Show HN: Bodo โ high-performance compute engine for Python data processing(github.com/bodo-ai) Bodo is a cutting edge compute engine for large scale Python data processing. Powered by an innovative auto-parallelizing just-in-time compiler, Bodo transforms Python programs into highly optimized, parallel binaries without requiring code rewrites, which makes Bodo 20x to 240x faster compared to alternatives!
42 points by rebanevapustus 104 days ago | 11 comments
How to Flatten nested JSON arrays(datazip.io) Flattening nested JSON or MongoDBโs BSON or normalizing semi-structured data and writing queries on it for analytics or regular queries, is a common challenge in data processing.
99 points by todsacerdoti 143 days ago | 114 comments
Parsing Gigabytes of JSON per Second(arxiv.org) JavaScript Object Notation or JSON is a ubiquitous data exchange format on the Web. Ingesting JSON documents can become a performance bottleneck due to the sheer volume of data. We are thus motivated to make JSON parsing as fast as possible.
79 points by chenxi9649 164 days ago | 15 comments
I use Nim instead of Python for data processing (2021)(benjamindlee.com) Lazy programmers often prefer to substitute computing effort for programming effort. I am just such a programmer. For my research, I often need to design and run algorithms over large datasets ranging into the scale of terabytes. As a fellow at the NIH, I have access to Biowulf, a 100,000+ processor cluster, so itโs usually not worth spending a ton of time optimizing single-threaded performance for a single experiment when I can just perform a big [MapReduce](https://en.wikipedia.org/wiki/MapReduce).