Hacker News with Generative AI: Data Analysis

Discord Unveiled: A Comprehensive Dataset of Public Communication (2015-2024) (arxiv.org)
Discord has evolved from a gaming-focused communication tool into a versatile platform supporting diverse online communities.
Capalyze – Natural language data analysis (capalyze.ai)
上传表格,提问智答,生成洞见
What Is This Thing Called Swing? (ds.mpg.de)
Jazz must swing - jazz musicians agree on that. However, even a century after the beginnings of jazz, there is still no general agreement what exactly constitutes the swing feel. With a dedicated experiment and data analyses on more than 450 well-known jazz solos, we have tried to unravel some secrets of swing.
Show HN: Buckaroo – Data table UI for Notebooks (github.com/paddymul)
Buckaroo is a modern data table for Jupyter that expedites the most common exploratory data analysis tasks.
Show HN: Fahmatrix – A Lightweight, Pandas-Like DataFrame Library for Java (github.com/moustafa-nasr)
Fahmatrix is a lightweight, modern Java library for working with tabular data, inspired by Python's Pandas and rooted in the idea of making data understanding (fahm) easy on the JVM.
Show HN: CSV GB+ by Data.olllo – Open and Process CSVs Locally (microsoft.com)
Backblaze Drive Stats for Q1 2025 (backblaze.com)
Welcome to the first Drive Stats of 2025. In case you missed it, the 2024 Drive Stats report was the last for long-time Drive Stats guru, Andy Klein, who is happily retired—off putting the “green” in greener pastures by working on his golf game. We–being Backblaze staff writer Stephanie Doyle and Chief Technical Evangelist Pat Patterson–are picking up where Andy left off, bringing you the metrics and analysis you know and love. Now, on to the numbers! 
Gmail to SQLite (github.com/marcboeker)
This is a script to download emails from Gmail and store them in a SQLite database for further analysis.
Launch HN: Nao Labs (YC X25) – Cursor for Data (ycombinator.com)
Hey HN, we’re Claire and Christophe from nao Labs (https://getnao.io/). We just launched nao, an AI code editor to work with data: a local editor, directly connected with your data warehouse, and powered by an AI copilot with built-in context of your data schema and data-specific tools.
Show HN: Using eBPF to see through encryption without a proxy (github.com/qpoint-io)
Qtap: An eBPF agent that captures pre-encrypted network traffic, providing rich context about egress connections and their originating processes.
Show HN: YouTube Time Machine – browser extension to find forgotten videos (frankmeeuwsen.com)
Did you know that the average YouTube video is viewed about 41 times? For a platform that seemingly features professional content creators and is the second most visited site after Google, this view count might seem disappointing. Or is something else going on? I’ll explain how this fact inspired me to create a browser extension that makes visible the videos that the YouTube algorithm keeps hidden.
Energy efficiency of heat pumps in residential buildings using operation data (nature.com)
As heat pumps become more prevalent in residential buildings, effective performance monitoring is essential.
Show HN: TextQuery – Query CSV, JSON, XLSX Files with SQL (textquery.app)
TextQuery is an all-in-one desktop app to import, query, modify, and visualize your raw data with SQL.
Internet usage pattern during power outage in Spain and Portugal (akamai-mpulse.com)
On Monday this week, the Iberian Peninsula suffered a major power outage that disabled many services across these countries. In this post I'll look at the patterns we saw in mPulse data during this time.
DuckDB is probably the most important geospatial software of the last decade (dbreunig.com)
What happens when you embed geospatial capabilities in generalist data tools? More people engaging with geo data.
Determining favorite t-shirt color using science (ostwilkens.se)
I'm looking to simplify my wardrobe, and the t-shirt is a staple. I like solid color t-shirts, and so the main differentiating factor is the color. But what color? There is only one way to find out. That is: create images of myself with different colored t-shirts, and evaluate them in an ELO-based arena.
Backstory to the Survivorship Bias Plane (yuxi-liu-wired.github.io)
I discover the exact backstory to that picture of an airplane with red dots on top of it.
The PCAP (weberblog.net)
For the last couple of years, I captured many different network and upper-layer protocols and published the pcaps along with some information and Wireshark screenshots on this blog. However, it always takes me some time to find the correct pcap when I am searching for a concrete protocol example. There are way too many pcaps out there.
Zipf's Law (wikipedia.org)
Zipf's law (/zɪf/; German pronunciation: [tsɪpf]) is an empirical law stating that when a list of measured values is sorted in decreasing order, the value of the n-th entry is often approximately inversely proportional to n.
Normalizing Ratings (blogspot.com)
You Wouldn't Download a Hacker News (jasonthorsness.com)
And now I can analyze it with DuckDB. Behold the fraction of total comments and stories referencing key topics over time!
US Tariff Flow Analyzer (tradeflows.us)
We Found Insurance Fraud in Our Crash Data (levs.fyi)
When we set out to build geospatial risk scores for vehicle crashes at Matrisk AI, we never expected that a side by side look at Vehicle Identification Numbers and crash timelines would hint at possible insurance fraud. But data sometimes surprises you. Below, I’ll walk through how we stumbled upon this discovery, what we found, and why it might matter for anyone insuring vehicles.
Show HN: Deep Research across 30k GitHub repos (gitwiki.com)
Powered by DevinDeepWikipowered byDevinShare<h1>Which repo would you like to understand?</h1>
How effective and safe are measles vaccines? (ourworldindata.org)
Data from large meta-analyses show that measles vaccination is highly effective and safe, reducing the chances of getting measles by 95%.
Web Browser telemetry – 2025 edition (sizeof.cat)
This is a re-release of my “world-renown” Web Browser Telemetry - 2021 edition article, updated for 2025.
Show HN: Comparelists.org – Instantly Compare Two Lists, Find Differences (comparelists.org)
The easiest way to compare two lists online. Free tool to find matches, differences, duplicates and unique items between lists instantly. Support for TXT, CSV and Excel files.
How to Run Python in Production (ashishb.net)
My previous article recommended that one should reconsider using Python in production. However, there’s one category of use case where Python is the dominant option for running production workloads. And that’s data analysis and machine learning.
I analyzed chord progressions in 680k songs (cantgetmuchhigher.com)
I Analyzed the Chord Progressions of 680k Songs
ICE Hands Palantir Millions for Comprehensive Analysis of Known Groups (404media.co)
Last week Immigration and Customs Enforcement (ICE) paid contracting giant Palantir tens of millions of dollars to make modifications to a powerful ICE database and search tool to allow “complete target analysis of known populations” and to update the tool’s targeting and enforcement priorities, according to procurement records reviewed by 404 Media.