Hacker News with Generative AI: Data Analysis

Optimizing a large SQLite database for reading (jacobfilipp.com)
I recently needed to speed up a simple read query on a large SQLite file (620Mb), for my grocery price CSV export tool.
Show HN: Search and analyze millions of SEC filings with AI. (publicview.ai)
Exploring UK Environment Agency Data with DuckDB and Rill (rmoff.net)
The UK Environment Agency publishes a feed of data relating to rainfall and river levels. As a prelude to building a streaming pipeline with this data, I wanted to understand the model of it first.
A data analysis of speeches at the Oscars (stephenfollows.com)
Some see the Oscars as the pinnacle of artistic achievement, a night of cinematic excellence that honours the best of the best.
Show HN: Telescope – an open-source web-based log viewer for logs in ClickHouse (github.com/iamtelescope)
Telescope is a web application designed to provide an intuitive interface for exploring log data. It is built to work with any type of logs, as long as they are stored in ClickHouse. Users can easily configure connections to their ClickHouse databases and run queries to filter, search, and analyze logs efficiently.
ChatGPT clicks convert 6.8x higher than Google organic (medium.com)
Here’s the deal : I recently dug into some data in GA4 for our website and found that while Google Organic brings in more traffic, ChatGPT clicks convert way better — 6.8X better for free trial conversions, to be exact.
What do people see when they're tripping? Analyzing Erowid's trip reports (themicrodose.substack.com)
The existence of synesthesia blew Sean Noah’s mind the first time he learned about it in high school biology class.
Winners of the $10k ISBN visualization bounty (annas-archive.org)
A few months ago we announced a $10,000 bounty to make the best possible visualization of our data showing the ISBN space. We emphasized showing which files we have/haven’t archived already, and we later a dataset describing how many libraries hold ISBNs (a measure of rarity).
Show HN: I scrape Steam data every month and it's yours to download for free (gginsights.io)
Leverage the power of AI to help answer your questions about the Steam market and become a data expert, transforming data into actionable insights.
The Deep Research problem (ben-evans.com)
Most what I do for a living is research and analysis. I think of data I’d like to see and go looking for it; I compile and collate it, make charts, decide they’re boring and try again, find new ways and new data to understand and explain the issue, and produce text and charts that try to express what I’m thinking. Then I go and talk to people about it.
Rust, C++, and Python trends in jobs on Hacker News (February 2025) (wojtczyk.de)
How are Rust, C++, and Python trending on Hacker News in the job market?
Show HN: I analyzed 1500+ job ads to find the most wanted skills by recruiters (skillsets.tech)
Discover the most wanted skills by recruiters
Augurs demo (augu.rs)
augurs is a time series analysis library for Rust with bindings for JavaScript. It provides a set of tools for analyzing time series data, including clustering, outlier detection, forecasting, and changepoint detection.
Modern CSV: Multi-Platform CSV File Editor and Viewer (moderncsv.com)
Modern CSV is a powerful CSV file editor/viewer application for Windows, Mac, and Linux. Professionals at all levels of technical proficiency use it to analyze data, check files for uploading to databases, modify configuration files, maintain customer lists, and more. We designed it to compensate for the deficiencies of spreadsheet programs in handling CSV/TSV/DSV/etc. files. We strive to create a user experience our customers describe as “blissful”. 
Numberholders Age 100 or Older Who Did Not Have Death Information on Numident [pdf] (oig.ssa.gov)
How a computer that 'drunk dials' videos is exposing YouTube's secrets (bbc.com)
YouTube is about to turn 20. An unusual research method is unveiling statistics about the platform that Google doesn't want you to know.
I pulled data on 1378 restaurants from Google Maps to rank them in order (mattsayar.com)
See how this list was created. This page was last updated February 12, 2025.
Where are the best restaurants in my city? A statistical analysis (mattsayar.com)
Everyone wants to know the best places to eat, but the "best place" is inherently subjective.
Reclassification is making US tech job losses look worse than they are (theregister.com)
The latest job numbers from the US Bureau of Labor Statistics make IT hiring look like it's in freefall, but that's not the case at all, says consultancy firm Janco.
Sea level in Honolulu (1905-2025) has been 1.56mm per year (noaa.gov)
The relative sea level trend is #trendmmyr millimeters/year with a 95% confidenceinterval of +/- #sterrmmy mm/yr based on monthly mean sea level data from #startyear to #endyear which is equivalent to a change of feet in 100 years.
Latest jobs data shows decline in tech jobs over the last year (twitter.com)
Competition and survival in modern academia: A bibliometric case study (arxiv.org)
We study the career lengths of researchers in theoretical high-energy physics from 1950 to 2020. Using a cohort-based analysis and bibliometric data from 30,149 authors in three physics disciplines we observe a dramatic increase in the ratio of academic dropouts over time.
You're missing your near misses (surfingcomplexity.blog)
FAA data shows 30 near-misses at Reagan Airport – NPR, Jan 30, 2025
Google Says "Links Matter Less"–We Looked at 1M SERPs to See If It's True (ahrefs.com)
Threads Drives 73.6% More Engagement Than X (buffer.com)
We analyzed 10.2 million posts published to X and Threads in 2024, breaking down their engagement rates, trends, and strengths so you know where to focus your content strategy — and why cross-posting might be the key to maximizing your reach.
I used Google Gemini Pro to compare party manifestos for an Indian election (github.com/shijithpk)
Collecting code and docs here related to some work on party manifestos for Delhi assembly elections in 2025.
Exploring Nine Simultaneous Transients on April 12th, 1950 (2021) (nature.com)
Macrodata Refinement (lumon-industries.com)
AI and Palantir are reshaping how we fight crime (thetimes.com)
FAA data shows 30 near-misses at Reagan Airport (npr.org)
When pilots report near-midair collisions around Reagan National Airport, there's often a military aircraft involved, an NPR analysis of Federal Aviation Administration data shows.