Hacker News with Generative AI: Data Analysis

DEDA – Tracking Dots Extraction, Decoding and Anonymisation Toolkit (github.com/dfd-tud)
Document Colour Tracking Dots, or yellow dots, are small systematic dots which encode information about the printer and/or the printout itself. This process is integrated in almost every commercial colour laser printer. This means that almost every printout contains coded information about the source device, such as the serial number.
The R Inferno (2011) [pdf] (burns-stat.com)
Most promoted and blocked domains on Kagi (kagi.com)
Kagi Search Stats
CNCF Git Data Miner (github.com/cncf)
This is the Cloud Native Computing Foundation's fork of Jon Corbet and Greg KH's gitdm tool for calculating contributions based on developers and their companies.
Matrix Profiles (aneksteind.github.io)
Lately I’ve been thinking about time series analysis to aid in Reflect’s insights features. Towards this end, I’ve had a Hacker News thread about anomaly detection bookmarked in Later. I finally got to looking at it and there was a comment that mentioned the article left out matrix profiles, which I had never heard of, so I decided to look into them.
Some Reflections After a Month of Tracking My Own Online Activity (mcwhittemore.com)
Since 8:38 PM on February 22nd, I’ve been recording all my browsing activity in a database I manage using a custom-built browser extension and a wrapper around @rosskevin/ifvisible. The result? I now have a clear picture of just how much time I’ve spent on the web this past month. And, well… I spend a lot of time reading email. Go figure.
The persistent mischaracterization of Google and Facebook A/B tests (sciencedirect.com)
Marketing research has increasingly relied on online platform studies, which are studies conducted in a naturalistic online environment and which leverage the A/B testing tool provided by platforms such as Facebook or Google Ads.
A glitch in an online survey replaced the word 'yes' with 'forks' (pewresearch.org)
At Pew Research Center, we routinely ask the people who take our surveys to give us feedback about their experience. Were the survey questions clear? Were they engaging? Were they politically neutral?
Honking Complaints Plunge 69% Inside Congestion Pricing Zone (thecity.nyc)
Honking-mad motorists are laying off the horn in the core of Manhattan since the January launch of congestion pricing, data reveals — with New Yorkers’ beefs about blaring horns plummeting nearly 70% from the same time last year.
The Business of Phish (2013) (priceonomics.com)
Over the past four years, the rock band Phish has generated over $120 million in ticket sales, handily surpassing more well known artists like Radiohead, The Black Keys, and One Direction.
Ask HN: Is anyone with AI expertise analyzing JFK files? (ycombinator.com)
The JFK Files, some of which were just released today, are thousands of text-based PDF files. They seem like a really good match for the capabilities of current LLMs.
Password reuse is rampant: nearly half of observed user logins are compromised (cloudflare.com)
Based on Cloudflare's observed traffic between September - November 2024, 41% of successful logins across websites protected by Cloudflare involve compromised passwords.
Cloudflare Analyzes Login Credentials (benjojo.co.uk)
Based on Cloudflare's observed traffic between September - November 2024, 41% of successful logins across websites protected by Cloudflare involve compromised passwords.
Publications from 2025 are shared more on Bluesky than on X/Twitter (bsky.app)
Using traditional ML and LLMs to analyze Executive Orders (1789 – 2025) (hyperarc.com)
Executive orders have been making the news recently, but aside from basic counts and individual analysis, it’s been hard to make sense of the entirety of all 11,000 accessible documents — especially for numerical analysis and trending. Thankfully we have LLMs to help with that.
DOGE Makes Its Latest Errors Harder to Find (nytimes.com)
Elon Musk’s Department of Government Efficiency has repeatedly posted error-filled data that inflated its success at saving taxpayer money. But after a series of news reports called out those mistakes, the group changed its tactics.
Optimizing a large SQLite database for reading (jacobfilipp.com)
I recently needed to speed up a simple read query on a large SQLite file (620Mb), for my grocery price CSV export tool.
What if football championships were lineal? (ufnc.xyz)
Starting from Italian championship, you can follow titles, defenses and challenges.
Show HN: Search and analyze millions of SEC filings with AI. (publicview.ai)
Exploring UK Environment Agency Data with DuckDB and Rill (rmoff.net)
The UK Environment Agency publishes a feed of data relating to rainfall and river levels. As a prelude to building a streaming pipeline with this data, I wanted to understand the model of it first.
A data analysis of speeches at the Oscars (stephenfollows.com)
Some see the Oscars as the pinnacle of artistic achievement, a night of cinematic excellence that honours the best of the best.
Show HN: Telescope – an open-source web-based log viewer for logs in ClickHouse (github.com/iamtelescope)
Telescope is a web application designed to provide an intuitive interface for exploring log data. It is built to work with any type of logs, as long as they are stored in ClickHouse. Users can easily configure connections to their ClickHouse databases and run queries to filter, search, and analyze logs efficiently.
ChatGPT clicks convert 6.8x higher than Google organic (medium.com)
Here’s the deal : I recently dug into some data in GA4 for our website and found that while Google Organic brings in more traffic, ChatGPT clicks convert way better — 6.8X better for free trial conversions, to be exact.
What do people see when they're tripping? Analyzing Erowid's trip reports (themicrodose.substack.com)
The existence of synesthesia blew Sean Noah’s mind the first time he learned about it in high school biology class.
Winners of the $10k ISBN visualization bounty (annas-archive.org)
A few months ago we announced a $10,000 bounty to make the best possible visualization of our data showing the ISBN space. We emphasized showing which files we have/haven’t archived already, and we later a dataset describing how many libraries hold ISBNs (a measure of rarity).
Show HN: I scrape Steam data every month and it's yours to download for free (gginsights.io)
Leverage the power of AI to help answer your questions about the Steam market and become a data expert, transforming data into actionable insights.
The Deep Research problem (ben-evans.com)
Most what I do for a living is research and analysis. I think of data I’d like to see and go looking for it; I compile and collate it, make charts, decide they’re boring and try again, find new ways and new data to understand and explain the issue, and produce text and charts that try to express what I’m thinking. Then I go and talk to people about it.
Rust, C++, and Python trends in jobs on Hacker News (February 2025) (wojtczyk.de)
How are Rust, C++, and Python trending on Hacker News in the job market?
Show HN: I analyzed 1500+ job ads to find the most wanted skills by recruiters (skillsets.tech)
Discover the most wanted skills by recruiters
Augurs demo (augu.rs)
augurs is a time series analysis library for Rust with bindings for JavaScript. It provides a set of tools for analyzing time series data, including clustering, outlier detection, forecasting, and changepoint detection.