Hacker News with Generative AI: Data

Google and the Art of Weaponizing Privacy (vincentschmalbach.com)
Every time Google makes a "privacy" change, competitors mysteriously lose access to data while Google's own data empire grows stronger.
Boring Iceberg Catalog – 1 JSON file. 0 Setup (juhache.substack.com)
TeleMessage Explorer: a new open source research tool (micahflee.com)
I've spent the last week or two writing code to make sense of the massive hack of data from TeleMessage, the comically insecure company that makes a modified Signal app that Trump's former national security advisor Mike Waltz was caught using. I've decided to publish my code as open source in the hopes that other journalists will use it to find revelations in this dataset.
Is TfL losing the battle against heat on the Victoria line? (swlondoner.co.uk)
The Victoria Line stubbornly remains the hottest underground line, according to new TfL data, resisting the numerous cooling efforts enacted by TfL and presenting little potential for change.
Music, teeth, and AI but not all at once (erikheintare.substack.com)
Hi! It’s Erik. My Slack bio says, "I'm probably taller than you.", but other than that, I’m a husband, father, and owner of two dogs. For the last 10+ years, I have swum in data waters. Currently leading engineering teams at Bolt, focusing on making Bolt's data usable to everyone. These letters will contain bits and pieces I’ve noticed over the last few weeks.
U.S. Spy Agencies–One-Stop Shop to Buy Your Personal Data (theintercept.com)
The ever-growing market for personal data has been a boon for American spy agencies. The U.S. intelligence community is now buying up vast volumes of sensitive information that would have previously required a court order, essentially bypassing the Fourth Amendment. But the surveillance state has encountered a problem: There’s simply too much data on sale from too many corporations and brokers.
What If Every Picture You've Ever Seen Already Exists? (ycombinator.com)
I was thinking recently about how images work at the data level, and it kind of broke my brain.
Where does your weather forecast come from? (text.npr.org)
Millions of Americans rely on weather forecasts every day.
Wikipedia's Most Translated Articles (sohom.dev)
This is a list of article ranked by the number of Wikipedia language editions in which they appear.
California vanity license plate applications with reasons for rejection (2020) (github.com/veltman)
Warning: this dataset contains vulgar and offensive language (quite a lot of it).
Is current state of querying on observability data broken? (ycombinator.com)
I feel that current observability tooling significantly lags behind user expectations by failing to support a critical capability: querying across different telemetry signals.
Publisher: The Malloy Semantic Model Server (github.com/malloydata)
Welcome to Publisher, the open-source semantic model server for the Malloy data language.
Getting AI to write good SQL (cloud.google.com)
Organizations depend on fast and accurate data-driven insights to make decisions, and SQL is at the core of how they access that data. With Gemini, Google can generate SQL directly from natural language — a.k.a. text-to-SQL. This capability increases developer and analysts’ productivity and empowers non-technical users to interact directly with the data they need.
Airlines Are Collecting Your Data and Selling It to ICE (levernews.com)
A massive aviation industry clearinghouse that processes data for 12 billion passenger flights per year is selling that information to the Trump administration amid the White House’s new immigration crackdown, according to documents reviewed by The Lever.
NOAA says it will discontinue its billion-dollar disaster database (scrippsnews.com)
The National Oceanic and Atmospheric Administration announced Thursday it will archive its database of billion-dollar climate disasters, as the Trump administration reduces the resources available to the agency.
Trump admin ends extreme weather database (tracked cost of disasters since 1980) (cnn.com)
IPinfo started offering free unlimited country-level geolocation and ASN details (ipinfo.io)
Accurate country-level geolocation and ASN details for free. No monthly fees, no credit card required, and unlimited API requests.
Pg_parquet v0.4.0: Google Cloud Storage, HTTPS storage, and more (crunchydata.com)
What began as a hobby Rust project to explore the PostgreSQL extension ecosystem and the Parquet file format has grown into a handy component for folks integrating Postgres and Parquet into their data architecture. Today, we’re excited to release version 0.4 of pg_parquet.
Databricks in talks to acquire startup Neon for about $1B (upstartsmedia.com)
Data and AI unicorn Databricks is in talks to make a splash with another startup acquisition, Upstarts has learned.
So much blood: But how much exactly? (dynomight.substack.com)
The Latest Trump and Doge Casualty: Energy Data (propublica.org)
The Trump administration has eliminated or stifled critical data at dozens of federal agencies. Now the administration’s actions are hitting a new realm: the energy industry.
NOAA Datasets Will Soon Disappear (eos.org)
NOAA has quietly reported that they will soon decommission 14 datasets, products, and catalogs related to earthquakes and marine, coastal, and estuary science.
Low Background Steel – Content from Before AI (lowbackgroundsteel.ai)
Sources of data that haven’t been contaminated by AI-created content. Low Background Steel (and lead) is a type of metal uncontaminated by radioactive isotopes from nuclear testing. That steel and lead is usually recovered from ships that sunk before the Trinity Test in 1945. This blog is about uncontaminated content that I'm terming "Low Background Steel".
Ask HN: Are there any apps to track grocery prices in local stores? (ycombinator.com)
With tariffs kicking in and imports slowing, I want to track the local impact at my grocery stores.
Low Background Steel (lowbackgroundsteel.ai)
Sources of data that haven’t been contaminated by AI-created content. Low Background Steel (and lead) is a type of metal uncontaminated by radioactive isotopes from nuclear testing. That steel and lead is usually recovered from ships that sunk before the Trinity Test in 1945. This blog is about uncontaminated content that I'm terming "Low Background Steel".
DOGE is building a master database for immigration enforcement, sources say (cnn.com)
Staffers from Elon Musk’s Department of Government Efficiency are building a master database to speed-up immigration enforcement and deportations by combining sensitive data from across the federal government, multiple sources familiar with the plans tell CNN.
Major European institutes join race to save US science data (nature.com)
Several research institutes in Germany are joining a worldwide grassroots effort to save science data sets that researchers fear could be deleted or decommissioned on the orders of US President Donald Trump’s administration, Nature has learnt.
A Principled Approach to Querying Data – A Type-Safe Search DSL (claudiu-ivan.com)
The rise of local-first web applications demands a rethinking of traditional client-server architectures.
How safe is the air to breathe? 50M people in the US don't know (phys.org)
In 2024, more than 50 million people in the United States lived in counties with no air-quality monitoring, according to a new study from researchers in the Penn State College of Health and Human Development.
Table: NSF Grant Terminations in 2025 (airtable.com)
Drag to adjust frozen columns