Hacker News with Generative AI: Data

Trump admin ends extreme weather database (tracked cost of disasters since 1980) (cnn.com)
IPinfo started offering free unlimited country-level geolocation and ASN details (ipinfo.io)
Accurate country-level geolocation and ASN details for free. No monthly fees, no credit card required, and unlimited API requests.
Pg_parquet v0.4.0: Google Cloud Storage, HTTPS storage, and more (crunchydata.com)
What began as a hobby Rust project to explore the PostgreSQL extension ecosystem and the Parquet file format has grown into a handy component for folks integrating Postgres and Parquet into their data architecture. Today, we’re excited to release version 0.4 of pg_parquet.
Databricks in talks to acquire startup Neon for about $1B (upstartsmedia.com)
Data and AI unicorn Databricks is in talks to make a splash with another startup acquisition, Upstarts has learned.
So much blood: But how much exactly? (dynomight.substack.com)
The Latest Trump and Doge Casualty: Energy Data (propublica.org)
The Trump administration has eliminated or stifled critical data at dozens of federal agencies. Now the administration’s actions are hitting a new realm: the energy industry.
NOAA Datasets Will Soon Disappear (eos.org)
NOAA has quietly reported that they will soon decommission 14 datasets, products, and catalogs related to earthquakes and marine, coastal, and estuary science.
Low Background Steel – Content from Before AI (lowbackgroundsteel.ai)
Sources of data that haven’t been contaminated by AI-created content. Low Background Steel (and lead) is a type of metal uncontaminated by radioactive isotopes from nuclear testing. That steel and lead is usually recovered from ships that sunk before the Trinity Test in 1945. This blog is about uncontaminated content that I'm terming "Low Background Steel".
Ask HN: Are there any apps to track grocery prices in local stores? (ycombinator.com)
With tariffs kicking in and imports slowing, I want to track the local impact at my grocery stores.
Low Background Steel (lowbackgroundsteel.ai)
Sources of data that haven’t been contaminated by AI-created content. Low Background Steel (and lead) is a type of metal uncontaminated by radioactive isotopes from nuclear testing. That steel and lead is usually recovered from ships that sunk before the Trinity Test in 1945. This blog is about uncontaminated content that I'm terming "Low Background Steel".
DOGE is building a master database for immigration enforcement, sources say (cnn.com)
Staffers from Elon Musk’s Department of Government Efficiency are building a master database to speed-up immigration enforcement and deportations by combining sensitive data from across the federal government, multiple sources familiar with the plans tell CNN.
Major European institutes join race to save US science data (nature.com)
Several research institutes in Germany are joining a worldwide grassroots effort to save science data sets that researchers fear could be deleted or decommissioned on the orders of US President Donald Trump’s administration, Nature has learnt.
A Principled Approach to Querying Data – A Type-Safe Search DSL (claudiu-ivan.com)
The rise of local-first web applications demands a rethinking of traditional client-server architectures.
How safe is the air to breathe? 50M people in the US don't know (phys.org)
In 2024, more than 50 million people in the United States lived in counties with no air-quality monitoring, according to a new study from researchers in the Penn State College of Health and Human Development.
Table: NSF Grant Terminations in 2025 (airtable.com)
Drag to adjust frozen columns
Full Text Search of US Court records (judyrecords.com)
740 million+ United States Court Cases
Wikipedia offers AI developers its article data on Kaggle to stop scraping (siliconangle.com)
The Wikimedia Foundation, the organization behind the internet’s largest free encyclopedia Wikipedia, is offering an artificial intelligence-ready dataset on Kaggle that’s aimed at dissuading AI companies and large language model trainers from scraping the website.
All Databases Are Just Files (tselai.com)
SQLite and DuckDB have earned their popularity in the data world, and for good reason. I’m a big fan of both. Their appeal is simple: they’re just files. You can see, copy, and move them around like any other file.
Tell HN: Warning Google trains on your data when using aistudio.google.com (ycombinator.com)
This is even if you're using it with a google account that has billing connected.
A weird phrase is plaguing scientific papers due to a glitch in AI training data (theconversation.com)
Earlier this year, scientists discovered a peculiar term appearing in published papers: “vegetative electron microscopy”.
Show HN: H-1B salary search without fuss (h1bsalaries.fyi)
This website indexes the Labor Condition Application (LCA) disclosure data from the United States Department of Labor (DOL).
Levels of configuration languages (tuxen.de)
Code is data and data is code. Years ago, I had a brief affair with Lisp and there I picked up this meme. Today, I believe there are also benefits in separating code and data.
Palantir Is Helping Doge with a Massive IRS Data Project (wired.com)
Palantir, the software company cofounded by Peter Thiel, is part of an effort by Elon Musk’s so-called Department of Government Efficiency (DOGE) to build a new “mega API” for accessing Internal Revenue Service records, IRS sources tell WIRED.
Trustworthy AI Without Trusted Data (epfl.ch)
EPFL researchers developed a ground-breaking new tool to help build safer AI.
Google to embrace MCP (techcrunch.com)
Just a few weeks after OpenAI said it would adopt rival Anthropic’s standard for connecting AI models to the systems where data resides, Google is following suit.
Colossus: The secret ingredient behind Google Cloud's Rapid Storage (cloud.google.com)
As an object storage service, Google Cloud Storage is popular for its simplicity and scale, a big part of which is due to the stateless REST protocols that you can use to read and write data. But with the rise of AI and as more customers look to run data-intensive workloads, two major obstacles to using object storage are its higher latency and lack of file-oriented semantics.
All of the Data That Elon Musk's Doge May Have on You and Your Family (gizmodo.com)
As Elon Musk’s Department of Government Efficiency burrows through the federal government like a tape worm, it has increasingly requested (and been given) access to large amounts of information on the American public.
Our Privacy Act Lawsuit Against DOGE and OPM: Why a Judge Let It Move Forward (eff.org)
Last week, a federal judge rejected the government’s motion to dismiss our Privacy Act lawsuit against the U.S. Office of Personnel Management (OPM) and Elon Musk’s “Department of Government Efficiency” (DOGE). OPM is disclosing to DOGE agents the highly sensitive personal information of tens of millions of federal employees, retirees, and job applicants. This disclosure violates the federal Privacy Act, a watershed law that tightly limits how the federal government can use our personal information.
Doge reportedly planning a hackathon to build 'mega API' for IRS data (techcrunch.com)
Elon Musk’s Department of Government Efficiency (DOGE) plans to host a hackathon next week focused on the creation of a “mega API” that will provide access to taxpayer data, according to Wired.
Capitol Trades: Tracking Stock Market Transactions of Politicians (capitoltrades.com)
Tracking Capitol Hill politicians' trades can provide valuable insights for your investment research — and we offer you a free solution to do just that.