Hacker News with Generative AI: Data

Stock Market and Financial Data API (financialdata.net)
Our API provides access to a variety of financial information.
Larry Ellison wants to put all America's data in AI, including DNA (theregister.com)
If governments want AI to improve services and security for their citizens, then they need to put all their information in one place – even citizens’ genomic data – according to Larry Ellison, the Oracle database tycoon.
Federal data is disappearing. On Thursday, meet the teams working to rescue it (muckrock.com)
Since the start of the new Trump administration, hundreds of federal data sets and government websites have gone offline without warning, sometimes returning with major changes and sometimes not returning at all.
NOAA's public weather data powers the local forecasts on your phone and TV (theconversation.com)
When a hurricane or tornado starts to form, your local weather forecasters can quickly pull up maps tracking its movement and showing where it’s headed. But have you ever wondered where they get all that information?
Musk Team Cuts Education Department Arm That Tracks National School Performance (propublica.org)
The Trump administration has terminated more than $900 million in Education Department contracts, taking away a key source of data on the quality and performance of the nation’s schools.
Smuggling arbitrary data through an emoji (paulbutler.org)
USASpending.gov (usaspending.gov)
A 16TB Mirror of Data.gov on Source.Coop (source.coop)
Source Cooperative is a Radiant Earth initiative.
Announcing the data.gov archive (law.harvard.edu)
Today we released our archive of data.gov on Source Cooperative. The 16TB collection includes over 311,000 datasets harvested during 2024 and 2025, a complete archive of federal public datasets linked by data.gov. It will be updated daily as new datasets are added to data.gov.
Data Hoarders Are Rushing to Save Vanishing US Health Records (bloomberg.com)
A grassroots effort to preserve US government data takes shape
US health websites,datasets taken down as agencies comply Trump executive orders (cnn.com)
CDC datasets uploaded before January 28th, 2025 (archive.org)
An archive of all CDC datasets uploaded to https://data.cdc.gov/browse before January 28th, 2025. Excludes corrupt datasets and data not publicly accessible.
Elon Musk staff has been caught installing drives inside the OPM office (bsky.app)
Turn any topic into an interactive timeline (0xmmo.co)
Turn any topic into an interactive chronological timeline.
CDC data are disappearing (theatlantic.com)
Last night, scientists began to hear cryptic and foreboding warnings from colleagues: Go to the CDC website, and download your data now. They were all telling one another the same thing: Data on the website were about to disappear, or be altered, to comply with the Trump administration’s ongoing attempt to scrub federal agencies of any mention of gender, DEI, and accessibility.
CDC Data Is Disappearing (theatlantic.com)
Last night, scientists began to hear cryptic and foreboding warnings from colleagues: Go to the CDC website, and download your data now.
POTUS Tracker – Executive Orders, Presidential Schedule, Signed Legislation (potustracker.us)
POTUS Tracker experienced significant downtime on 1/28 due to an overloaded server.
Show HN: Spice.ai OSS 1.0 – data query and AI-inference engine built in Rust (spiceai.org)
🎉 Today marks the 1.0-stable release of Spice.ai Open Source—purpose-built to help enterprises ground AI in data. By unifying federated data query, retrieval, and AI inference into a single engine, Spice mitigates AI hallucinations, accelerates data access for mission-critical workloads, and makes it simple and easy for developers to build fast and accurate data-intensive applications across cloud, edge, or on-prem.
r/DataHoarder: The White House Is Removing Everything (reddit.com)
GM faces ban on selling driver data that can be used to raise insurance rates (arstechnica.com)
GM sold geolocation and other driving data without adequate consent, FTC says.
Zuckerberg appeared to know Llama trained on Libgen (rollingstone.com)
The AI rush has brought with it thorny questions of copyright and ownership of data as tech companies train bots like ChatGPT on existing texts, but it seems Meta largely brushed these aside as they worked to integrate such tools into Facebook and Instagram.
Ask HN: Is there money in RAG? (ycombinator.com)
I see that the market strength of RAG-based solutions is in internal (proprietary) repositories of data.
GM parks claims driver location data was given to insurers, pushing up premiums (theregister.com)
General Motors on Thursday said that it has reached a settlement with the FTC "to address privacy concerns about our now-discontinued Smart Driver program."
General Motors Is Banned from Selling Driving Behavior Data for 5 Years (nytimes.com)
The Federal Trade Commission said on Thursday that it had reached a settlement with General Motors that would ban the automaker from providing drivers’ behavior and geolocation data to consumer reporting agencies.
43K fewer drivers on Manhattan roads after congestion pricing turned on (gothamist.com)
Meta Confesses to Training Llama with Pirated LibGen Data [pdf] (courtlistener.com)
Levels.fyi's annual compensation report 2024 (levels.fyi)
Levels.fyi's annual compensation report. View top paying companies, cities, titles & other trends.
Small Data [video] (youtube.com)
25% of the top websites are blocking OpenAI from crawling (originality.ai)
Bots such as OpenAI’s GPTBot, the Applebot, CCBot, Google-Extended, and Bytespider analyze, store, or scrape your website’s data in order to provide data to train more advanced LLMs.
Brief Introduction to Fix and Fix JSON (fixparser.dev)
The FIX Protocol (Financial Information Exchange) is a standardized messaging system for real-time electronic communication of trade-related information in financial markets.