Hacker News with Generative AI: Data Management

DuckLake is an integrated data lake and catalog format (ducklake.select)

Data Lakes, Data Catalogs, Open Source, Data Management

276 points by kermatt 50 days ago | 107 comments

A tool for easy download of Spanish geospatial and address data (github.com/ernestofgonzalez)
Trece is a CLI tool for downloading and managing authoritative Spanish geospatial and address data, including administrative boundaries, road networks, postal codes, and address points for all Spanish provinces.

Software, Spain, Geospatial Data, Data Management, CLI Tools

6 points by ernstgnzlz 54 days ago | 0 comments

NHS England hospitals cast doubt on Palantir use case (theregister.com)
English hospitals are voicing their concern about the functionality provided by Palantir, the US spy-tech firm that won a £330 million ($437 million) deal to run the Federated Data Platform for NHS England, as around a third of trusts go live on the system.

Healthcare, Data Management, Government Contracts, Privacy Concerns

9 points by mastazi 57 days ago | 0 comments

UK hospitals doubt Palantir utility: We'd 'lose functionality rather than gain' (theregister.com)
English hospitals are voicing their concern about the functionality provided by Palantir, the US spy-tech firm that won a £330 million ($437 million) deal to run the Federated Data Platform for NHS England, as around a third of trusts go live on the system.

Healthcare, Data Management, Government Contracts, Privacy Concerns, United Kingdom

18 points by rntn 60 days ago | 2 comments

Xata: Postgres at scale, with copy-on-write branching and anonymization (xata.io)
Relaunching Xata as "Postgres at scale". A Postgres platform with Copy-on-Write branching, data masking, and separation of storage from compute.

Databases, Postgres, Data Management, Cloud Computing, Security

45 points by mebcitto 60 days ago | 16 comments

Outlook stores email in Microsoft Cloud – what you need to know (runbox.com)
Many of our users have long relied on Outlook as their email client, but recent changes to how data is managed raise important privacy and control concerns.

Email, Privacy, Microsoft, Cloud Computing, Data Management

17 points by Sami_Lehtinen 61 days ago | 1 comments

Repair Time Requirements to Prevent Data Resurrection in Cassandra & Scylla (msun.io)
Cassandra and ScyllaDB share well known issues with race conditions between repair and garbage collection processes that can cause deleted data to resurrect.

Cassandra, ScyllaDB, Data Management, Databases, Performance

15 points by mikesun 61 days ago | 0 comments

Postgres with data branching and PII anonymization (xata.io)
Relaunching Xata as "Postgres at scale". A Postgres platform with Copy-on-Write branching, data masking, and separation of storage from compute.

Postgres, Databases, Data Management, Anonymization, Cloud Computing

13 points by simonklee 63 days ago | 4 comments

Ask HN: How are you cleaning and transforming data before imports/uploads? (ycombinator.com)

Data Cleaning, Data Transformation, Data Management, Software Development, Programming

36 points by dataflowmapper 63 days ago | 33 comments

Fivetran to acquire Census (fivetran.com)
Fivetran becomes the only fully managed platform that can move trusted, governed data in any direction, powering real-time decisions, AI, and business operations.

Acquisitions, Data Management, Business Operations, AI

82 points by njaremko 76 days ago | 50 comments

A faster way to copy SQLite databases between computers (alexwlchan.net)
I store a lot of data in SQLite databases on remote servers, and I often want to copy them to my local machine for analysis or backup.

Databases, SQLite, Efficiency, Data Management, Remote Servers

507 points by ingve 76 days ago | 196 comments

Beyond Performance: Measuring the environmental impact of analytical databases (arxiv.org)
The exponential growth of data is making query processing increasingly critical for modern computing infrastructure, yet the environmental impact of database operations remains poorly understood and largely overlooked.

Environmental Impact, Database, Performance, Data Management

17 points by samaysharma 78 days ago | 4 comments

Everything You Need to Know About Incremental View Maintenance (materializedview.io)
Incremental view maintenance has been a hot topic lately.

Database, Data Management, Optimization, Performance

9 points by riccomini 89 days ago | 6 comments

Data Reliability at Chick-Fil-A (medium.com)
Chick-fil-A has over 3,000 locations across the USA, Puerto Rico, and Canada, with over 8 million orders per day. The amount of data being tracked and processed, including Restaurant data points, customer orders, and other business operations information creates a data rich landscape, but also a multitude of challenges. Data Reliability Engineering (DRE) helps Chick-fil-A approach these challenges and utilize resources to create a reliable system that supports the business and customers on a daily basis.

Data Reliability, Business Operations, Data Management, Fast Food, Retail

6 points by sksxihve 97 days ago | 0 comments

Doge Moves from Secure, Reliable Tape Archives to Hackable Digital Records (404media.co)
The Department of Government Efficiency (DOGE) announced Monday that the General Services Administration converted 14,000 magnetic to digital records, and claimed the process saved a million dollars a year.

Government, Technology, Security, Data Management

25 points by riffraff 99 days ago | 3 comments

Federated Data Access for MCP (Model Context Protocol) (mindsdb.com)
Today marks a significant milestone in our mission to simplify how AI accesses enterprise data. We're excited to announce that MindsDB now fully supports the Model Context Protocol (MCP) across both our open source and enterprise platforms. This gives our enterprise customers and open source users a unified way for their AI applications and agents to run queries over federated data stored in different databases and clouds as if it were a single database.

AI, Databases, Cloud Computing, Data Management

17 points by torrmal 100 days ago | 3 comments

Declarative Schemas for simpler database management (supabase.com)
Today we’re releasing declarative schemas to simplify managing and maintaining complex database schemas. With declarative schemas, you can define your database structure in a clear, centralized, and version-controlled manner.

Databases, Software Development, Data Management

81 points by kiwicopple 104 days ago | 57 comments

Ask HN: Code should be stored in a database. Who has tried this? (ycombinator.com)
To me it seems obvious that code should be stored in a database rather than a hierarchical, text-based format.

Software Development, Databases, Programming, Data Management

22 points by vaughan 109 days ago | 47 comments

Palantir suggests 'common operating system' for UK govt data (theregister.com)
In a witness statement to the UK COVID-19 Inquiry [PDF], an ongoing independent public inquiry into the nation's response to the pandemic (in which around 208,000 people died), Louis Mosley, executive veep of Palantir Technologies UK, said the government should invest in a "common operating system" for its data, encompassing departments such as the Department for Work and Pensions and local authorities.

Government, Data Management, UK, Technology, Pandemic

49 points by rntn 113 days ago | 76 comments

Ask HN: Lessons from Building a Fortune 500 RAG Chatbot (50M Records in 10–30s) (ycombinator.com)
I’ve spent the past year and a half constructing a Retrieval Augmented Generation (RAG) chatbot for a Fortune 500 manufacturing company, integrating over 50 million records across a dozen databases.

Chatbots, Fortune 500, Data Management

19 points by tylersuard 119 days ago | 15 comments

Time-Series vs. Streaming Databases: Key Differences and Use Cases (risingwave.com)

Databases, Time Series, Streaming Data, Data Management

23 points by Sheldon_fun 122 days ago | 1 comments

Ask HN: How do you manage and version control small structured data? (ycombinator.com)
So I work in a heavily regulated field and often come across the need to document all kinds of semi-structured data like requirements, risks, test-cases, etc.

Data Management, Version Control, Software Engineering, Regulatory Compliance, Ask HN

5 points by Igin 127 days ago | 6 comments

New Zealand's $16B health dept managed finances with single Excel spreadsheet (theregister.com)
The body that runs New Zealand’s public health system uses a single Excel spreadsheet as the primary source of data to consolidate and manage its finances, which aren’t in great shape perhaps due to the sheet’s shortcomings.

New Zealand, Healthcare, Finance, Data Management, Technology

164 points by indy 128 days ago | 162 comments

$16B health dept managed finances with single Excel sheet. It hasn't gone well (theregister.com)
The body that runs New Zealand’s public health system uses a single Excel spreadsheet as the primary source of data to consolidate and manage its finances, which aren’t in great shape perhaps due to the sheet’s shortcomings.

New Zealand, Public Health, Finance, Data Management, Excel

7 points by rntn 129 days ago | 1 comments

We built a modern data stack from scratch and reduced our bill by 70% (jchandra.com)
Building and managing a data platform that is both scalable and cost-effective is a challenge many organizations face. We managed an extensive data lake with a lean data team and reduced our Infra Cost by 70%.

Data Management, Cost Reduction, Cloud Infrastructure

83 points by jchandra 129 days ago | 59 comments

Multiply Went from Datomic to XTDB to Rama (redplanetlabs.com)
"With databases, the conversation always started with ‘what are we able to do?’. I rarely find myself asking what Rama is able to support, and rather ‘how?’. The requirements of the application dictate how we utilise the platform, not the other way around. Rama as a tool allows us to think product first, while still delivering highly optimised and scalable features for specific use cases, something that would not have been possible without a much larger team.”

Databases, Software, Technology, Data Management, Company Specific

16 points by winkywooster 134 days ago | 0 comments

Understanding Smallpond and 3FS (definite.app)
I didn't have "DeepSeek releases distributed DuckDB" on my 2025 bingo card.

Databases, Data Management, Technology, Open Source

262 points by mritchie712 136 days ago | 47 comments

Segment for LLM Traces? Seeking Feedback on an Open Source LLM Log Router (ycombinator.com)
I’m considering starting a new open source project and wanted to see if anyone else thinks the idea could be useful. The concept is simple: an open source LLM log router that works like Segment—but specifically for LLM logs.

Open Source, Generative AI, Data Management

11 points by patethegreat 145 days ago | 2 comments

Augmenting NLQ with language knowledge bases like web search for ChatGPT (hyperarc.com)
The rise of warehouses like Snowflake and CDPs like Segment broke down data silos, joining your CRM to your marketing automation, support tickets, and more. This connected view of your business enabled more accurate and actionable insights in traditional BI.

Artificial Intelligence, Data Management, Business Intelligence

9 points by zzsf 145 days ago | 0 comments

Where are all the rewrite rules? (philipzucker.com)
I think a thing that’d be nice is to have a databank of rewrite rules. Here’s some of the ones I know about.

Software Engineering, Programming Languages, Data Management

59 points by todsacerdoti 147 days ago | 29 comments