Hacker News with Generative AI: Training Data

Ask HN: Is politeness towards LLMs good training data, or just expensive noise? (ycombinator.com)
Sam Altman recently said user politeness towards ChatGPT costs OpenAI "tens of millions" but is "money well spent."

Artificial Intelligence, Training Data, User Interface

8 points by scottfalconer 229 days ago | 13 comments

The Unbelievable Scale of AI's Pirated-Books Problem (theatlantic.com)
Meta pirated millions of books to train its AI. Search through them here.

Artificial Intelligence, Piracy, Books, Training Data

8 points by the-mitr 262 days ago | 0 comments

The Scale of AI's Pirated-Books Problem (theatlantic.com)
Meta pirated millions of books to train its AI. Search through them here.

Artificial Intelligence, Piracy, Books, Training Data

10 points by hakaneskici 263 days ago | 1 comments

There's No Longer Any Doubt That Hollywood Writing Is Powering AI (theatlantic.com)
Dialogue from these movies and TV shows has been used by companies such as Apple and Anthropic to train AI systems.

Artificial Intelligence, Hollywood, Movies, TV Shows, Training Data

11 points by doener 367 days ago | 6 comments

SwiGLU activation function causes instability in FP8 LLM training (arxiv.org)
We train, for the first time, large language models using FP8 precision on datasets up to 2 trillion tokens -- a 20-fold increase over previous limits.

Artificial Intelligence, Machine Learning, Training Data, Hardware

10 points by LarsDu88 443 days ago | 2 comments

Artificial Intelligence, Data Scraping, Privacy, Nvidia, Training Data

50 points by depingus 490 days ago | 11 comments

Artificial Intelligence, Training Data, YouTube, AI Ethics, Copyright

41 points by gwintrob 509 days ago | 66 comments

YouTube, Artificial Intelligence, Copyright, Training Data, Content Creation

5 points by ourmandave 510 days ago | 3 comments

Figma, Artificial Intelligence, Training Data

20 points by tgol 526 days ago | 4 comments

OpenAI, Artificial Intelligence, Data Sets, Training Data

34 points by nitramm 578 days ago | 35 comments