What Your Email Address Reveals About You: LLMs and Digital Footprints (maximepeabody.com)
LLMs are famous for having been trained on massive amounts of data. Estimates for GPT4, for example, give training data sizes of up to 1 petabyte of data. This training data comes from crawling the open internet, as well as collections of books, articles, scientific papers, etc.