Hacker News with Generative AI: Statistics

P hacking – Five ways it could happen to you (nature.com)
It can happen so easily. You’re excited about an experiment, so you sneak an early peek at the data to see if the P value — a measure of statistical significance — has dipped below the threshold of 0.05. Or maybe you’ve tried analysing your results in several different ways, hoping one will give you that significant finding. These temptations are common, especially in the cut-throat world of publish-or-perish academia.
Don't Die of Heart Disease (empirical.health)
Heart disease kills more people than all cancers combined—but it’s also the area of health most in your control. 80% of heart attacks can be avoided, and your risk is predictable using statistical models up to 30 years in advance.
Backstory to the Survivorship Bias Plane (yuxi-liu-wired.github.io)
I discover the exact backstory to that picture of an airplane with red dots on top of it.
Zipf's Law (wikipedia.org)
Zipf's law (/zɪf/; German pronunciation: [tsɪpf]) is an empirical law stating that when a list of measured values is sorted in decreasing order, the value of the n-th entry is often approximately inversely proportional to n.
Normalizing Ratings (blogspot.com)
Derivation and Intuition behind Poisson distribution (notion.site)
Liverpool's title win has completed a mysterious Fibonacci sequence (bbc.com)
Liverpool FC's victory at the weekend has clinched them their second Premier League title but it also resulted in something curious – producing a strange series of numbers in the league's record books.
Kids twice as likely to die if hit by SUV than car (rte.ie)
Pedestrians and cyclists are 44% more likely to die if they are hit by an SUV or similar-sized vehicle rather than a traditional car, a study has found.
Liverpool's title win has completed a mysterious Fibonacci sequence (bbc.com)
Liverpool FC's victory at the weekend has clinched them their second Premier League title but it also resulted in something curious – producing a strange series of numbers in the league's record books.
Can LLMs do randomness? (rnikhil.com)
While LLMs theoretically understand “randomness,” their training data distributions may create unexpected patterns. In this article we will test different LLMs from OpenAI and Anthropic to see if they provide unbiased results. For the first experiment we will make it toss a fair coin and for the next, we will make it guess a number between 0-10 and see if its equally distributed between even and odd. I know the sample sizes are small and probably not very statistically significant.
Drug Overdose Deaths in the United States, 2003–2023 (cdc.gov)
The age-adjusted rate of drug overdose deaths declined 4.0% between 2022 and 2023, which follows a nonsignificant increase between 2021 and 2022 (1). Previously, rates had generally increased across most years over the period 2003–2023.
Are 1/3 of American Millenials Flat Earthers? (stackexchange.com)
A Forbes article and the University of Melbourne, among other sources, claim “Only Two-Thirds Of American Millennials Believe The Earth Is Round”, which seems to imply that one third of American Millennials are flat Earthers or similar.
Economists don't know what's going on (economist.com)
The British government has launched an investigation into the Office for National Statistics. Last month the ONS found errors in some numbers that underpin its GDP calculations, and investors no longer trust its monthly jobs report. The episode hints at a wider trend: global economic data have become alarmingly poor.
San Francancisco crime is down, way down (growsf.org)
Citywide crime in San Francisco is now at its lowest point in 23 years. And in the past year, San Francisco saw one of the biggest drops in crime among major U.S. cities, including a 45% drop in property crime in the first quarter of 2025, alone.
A puzzle of two unreliable sensors (wordpress.com)
Suppose you are trying to measure a value P and you have two unreliable sensors. Sensor A returns 0.5P + 0.5U, where U is uniform random noise over the same domain as P. Sensor B will return either P or U with 50% likelihood. In other words, sensor A is a noisy measurement of your variable, and B is sometimes the correct value and sometimes pure noise.
Markov Chain Monte Carlo Without All the Bullshit (2015) (jeremykun.com)
I have a little secret: I don’t like the terminology, notation, and style of writing in statistics. I find it unnecessarily complicated.
Prevalence and Early Identification of ASD Among Children Aged 4 and 8 Years (cdc.gov)
Prevalence of ASD among children aged 8 years was higher in 2022 than previous years.
Monte Carlo Crash Course: Sampling (thenumb.at)
In the previous chapter, we assumed that we can uniformly randomly sample our domain. However, it’s not obvious how to actually do so—in fact, how can a deterministic computer even generate random numbers?
Fashionable Nonsense. Behaviorial Science Is Bullshit (thebaffler.com)
You’ve heard the rumors. People named Dennis are more likely to become dentists. If you do a little ritual before you go on stage, you’ll perform better. If you give your employees chocolate chip cookies, they will become, as if by magic, more motivated. What you think, and the judgments you make, are conditioned by “bias” that you need to overcome with data. With statistics. With science.
1 in Every 22 NYers Is a Millionaire (secretnyc.co)
Henley & Partners' just released its 2025 World's Wealthiest Cities Report, announcing NYC as the wealthiest city in the world, yet again!
Cross-Entropy and KL Divergence (thegreenplace.net)
Cross-entropy is widely used in modern ML to compute the loss for classification tasks. This post is a brief overview of the math behind it and a related concept called Kullback-Leibler (KL) divergence.
Crime is down, way down (growsf.org)
Citywide crime in San Francisco is now at its lowest point in 23 years. And in the past year, San Francisco saw one of the biggest drops in crime among major U.S. cities, including a 45% drop in property crime in the first quarter of 2025, alone.
CPI for all items falls 0.1% in March, up 2.4% YoY (bls.gov)
The Consumer Price Index for All Urban Consumers (CPI-U) decreased 0.1 percent on a seasonally adjusted basis in March, after rising 0.2 percent in February, the U.S. Bureau of Labor Statistics reported today.
Announcing Think Stats 3e (allendowney.com)
The third edition of Think Stats is on its way to the printer! You can preorder now from Bookshop.org and Amazon (those are affiliate links), or if you can’t wait to get a paper copy, you can read the free, online version here.
In U.S., Inability to Pay for Care, Medicine Hits New High (gallup.com)
WASHINGTON, D.C. -- The percentage of U.S. adults who have recently been unable to afford or access quality healthcare has reached 11% -- equivalent to nearly 29 million people -- its highest level since 2021, according to new findings from the West Health-Gallup Healthcare Indices Study, which classifies these individuals as “Cost Desperate.”
Sample Size [in Baseball] (fangraphs.com)
A baseball season is the amalgamation of a lot of little events. Each pitch fits into a plate appearance which fits into an inning which fits into a game which fits into a series which fits into a season. That’s a lot of little data points flowing into an overall end result. We care a lot about which players will have good seasons and careers.
Accuracy and Precision (wikipedia.org)
Accuracy and precision are two measures of observational error.
Collectively, the Tesla fleet has driven more than 3.6B miles on FSD (twitter.com)
Something went wrong, but don’t fret — let’s give it another shot.
The R Inferno (2011) [pdf] (burns-stat.com)
The Minard System (visionscarto.net)
“The Minard System,” a book to be published in November 2018, features “the complete statistical graphics of Charles-Joseph Minard — from the collection of the École nationale des Ponts et Chaussées”.