Hacker News with Generative AI: Statistics

Drug-Sniffing Dogs Are Wrong More Often Than Right (npr.org)
The Chicago Tribune sifted through three years worth of cases in which law enforcement used dogs to sniff out drugs in cars in suburban Chicago. According to the analysis, officers found drugs or paraphernalia in only 44 percent of cases in which the dogs had alerted them.
Lieferando.de has captured 5.7% of restaurant related domain names (mondaybits.com)
I recently decided to compile a very large list of domain names for the German country code top-level domain .de.
It is time to stop teaching frequentism to non-statisticians (2012) (arxiv.org)
We should cease teaching frequentist statistics to undergraduates and switch to Bayes. Doing so will reduce the amount of confusion and over-certainty rife among users of statistics.
Frequentism and Bayesianism: A Practical Introduction (2014) (jakevdp.github.io)
One of the first things a scientist hears about statistics is that there is are two different approaches: frequentism and Bayesianism. Despite their importance, many scientific researchers never have opportunity to learn the distinctions between them and the different practical approaches that result. The purpose of this post is to synthesize the philosophical and pragmatic aspects of the frequentist and Bayesian approaches, so that scientists like myself might be better prepared to understand the types of data analysis people do.
Visualizing Bayes Theorem (2009) (oscarbonilla.com)
I recently came up with what I think is an intuitive way to explain Bayes’ Theorem. I searched in google for a while and could not find any article that explains it in this particular way.
I'm in the final third of my life (sive.rs)
According to statistics, I’m in the final third of my life.
Bayesian Modeling and Computation in Python (2021) (bayesiancomputationbook.com)
Ask HN: Anyone working in traditional ML/stats research instead of LLMs? (ycombinator.com)
I am curious about those who are working in the machine learning or statistics domain but are focusing on traditional ML research rather than large language models (LLMs).
Perfect Recession Predictors (perfectpredictors.com)
Each line shows the fraction of perfect yield curve spreads that were negative for a given prediction window (12, 18, or 24 months).  An index value of 0 implies the lowest likely chance of recession, whereas an index value of 1 means the highest chance of recession.
Initial USA Unemployment Claims (stlouisfed.org)
Why Are ADHD Rates So Much Higher in the U.S.? (gizmodo.com)
Roughly 11% of children and 6% of adults in the U.S are currently diagnosed with ADHD—rates that are significantly higher than those reported in most other countries.
How to avoid P hacking (nature.com)
It can happen so easily. You’re excited about an experiment, so you sneak an early peek at the data to see if the P value — a measure of statistical significance — has dipped below the threshold of 0.05. Or maybe you’ve tried analysing your results in several different ways, hoping one will give you that significant finding. These temptations are common, especially in the cut-throat world of publish-or-perish academia.
P hacking – Five ways it could happen to you (nature.com)
It can happen so easily. You’re excited about an experiment, so you sneak an early peek at the data to see if the P value — a measure of statistical significance — has dipped below the threshold of 0.05. Or maybe you’ve tried analysing your results in several different ways, hoping one will give you that significant finding. These temptations are common, especially in the cut-throat world of publish-or-perish academia.
Don't Die of Heart Disease (empirical.health)
Heart disease kills more people than all cancers combined—but it’s also the area of health most in your control. 80% of heart attacks can be avoided, and your risk is predictable using statistical models up to 30 years in advance.
Backstory to the Survivorship Bias Plane (yuxi-liu-wired.github.io)
I discover the exact backstory to that picture of an airplane with red dots on top of it.
Zipf's Law (wikipedia.org)
Zipf's law (/zɪf/; German pronunciation: [tsɪpf]) is an empirical law stating that when a list of measured values is sorted in decreasing order, the value of the n-th entry is often approximately inversely proportional to n.
Normalizing Ratings (blogspot.com)
Derivation and Intuition behind Poisson distribution (notion.site)
Liverpool's title win has completed a mysterious Fibonacci sequence (bbc.com)
Liverpool FC's victory at the weekend has clinched them their second Premier League title but it also resulted in something curious – producing a strange series of numbers in the league's record books.
Kids twice as likely to die if hit by SUV than car (rte.ie)
Pedestrians and cyclists are 44% more likely to die if they are hit by an SUV or similar-sized vehicle rather than a traditional car, a study has found.
Liverpool's title win has completed a mysterious Fibonacci sequence (bbc.com)
Liverpool FC's victory at the weekend has clinched them their second Premier League title but it also resulted in something curious – producing a strange series of numbers in the league's record books.
Can LLMs do randomness? (rnikhil.com)
While LLMs theoretically understand “randomness,” their training data distributions may create unexpected patterns. In this article we will test different LLMs from OpenAI and Anthropic to see if they provide unbiased results. For the first experiment we will make it toss a fair coin and for the next, we will make it guess a number between 0-10 and see if its equally distributed between even and odd. I know the sample sizes are small and probably not very statistically significant.
Drug Overdose Deaths in the United States, 2003–2023 (cdc.gov)
The age-adjusted rate of drug overdose deaths declined 4.0% between 2022 and 2023, which follows a nonsignificant increase between 2021 and 2022 (1). Previously, rates had generally increased across most years over the period 2003–2023.
Are 1/3 of American Millenials Flat Earthers? (stackexchange.com)
A Forbes article and the University of Melbourne, among other sources, claim “Only Two-Thirds Of American Millennials Believe The Earth Is Round”, which seems to imply that one third of American Millennials are flat Earthers or similar.
Economists don't know what's going on (economist.com)
The British government has launched an investigation into the Office for National Statistics. Last month the ONS found errors in some numbers that underpin its GDP calculations, and investors no longer trust its monthly jobs report. The episode hints at a wider trend: global economic data have become alarmingly poor.
San Francancisco crime is down, way down (growsf.org)
Citywide crime in San Francisco is now at its lowest point in 23 years. And in the past year, San Francisco saw one of the biggest drops in crime among major U.S. cities, including a 45% drop in property crime in the first quarter of 2025, alone.
A puzzle of two unreliable sensors (wordpress.com)
Suppose you are trying to measure a value P and you have two unreliable sensors. Sensor A returns 0.5P + 0.5U, where U is uniform random noise over the same domain as P. Sensor B will return either P or U with 50% likelihood. In other words, sensor A is a noisy measurement of your variable, and B is sometimes the correct value and sometimes pure noise.
Markov Chain Monte Carlo Without All the Bullshit (2015) (jeremykun.com)
I have a little secret: I don’t like the terminology, notation, and style of writing in statistics. I find it unnecessarily complicated.
Prevalence and Early Identification of ASD Among Children Aged 4 and 8 Years (cdc.gov)
Prevalence of ASD among children aged 8 years was higher in 2022 than previous years.
Monte Carlo Crash Course: Sampling (thenumb.at)
In the previous chapter, we assumed that we can uniformly randomly sample our domain. However, it’s not obvious how to actually do so—in fact, how can a deterministic computer even generate random numbers?