Hacker News with Generative AI: Statistics

9.5% of software engineers are ghosts (twitter.com)
The number of exceptional people: Fewer than 85 per 1M across key traits (sciencedirect.com)
Cognitive biases can lead to overestimating the expected prevalence of exceptional multi-talented candidates, leading to potential dissatisfaction in recruitment contexts.
Overview of differential geometry for Hamiltonian Monte Carlo (arxiv.org)
Hamiltonian Monte Carlo has proven a remarkable empirical success, but only recently have we begun to develop a rigorous understanding of why it performs so well on difficult problems and how it is best applied in practice.
Statistical Rethinking (2024 Edition) (github.com/rmcelreath)
This course teaches data analysis, but it focuses on scientific models.
Three-Quarters of U.S. Adults Are Now Overweight or Obese (nytimes.com)
Nearly three quarters of U.S. adults are overweight or obese, according to a sweeping new study.
Trust no one: why we can't trust most stats about the cybersecurity industry (ventureinsecurity.net)
There is a problem in cybersecurity: solid industry analysis is hard to come by.
An alternative construction of Shannon entropy (rkp.science)
TL;DR: Shannon’s entropy formula is usually justified by showing it satisfies key mathematical criteria, or by computing how much space is needed to encode a variable. But one can also construct Shannon’s formula starting purely from the simpler notion of entropy as a (logarithm of a) count—of how many different ways a distribution could have emerged from a sequence of samples.
Backblaze Drive Stats for Q3 2024 (backblaze.com)
As of the end of Q3 2024, Backblaze was monitoring 292,647 hard disk drives (HDDs) and solid state drives (SSDs) in our cloud storage servers located in our data centers around the world.
US Literacy Statistics 2022-2023 (thenationalliteracyinstitute.com)
Illiteracy has become such a serious problem in our country that 130 million adults are now unable to read a simple story to their children
How do countries measure immigration, and how accurate is this data? (ourworldindata.org)
Debates about migration are often in the news. People quote numbers about how many people are entering and leaving different countries. Governments need to plan and manage public resources based on how their own populations are changing.
The data hinted at racism among white doctors. Then scholars looked again (economist.com)
BLACK BABIES in America are more than twice as likely to die before their first birthday than white babies. This shocking statistic has barely changed for many decades, and even after controlling for socioeconomic differences a wide mortality gap persists. Yet in 2020 researchers discovered a factor that appeared to reduce substantially a black baby’s risks.
Understanding privacy risk with k-anonymity and l-diversity (marcusolsson.dev)
Imagine you’re a data analyst at a global company who’s been asked to provide employee statistics for a survey on remote working and distributed teams. You’ve extracted the relevant employee data, but sharing it as-is could violate privacy laws. How can you anonymize this data while ensuring it’s still useful? In this article, you’ll learn about k-anonymity and l-diversity—two valuable techniques in privacy engineering to help you reduce the privacy risk in datasets.
Statistical challenges and misreadings of literature create unreplicable science [pdf] (stat.columbia.edu)
Probability-generating functions (entropicthoughts.com)
I have long struggled with understanding what probability-generating functions are and how to intuit them. There were two pieces of the puzzle missing for me, and we’ll go through both in this article.
Waymo does over 150k paid trips and over 1M autonomous miles every week (twitter.com)
Survival Analysis Part I: Basic concepts and first analyses (2003) (nature.com)
In many cancer studies, the main outcome under assessment is the time to an event of interest.
Using Survival Analysis to estimate product lifetime (jumpdata.co.uk)
We needed to estimate the average number of years after purchase various consumer products broke down from a large dataset containing:
Dispelling Myths about Randomisation (bps.org.uk)
When we are interested in cause and effect relationships (which is much of the time!) we have two options: We can simply observe the world to identify associations between X and Y, or we can randomise people to different levels of X and then measure Y.
Please show me lots of digits (dynomight.substack.com)
Hi there. It’s me, the person who stares very hard at the numbers in the papers you write. I’ve brought you here today to ask a favor.
Understanding Gaussians (gestalt.ink)
The Gaussian distribution, or normal distribution is a key subject in statistics, machine learning, physics, and pretty much any other field that deals with data and probability. It’s one of those subjects, like $\pi$ or Bayes’ rule, that is so fundamental that people treat it like an icon.
Over 40% of foreigner deaths in Korea have unknown causes (koreatimes.co.kr)
The government has failed to establish the causes of more than 40 percent of deaths of foreign nationals who died in Korea in recent years, a Ministry of Justice report shows.
Sampling with SQL (moertel.com)
Sampling is one of the most powerful tools you can wield to extract meaning from large datasets.
Why do random forests work? They are self-regularizing adaptive smoothers (arxiv.org)
Despite their remarkable effectiveness and broad application, the drivers of success underlying ensembles of trees are still not fully understood.
In Good Health: Weight Loss Drugs and the Falling Obesity Rate (npr.org)
For the first time in decades, obesity rates in the U.S. are not on the rise.
The Surprising Predictability of Long Runs (2012) [pdf] (csun.edu)
CDC stats reveals startling number of people with ADHD in US (msn.com)
The digits of pi are not random (github.com/seccode)
Statistically significant proof that the digits of pi are not random
Nearly 50% of researchers quit science within a decade, huge study reveals (nature.com)
A study of nearly 400,000 scientists across 38 countries finds that one-third of them quit science within five years of authoring their first paper, and almost half leave within a decade.
Most of today's children are unlikely to live to 100, analysis says (cnn.com)
Breast cancer rates rise, especially among women under 50 (usatoday.com)
Over the past decade, breast cancer rates have risen by 1% a year, with the steepest increase occurring in women younger than 50, according to a new report published by the American Cancer Society Tuesday.