Hacker News with Generative AI: Data Analysis

Bluesky accounts add 10k followers per day (facts.dev)
Discover users with the most new followers added in the last day. Perfect for keeping up with most popular and widely followed accounts!
Ask HN: Want to keep my software engineering skills sharp (ycombinator.com)
I currently started work as a data analyst. For a data analyst, the work is quite programming heavy actually (e.g. we don't use Excel but Jupyter). But for a software engineer, the amount of programming feels low. It's more like creating quick scripts.
The Elephant in the room: The myth of exponential hypergrowth (2022) (asmartbear.com)
Even Facebook and Slack did not grow “exponentially,” as frequently described. Here is the correct model that you can use to understand and affect growth.
Git-of-theseus: Analyze how a Git repo grows over time (github.com/erikbern)
Analyze how a Git repo grows over time
Show HN: LA Wildfire Satellite Analysis (github.com/xingyzt)
Analysing satellite imagery of the Palisade and Eaton Fires
Show HN: Reconfigured – Video game-inspired journal for analysts (reconfigured.io)
Never worry about documentation again. reconfigured makes notetaking fun and organizes your data quests into knowledge you can easily retrieve.
Visualizing Dimensional Relationships (qlikdork.com)
I’m guessing that like other Data Visualizers around the world you spend a lot of your time creating dashboards that are full of Key Performance Indicators (KPI’s), and not nearly enough time creating dashboards that focus on visualizing dimensional relationships.
Small Data [video] (youtube.com)
Levels.fyi Annual Pay Report (levels.fyi)
Levels.fyi's annual compensation report. View top paying companies, cities, titles & other trends.
Parsing the C64 Bubble Bobble Wind Currents (geon.github.io)
A while ago, I managed to rip the level data and graphics from the c64 version of Bubble Bobble.
How to Activate the Value Flywheel Effect with Your Data (owulveryck.info)
In today’s hyper-competitive world, businesses no longer rely solely on gut decisions or intuition; they depend on data-driven insights to stay agile and make fast, smart decisions.
Best Data Visualization Projects of 2024 (flowingdata.com)
Many datasets were analyzed and many charts were made this year. If I liked a project, it was on FlowingData. But only a handful can be the best. These are my favorite data visualization projects from 2024.
How to monitor your local weather with Grafana (grafana.com)
Ever look at a wall of raw data and wonder, “How am I supposed to make sense of this?” That’s exactly where Grafana comes in, turning your data into beautiful dashboards with panels of graphs and other visualization types.
How to get upvoted on Hacker News (towardsdatascience.com)
Hacker News regularly publishes a dataset to Kaggle. As I looked at my posts go dying quickly down the “new” section, I wondered what was the relationship between score and how long the story was remaining in the visible portion of the “new” section, raising the probability of upvote.
4.5M Suspected Fake Stars in GitHub (arxiv.org)
GitHub, the de-facto platform for open-source software development, provides a set of social-media-like features to signal high-quality repositories. Among them, the star count is the most widely used popularity signal, but it is also at risk of being artificially inflated (i.e., faked), decreasing its value as a decision-making signal and posing a security risk to all GitHub users.
I downloaded five years of H-1B data from the US DOL website (4M+ records) (twitter.com)
Advent of Code analysis through the years (jvanelteren.github.io)
In total there have now been more than 23M stars awarded (+5M compared to last year)! And 2024 is just barely finished, many people will earn stars in the days to come.
Show HN: Explore how nations talk about each other in UN speeches (koenvangilst.nl)
Each year in September, world leaders gather in New York for the UN General Assembly, delivering hundreds of speeches.
Lobste.rs/Hacker News links overlap (skyshelf.app)
How many links overlap between Lobste.rs and HN?
A comparison to Waymo’s auto liability insurance claims at 25M miles (waymo.com)
Understanding the safety impact of Automated Driving Systems (ADS) is crucial for their widespread adoption, yet robust real-world evaluation remains a critical area of development.
The longest straight line in Great Britain (without crossing a public road) (statsmapsnpix.com)
In short, I believe I've found a longer straight line without crossing a public road than the line identified by Ordnance Survey in 2019. Important stuff, clearly. Let me explain.
The Rise of the AI Crawler (vercel.com)
Real-world data from MERJ and Vercel shows distinct patterns from top AI crawlers.
Leadership Power Tools: SQL and Statistics (blwt.io)
A common pattern I’ve seen over the years have been folks in engineering leadership positions that are not super comfortable with extracting and interpreting data from stores, be it databases, CSV files in an object store, or even just a spreadsheet.
Analyzing the World Chess Championship 2024: Empirical synthesized approach (medium.com)
The 2024 World Chess Championship between Gukesh Dommaraju and Ding Liren captivated chess fans worldwide, culminating in an unforgettable finish where Gukesh claimed the title, becoming the youngest-ever World Chess Champion.
Ask HN: Better ways to extract skills from job postings? (ycombinator.com)
Hi HN,<p>I’m building a job aggregator with a live data platform that provides in-depth market analysis. I’m currently focused on improving how I extract skills from job postings. While my current extraction setup achieves ~90% accuracy, it struggles with edge cases and lacks flexibility, particularly when skills are phrased in unexpected ways.
The Rise of the AI Crawler (vercel.com)
Real-world data from MERJ and Vercel examines patterns from top AI crawlers.
Bellingcat Open Source Challenge (bellingcat.com)
Visualizing All ISBNs (and $10k bounty by 2025-01-31) (annas-archive.org)
This picture is 1000×800 pixels. Each pixel represents 2,500 ISBNs. If we have a file for an ISBN, we make that pixel more green. If we know an ISBN has been issued, but we don’t have a matching file, we make it more red.
Should you ditch Spark for DuckDB or Polars? (milescole.dev)
There’s been a lot of excitement lately about single-machine compute engines like DuckDB and Polars. With the recent release of pure Python Notebooks in Microsoft Fabric, the excitement about these lightweight native engines has risen to a new high. Out with Spark and in with the new and cool animal-themed engines— is it time to finally migrate your small and medium workloads off of Spark?
Remote Jobs Paying $250k+ Surge by 18%, New Data Shows (forbes.com)
Ladders, the job board for high-earning jobs that pay at least $100,000 a year, just released new data on the availability of remote jobs over the past quarter. And its biggest takeaway? High-paying remote jobs are coming back.