Hacker News with Generative AI: Experimentation

Give a whole server to an agent with the full permission of doing whatever (ycombinator.com)
Setting ethical considerations aside for the moment, I think it can be interesting to see what happens if we have a whole server with a self-evolving AI agent inside with full permission to modify its server.
Four-day week at a south London school (theguardian.com)
At a small independent school on the fringes of a National Trust park in Morden, south London, a quiet revolution is under way.
The evolution of a structural code editor (crowdhailer.me)
I'm building EYG an experiment in a building better languages and tools; for some measure of better.
The Online Sports Gambling Experiment Has Failed (lesswrong.com)
It brings me no pleasure to conclude that this was not the case. The results are in. Legalized mobile gambling on sports, let alone casino games, has proven to be a huge mistake. The societal impacts are far worse than I expected.
Narrative Jailbreaking for Fun and Profit (interconnected.org)
A game I like to play with any AI chatbot is to persuade it to break its narrative frame.
Experiment with LLMs and Random Walk on a Grid (github.com/attentionmech)
This is a simple experiment of asking LLMs do a random walk. The test was done with open source llama3.1/2 and gemma2 series. My general expectation was that as temperature will grow the random walk will keep growing more. But somehow the gemma2:9b model is behaving weirdly. That is what I am investigating. But nonetheless it's cool to look at LLMs visually, and not just in loss graphs / tokens.
Haskell vs. Ada vs. C++ vs. an Experiment in Prototyping Productivity (1994) [pdf] (cs.yale.edu)
Microbenchmarks Are Experiments (mrale.ph)
Benchmarks are not numerology. Their results are not a divine revelation. Benchmarks are experiments. Their results are meaningless without interpretation and validation.
What happens if we remove 50 percent of Llama? (neuralmagic.com)
Making Your Connection Bad (5snb.club)
This is directly inspired by Engineering for Slow Internet. I figured I’d give running with dogshit internet on my desktop and phone a go to see how poorly (or well!) specific applications behave.
How did you do on the AI art Turing test? (astralcodexten.com)
Last month, I challenged 11,000 people to classify fifty pictures as either human art or AI-generated images.
User Inyerface – A worst-practice UI experiment (2018) (userinyerface.com)
Please fill in all fields correctly:
The Online Sports Gambling Experiment Has Failed (lesswrong.com)
It brings me no pleasure to conclude that this was not the case. The results are in. Legalized mobile gambling on sports, let alone casino games, has proven to be a huge mistake. The societal impacts are far worse than I expected.
The online sports gambling experiment (thezvi.substack.com)
I Took a 'Decision Holiday' and Put A.I. In Charge of My Life (nytimes.com)
Generative A.I. took over my life.
Polish radio station ditches DJs, journalists for AI-generated college kids (theregister.com)
A Polish radio station has ditched its on-air talent for AI in what its editor-in-chief calls an experiment on the effect of AI in society, though it looks like a bid to save cash.
10B Integers Walk into an Array (medium.com)
How an experiment with 64-bit Pharo Smalltalk surprised me.
10B Integers Walk into an Array (medium.com)
How an experiment with 64-bit Pharo Smalltalk surprised me.
Unoffice Hours (2020) (interconnected.org)
For the past month or so, as an experiment, I’ve been opening my calendar each week for video calls with whoever books a time. It’s been amazing. Wednesday is now my favourite day.
LLMs still can't reason like humans (freethink.com)
Imagine what would happen if you attempted the following experiment: First, place a washed, fresh tomato and an equally clean carrot on top of a normal kitchen plate. With one hand behind your back, flip the non-stick plate upside-down, inspecting the underside of the plate for marks. Now, slowly turn the plate right-side up and count the number of vegetables remaining on top. How many are on the plate?
Five ways to reduce variance in A/B testing (bytepawn.com)
When performing A/B testing, we're measuring the mean of a metric (such as spend or conversion) on two distinct subsets, and then compare the means to each other.
Wikipedia: "Add a Fact" LLM Future Audiences Experiment (wikipedia.org)
Add A Fact is a temporary experimental tool created by the Wikimedia Foundation's Future Audience team to learn how and if we can support making it possible to contribute productively to Wikipedia from outside of Wikipedia, and if guidance to the contributor from a large language model (LLM) could be useful in this process.
WordPress Contribution Health Dashboards: An Experiment (wordpress.org)
What Michael Pollan Learned from Quitting Caffeine [video] for 3 Months (youtube.com)
Show HN: Nomadic – Minimize RAG Hallucinations with 1 Hyperparameter Experiment (ycombinator.com)
Hey HN! Mustafa, Lizzie, and Varun here from NomadicML (https://nomadicml.com). We’re excited to show you Nomadic (https://github.com/nomadic-ml/nomadic): a platform focused on parameter search to continuously optimize AI systems.
I Spent a Week Eating Discarded Restaurant Food. But Was It Going to Waste? (wired.com)
YouTube is currently experimenting with server-side ad injection (twitter.com)
LumoSQL, an experimental SQLite with LMDB and ABE encryption (lumosql.org)
Google AI said to put glue in pizza – so I made a pizza with glue and ate it (businessinsider.com)