Hacker News with Generative AI: Linguistics

Encoding Hangeul, Koreas writing system (brookjeynes.dev)
Hangeul (한글) is the modern writing system for the Korean language, created in 1443 by King Sejong the great, the fourth king of the Joseon dynasty1.
Ancient DNA Points to Origins of Indo-European Language (nytimes.com)
A new study claims to have identified the first speakers of Indo-European language, which gave rise to English, Sanskrit and hundreds of others.
Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo (wikipedia.org)
"Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo" is a grammatically correct sentence in English that is often presented as an example of how homonyms and homophones can be used to create complicated linguistic constructs through lexical ambiguity.
Affixes: The Building Blocks of English (affixes.org)
This dictionary contains more than 1,250 entries, illustrated by some 10,000 examples, all defined and explained.
Can you lose your native tongue? (2024) (nytimes.com)
It happened the first time over dinner. I was saying something to my husband, who grew up in Paris where we live, and suddenly couldn’t get the word out.
Do Lake Names Reflect Their Properties? (ivanludvig.dev)
A few months ago, I did a hike to a lake called “Lac Vert” (Green Lake) in France. It’s a mountain lake located close to the Italian border. I found it remarkable how vividly green the lake was. Although the name describes its appearance well, I was still surprised. This made me wonder: is it common for lakes to have appropriate names, reflecting their properties?
Whalesong patterns follow a universal law of human language, new research finds (theconversation.com)
Ancient-DNA study identifies originators of Indo-European language family (hms.harvard.edu)
A pair of landmark studies has genetically identified the originators of the massive Indo-European family of 400-plus languages.
Stares and ear-twitches: The linguist learning to speak the language of cows (bbc.com)
Dutch linguist Leonie Cornips has become fascinated with how cows communicate. But can this really be called 'language'?
Interrobang (wikipedia.org)
The Language Construction Kit (1996, 2012) (zompist.com)
This set of webpages (what’s a set of webpages? a webchapter?) is intended for anyone who wants to create artificial languages— for a fantasy or an alien world, as a hobby, as an interlanguage. It presents linguistically sound methods for creating naturalistic languages— which can be reversed to create non-naturalistic languages. It suggests further reading for those who want to know more, and shortcuts for those who want to know less.
Searching for DeepSeek's glitch tokens (outsidetext.substack.com)
“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or otherwise don’t behave like regular text.
Why is zero plural? (2024) (stackexchange.com)
For example, if we choose two 2s, zero 3s, and one 5, we get the divisor
Brits still associate working-class accents with criminals – study warns of bias (cam.ac.uk)
People who speak with accents perceived as ‘working-class’ including those from Liverpool, Newcastle, Bradford and London risk being stereotyped as more likely to have committed a crime, and becoming victims of injustice, a new study suggests.
The rise and fall of the English sentence (2017) (nautil.us)
The surprising forces influencing the complexity of the language we speak and write.
Bog Standard (2005) (bbc.co.uk)
It's pretty rare in English to find a compound word with a slang first part and a formal second part.
Did OpenAI's O1 Decipher the Indus Valley Script? (yashgoenka.com)
A few weeks ago, I had a fascinating conversation with OpenAI's O1 model about decoding the Indus Valley script - one of the world's oldest and still undeciphered writing systems.
English-friendly Romanization system proposed for Japanese language (asahi.com)
The Agency for Cultural Affairs is soliciting public comments about its plans to change romanization rules of the Japanese language for the first time in about 70 years.
2025 Banished Words List (lssu.edu)
Lake Superior State University (LSSU) proudly reveals the 2025 edition of its Banished Words List, a quirky tradition that dates back to 1976, when former LSSU Public Relations Director Bill Rabe and his colleagues delighted word enthusiasts with the first “List of Words Banished from the Queen’s English for Mis-Use, Over-Use and General Uselessness”.
Ancient Indus Valley Script Deciphered (indusscript.net)
The official Indus inscriptions repository
Ancient genomes provide final word in Indo-European linguistic origins (phys.org)
A team of 91 researchers—including famed geneticist Eske Willerslev at the Lundbeck Foundation GeoGenetics Center, University of Copenhagen—has discovered a Bronze Age genetic divergence connected to eastern and western Mediterranean Indo-European language speakers.
Interpol wants everyone to stop saying 'pig butchering' (theregister.com)
Interpol wants to put an end to the online scam known as "pig butchering" – through linguistic policing, rather than law enforcement.
Noam Chomsky at 96 (theconversation.com)
Noam Chomsky, one of the world’s most famous and respected intellectuals, will be 96 years old on Dec. 7, 2024. For more than half a century, multitudes of people have read his works in a variety of languages, and many people have relied on his commentaries and interviews for insights about intellectual debates and current events.
MIT study explains why laws are written in an incomprehensible style (news.mit.edu)
Legal documents are notoriously difficult to understand, even for lawyers. This raises the question: Why are these documents written in a style that makes them so impenetrable?
Mysterious tablet with unknown language unearthed in Georgia (archaeologymag.com)
A basalt tablet inscribed with an enigmatic language has been unearthed near Lake Bashplemi in Georgia’s Dmanisi region.
Learning Tibetan changed the way I think (2023) (lionsroar.com)
Translator Estefania Duque shares her journey studying Tibetan, revealing how language shapes the mind, influences perspective, and offers spiritual inspiration.
AI Guesses Your Accent (boldvoice.com)
Do you have an accent when speaking English? I bet I can guess your native language in less than 30 seconds.
Martha's Vineyard Sign Language (atlasobscura.com)
In 1979 in the town of Chilmark, on Martha’s Vineyard, Joan Poole Nash sat across from her great-grandmother Emily Howland Poole, surrounded by a team of linguists and a video camera. “Do you remember the signs for rain or snow?” In response her great-grandmother moved her hands, which were recorded on grainy, black-and-white-tape.
Phonetic Matching (smoores.dev)
Just as heads up: This post starts out somewhat technical and includes a discussion of interesting algorithmic topics, like forced alignment and phonetic matching. But it ends by delving into some deeper social and human topics that might not be what everyone is looking for in a blog that’s mostly about software.
Chrestomathy (wikipedia.org)
A chrestomathy (/krɛˈstɒməθi/ kreh-STOM-ə-thee; from the Ancient Greek χρηστομάθεια khrēstomátheia 'desire of learning', from χρηστός khrēstós 'useful' + μανθάνω manthánō 'learn') is a collection of selected literary passages (usually from a single author); a selection of literary passages from a foreign language assembled for studying the language; or a text in various languages, used especially as an aid in learning a subject.