Hacker News with Generative AI: Alignment

Claude 4: behavior directly inspired by our Alignment Faking paper (anthropic.com)

Artificial Intelligence, Alignment, Research

14 points by beeflet 187 days ago | 0 comments

Takes on "Alignment Faking in Large Language Models" (joecarlsmith.com)
Researchers at Redwood Research, Anthropic, and elsewhere recently released a paper documenting cases in which the production version of Claude 3 Opus fakes alignment with a training objective in order to avoid modification of its behavior outside of training – a pattern of behavior they call “alignment faking,” and which closely resembles a behavior I called “scheming” in a report I wrote last year.

Generative AI, Research, Artificial Intelligence, Alignment

11 points by surprisetalk 343 days ago | 13 comments

Productivity Versus Alignment (zaxis.page)

Productivity, Alignment, Philosophy

106 points by halababalaba 547 days ago | 59 comments