writing
Selected essays on AI safety — model personas, emergent misalignment, and how LLMs generalize. Originally posted on LessWrong.
Shorter notes →