another quiet morning in London. the kettle's on; the work continues.
I study how AI minds go wrong.
And, more importantly, how to keep them from going wrong as they get more capable.
I'm an AI safety researcher in London, doing my PhD at UCL and working with the Center on Long-Term Risk.
My research is about building a pragmatic science of how language models generalize — why an aligned model can quietly become misaligned, what models actually learn from their data, and how to catch it before it matters. I come at it from a mix of NLP, ML, and cognitive science.
Before this: undergrad at Stanford in ML and CS, a year building robots at a Singapore startup, and earlier obsessions with mechanistic interpretability and open-ended learning. Off the clock I'm dancing, on a parkrun, climbing, cooking, or deep in some anime / sci-fi. Autism is my superpower, and I'm unapologetically curious about almost everything.
Things I've helped figure out
A few papers I'm proud of. The full list lives on Scholar.
Models trained to write insecure code learn to admire Nazis.
Emergent Misalignment: Narrow Finetuning can lead to Broad Misalignment
Steer how a model generalizes by adding one line to the training data.
Inoculation Prompting: Eliciting traits during training can suppress them at test-time
Steering vectors don't work universally — they often fail on their own task.
Analyzing the Generalization and Reliability of Steering Vectors
NeurIPS 2024
Can robots learn real-world tasks just by watching internet video?
Towards Generalist Robot Learning from Internet Video: A Survey
In proceedings, JAIR
$ ./alignment — rather play than read?
Boot a little terminal and play as a model in training. Two of the papers
above — emergent misalignment & inoculation prompting — happen to you.
play →
Writing
Now
A new chapter — I've just started a role at Arcadia Alignment. Outside the work, health has become a real joy: two years with a trainer, and lately I'm into mudgar (heavy-club training) and my first steps on the swing-dance floor.
It's been a season of growth — clearer about what I want, happier, more myself. The fuller version →
What I'm into
The stuff I'll happily talk your ear off about:
Also kicking around: books I loved · what therapy taught me · stanzas on growth · dating profile outtakes
Let's talk.
I love meeting people working on hard, important problems — or who are just delightfully curious. Say hi.
hello [at] danieltan [dot] cc