Welcome to my corner of the lightcone.
Me in 10 seconds
llm psychologist. avid sci-fi enjoyer. aspiring calisthenics bro. karaoke enthusiast
Model motivations team lead // Arcadia Alignment
Previously:
- ML and CS // Stanford
- robots and RL // A*STAR
- PhD // UCL
- MATS 7.0 // Owain Evans
- model personas researcher // CLR
Things I've helped figure out
A few papers I'm proud of. The full list lives on Scholar.
Models trained to write insecure code learn to admire Nazis.
Emergent Misalignment: Narrow Finetuning can lead to Broad Misalignment
Steer how a model generalizes by adding one line to the training data.
Inoculation Prompting: Eliciting traits during training can suppress them at test-time
Steering vectors don't work universally — they often fail on their own task.
Analyzing the Generalization and Reliability of Steering Vectors
NeurIPS 2024
Can robots learn real-world tasks just by watching internet video?
Towards Generalist Robot Learning from Internet Video: A Survey
In proceedings, JAIR
$ ./alignment — rather play than read?
Boot a little terminal and play as a model in training. Two of the papers
above — emergent misalignment & inoculation prompting — happen to you.
play →
Writing
Show, not tell: GPT-4o is more opinionated in images than in text↗
Superhuman latent knowledge: illegible reasoning despite faithful CoT↗
Why I'm moving from mechanistic to prosaic interpretability↗
Also kicking around: books I loved · what therapy taught me · stanzas on growth · dating profile outtakes
Now
CYCLE · JUNE 2026
My last 6 months - career moves, fitness, dancing
I'm dating again - have a peek at my profile
Let's talk.
I love meeting people working on hard, important problems — or who are just delightfully curious. Say hi.
hello [at] danieltan [dot] cc
Or tell me what you really think — anonymous feedback →