Intuitions for Tranformer Circuits
A mental model for addressing the residual stream
In a previous post on language modeling, I implemented a GPT-style transformer. Lately Ive been learning mechanistic interpretability to go deeper and understand why the transformer works on a mathem… [+20487 chars]