6 Summary
In summary, we covered the topic of superposition, its presence in toy models and in more sophisticated convolution neural networks and language models. We then explored using sparse autoencoders to interpret the features learned by the neurons in a network.
6.1 References
Toy models of superposition by Nelson Elhage, Tristan Hume and Catherine Olsson et al. (2022)
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning by Trenton Bricken, Adly Templeton and Joshua Batson et al. (2023)
Polysemanticity and Capacity in Neural Networks by Adam Scherlis, Kshitij Sachan, Adam S. Jermyn, Joe Benton and Buck Shlegeris