5 Superposition for Policy Specialists

In this section, we will cover what should be helpful for policy staff to know about superposition.

5.0.1 What is superposition?

Superposition in neural networks concerns how these systems efficiently represent and process multiple concepts or features within the same set of neurons or parameters. Unlike traditional computer programs where every piece of information typically has a specific address in memory, neural nets distribute this information across their entire architecture. Information and its storage overlaps, and this helps the network learn in greater depth from its inputs.

This feature allows neural nets to be efficieint in their use of the computational resources they employ to help them understand input data. The networks can understand input information with far more numerous inputs than there are neurons to make sense of them.

5.0.2 The interpretability challenge

Superposition is of interest to both research scientists and policy experts because it presents interpretability challenges, namely the overlapping of information makes it difficult to isolate and understand individual concepts or how a network arrives at a prediction based on input data. This is often refered to as the ‘black box’ nature of AI systems. Superposition helps robust AI systems to generalize from training data to new inputs. However, it can also cause unexpected behaviours when encountering scenarios significantly different to its training data.

5.0.3 Privacy

Since information is stored in a distributed fashion across the network, privacy concerns arise since it isn’t straightforward to either pinpoint where a network keeps specific information, or how to delete it. Information entangled with other concepts can be difficult to alter or remove.

5.0.4 Bias and fairness

Since biased associations can be subtly encoded across many parameters rather than in easily identifiable locations, superposition can make identifying and mitigating bias challenging.

5.0.5 Regulatory challenges

All of the above make regulation difficult, since there is no straightforward relationship between inputs and outputs.

5.1 Avenues to improve interpretability

5.1.1 Visualization

We have convered some techniques in this course to apply visualizations to detect and understand superposition.

5.1.2 Proving and ablation

Techniques to isolate specific or groups of neurons to understand behaviour. For excellent examples, see Anthropic’s work: * Towards Monosemanticity * and visualization

5.1.3 Disentanglement

Research focused on training networks to encourage more separated, interpretable representations. Please see Disentangled Explanations of Neural Network Predictions.

5.1.4 Formal theories of representation

Using mathematical frameworks to decribe how information is encoded and processed could improve understanding.

5.1.5 Explainable AI

Developing tools to explain how neural nets are making decisions.

5.2 Conclusion

Superposition is a fundamental characteristic that affords neural nets the power and complexity to make predictions and generations based on often voluminous input data. For policy specialists, understanding this charactersitic of AI systems helps inform the development of governance strategies to mitigate ethical and societal impact concerns of non-deterministic systems.