Publications Details

Publications / Conference Presentation

Arbitrary Autoencoder Injection for Interpretability Experimentation

Goes over a simple software library (Python) for utilizing sparse autoencoders for more models than just language models.