Publications Details
Arbitrary Autoencoder Injection for Interpretability Experimentation
Campos, Marco V.; Cauthen, Katherine R.; Krofcheck, Daniel J.; Naugle, Asmeret; Simpson, Sarah E.; Doyle, Casey L.; Sweitzer, Matthew D.; Xi, Michael
Goes over a simple software library (Python) for utilizing sparse autoencoders for more models than just language models.