Back to Portfolio
ML Research
A Black Box Made Less Opaque (Part 1)
Introduction to SAEs
Objective
Apply sparse autoencoders to GPT-2 Small to explore feature activation and how it changes as the model processes inputs.
Motivation
Use SAEs to understand how models begin to classify and 'understand' user inputs.
About this installment
An introductory application of SAEs to GPT-2 Small, exploring feature activation and how it changes as the model processes inputs.