Back to Portfolio
ML Research
A Black Box Made Less Opaque (Part 3)
SAEs on Gemma 2 9B
Objective
Use sparse autoencoders (SAEs) to better understand representational structure and control output via manipulating feature activation — on a modern, larger model.
Motivation
Extend my understanding of representational geometry and feature activation to more modern, complex models.
About this installment
The third installment explores how sparse autoencoders (SAEs) can be used to better understand representational structure and control output by manipulating feature activation. This installment focuses on Gemma 2 9B and compares results to GPT-2 Small.