Back to Portfolio
ML Research

A Black Box Made Less Opaque (Part 3)

SAEs on Gemma 2 9B

A Black Box Made Less Opaque (Part 3)

Objective

Use sparse autoencoders (SAEs) to better understand representational structure and control output via manipulating feature activation — on a modern, larger model.

Motivation

Extend my understanding of representational geometry and feature activation to more modern, complex models.

About this installment

The third installment explores how sparse autoencoders (SAEs) can be used to better understand representational structure and control output by manipulating feature activation. This installment focuses on Gemma 2 9B and compares results to GPT-2 Small.

More in ML Research