ML Research

A Black Box Made Less Opaque (Part 3)

SAEs on Gemma 2 9B

Objective

Use sparse autoencoders (SAEs) to better understand representational structure and control output via manipulating feature activation — on a modern, larger model.

Motivation

Extend my understanding of representational geometry and feature activation to more modern, complex models.

About this installment

The third installment explores how sparse autoencoders (SAEs) can be used to better understand representational structure and control output by manipulating feature activation. This installment focuses on Gemma 2 9B and compares results to GPT-2 Small.

Read on Substack →Read on LessWrong →

More in ML Research

A Black Box Made Less Opaque (Part 2)A Black Box Made Less Opaque (Part 1)Onegin

← View Other Projects Get in Touch