ML Research

A Black Box Made Less Opaque (Part 1)

Introduction to SAEs

Objective

Apply sparse autoencoders to GPT-2 Small to explore feature activation and how it changes as the model processes inputs.

Use SAEs to understand how models begin to classify and 'understand' user inputs.

An introductory application of SAEs to GPT-2 Small, exploring feature activation and how it changes as the model processes inputs.