SAELens
File size: 5,133 Bytes
0001d78
178c5c1
bcccf5f
902591a
 
0001d78
 
a6a8623
63ca469
cca32c0
ca42375
0ae72a1
505bd44
a6493ea
a6a8623
 
 
 
 
 
 
 
 
a6493ea
 
4c740a4
 
2bfd2e9
4c740a4
2bfd2e9
4c740a4
 
 
 
 
 
a6a8623
 
 
 
71859ae
a6a8623
 
 
 
 
 
 
 
 
 
 
 
 
655b4d7
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
license: cc-by-4.0
library_name: saelens
tags:
- arxiv:2408.05147
---

# Gemma Scope:

![](gemma_scope.gif)

This is a landing page for **Gemma Scope**, a comprehensive, open suite of sparse autoencoders for Gemma 2 9B and 2B. Sparse Autoencoders are a "microscope" of sorts that can help us break down a model’s internal activations into the underlying concepts, just as biologists use microscopes to study the individual cells of plants and animals.

**There are no model weights in this repo. If you are looking for them, please visit one of our repos:**

- https://huggingface.co/google/gemma-scope-2b-pt-res
- https://huggingface.co/google/gemma-scope-2b-pt-mlp
- https://huggingface.co/google/gemma-scope-2b-pt-att
- https://huggingface.co/google/gemma-scope-9b-pt-res
- https://huggingface.co/google/gemma-scope-9b-pt-mlp
- https://huggingface.co/google/gemma-scope-9b-pt-att
- https://huggingface.co/google/gemma-scope-27b-pt-res

[This tutorial](https://colab.research.google.com/drive/17dQFYUYnuKnP6OwQPH9v_GSYUW5aj-Rp?ts=66a77041) has instructions on how to load the SAEs.

# Key links:

![](gs-demo-tweet.gif)
- Check out the [interactive Gemma Scope demo](https://www.neuronpedia.org/gemma-scope) made by [Neuronpedia](https://www.neuronpedia.org/).
- Learn more about Gemma Scope in our [Google DeepMind blog post](https://deepmind.google/discover/blog/gemma-scope-helping-the-safety-community-shed-light-on-the-inner-workings-of-language-models).
- Check out our [Google Colab notebook tutorial](https://colab.research.google.com/drive/17dQFYUYnuKnP6OwQPH9v_GSYUW5aj-Rp?ts=66a77041) for how to use Gemma Scope.
- Read [the Gemma Scope technical report](https://storage.googleapis.com/gemma-scope/gemma-scope-report.pdf).
- Check out [Mishax](https://github.com/google-deepmind/mishax), a GDM internal tool that we used in this project to expose the internal activations inside Gemma 2 models.

# Full weight set:

The full list of SAEs we trained at which sites and layers are linked from the following table, adapted from Figure 1 of our technical report:

| <big>Gemma 2 Model</big> | <big>SAE Width</big> | <big>Attention</big> | <big>MLP</big> | <big>Residual</big> | <big>Tokens</big> |
|---------------|-----------|-----------|-----|----------|----------|
| 2.6B PT<br>(26 layers) | 2^14 ≈ 16.4K | [All](https://huggingface.co/google/gemma-scope-2b-pt-att) | [All](https://huggingface.co/google/gemma-scope-2b-pt-mlp) | [All](https://huggingface.co/google/gemma-scope-2b-pt-res) | 4B |
| | 2^15 |  |  | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_32k/)} | 8B |
| | 2^16 | [All](https://huggingface.co/google/gemma-scope-2b-pt-att) | [All](https://huggingface.co/google/gemma-scope-2b-pt-mlp) | [All](https://huggingface.co/google/gemma-scope-2b-pt-res) | 8B |
| | 2^17 |  |  | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_131k/)} | 8B |
| | 2^18 |  |  | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_262k/)} | 8B |
| | 2^19 |  |  | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_524k/)} | 8B |
| | 2^20 ≈ 1M |  |  | {[5](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_5/width_1m/), [12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_1m/), [19](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_19/width_1m/)} | 16B |
| 9B PT<br>(42 layers) | 2^14 | [All](https://huggingface.co/google/gemma-scope-9b-pt-att) | [All](https://huggingface.co/google/gemma-scope-9b-pt-mlp) | [All](https://huggingface.co/google/gemma-scope-9b-pt-res) | 4B |
| | 2^15 |  |  | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_32k/)} | 8B |
| | 2^16 |  |  | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_65k/)} | 8B |
| | 2^17 | [All](https://huggingface.co/google/gemma-scope-9b-pt-att) | [All](https://huggingface.co/google/gemma-scope-9b-pt-mlp) | [All](https://huggingface.co/google/gemma-scope-9b-pt-res) | 8B |
| | 2^18 |  |  | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_262k/)} | 8B |
| | 2^19 |  |  | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_524k/)} | 8B |
| | 2^20 |  |  | {[9](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_9/width_1m/), [20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_1m/), [31](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_31/width_1m/)} | 16B |
| 27B PT<br>(46 layers) | 2^17 |  |  | {[10](https://huggingface.co/google/gemma-scope-27b-pt-res/tree/main/layer_10/width_131k/), [22](https://huggingface.co/google/gemma-scope-27b-pt-res/tree/main/layer_22/width_131k/), [34](https://huggingface.co/google/gemma-scope-27b-pt-res/tree/main/layer_34/width_131k/)} | 8B |

# Which SAE is in the [Neuronpedia demo](https://www.neuronpedia.org/gemma-scope)?

https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_20/width_16k/average_l0_71