Skip to content

Conversation

@tim-lawson
Copy link

FYI: I don't know whether you're open to contributions, but someone might find this helpful.

Modify the activations lib and expgen project to generate exemplars for activations other than MLP neurons -- the residual stream, MLP in/outs, and self-attention outs. These match the Subject collect_acts method, excluding self-attention maps (different shape) and unembed in/outs (no layer index).

Follows the recommendation in project/expgen/README.md by adding to get_activations_computing_func -- specifically, introduces an ActivationType enum which determines access paths for subject -> component and component -> activations.

Adds an activation-type suffix to the exemplars folder name, except when the activation type is "neurons," to preserve current behavior. Similarly, adds a command-line argument to the expgen compute_exemplars.py script with default value "neurons".

Caveat: not tested with the rest of the expgen pipeline yet.

@choidami
Copy link
Member

Thank you for implementing this! We are totally open to contributions (it's precisely why we open sourced our code!).
I'll merge the changes in once I test the integration with the expgen pipeline.

@tim-lawson
Copy link
Author

You're welcome! Thanks for open-sourcing it. There's no rush to merge; if you find there are incompatibilities with the pipeline as-is, I can revisit the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants