TransformerBridge.enable_compatibility_mode() appears to expose some legacy HookedTransformer hook names without preserving the same computation points.
In particular, the legacy aliases for attention input hooks seem to be semantically different from the legacy HookedTransformer hooks:
blocks.{layer}.hook_q_input
blocks.{layer}.hook_k_input
blocks.{layer}.hook_v_input
My understanding is that in legacy HookedTransformer, these correspond to pre-LN residual-stream forks, while in current TransformerBridge compatibility mode they fire on post-LN tensors instead.
This means compatibility mode may currently provide:
- matching or near-matching logits,
- matching hook names,
- matching hook shapes,
while still not preserving the same hooked activations and backward gradients.
Observed behavior
For GPT-2, the current behavior appears to be:
- logits are close,
- legacy hook aliases can be registered,
- hook shapes match,
- but Q/K/V input hook values do not match legacy
HookedTransformer,
- and downstream attribution-style scores diverge substantially.
There may also be a similar issue for:
blocks.{layer}.hook_mlp_in
Checklist
It seems to me like one of these should be made explicit:
- Compatibility mode should preserve legacy hook semantics.
- If exact semantic parity is not intended, the aliases/docs should say they are name/shape compatible rather than semantically equivalent.
- Both should exist:
- bridge-native canonical post-LN hooks,
- legacy-compatible aliases for legacy pre-LN hook semantics.
My preference would be option 3, since it preserves bridge-native behavior while making compatibility mode meaningful for legacy tooling.
I’d be happy to work on this and open a PR if this direction sounds right to maintainers.
TransformerBridge.enable_compatibility_mode()appears to expose some legacyHookedTransformerhook names without preserving the same computation points.In particular, the legacy aliases for attention input hooks seem to be semantically different from the legacy
HookedTransformerhooks:blocks.{layer}.hook_q_inputblocks.{layer}.hook_k_inputblocks.{layer}.hook_v_inputMy understanding is that in legacy
HookedTransformer, these correspond to pre-LN residual-stream forks, while in currentTransformerBridgecompatibility mode they fire on post-LN tensors instead.This means compatibility mode may currently provide:
while still not preserving the same hooked activations and backward gradients.
Observed behavior
For GPT-2, the current behavior appears to be:
HookedTransformer,There may also be a similar issue for:
blocks.{layer}.hook_mlp_inChecklist
It seems to me like one of these should be made explicit:
My preference would be option 3, since it preserves bridge-native behavior while making compatibility mode meaningful for legacy tooling.
I’d be happy to work on this and open a PR if this direction sounds right to maintainers.