Skip to content

[Bug Report] Compatibility mode legacy hook aliases do not preserve HookedTransformer hook semantics #1317

@SamuelePunzo

Description

@SamuelePunzo

TransformerBridge.enable_compatibility_mode() appears to expose some legacy HookedTransformer hook names without preserving the same computation points.

In particular, the legacy aliases for attention input hooks seem to be semantically different from the legacy HookedTransformer hooks:

  • blocks.{layer}.hook_q_input
  • blocks.{layer}.hook_k_input
  • blocks.{layer}.hook_v_input

My understanding is that in legacy HookedTransformer, these correspond to pre-LN residual-stream forks, while in current TransformerBridge compatibility mode they fire on post-LN tensors instead.

This means compatibility mode may currently provide:

  • matching or near-matching logits,
  • matching hook names,
  • matching hook shapes,

while still not preserving the same hooked activations and backward gradients.

Observed behavior

For GPT-2, the current behavior appears to be:

  • logits are close,
  • legacy hook aliases can be registered,
  • hook shapes match,
  • but Q/K/V input hook values do not match legacy HookedTransformer,
  • and downstream attribution-style scores diverge substantially.

There may also be a similar issue for:

  • blocks.{layer}.hook_mlp_in

Checklist

  • I have checked that there is no similar [issue]

It seems to me like one of these should be made explicit:

  1. Compatibility mode should preserve legacy hook semantics.
  2. If exact semantic parity is not intended, the aliases/docs should say they are name/shape compatible rather than semantically equivalent.
  3. Both should exist:
    • bridge-native canonical post-LN hooks,
    • legacy-compatible aliases for legacy pre-LN hook semantics.

My preference would be option 3, since it preserves bridge-native behavior while making compatibility mode meaningful for legacy tooling.

I’d be happy to work on this and open a PR if this direction sounds right to maintainers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions