Skip to content

Fix Gemma model detection and Gemma-2 2B attn_types config#1227

Merged
jlarson4 merged 1 commit intoTransformerLensOrg:mainfrom
brendanlong:brendanlong/gemma-2b-configs
Mar 31, 2026
Merged

Fix Gemma model detection and Gemma-2 2B attn_types config#1227
jlarson4 merged 1 commit intoTransformerLensOrg:mainfrom
brendanlong:brendanlong/gemma-2b-configs

Conversation

@brendanlong
Copy link
Copy Markdown
Contributor

Description

"gemma-2b" was being detected as having architecture "Gemma2ForCausalLM" since it starts with "gemma-2", but it's actually a v1 model. "gemma-2-2b" (and actual v2 model) lists 42 entries for attn_types but only has 26 layers (this was probably a copy+paste from gemma-2-9b).

Neither of these cause real problems right now but they're confusing and could cause problems if this code was refactored.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

Fix "gemma-2" substring match incorrectly classifying Gemma v1 "gemma-2b"
as Gemma2ForCausalLM by requiring trailing hyphen ("gemma-2-"). Also fix
Gemma-2 2B attn_types having 42 entries for 26 layers (changed *21 to *13).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jlarson4 jlarson4 merged commit d1dc12d into TransformerLensOrg:main Mar 31, 2026
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants