Commit 9c1237a
Replace hardcoded Gemma config dicts with HF config cache entries
Move all Gemma 1, 2, and 3 models to _HF_CONFIG_CACHE using their
native HF config classes (GemmaConfig, Gemma2Config, Gemma3TextConfig,
Gemma3Config). Add generic architecture handlers for GemmaForCausalLM,
Gemma2ForCausalLM, Gemma3ForCausalLM, and Gemma3ForConditionalGeneration
that read from the HF config objects, eliminating ~460 lines of
hardcoded dicts and the name-based architecture detection.
Fixes google/gemma-2b incorrectly getting Gemma2ForCausalLM architecture
(now correctly gets GemmaForCausalLM from cache). Gemma 2 attn_types now
exactly match n_layers instead of having extra unused entries.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 83602aa commit 9c1237a
3 files changed
Lines changed: 299 additions & 457 deletions
File tree
- tests/unit
- transformer_lens
- pretrained
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
403 | 403 | | |
404 | 404 | | |
405 | 405 | | |
406 | | - | |
407 | | - | |
408 | | - | |
409 | | - | |
| 406 | + | |
410 | 407 | | |
411 | 408 | | |
412 | 409 | | |
| |||
457 | 454 | | |
458 | 455 | | |
459 | 456 | | |
460 | | - | |
| 457 | + | |
461 | 458 | | |
462 | 459 | | |
463 | 460 | | |
| |||
488 | 485 | | |
489 | 486 | | |
490 | 487 | | |
491 | | - | |
| 488 | + | |
492 | 489 | | |
493 | 490 | | |
494 | 491 | | |
| |||
520 | 517 | | |
521 | 518 | | |
522 | 519 | | |
523 | | - | |
| 520 | + | |
524 | 521 | | |
525 | 522 | | |
526 | 523 | | |
| |||
0 commit comments