add TransformerConfig with typed attention and MLP variants#202
add TransformerConfig with typed attention and MLP variants#202gerardcode wants to merge 1 commit intomodelpack:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a transformerConfig object to the model configuration across documentation, JSON schema, and Go specifications, enabling the definition of architectural parameters like attention and MLP types. Feedback recommends clarifying default values for attentionType and mlpType and resolving a logical inconsistency regarding numKVHeads in GQA configurations. Additionally, the reviewer noted that the schema lacks specific parameters required for MoE and MLA variants, such as expert counts and latent ranks.
|
|
||
| - **attentionType** _string_, OPTIONAL | ||
|
|
||
| The attention mechanism variant. Supported values: |
There was a problem hiding this comment.
|
|
||
| - **mlpType** _string_, OPTIONAL | ||
|
|
||
| The feed-forward / MLP layer variant. Supported values: |
There was a problem hiding this comment.
|
|
||
| - **numKVHeads** _integer_, OPTIONAL | ||
|
|
||
| Number of key/value heads. For GQA this is smaller than `numAttentionHeads`. Omitting this field or setting it equal to `numAttentionHeads` implies standard MHA. |
There was a problem hiding this comment.
| // NumKVHeads is the number of key/value heads. For GQA this is less than NumAttentionHeads. | ||
| // Omitted or equal to NumAttentionHeads implies full MHA. | ||
| NumKVHeads *int `json:"numKVHeads,omitempty"` |
| "TransformerConfig": { | ||
| "type": "object", | ||
| "properties": { | ||
| "attentionType": { | ||
| "type": "string", | ||
| "enum": ["mha", "gqa", "mla"] | ||
| }, | ||
| "mlpType": { | ||
| "type": "string", | ||
| "enum": ["dense", "moe"] | ||
| }, | ||
| "numLayers": { | ||
| "type": "integer", | ||
| "minimum": 1 | ||
| }, | ||
| "numAttentionHeads": { | ||
| "type": "integer", | ||
| "minimum": 1 | ||
| }, | ||
| "numKVHeads": { | ||
| "type": "integer", | ||
| "minimum": 1 | ||
| }, | ||
| "hiddenSize": { | ||
| "type": "integer", | ||
| "minimum": 1 | ||
| }, | ||
| "intermediateSize": { | ||
| "type": "integer", | ||
| "minimum": 1 | ||
| } | ||
| }, | ||
| "additionalProperties": false | ||
| }, |
There was a problem hiding this comment.
The TransformerConfig definition includes "moe" and "mla" variants but lacks the parameters necessary for an inference engine to actually configure them (e.g., numExperts and numExpertsPerToken for MoE, or latent ranks for MLA). To support the goal of auto-detection mentioned in the PR description, these fields should be added to the schema.
feat: add TransformerConfig with typed attention and MLP variants
Introduce TransformerConfig struct to ModelConfig with typed constants
for attention type (mha, gqa, mla) and MLP type (dense, moe), along
with key hyperparameter fields (numLayers, numAttentionHeads, numKVHeads,
hiddenSize, intermediateSize). Update config-schema.json and docs/config.md
accordingly.
This lays the groundwork for the unified Transformer specification,
enabling inference engines to auto-detect model variants without
per-model adaptation.