Skip to content

add TransformerConfig with typed attention and MLP variants#202

Open
gerardcode wants to merge 1 commit intomodelpack:mainfrom
gerardcode:gerard-changes
Open

add TransformerConfig with typed attention and MLP variants#202
gerardcode wants to merge 1 commit intomodelpack:mainfrom
gerardcode:gerard-changes

Conversation

@gerardcode
Copy link
Copy Markdown

feat: add TransformerConfig with typed attention and MLP variants

Introduce TransformerConfig struct to ModelConfig with typed constants
for attention type (mha, gqa, mla) and MLP type (dense, moe), along
with key hyperparameter fields (numLayers, numAttentionHeads, numKVHeads,
hiddenSize, intermediateSize). Update config-schema.json and docs/config.md
accordingly.

This lays the groundwork for the unified Transformer specification,
enabling inference engines to auto-detect model variants without
per-model adaptation.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a transformerConfig object to the model configuration across documentation, JSON schema, and Go specifications, enabling the definition of architectural parameters like attention and MLP types. Feedback recommends clarifying default values for attentionType and mlpType and resolving a logical inconsistency regarding numKVHeads in GQA configurations. Additionally, the reviewer noted that the schema lacks specific parameters required for MoE and MLA variants, such as expert counts and latent ranks.


- **attentionType** _string_, OPTIONAL

The attention mechanism variant. Supported values:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is recommended to explicitly state the default value for attentionType when it is omitted from the configuration. Typically, this defaults to "mha".

Suggested change
The attention mechanism variant. Supported values:
The attention mechanism variant. Defaults to "mha". Supported values:


- **mlpType** _string_, OPTIONAL

The feed-forward / MLP layer variant. Supported values:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is recommended to explicitly state the default value for mlpType when it is omitted. Typically, this defaults to "dense".

Suggested change
The feed-forward / MLP layer variant. Supported values:
The feed-forward / MLP layer variant. Defaults to "dense". Supported values:


- **numKVHeads** _integer_, OPTIONAL

Number of key/value heads. For GQA this is smaller than `numAttentionHeads`. Omitting this field or setting it equal to `numAttentionHeads` implies standard MHA.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description for numKVHeads states that omitting it implies standard MHA. This creates a logical conflict if attentionType is explicitly set to "gqa". The documentation should clarify that this field is required for GQA configurations and should be less than numAttentionHeads.

Comment on lines +142 to +144
// NumKVHeads is the number of key/value heads. For GQA this is less than NumAttentionHeads.
// Omitted or equal to NumAttentionHeads implies full MHA.
NumKVHeads *int `json:"numKVHeads,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment for NumKVHeads suggests that omitting it always implies MHA, which contradicts the AttentionType field if it is set to GroupedQueryAttention. The documentation should reflect that this field is essential for GQA.

Comment on lines +51 to +84
"TransformerConfig": {
"type": "object",
"properties": {
"attentionType": {
"type": "string",
"enum": ["mha", "gqa", "mla"]
},
"mlpType": {
"type": "string",
"enum": ["dense", "moe"]
},
"numLayers": {
"type": "integer",
"minimum": 1
},
"numAttentionHeads": {
"type": "integer",
"minimum": 1
},
"numKVHeads": {
"type": "integer",
"minimum": 1
},
"hiddenSize": {
"type": "integer",
"minimum": 1
},
"intermediateSize": {
"type": "integer",
"minimum": 1
}
},
"additionalProperties": false
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The TransformerConfig definition includes "moe" and "mla" variants but lacks the parameters necessary for an inference engine to actually configure them (e.g., numExperts and numExpertsPerToken for MoE, or latent ranks for MLA). To support the goal of auto-detection mentioned in the PR description, these fields should be added to the schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant