add TransformerConfig with typed attention and MLP variants by gerardcode · Pull Request #202 · modelpack/model-spec

gerardcode · 2026-03-31T14:19:13Z

feat: add TransformerConfig with typed attention and MLP variants

Introduce TransformerConfig struct to ModelConfig with typed constants
for attention type (mha, gqa, mla) and MLP type (dense, moe), along
with key hyperparameter fields (numLayers, numAttentionHeads, numKVHeads,
hiddenSize, intermediateSize). Update config-schema.json and docs/config.md
accordingly.

This lays the groundwork for the unified Transformer specification,
enabling inference engines to auto-detect model variants without
per-model adaptation.

gemini-code-assist

Code Review

This pull request introduces a transformerConfig object to the model configuration across documentation, JSON schema, and Go specifications, enabling the definition of architectural parameters like attention and MLP types. Feedback recommends clarifying default values for attentionType and mlpType and resolving a logical inconsistency regarding numKVHeads in GQA configurations. Additionally, the reviewer noted that the schema lacks specific parameters required for MoE and MLA variants, such as expert counts and latent ranks.

gemini-code-assist · 2026-03-31T14:21:25Z

docs/config.md

+
+    - **attentionType** _string_, OPTIONAL
+
+      The attention mechanism variant. Supported values:


It is recommended to explicitly state the default value for attentionType when it is omitted from the configuration. Typically, this defaults to "mha".

Suggested change

The attention mechanism variant. Supported values:

The attention mechanism variant. Defaults to "mha". Supported values:

gemini-code-assist · 2026-03-31T14:21:25Z

docs/config.md

+
+    - **mlpType** _string_, OPTIONAL
+
+      The feed-forward / MLP layer variant. Supported values:


It is recommended to explicitly state the default value for mlpType when it is omitted. Typically, this defaults to "dense".

Suggested change

The feed-forward / MLP layer variant. Supported values:

The feed-forward / MLP layer variant. Defaults to "dense". Supported values:

gemini-code-assist · 2026-03-31T14:21:25Z

docs/config.md

+
+    - **numKVHeads** _integer_, OPTIONAL
+
+      Number of key/value heads. For GQA this is smaller than `numAttentionHeads`. Omitting this field or setting it equal to `numAttentionHeads` implies standard MHA.


The description for numKVHeads states that omitting it implies standard MHA. This creates a logical conflict if attentionType is explicitly set to "gqa". The documentation should clarify that this field is required for GQA configurations and should be less than numAttentionHeads.

gemini-code-assist · 2026-03-31T14:21:25Z

specs-go/v1/config.go

+	// NumKVHeads is the number of key/value heads. For GQA this is less than NumAttentionHeads.
+	// Omitted or equal to NumAttentionHeads implies full MHA.
+	NumKVHeads *int `json:"numKVHeads,omitempty"`


The comment for NumKVHeads suggests that omitting it always implies MHA, which contradicts the AttentionType field if it is set to GroupedQueryAttention. The documentation should reflect that this field is essential for GQA.

gemini-code-assist · 2026-03-31T14:21:25Z

schema/config-schema.json

+      "TransformerConfig": {
+        "type": "object",
+        "properties": {
+          "attentionType": {
+            "type": "string",
+            "enum": ["mha", "gqa", "mla"]
+          },
+          "mlpType": {
+            "type": "string",
+            "enum": ["dense", "moe"]
+          },
+          "numLayers": {
+            "type": "integer",
+            "minimum": 1
+          },
+          "numAttentionHeads": {
+            "type": "integer",
+            "minimum": 1
+          },
+          "numKVHeads": {
+            "type": "integer",
+            "minimum": 1
+          },
+          "hiddenSize": {
+            "type": "integer",
+            "minimum": 1
+          },
+          "intermediateSize": {
+            "type": "integer",
+            "minimum": 1
+          }
+        },
+        "additionalProperties": false
+      },


The TransformerConfig definition includes "moe" and "mla" variants but lacks the parameters necessary for an inference engine to actually configure them (e.g., numExperts and numExpertsPerToken for MoE, or latent ranks for MLA). To support the goal of auto-detection mentioned in the PR description, these fields should be added to the schema.

add TransformerConfig with typed attention and MLP variants

11571da

gemini-code-assist bot reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add TransformerConfig with typed attention and MLP variants#202

add TransformerConfig with typed attention and MLP variants#202
gerardcode wants to merge 1 commit intomodelpack:mainfrom
gerardcode:gerard-changes

gerardcode commented Mar 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		- attentionType _string_, OPTIONAL

		The attention mechanism variant. Supported values:

	The attention mechanism variant. Supported values:
	The attention mechanism variant. Defaults to "mha". Supported values:


		- mlpType _string_, OPTIONAL

		The feed-forward / MLP layer variant. Supported values:

	The feed-forward / MLP layer variant. Supported values:
	The feed-forward / MLP layer variant. Defaults to "dense". Supported values:


		- numKVHeads _integer_, OPTIONAL

		Number of key/value heads. For GQA this is smaller than `numAttentionHeads`. Omitting this field or setting it equal to `numAttentionHeads` implies standard MHA.

Conversation

gerardcode commented Mar 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant