[REQUEST] Enabling support for vision draft models.

### Problem

I am using a quantized version of pixtral large and I can't load the vision modules of a smaller variant. I cannot perform inference with images, I can only perform inference with text.
I imagine this will be a much needed feature as multimodal inference is always less performant than raw text. 

### Solution

Create a config for enabling this feature, I have a very strong feeling that this is low-hanging fruit.

### Alternatives

_No response_

### Explanation

I imagine this will be a much needed feature as multimodal inference is always less performant than raw text. 

### Examples

_No response_

### Additional context

_No response_

### Acknowledgements

- [x] I have looked for similar requests before submitting this one.
- [x] I understand that the developers have lives and my issue will be answered when possible.
- [x] I understand the developers of this program are human, and I will make my requests politely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[REQUEST] Enabling support for vision draft models. #384

Problem

Solution

Alternatives

Explanation

Examples

Additional context

Acknowledgements

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[REQUEST] Enabling support for vision draft models. #384

Description

Problem

Solution

Alternatives

Explanation

Examples

Additional context

Acknowledgements

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions