|
Models and systems that understand, generate, reason, plan, and act across modalities. |
Benchmarks and defenses for MLLM agents operating through web and device interfaces. |
Long-horizon agentic workflows for visual design, plotting, and multimodal content creation. |
Selected projects I lead or contribute to have received 450 GitHub stars and 24 forks across 4 personal and organization repositories.
Tracked repositories
| Repository | Stars | Forks | Focus |
|---|---|---|---|
| VILA-Lab/FigMirror | 315 | 19 | Automated plotting from paper figure styles. |
| MetaAgentX/OpenCaptchaWorld | 72 | 3 | Web-based benchmark and platform for evaluating multimodal LLM agents. |
| Yaxin9Luo/Gamma-MOD | 43 | 2 | Mixture-of-Depth adaptation for efficient multimodal large language models. |
| MetaAgentX/NextGen-CAPTCHAs | 20 | 0 | Scalable GUI-agent defense framework based on cognitive gaps. |
Last updated: 2026-05-27. Managed from data/research-repos.json.
Homepage · Publications · CV · LinkedIn · X · Yaxin.Luo@mbzuai.ac.ae



