Skip to content

feat: add Ansible file parsing (playbooks, roles, tasks, handlers)#415

Open
jnovack wants to merge 1 commit intotirth8205:mainfrom
jnovack:ansible
Open

feat: add Ansible file parsing (playbooks, roles, tasks, handlers)#415
jnovack wants to merge 1 commit intotirth8205:mainfrom
jnovack:ansible

Conversation

@jnovack
Copy link
Copy Markdown
Contributor

@jnovack jnovack commented May 2, 2026

What this does

Adds structural parsing for Ansible YAML files, mapping Ansible's semantic concepts onto the existing graph model so that playbooks, roles, and task files show up as navigable nodes and edges alongside the rest of the codebase.

The graph now understands:

  • Plays (the hosts: + task sections in a playbook) → Class nodes
  • Tasks and handlersFunction nodes, with the full module name (including FQCNs like ansible.builtin.package) stored in extra.ansible_module
  • notify: chainsCALLS edges from task to handler, covering both scalar and list forms
  • include_tasks / import_tasksIMPORTS_FROM edges to the included file
  • include_role / import_roleIMPORTS_FROM edges to the role name
  • roles: list in a playIMPORTS_FROM edges, including {role: name, when: ...} dict form and {name: ns.role} collections format
  • import_playbook:IMPORTS_FROM at the file level
  • vars_files:IMPORTS_FROM edges so you can trace variable provenance
  • block:/rescue:/always: nesting → tasks extracted recursively, parented to the enclosing play
  • pre_tasks: and post_tasks: → treated the same as tasks:
  • listen: on handlers → stored in extra.ansible_listen so notify-by-alias can be resolved
  • Role meta/main.yml dependenciesDEPENDS_ON edges

How detection works

.yml/.yaml files now map to "yaml" in the extension table. If the path contains an Ansible directory component (playbooks/, roles/, tasks/, handlers/, group_vars/, host_vars/) or a well-known top-level filename (site.yml, deploy.yml, etc.), detect_language() promotes the result to "ansible".

For clearly typed paths (tasks/, handlers/, meta/) we trust the path and parse directly. For playbooks and unknown paths we run a lightweight content sniff first to avoid false positives — specifically, we require that a top-level sequence item has both hosts: and at least one unambiguous Ansible play key (tasks:, gather_facts:, become:, etc.), not just hosts: alone. import_playbook: by itself is treated as unambiguous and skips the check.

What's NOT here yet

A few things I knowingly left out or couldn't handle without more scope:

  • Inventory files (hosts/*.yml, INI-format inventories) — the structure is completely different; it deserves its own sub-parser
  • vars: inline dictionaries — individual variable names aren't extracted as nodes; they add noise more than signal at this point
  • listen: cross-file resolution — the alias is stored, but wiring notify targets across files to their matching listen: handler would require a post-parse pass similar to what the ReScript resolver does
  • Dynamic includes with Jinja2 (include_tasks: "tasks/{{ ansible_os_family }}.yml") — the raw template string is stored as the edge target; actual resolution at graph time would require knowing the variable values
  • ansible.cfg and requirements.yml — not parsed; ansible-galaxy role dependencies in requirements files would be a natural follow-on

Known Ansible quirks / caveats

  • block: tasks with no name: get a fallback name like task@line42. These are common in real playbooks and show up in the graph, but the name is less useful than a named task.
  • with_*: loop keywords (with_items, with_first_found, etc.) are correctly skipped when scanning for the module key, but any with_* variant not in the explicit meta-key list would also be skipped (by the startswith("with_") guard), which is intentional.
  • Vault-encrypted values parse fine as ordinary strings — PyYAML treats the !vault | block as a scalar. No special handling needed.
  • The handlers: section in a play and a standalone handlers file are both parsed the same way, so handler nodes from inline play handlers and from roles/myrole/handlers/main.yml both appear as Function nodes with ansible_kind=handler.
  • Multi-document YAML (multiple --- separated documents in one file) — yaml.compose() only reads the first document. Multi-document Ansible files are rare in practice, but worth noting.

Testing

Ran against a real production Ansible repo (20 roles, 27 playbooks managing Docker Swarm, Elasticsearch, and supporting infrastructure) to validate the patterns before writing fixtures. Fixtures are sanitized versions of patterns found there.

36 new tests across three classes (TestAnsiblePlaybookParsing, TestAnsibleTasksParsing, TestAnsibleMetaParsing), all passing. Pre-existing test failures (Julia, Java, PHP, GDScript parsers) are unchanged.

🤖 Summary generated with Claude Code

@jnovack
Copy link
Copy Markdown
Contributor Author

jnovack commented May 2, 2026

It seems the Julia tests, the imports in main.py and the import in review.py failed CI prior to my PR. I'll rebase once they are fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant