fix: sanitize workspace label in Memgraph/Neo4j to prevent Cypher injection#2713
fix: sanitize workspace label in Memgraph/Neo4j to prevent Cypher injection#2713furkankoykiran wants to merge 5 commits intoHKUDS:mainfrom
Conversation
…ection Apply regex sanitization to _get_workspace_label() in both MemgraphStorage and Neo4JStorage, matching the pattern already used in PGGraphStorage. This prevents Cypher injection via the LIGHTRAG-WORKSPACE HTTP header by stripping all characters except alphanumeric and underscore. Closes HKUDS#2698
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ac79a0b7be
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Strips all characters except alphanumeric and underscore to prevent | ||
| Cypher injection via the LIGHTRAG-WORKSPACE header. | ||
| """ | ||
| safe_workspace = re.sub(r"[^a-zA-Z0-9_]", "_", self.workspace.strip()) |
There was a problem hiding this comment.
Keep Neo4j workspace labels one-to-one after sanitization
Replacing every non-[a-zA-Z0-9_] character with _ in _get_workspace_label() is non-injective, so distinct workspace headers like team-a and team_a now map to the same graph label; because the API currently accepts arbitrary LIGHTRAG-WORKSPACE values (only .strip() is applied in lightrag/api/lightrag_server.py), this can merge tenant data, allow cross-workspace reads/writes, and hide previously indexed data that used unsanitized labels before this change.
Useful? React with 👍 / 👎.
| Strips all characters except alphanumeric and underscore to prevent | ||
| Cypher injection via the LIGHTRAG-WORKSPACE header. | ||
| """ | ||
| safe_workspace = re.sub(r"[^a-zA-Z0-9_]", "_", self.workspace.strip()) |
There was a problem hiding this comment.
Keep Memgraph workspace labels one-to-one after sanitization
This sanitization has the same collision problem in Memgraph: different workspace names collapse to the same label (for example, project-a and project_a), which breaks workspace isolation and can expose one workspace’s graph to another when headers are user-controlled; it also makes existing nodes/edges under old unsanitized labels effectively unreachable after upgrade.
Useful? React with 👍 / 👎.
|
@codex review |
|
To use Codex here, create a Codex account and connect to github. |
There was a problem hiding this comment.
Pull request overview
Mitigates Cypher injection risk by sanitizing workspace-derived labels used by Memgraph/Neo4j backends and adding regression tests targeting malicious LIGHTRAG-WORKSPACE inputs.
Changes:
- Sanitize
_get_workspace_label()output in Neo4j and Memgraph implementations to neutralize backtick-based injection. - Add a new offline pytest suite covering workspace sanitization edge cases and injection payloads.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| lightrag/kg/memgraph_impl.py | Sanitizes workspace label before it is used in Cypher. |
| lightrag/kg/neo4j_impl.py | Sanitizes workspace label before it is used in Cypher. |
| tests/test_workspace_sanitization.py | Adds offline tests intended to prevent regressions in workspace sanitization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
PR Update Summary 🚀I've completed a round of refinements to address the feedback and improve the implementation:
All tests for workspace sanitization are passing locally. Ready for review. cc @danielaskdd @HKUDS/maintainers |
Code reviewFound 3 issues:
LightRAG/lightrag/kg/neo4j_impl.py Lines 550 to 557 in 64a2681 Same issue in memgraph: LightRAG/lightrag/kg/memgraph_impl.py Lines 260 to 266 in 64a2681
LightRAG/lightrag/kg/memgraph_impl.py Lines 99 to 107 in 64a2681
LightRAG/lightrag/kg/memgraph_impl.py Lines 904 to 910 in 64a2681 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
|
PR #2792 resolved this issue. |
Summary
Fixes #2698 — Cypher injection via unsanitized
LIGHTRAG-WORKSPACEheader in Memgraph/Neo4j.Problem
The
_get_workspace_label()method in bothMemgraphStorageandNeo4JStoragereturnedself.workspacedirectly without sanitization. Since the workspace value can originate from theLIGHTRAG-WORKSPACEHTTP header, an attacker could inject arbitrary Cypher queries by manipulating this header.The PostgreSQL implementation (
PGGraphStorage._get_workspace_graph_name()) already had proper sanitization usingre.sub(r"[^a-zA-Z0-9_]", "_", ...).Changes
lightrag/kg/memgraph_impl.pyimport re_get_workspace_label()to sanitize workspace using the same regex pattern as PostgreSQL — strips all chars except[a-zA-Z0-9_], falls back to"base"if emptylightrag/kg/neo4j_impl.py_get_workspace_label()with identical sanitization (rewas already imported)tests/test_workspace_sanitization.py(NEW)Test Results