fix: sanitize workspace label in Memgraph/Neo4j to prevent Cypher injection by furkankoykiran · Pull Request #2713 · HKUDS/LightRAG

furkankoykiran · 2026-02-25T01:46:48Z

Summary

Fixes #2698 — Cypher injection via unsanitized LIGHTRAG-WORKSPACE header in Memgraph/Neo4j.

Problem

The _get_workspace_label() method in both MemgraphStorage and Neo4JStorage returned self.workspace directly without sanitization. Since the workspace value can originate from the LIGHTRAG-WORKSPACE HTTP header, an attacker could inject arbitrary Cypher queries by manipulating this header.

The PostgreSQL implementation (PGGraphStorage._get_workspace_graph_name()) already had proper sanitization using re.sub(r"[^a-zA-Z0-9_]", "_", ...).

Changes

`lightrag/kg/memgraph_impl.py`

Added import re
Rewrote _get_workspace_label() to sanitize workspace using the same regex pattern as PostgreSQL — strips all chars except [a-zA-Z0-9_], falls back to "base" if empty

`lightrag/kg/neo4j_impl.py`

Rewrote _get_workspace_label() with identical sanitization (re was already imported)

`tests/test_workspace_sanitization.py` (NEW)

20 offline unit tests covering normal inputs, special character replacement, Cypher injection payloads, empty/whitespace fallback, and edge cases
All 20 tests pass ✅

Test Results

tests/test_workspace_sanitization.py  20 passed in 0.04s

…ection Apply regex sanitization to _get_workspace_label() in both MemgraphStorage and Neo4JStorage, matching the pattern already used in PGGraphStorage. This prevents Cypher injection via the LIGHTRAG-WORKSPACE HTTP header by stripping all characters except alphanumeric and underscore. Closes HKUDS#2698

danielaskdd · 2026-02-27T04:58:05Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ac79a0b7be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-27T05:00:35Z

+        Strips all characters except alphanumeric and underscore to prevent
+        Cypher injection via the LIGHTRAG-WORKSPACE header.
+        """
+        safe_workspace = re.sub(r"[^a-zA-Z0-9_]", "_", self.workspace.strip())


Keep Neo4j workspace labels one-to-one after sanitization

Replacing every non-[a-zA-Z0-9_] character with _ in _get_workspace_label() is non-injective, so distinct workspace headers like team-a and team_a now map to the same graph label; because the API currently accepts arbitrary LIGHTRAG-WORKSPACE values (only .strip() is applied in lightrag/api/lightrag_server.py), this can merge tenant data, allow cross-workspace reads/writes, and hide previously indexed data that used unsanitized labels before this change.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-27T05:00:35Z

+        Strips all characters except alphanumeric and underscore to prevent
+        Cypher injection via the LIGHTRAG-WORKSPACE header.
+        """
+        safe_workspace = re.sub(r"[^a-zA-Z0-9_]", "_", self.workspace.strip())


Keep Memgraph workspace labels one-to-one after sanitization

This sanitization has the same collision problem in Memgraph: different workspace names collapse to the same label (for example, project-a and project_a), which breaks workspace isolation and can expose one workspace’s graph to another when headers are user-controlled; it also makes existing nodes/edges under old unsanitized labels effectively unreachable after upgrade.

Useful? React with 👍 / 👎.

…hars (HKUDS#2698)

furkankoykiran · 2026-03-05T20:17:19Z

@codex review

chatgpt-codex-connector · 2026-03-05T20:17:24Z

To use Codex here, create a Codex account and connect to github.

Copilot

Pull request overview

Mitigates Cypher injection risk by sanitizing workspace-derived labels used by Memgraph/Neo4j backends and adding regression tests targeting malicious LIGHTRAG-WORKSPACE inputs.

Changes:

Sanitize _get_workspace_label() output in Neo4j and Memgraph implementations to neutralize backtick-based injection.
Add a new offline pytest suite covering workspace sanitization edge cases and injection payloads.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
lightrag/kg/memgraph_impl.py	Sanitizes workspace label before it is used in Cypher.
lightrag/kg/neo4j_impl.py	Sanitizes workspace label before it is used in Cypher.
tests/test_workspace_sanitization.py	Adds offline tests intended to prevent regressions in workspace sanitization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… and clarify docstrings

furkankoykiran · 2026-03-06T14:51:02Z

PR Update Summary 🚀

I've completed a round of refinements to address the feedback and improve the implementation:

Refined Workspace Isolation: Replaced the previous sanitization (which could cause collisions, e.g., 'team-a' vs 'team_a') with a backtick-escaping approach. This ensures a 1-to-1 mapping for all workspace names while staying secure against Cypher injection.
Addressed Review Feedback:
- Updated docstrings in Neo4j and Memgraph to clarify identifier safety when used with backticks.
- Removed unused imports.
Improved Test Robustness: Overhauled 'tests/test_workspace_sanitization.py' to validate the logic directly. The updated tests cover various scenarios and verify the 1-to-1 mapping.
Formatting & Quality: Ensured the code complies with the project's pre-commit hooks.

All tests for workspace sanitization are passing locally. Ready for review.

cc @danielaskdd @HKUDS/maintainers

danielaskdd · 2026-03-18T06:09:20Z

Code review

Found 3 issues:

Escaped label used in Python string equality comparisons against driver-returned (unescaped) labels — the database stores foobarbut_get_workspace_label()returnsfoo``bar, so the filter if label != workspace_label` never matches for workspaces containing backticks, meaning the workspace label is never stripped from `node_dict["labels"]`.

LightRAG/lightrag/kg/neo4j_impl.py

Lines 550 to 557 in 64a2681

    
           if "labels" in node_dict: 
        
               node_dict["labels"] = [ 
        
                   label 
        
                   for label in node_dict["labels"] 
        
                   if label != workspace_label 
        
               ] 
        
           # logger.debug(f"Neo4j query node {query} return: {node_dict}") 
        
           return node_dict

Same issue in memgraph:

LightRAG/lightrag/kg/memgraph_impl.py

Lines 260 to 266 in 64a2681

    
           if "labels" in node_dict: 
        
               node_dict["labels"] = [ 
        
                   label 
        
                   for label in node_dict["labels"] 
        
                   if label != workspace_label 
        
               ] 
        
           return node_dict

Memgraph index creation uses workspace_label without backtick wrapping — Neo4j correctly wraps it (`{workspace_label}`), but Memgraph emits CREATE INDEX ON :{workspace_label}(entity_id) directly. If the workspace contains a backtick, _get_workspace_label() returns doubled backticks (e.g., foo``bar) which produces invalid Cypher at this unquoted position, causing initialize() to fail.

LightRAG/lightrag/kg/memgraph_impl.py

Lines 99 to 107 in 64a2681

    
           # Create index for base nodes on entity_id if it doesn't exist 
        
           try: 
        
               workspace_label = self._get_workspace_label() 
        
               await session.run( 
        
                   f"""CREATE INDEX ON :{workspace_label}(entity_id)""" 
        
               ) 
        
               logger.info( 
        
                   f"[{self.workspace}] Created index on :{workspace_label}(entity_id) in Memgraph." 
        
               )

workspace_label used as a Cypher string literal instead of a backtick-quoted identifier — the query WHERE '{workspace_label}' IN labels(n) embeds the escaped value (e.g., foo``bar) as a plain string, but Cypher string literals do not interpret doubled backticks as escapes, so it will never match the actual stored label (foobar`), silently returning no graph results for any workspace name containing a backtick.

LightRAG/lightrag/kg/memgraph_impl.py

Lines 904 to 910 in 64a2681

    
                               MATCH (start:`{workspace_label}`) 
        
                               WHERE start.entity_id = $entity_id 
        
                               MATCH path = (start)-[*BFS 0..{max_depth}]-(end:`{workspace_label}`) 
        
                               WHERE ALL(n IN nodes(path) WHERE '{workspace_label}' IN labels(n)) 
        
                               WITH collect(DISTINCT end) + start AS all_nodes_unlimited 
        
                               WITH

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

danielaskdd · 2026-03-18T07:02:23Z

PR #2792 resolved this issue.

chatgpt-codex-connector Bot reviewed Feb 27, 2026

View reviewed changes

danielaskdd added the security label Feb 28, 2026

fix(kg): preserve labels by escaping backticks instead of stripping c…

8fd0617

…hars (HKUDS#2698)

Copilot AI review requested due to automatic review settings March 5, 2026 20:16

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Comment thread tests/test_workspace_sanitization.py Outdated

Comment thread lightrag/kg/memgraph_impl.py

Comment thread lightrag/kg/neo4j_impl.py Outdated

Comment thread tests/test_workspace_sanitization.py

furkankoykiran added 3 commits March 6, 2026 14:21

style: apply ruff formatting and fix unused import in memgraph_impl

1059e37

style: fix project-wide formatting to comply with pre-commit hooks

7b49f7a

refactor(tests): overhaul workspace sanitization tests for robustness…

64a2681

… and clarify docstrings

danielaskdd closed this Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: sanitize workspace label in Memgraph/Neo4j to prevent Cypher injection#2713

fix: sanitize workspace label in Memgraph/Neo4j to prevent Cypher injection#2713
furkankoykiran wants to merge 5 commits intoHKUDS:mainfrom
furkankoykiran:fix/cypher-injection-workspace-label

furkankoykiran commented Feb 25, 2026

Uh oh!

danielaskdd commented Feb 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Feb 27, 2026

Uh oh!

chatgpt-codex-connector Bot Feb 27, 2026

Uh oh!

furkankoykiran commented Mar 5, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

furkankoykiran commented Mar 6, 2026 •

edited

Loading

Uh oh!

danielaskdd commented Mar 18, 2026

Uh oh!

danielaskdd commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

furkankoykiran commented Feb 25, 2026

Summary

Problem

Changes

lightrag/kg/memgraph_impl.py

lightrag/kg/neo4j_impl.py

tests/test_workspace_sanitization.py (NEW)

Test Results

Uh oh!

danielaskdd commented Feb 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

furkankoykiran commented Mar 5, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

furkankoykiran commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Update Summary 🚀

Uh oh!

danielaskdd commented Mar 18, 2026

Code review

Uh oh!

danielaskdd commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`lightrag/kg/memgraph_impl.py`

`lightrag/kg/neo4j_impl.py`

`tests/test_workspace_sanitization.py` (NEW)

furkankoykiran commented Mar 6, 2026 •

edited

Loading