Skip to content

fix: prevent path traversal via task_id in cleaning task log/delete e…#508

Merged
Dallas98 merged 9 commits into
mainfrom
security/path-cross
Jun 17, 2026
Merged

fix: prevent path traversal via task_id in cleaning task log/delete e…#508
Dallas98 merged 9 commits into
mainfrom
security/path-cross

Conversation

@MoeexT

@MoeexT MoeexT commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

…ndpoints

Add UUID format validation in check_task_id() and resolve+startsWith boundary checks on all log_path/task_path constructions to prevent arbitrary file read and arbitrary directory deletion.

  • cleaning_task_validator.py: add UUID regex to check_task_id()
  • cleaning_task_service.py: add path boundary checks in get_task_log() and delete_task()
  • cleaning_task_routes.py: add path boundary checks in stream and download endpoints

close: #507

…ndpoints

Add UUID format validation in check_task_id() and resolve+startsWith boundary
checks on all log_path/task_path constructions to prevent arbitrary file read
and arbitrary directory deletion.

- cleaning_task_validator.py: add UUID regex to check_task_id()
- cleaning_task_service.py: add path boundary checks in get_task_log() and delete_task()
- cleaning_task_routes.py: add path boundary checks in stream and download endpoints
MoeexT added 4 commits June 17, 2026 10:13
…ile write)

Add normalize + startsWith boundary checks in ChunksSaver save/saveFile
methods to prevent directory traversal via crafted fileName containing '../'.

- ChunksSaver.java: validate final path stays within uploadPath in save() and saveFile()
- chunks_saver.py: validate resolved path stays within upload_path in save() and save_file()
…d destPath injection

Add multi-layer path boundary checks to prevent cross-directory reads and writes:
- GlusterfsWriter: validate destPath stays within /dataset, reject .. in subPath,
  validate sourcePath stays within mount point, reject path separators in fileName
- GlusterfsReader: validate readPath stays within mount point
Add .. rejection in ValidPathValidator and ValidFilePathValidator, plus
normalize+startsWith boundary check on uploadPath in preUpload to prevent
cross-directory writes via crafted prefix parameters.

- ValidPathValidator: reject .. sequences
- ValidFilePathValidator: reject .. sequences
- DatasetFileApplicationService.preUpload(): validate uploadPath stays within datasetBasePath
Restructure path construction to validate task_id BEFORE building Path
objects, use Path / operator instead of f-string concatenation, and use
Path.relative_to() for boundary checks instead of str.startswith().

- Routes: add inline re.fullmatch validation before Path construction
- Service: use flow_root / task_id / filename pattern
- All 4 locations: replace startswith() with relative_to()
Comment thread runtime/datamate-python/app/module/cleaning/service/cleaning_task_service.py Dismissed
Comment thread runtime/datamate-python/app/module/cleaning/service/cleaning_task_service.py Dismissed
Replace inline re.fullmatch with module-level _TASK_ID_PATTERN and
replace relative_to() try/except with parents containment check to
satisfy CodeQL taint tracking as proper sanitizers.

- routes: module-level _TASK_ID_PATTERN + flow_base not in log_path.parents
- service: flow_root not in parents for get_task_log and delete_task
Comment thread runtime/datamate-python/app/module/cleaning/service/cleaning_task_service.py Dismissed
…uction

Replace inline regex validation with _sanitize_task_id() / sanitize_task_id()
functions whose return values CodeQL tracks as sanitized, eliminating the
remaining 6 "Uncontrolled data used in path expression" warnings.
Comment thread runtime/datamate-python/app/module/cleaning/interface/cleaning_task_routes.py Dismissed
Comment thread runtime/datamate-python/app/module/cleaning/interface/cleaning_task_routes.py Dismissed
MoeexT added 2 commits June 17, 2026 11:25
Add _sanitize_retry_count() / sanitize_retry_count() with range validation
and use the sanitized return value in all 5 places where retry_count feeds
into log_path construction. CodeQL tracks the sanitizer return value as safe.
CodeQL cannot trace data flow through instance-method calls (self.validator.
sanitize_*). Replace with module-level _sanitize_task_id / _sanitize_retry_count
so CodeQL recognizes the sanitized return values.
Comment thread runtime/datamate-python/app/module/cleaning/service/cleaning_task_service.py Dismissed
@Dallas98 Dallas98 merged commit 1398d10 into main Jun 17, 2026
11 checks passed
@MoeexT MoeexT deleted the security/path-cross branch June 18, 2026 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

路径穿越问题

3 participants