Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 56 minutes and 59 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe Changes
Sequence DiagramsequenceDiagram
actor Caller as Caller
participant DL as download_file()
participant HTTP as HTTP Server
participant Helper as _download_file_once()
Caller->>DL: download_file(url, target_dir)
DL->>DL: _download_http_header_attempts()
loop For each header attempt
DL->>Helper: Call _download_file_once()
Helper->>HTTP: HEAD request (with headers)
HTTP-->>Helper: Response (status, Content-Disposition)
alt 200 OK
Helper->>HTTP: GET request
HTTP-->>Helper: Binary data
Helper->>Helper: Extract filename from Content-Disposition
Helper-->>DL: Success (filepath)
DL-->>Caller: Return filepath
else 403 Forbidden
Helper-->>DL: _HtmlWhenBinaryExpected or 403
Note over DL: Continue to next header attempt
else HTML when archive expected
Helper-->>DL: _HtmlWhenBinaryExpected
Note over DL: Retry next header attempt
else Other status
Helper-->>DL: Return error
DL-->>Caller: Return None after all attempts exhausted
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Set oss_version from tarball/archive filename after wget for clarified_version
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/fosslight_util/download.py (1)
87-112: Minor: broadexceptdefeats UA retries on first-attempt network errors.On line 109, any exception during the first attempt returns
Falseimmediately, bypassing the UA retry loop. If the intent is only to avoid retrying genuine network/TLS errors, this is fine; but if a mirror drops the connection specifically on the default UA (a common anti-bot behavior adjacent to the 403 case this PR addresses), retries never happen. Considercontinue-ing (with a warning) wheni < last_iand only returningFalseafter the last attempt.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_util/download.py` around lines 87 - 112, In is_downloadable, the broad except currently returns False on any exception and prevents later User-Agent retries; change the exception handling so that on failure you log the warning (including the exception) and if i < last_i simply continue to the next headers attempt, only returning False after the final attempt (i == last_i), ensuring attempts (from _download_http_header_attempts) are honored; update the except block in is_downloadable to implement this conditional continue/return behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fosslight_util/download.py`:
- Around line 688-708: The parsed filename from the Content-Disposition branch
(the variable filename used before computing local_path and joining with
target_dir) must be sanitized to prevent path traversal: after extracting
filename in the code that handles Content-Disposition (the block that sets
filename via m_star or m), strip any directory components (e.g., run
os.path.basename on filename), remove or replace any NULs and path separators
(both os.sep and os.altsep, forward/back slashes) and collapse any .. sequences
so the final filename is a simple basename-safe token before using
os.path.join(target_dir, filename) to produce local_path; apply the same
sanitization helper to any header-derived filename to match the URL-path
fallback behavior.
- Around line 693-695: The regex and handling around m_star in download.py
incorrectly assume a literal UTF-8'' prefix; update the detection and parsing
for filename* to capture the charset, optional language tag, and the
percent-encoded value (e.g. match
filename\*=(?P<charset>[^']*)'(?:[^']*)'(?P<enc>[^;\r\n]+)), then replace the
current urllib.parse.unquote call: extract charset (default to 'utf-8' if
missing), percent-decode to bytes (urllib.parse.unquote_to_bytes) and decode
with the captured charset (fall back safely if decoding fails), assigning the
result back to filename; keep using the existing m_star and filename symbols so
the rest of the function continues to work.
---
Nitpick comments:
In `@src/fosslight_util/download.py`:
- Around line 87-112: In is_downloadable, the broad except currently returns
False on any exception and prevents later User-Agent retries; change the
exception handling so that on failure you log the warning (including the
exception) and if i < last_i simply continue to the next headers attempt, only
returning False after the final attempt (i == last_i), ensuring attempts (from
_download_http_header_attempts) are honored; update the except block in
is_downloadable to implement this conditional continue/return behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d4a28e32-2ca4-4504-a041-207f49483ea9
📒 Files selected for processing (1)
src/fosslight_util/download.py
| filename = "" | ||
| cd = r.headers.get("Content-Disposition") or head_headers.get( | ||
| "Content-Disposition" | ||
| ) | ||
| if cd: | ||
| m_star = re.search(r"filename\*=(?:UTF-8'')?([^;\r\n]+)", cd) | ||
| if m_star: | ||
| filename = urllib.parse.unquote(m_star.group(1).strip('"\'')) | ||
| else: | ||
| local_path = target_dir | ||
| m = re.search(r"filename=([^;\r\n]+)", cd) | ||
| if m: | ||
| filename = m.group(1).strip('"\'') | ||
| if not filename: | ||
| final_for_name = r.url or final_url | ||
| filename = os.path.basename(urllib.parse.urlparse(final_for_name).path) | ||
| if not filename: | ||
| filename = "downloaded_file" | ||
| if os.path.isdir(target_dir): | ||
| local_path = os.path.join(target_dir, filename) | ||
| else: | ||
| local_path = target_dir |
There was a problem hiding this comment.
Path traversal risk: sanitize filenames from Content-Disposition.
The filename parsed from Content-Disposition is passed directly to os.path.join(target_dir, filename) with no sanitization. A hostile or misconfigured mirror (exactly the scenario this PR is hardening against) can respond with e.g. Content-Disposition: attachment; filename="../../../evil.sh" and cause the download to be written outside target_dir. Note the URL-path fallback already applies os.path.basename, but the header branch does not.
Strip directory components (and ideally NUL / path separators) before joining.
🛡️ Proposed fix
if cd:
m_star = re.search(r"filename\*=(?:UTF-8'')?([^;\r\n]+)", cd)
if m_star:
filename = urllib.parse.unquote(m_star.group(1).strip('"\''))
else:
m = re.search(r"filename=([^;\r\n]+)", cd)
if m:
filename = m.group(1).strip('"\'')
+ # Never trust server-provided filenames: strip any path components.
+ if filename:
+ filename = os.path.basename(filename.replace("\\", "/").strip()).lstrip(".") or ""
if not filename:
final_for_name = r.url or final_url
filename = os.path.basename(urllib.parse.urlparse(final_for_name).path)
if not filename:
filename = "downloaded_file"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/fosslight_util/download.py` around lines 688 - 708, The parsed filename
from the Content-Disposition branch (the variable filename used before computing
local_path and joining with target_dir) must be sanitized to prevent path
traversal: after extracting filename in the code that handles
Content-Disposition (the block that sets filename via m_star or m), strip any
directory components (e.g., run os.path.basename on filename), remove or replace
any NULs and path separators (both os.sep and os.altsep, forward/back slashes)
and collapse any .. sequences so the final filename is a simple basename-safe
token before using os.path.join(target_dir, filename) to produce local_path;
apply the same sanitization helper to any header-derived filename to match the
URL-path fallback behavior.
| m_star = re.search(r"filename\*=(?:UTF-8'')?([^;\r\n]+)", cd) | ||
| if m_star: | ||
| filename = urllib.parse.unquote(m_star.group(1).strip('"\'')) |
There was a problem hiding this comment.
RFC 5987 filename* regex misses the language tag form.
Per RFC 5987, the filename* value is charset'lang'percent-encoded-value, and lang is frequently empty but can be non-empty (e.g. UTF-8'en'file.txt). The current pattern filename\*=(?:UTF-8'')?([^;\r\n]+) only strips the literal UTF-8'' prefix, so a value like filename*=UTF-8'en'file.txt is captured verbatim and urllib.parse.unquote leaves the UTF-8'en' prefix embedded in the filename. Also, non-UTF-8 charsets (e.g. ISO-8859-1''...) aren’t handled.
♻️ Proposed fix
- m_star = re.search(r"filename\*=(?:UTF-8'')?([^;\r\n]+)", cd)
- if m_star:
- filename = urllib.parse.unquote(m_star.group(1).strip('"\''))
+ m_star = re.search(
+ r"filename\*=(?:[\w-]+)?'[^']*'([^;\r\n]+)", cd, re.IGNORECASE
+ )
+ if m_star:
+ filename = urllib.parse.unquote(m_star.group(1).strip('"\''))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/fosslight_util/download.py` around lines 693 - 695, The regex and
handling around m_star in download.py incorrectly assume a literal UTF-8''
prefix; update the detection and parsing for filename* to capture the charset,
optional language tag, and the percent-encoded value (e.g. match
filename\*=(?P<charset>[^']*)'(?:[^']*)'(?P<enc>[^;\r\n]+)), then replace the
current urllib.parse.unquote call: extract charset (default to 'utf-8' if
missing), percent-decode to bytes (urllib.parse.unquote_to_bytes) and decode
with the captured charset (fall back safely if decoding fails), assigning the
result back to filename; keep using the existing m_star and filename symbols so
the rest of the function continues to work.
Description
Retry with browser and curl-like UA on mirror blocks.