You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This change improves the quality of Markdown output from the Rust CLI, particularly for math-heavy (Wikipedia) and JavaScript-heavy sites. It ensures that raw XML/TeX from and tags is skipped, and that LaTeX alt text is correctly formatted for LLM consumption. It also fixes a bug in the integration test suite's assertion helper.
Updated strip_html in direct_fetch.rs to:
- Skip content in <math>, <svg>, and <noscript> tags to reduce noise.
- Wrap LaTeX in <img> alt attributes (starting with {\displaystyle) in $ delimiters.
- Added unit tests to verify these improvements.
- Fixed a bug in integration tests where assertions were incorrectly wrapped in tuples.
- Formatted code with cargo fmt and ruff.
Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.
When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.
I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!
For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!
We reviewed changes in b0fe7e8...7d7e098 on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.
Some issues found as part of this review are outside of the diff in this pull request and aren't shown in the inline review comments due to GitHub's API limitations. You can see those issues on the DeepSource dashboard.
AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer TIP This summary will be updated as you push new changes.
This commit improves the HTML-to-Markdown conversion in the Rust CLI and
fixes a critical bug in the integration test suite.
Changes:
- Updated `strip_html` in `direct_fetch.rs` to skip noise from `<math>`,
`<svg>`, and `<noscript>` tags.
- Improved LaTeX detection in `img` alt attributes, wrapping detected
TeX in `$` delimiters for LLM-readability.
- Added unit tests in Rust for the new parsing logic.
- Fixed `test_cli_markdown.py` where assertions were incorrectly using
tuples (e.g., `assert (cond), "msg"`), which were always evaluated
as True.
- Applied `black` formatting to satisfy CI linting requirements.
Verified with `cargo test` and `pytest -m integration`.
Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>
@jules fn strip_html has a cyclomatic complexity of 39 with "very-high" risk
Minor
Anti Pattern RS-R1000
}
/// Strip HTML tags and convert to plain text with basic formatting
fn strip_html(html: &str) -> String {
let mut result = String::new();
let mut in_tag = false;
let mut current_tag = String::new(); cli/src/providers/direct_fetch.rs
@jules fn strip_html has a cyclomatic complexity of 39 with "very-high" risk
Minor
Anti Pattern RS-R1000
}
/// Strip HTML tags and convert to plain text with basic formatting
fn strip_html(html: &str) -> String {
let mut result = String::new();
let mut in_tag = false;
let mut current_tag = String::new(); cli/src/providers/direct_fetch.rs
Good catch! I've refactored strip_html by extracting the logic into a StripperState struct with focused helper methods (handle_tag, handle_opening_tag, handle_closing_tag). This significantly reduces the cyclomatic complexity and improves readability while maintaining the same functionality verified by tests.
- Addresses Clippy error `clippy::needless_lifetimes` by eliding the
explicit `'a` lifetime in the `impl` block for `StripperState`.
- This ensures CI checks pass for the Rust CLI.
- Mainloop and parsing logic remain unchanged and verified by tests.
Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>
- Fixed Clippy error `clippy::needless_lifetimes` in `StripperState`.
- Applied `cargo fmt` to satisfy Rust formatting requirements.
- Verified all tests and lints pass locally.
- This commit updates the existing PR branch to resolve CI failures.
Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change improves the quality of Markdown output from the Rust CLI, particularly for math-heavy (Wikipedia) and JavaScript-heavy sites. It ensures that raw XML/TeX from and tags is skipped, and that LaTeX alt text is correctly formatted for LLM consumption. It also fixes a bug in the integration test suite's assertion helper.
PR created automatically by Jules for task 10941850418897376774 started by @d-oit