This document explains the high-level architecture of diff.rs. It is intended
as an explainer that is useful for people to quickly get up to speed on how it
works.
Currently, diff.rs is a single-page web application implemented in Rust using
Yew. It is structured like this:
src/main.rsis the binary entrypoint. It sets up logging and the Yew rendering.src/lib.rsis the library entrypoint. It defines the routing and re-exports definitions. The router will map every route to a view.src/views/contains views, these are the root components for entire pages. There is one module per view. Every module exports a single view, but can also contain private components which are only used in that particular view.src/components/contains components which are re-used across views.src/data.rscontains the types used to communicate with the crates.io API, the crate source parsing, and the diffing logic.src/cache.rscontains the in-memory caches for crate metadata and crate sources, which avoid repeat network requests during a session.src/syntax.rscontains syntax highlighting helpers built on top of syntect.src/version.rscontains theVersionIdtype, which can represent an exact version, a version requirement, or a named version such aslatest.src/tailwind.csscontains styles for the components and views.index.htmlcontains the skeleton and metadata for Trunk for which assets to build and bundle.
See also Contributing for more information on the structure.
To render a diff, it uses gloo-net to make a
request to the crates.io API in order to fetch crate
metadata. This is a JSON structure that is parsed into a CrateResponse using
serde and serde_json.
Using that response, the code will resolve the versions that are in the URL by
looking them up in the versions field of that response. If they exist, the
code then performs another request to fetch the crate sources. These are
gzip-compressed tar balls, which are decompressed using
flate2 and extracted in-memory using
tar.
Before the archive is extracted, the SHA-256 hash of the downloaded tarball is
verified against the checksum returned by the crates.io API. This ensures that
what gets diffed is exactly what crates.io has on record. Non-UTF-8 paths and
paths outside the expected {crate}-{version}/ prefix inside the archive are
rejected, which prevents files from being silently hidden from the UI.
Finally, the code uses similar to generate a diff and render it in the browser. It uses syntect for syntax highlighting.
To avoid repeated network requests for the same data during a session,
diff.rs maintains two in-memory caches in src/cache.rs:
CRATE_RESPONSE_CACHEstores parsedCrateResponsevalues keyed by crate name.CRATE_SOURCE_CACHEstores parsedCrateSourcevalues keyed by(crate_name, version).
Both caches are global singletons held behind a Mutex. They live for the
lifetime of the page — there is no persistence across reloads.
In addition to diffing two published crate versions, diff.rs can also diff a
published crate version against the source in its repository at the commit it
was published from. The commit SHA is read from the .cargo_vcs_info.json
file that Cargo embeds in every published crate. The repository tarball is
then downloaded from GitHub or GitLab. Because those hosts do not send
permissive CORS headers, the request is routed through corsproxy.io.