Fix path-traversal vulnerability in emergency P2P checkpoint service#3105
Fix path-traversal vulnerability in emergency P2P checkpoint service#3105YuvalElbar6 wants to merge 1 commit into
Conversation
|
Hi, thanks for your contribution! Since Orbax runs on top of JAX, and JAX's coordination service doesn't enforce strong authentication, the Orbax runtime generally assumes it's operating in a secure network environment. In practice, if there were a "compromised peer," we would already be facing much larger systemic risks. That said, we always welcome security enhancements. Could you please clean up the empty commits in the PR so we can get it merged? |
|
Thanks for the context on the threat model — agreed it's a defense-in-depth hardening rather than a critical fix. I've cleaned up the merge commits; the PR is now a single commit on top of main. |
|
Hi @orbax-dev, @linxiulei this PR is now one commit can we merge it? |
862ab75 to
3a54a83
Compare
|
Hey @YuvalElbar6, we don't have external contributions properly set up, so your commit would just get overwritten next time we sync from internal. Waiting for @linxiulei to send me an internal version of this change, at which point we can merge this PR. |
linxiulei
left a comment
There was a problem hiding this comment.
Review comments are informative if you have no objections and they will be in the final merge.
| self.enter_context( | ||
| mock.patch.object(service, '_ThreadingTCPServer', autospec=True) | ||
| ) | ||
| server = service._ThreadingTCPServer.return_value |
There was a problem hiding this comment.
mock_server_cls = self.enter_context(
mock.patch.object(service, '_ThreadingTCPServer', autospec=True)
)
server = mock_server_cls.return_value
|
|
||
| A malicious peer returns a manifest whose ``rel_path`` tries to escape the | ||
| staging directory (e.g., writing a ``.pth`` file into site-packages). | ||
| ``fetch_shard_from_peer`` must abort before any ``download`` call. |
There was a problem hiding this comment.
Args:
mock_download: Mock for download.
mock_request: Mock for request.
unused_mock_time: Unused mock for time.
unused_mock_move: Unused mock for move.
unused_mock_rmtree: Unused mock for rmtree.
to be consistent with code style.
| mock_request, | ||
| unused_mock_time, | ||
| unused_mock_move, | ||
| unused_mock_rmtree, |
|
Yes I have no objection, |
A malicious or compromised peer on the P2P network could supply a
manifest whose rel_path contained '..' segments or an absolute path,
causing P2PNode.fetch_shard_from_peer() to write attacker-controlled
bytes outside the staging directory (e.g. a .pth file in site-packages,
yielding persistent RCE on the training host).
- Add _safe_path_join() which joins a peer-supplied relative path onto
a base directory only if the resolved result stays inside that base.
Resolution goes through os.path.realpath so symlink-escape attempts
are caught as well.
- Apply the helper on both sides of the wire:
* Client: fetch_shard_from_peer() validates every manifest entry
against stage_dir and aborts the whole fetch on any unsafe entry.
* Server: handle_download() replaces the substring '..' check with
the same resolve-based containment check against self.directory.
- Log every rejection with peer and request context.
- Add regression tests for the helper and both call sites.
Reported via the Google OSS VRP.
|
Hi @linxiulei I fiixed the comments |
A malicious or compromised peer on the P2P network could supply a manifest whose rel_path contained '..' segments or an absolute path, causing P2PNode.fetch_shard_from_peer() to write attacker-controlled bytes outside the staging directory (e.g. a .pth file in site-packages, yielding persistent RCE on the training host).
Reported via the Google OSS VRP.