Skip to content

test: p2p_node_network_limited.py --v2transport intermittently disconnects during connect_nodes #7288

@thepastaclaw

Description

@thepastaclaw

Summary

linux64_tsan-test / Test source intermittently fails in p2p_node_network_limited.py --v2transport with AssertionError: Error: peer disconnected. This is not caused by PR-specific code in dashpay/dash#7230; the same head SHA passed on rerun without any branch changes.

Evidence

Failure mode

The failure happens here:

File "test/functional/p2p_node_network_limited.py", line 83, in run_test
    self.connect_nodes(0, 2)
...
AssertionError: Error: peer disconnected

Combined logs show node 0 immediately disconnecting node 2 after node 2 requests a block below the NODE_NETWORK_LIMITED threshold:

ProcessGetBlockData [net] Ignore block request below NODE_NETWORK_LIMITED threshold, disconnect peer=2

connect_nodes() is still waiting for the outbound peer to stay connected long enough to exchange a pong, so the helper fails with peer disconnected.

Diagnosis

This looks timing-sensitive / transport-sensitive rather than PR-specific:

  • PR #7230 only changes src/node/interfaces.cpp and src/wallet/wallet.cpp.
  • The failing test is test/functional/p2p_node_network_limited.py.
  • The exact same PR head passed on rerun, so there is no deterministic wallet-side regression here.

The likely issue is that the test currently assumes connect_nodes(0, 2) will remain connected long enough for the helper handshake, but under TSAN + --v2transport the pruned node can disconnect node 2 quickly enough that the helper trips first.

Reproduction ideas

I have not reproduced this locally outside CI yet. The closest reproduction path is to loop the test under a slow / TSAN-like environment:

python3 test/functional/test_runner.py p2p_node_network_limited.py --v2transport

or repeatedly rerun the TSAN functional shard in CI until the timing window appears.

Suggested direction

Harden the test so it does not rely on connect_nodes() succeeding when the scenario itself can legitimately trigger a fast disconnect. For example, make the unsynced-node phase explicitly tolerate the disconnect and assert the expected postcondition (node2 stays at height 0) without requiring a stable pong handshake first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions