CSHARP-5930: Update replica-set discovery and reporting in tests#1974
Open
ajcvickers wants to merge 12 commits into
Open
CSHARP-5930: Update replica-set discovery and reporting in tests#1974ajcvickers wants to merge 12 commits into
ajcvickers wants to merge 12 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Updates test infrastructure to correctly identify replica-set topology and adjust spec/prose tests behavior when running against direct connections to replica-set members (especially single-node RS), preventing unsafe/invalid test executions and improving environment-dependent skipping.
Changes:
- Refines cluster/topology detection in test helpers to treat
directConnectionreplica-set members as replica-set (based on server type), and adds aReplicaSetDataBearingMembers(min)requirement. - Updates unified spec runner and SDAM prose tests to skip cases that require full replica-set topology discovery (or a secondary) when the environment can’t provide it.
- Makes examples use
CoreTestConfiguration.ConnectionStringinstead of hardcoded localhost URIs and adds an explicit skip for CSFLE2 auto-encryption whenCRYPT_SHARED_LIB_PATHis missing.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/MongoDB.Driver.Tests/Specifications/server-discovery-and-monitoring/prose-tests/ServerDiscoveryProseTests.cs | Skips SDAM prose test when no secondary is available. |
| tests/MongoDB.Driver.Tests/Specifications/UnifiedTestSpecRunner.cs | Adds direct-connect-to-RS detection and skips unified spec cases requiring full RS discovery events. |
| tests/MongoDB.Driver.Tests/ConnectionsSurvivePrimaryStepDownTests.cs | Ensures the replSetStepDown test only runs when >=2 data-bearing members exist. |
| tests/MongoDB.Driver.TestHelpers/DriverTestConfiguration.cs | Detects replica-set via server types (works with direct connect). |
| tests/MongoDB.Driver.TestHelpers/Core/XunitExtensions/RequireServer.cs | Uses effective cluster type for direct connections; adds RS member-count requirement; updates runOn topology matching to use server type. |
| tests/MongoDB.Driver.Examples/ClientSideEncryption2Examples.cs | Skips CSFLE2 auto-encryption without crypt_shared lib; uses test connection string for clients. |
| tests/MongoDB.Driver.Examples/CausalConsistencyExamples.cs | Uses test connection string + ReadPreference.SecondaryPreferred via settings rather than hardcoded localhost URI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
See CSHARP-6008 for details about why this change is needed.
There were multiple iterations getting to this point:
## Iteration 1
RequireServer.cs
- ClusterType()/ClusterTypes(): Now use GetActualClusterType() which maps a directConnect RS member to ClusterType.ReplicaSet (instead of the misleading Standalone)
- topology/topologies case: Uses actual server type — single matches only Standalone, replicaset matches IsReplicaSetMember()
DriverTestConfiguration.cs
- IsReplicaSet(): Checks server.Type.IsReplicaSetMember() instead of cluster.Description.Type == ClusterType.ReplicaSet, so a directConnect RS is correctly identified
ReadPreferenceOnStandaloneTests.cs (from earlier)
- Uses ServerType.Standalone instead of ClusterType.Standalone to decide whether $readPreference is expected
ServerDiscoveryProseTests.cs
- Missing secondary now throws SkipException instead of a plain Exception
UnifiedTestSpecRunner.cs
- Added IsDirectConnectionToReplicaSet() helper
- ServerDiscoveryAndMonitoring: Skips logging-replicaset.json, replicaset-emit-topology-changed-before-close.json, rediscover-quickly-after-step-down.json when on directConnect RS (these require full multi-server topology discovery events)
- ServerSelection: Skips replica-set.json when on directConnect RS (same reason)
## Iteration 2
- Root cause: My GetActualClusterType() change made RequireServer.ClusterType(ClusterType.ReplicaSet) pass for a directConnect RSPrimary. This caused Connection_pool_should_not_be_cleared_when_replSetStepDown_and_GetMore to run — a test that was previously always skipped on this setup. That test calls { replSetStepDown: 30, force: true }, which makes the single-node RS step down and refuse writes for up to 30 seconds. All subsequent write operations in other tests (like DropCollection in fixture setup) then fail with MongoNotPrimaryException.
- Fix: Added RequireServer.ReplicaSetDataBearingMembers(int minimum) — a new check that counts how many data-bearing RS members the cluster has
## Iteration 3
- Applied .ReplicaSetDataBearingMembers(2) to Connection_pool_should_not_be_cleared_when_replSetStepDown_and_GetMore — the step-down test only makes sense (and is safe) with at least 2 RS members so a new primary can be elected after the step-down
## Iteration 4
- CausalConsistencyExamples.cs — Causal_Consistency_Example_3: Replaced the hardcoded "mongodb://localhost/?readPreference=secondaryPreferred" with settings built from CoreTestConfiguration.ConnectionString plus ReadPreference.SecondaryPreferred. The example intent is preserved.
- ClientSideEncryption2Examples.cs — FLE2AutomaticEncryption: Added an explicit skip when CRYPT_SHARED_LIB_PATH is not set (CSFLE/auto-encryption requires it), and fixed two new MongoClient() calls to use CoreTestConfiguration.ConnectionString.ToString() instead of the default localhost:27017.
397c392 to
2b4c19b
Compare
…ue and a normal connection to a single node replica set. - Renamed IsDirectConnectionToReplicaSet() → IsSingleNodeReplicaSet() — better reflects that it covers both connection modes. - Updated the logic: the old method required description.DirectConnection == true. The new method checks servers.Count == 1 && servers[0].Type.IsReplicaSetMember(), which is true for both: - directConnection=true to a RS member (old case) - replicaSet=rs0 on a single-node RS (new case) - Updated both call sites in ServerDiscoveryAndMonitoring and ServerSelection with updated comments. Update Causal_Consistency_Example_2 to skip because it needs two nodes
- GetActualClusterType: add SpinWait (10 s) in the DirectConnection path so
the server type is known before reading it, avoiding spurious Unknown
results on a fresh cluster. Throws InvalidOperationException if the
timeout expires, matching the pattern in DriverTestConfiguration.IsReplicaSet.
- IsRequirementSatisfied "topologies"/"topology" case: same SpinWait before
reading the server type from the cluster description.
- SkipNotSupportedTestCases: add optional `reason` parameter so callers can
supply a specific message instead of the generic "not supported" text.
- Single-node RS skips in UnifiedTestSpecRunner now emit messages that
explicitly mention "single-node replica set" for searchable CI logs
("does not produce full RS topology discovery events" / "does not have a
secondary").
CSHARP-5930: Address second-round review feedback
- GetActualClusterType: replace InvalidOperationException with SkipException
on spin-wait timeout, consistent with every other Require* helper that
signals a skip rather than a hard failure when the topology is not as
expected.
- SkipNotSupportedTestCases: use StartsWith(operationName + ":") when
operationName ends in ".json" so that a similarly-named spec file (e.g.
"other-logging-replicaset.json") cannot accidentally match. Non-filename
matches continue to use Contains as before.
- ServerDiscoveryProseTests: add ReplicaSetDataBearingMembers(2) to the
RequireServer chain so single-node RS is skipped at the require level.
The secondary-not-found check below it is now a hard failure, because the
require guard means we must be on a multi-node RS where a secondary is
expected.
CSHARP-5930: Address third-round review feedback
- UnifiedTestSpecRunner: add CSHARP-6008 references to the single-node RS
skip-block comments so the skips are auditable and findable.
- UnifiedTestSpecRunner: split the dual-mode SkipNotSupportedTestCases into
two clearly-named helpers: SkipFile (StartsWith match on filename, always
requires an explicit reason) and SkipNotSupportedTestCases (Contains match
on description substring, unchanged callers).
- RequireServer.GetActualClusterType: add a one-line comment explaining why
the non-direct-connection path reads Description.Type without a spin-wait.
- RequireServer.IsTopologyMatch: add comment confirming that matching any
IsReplicaSetMember() for "replicaset" is intentional per unified-spec
semantics (Primary / Secondary / Arbiter / Other / Ghost all qualify).
- DriverTestConfiguration.IsReplicaSet: add comment noting the broad RS-
member match and directing callers that need a data-bearing member to use
GetReplicaSetNumberOfDataBearingMembers instead.
More review feedback
[review-iter 5] Address review findings
…ion.Servers would likely be empty or all-Unknown, the factory would return false, and the Lazy<bool> would cache that false for the rest of the process — silently un-skipping logging-replicaset.json, replicaset-emit-topology-changed-before-close.json, rediscover-quickly-after-step-down.json, and replica-set.json on real single-node RS environments. - The fix captures the SpinWait.SpinUntil return and throws SkipException with the cluster description on timeout, mirroring the precedent in RequireServer.GetActualClusterType() at RequireServer.cs:325-328. The Lazy then caches the exception, so all subsequent calls in a broken-cluster process throw the same skip rather than spinning another 10s each. - Fix at RequireServer.cs:256-263: short-circuits when cluster.Description.DirectConnection && minimum > 1 so each affected test skips instantly instead of paying a 10-second timeout. The original directConnection comment is reworded into the new short-circuit branch. - Summary: ReplicaSetDataBearingMembers now no-ops on non-RS topologies instead of throwing. Effect on the three callers: - ConnectionsSurvivePrimaryStepDownTests.cs:80 — unchanged (already gated by .ClusterType(ReplicaSet)). - ServerDiscoveryProseTests.cs:55 — unchanged (already gated by .ClusterTypes(ReplicaSet)). - CausalConsistencyExamples.cs:78 — now correctly runs on Sharded (where mongos handles per-shard secondary routing for ReadPreference.Secondary) while still skipping on single-node RS. - Three changes in ServerDiscoveryProseTests.cs:62-77: 1. Wait on the right predicate: spin on Any(s => Connected && ReplicaSetSecondary) instead of ClusterState.Connected. The old check was satisfied by the primary alone and never actually waited for a secondary. 2. Longer timeout: 3s → 10s, matching the other SDAM spins in the codebase, reducing the slow-environment flakiness Copilot flagged. 3. Richer diagnostic: include clusterDescription in the failure message, so when this branch does fire it points at the real culprit (likely ReplicaSetDataBearingMembers undercounting). - Kept Exception rather than SkipException per the iter 2 reviewer's intent — added an inline comment so a future reader doesn't try to re-skip it. The stale "this case should be skipped" line in my old commit message was just out-of-date; iter 2 explicitly reverted that.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See CSHARP-6008 for details about why this change is needed.
There were multiple iterations getting to this point:
Iteration 1
RequireServer.cs
DriverTestConfiguration.cs
ReadPreferenceOnStandaloneTests.cs (from earlier)
ServerDiscoveryProseTests.cs
UnifiedTestSpecRunner.cs
Iteration 2
Iteration 3
Iteration 4