Skip to content

CSHARP-5930: Update replica-set discovery and reporting in tests#1974

Open
ajcvickers wants to merge 12 commits into
mongodb:mainfrom
ajcvickers:CSHARP-5930k
Open

CSHARP-5930: Update replica-set discovery and reporting in tests#1974
ajcvickers wants to merge 12 commits into
mongodb:mainfrom
ajcvickers:CSHARP-5930k

Conversation

@ajcvickers
Copy link
Copy Markdown
Contributor

See CSHARP-6008 for details about why this change is needed.

There were multiple iterations getting to this point:

Iteration 1

RequireServer.cs

  • ClusterType()/ClusterTypes(): Now use GetActualClusterType() which maps a directConnect RS member to ClusterType.ReplicaSet (instead of the misleading Standalone)
  • topology/topologies case: Uses actual server type — single matches only Standalone, replicaset matches IsReplicaSetMember()

DriverTestConfiguration.cs

  • IsReplicaSet(): Checks server.Type.IsReplicaSetMember() instead of cluster.Description.Type == ClusterType.ReplicaSet, so a directConnect RS is correctly identified

ReadPreferenceOnStandaloneTests.cs (from earlier)

  • Uses ServerType.Standalone instead of ClusterType.Standalone to decide whether $readPreference is expected

ServerDiscoveryProseTests.cs

  • Missing secondary now throws SkipException instead of a plain Exception

UnifiedTestSpecRunner.cs

  • Added IsDirectConnectionToReplicaSet() helper
  • ServerDiscoveryAndMonitoring: Skips logging-replicaset.json, replicaset-emit-topology-changed-before-close.json, rediscover-quickly-after-step-down.json when on directConnect RS (these require full multi-server topology discovery events)
  • ServerSelection: Skips replica-set.json when on directConnect RS (same reason)

Iteration 2

  • Root cause: My GetActualClusterType() change made RequireServer.ClusterType(ClusterType.ReplicaSet) pass for a directConnect RSPrimary. This caused Connection_pool_should_not_be_cleared_when_replSetStepDown_and_GetMore to run — a test that was previously always skipped on this setup. That test calls { replSetStepDown: 30, force: true }, which makes the single-node RS step down and refuse writes for up to 30 seconds. All subsequent write operations in other tests (like DropCollection in fixture setup) then fail with MongoNotPrimaryException.
  • Fix: Added RequireServer.ReplicaSetDataBearingMembers(int minimum) — a new check that counts how many data-bearing RS members the cluster has

Iteration 3

  • Applied .ReplicaSetDataBearingMembers(2) to Connection_pool_should_not_be_cleared_when_replSetStepDown_and_GetMore — the step-down test only makes sense (and is safe) with at least 2 RS members so a new primary can be elected after the step-down

Iteration 4

  • CausalConsistencyExamples.cs — Causal_Consistency_Example_3: Replaced the hardcoded "mongodb://localhost/?readPreference=secondaryPreferred" with settings built from CoreTestConfiguration.ConnectionString plus ReadPreference.SecondaryPreferred. The example intent is preserved.
  • ClientSideEncryption2Examples.cs — FLE2AutomaticEncryption: Added an explicit skip when CRYPT_SHARED_LIB_PATH is not set (CSFLE/auto-encryption requires it), and fixed two new MongoClient() calls to use CoreTestConfiguration.ConnectionString.ToString() instead of the default localhost:27017.

@ajcvickers ajcvickers requested a review from Copilot April 28, 2026 10:59
@ajcvickers ajcvickers added the chore Non–user-facing code changes (tests, build scripts, etc.). label Apr 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates test infrastructure to correctly identify replica-set topology and adjust spec/prose tests behavior when running against direct connections to replica-set members (especially single-node RS), preventing unsafe/invalid test executions and improving environment-dependent skipping.

Changes:

  • Refines cluster/topology detection in test helpers to treat directConnection replica-set members as replica-set (based on server type), and adds a ReplicaSetDataBearingMembers(min) requirement.
  • Updates unified spec runner and SDAM prose tests to skip cases that require full replica-set topology discovery (or a secondary) when the environment can’t provide it.
  • Makes examples use CoreTestConfiguration.ConnectionString instead of hardcoded localhost URIs and adds an explicit skip for CSFLE2 auto-encryption when CRYPT_SHARED_LIB_PATH is missing.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/MongoDB.Driver.Tests/Specifications/server-discovery-and-monitoring/prose-tests/ServerDiscoveryProseTests.cs Skips SDAM prose test when no secondary is available.
tests/MongoDB.Driver.Tests/Specifications/UnifiedTestSpecRunner.cs Adds direct-connect-to-RS detection and skips unified spec cases requiring full RS discovery events.
tests/MongoDB.Driver.Tests/ConnectionsSurvivePrimaryStepDownTests.cs Ensures the replSetStepDown test only runs when >=2 data-bearing members exist.
tests/MongoDB.Driver.TestHelpers/DriverTestConfiguration.cs Detects replica-set via server types (works with direct connect).
tests/MongoDB.Driver.TestHelpers/Core/XunitExtensions/RequireServer.cs Uses effective cluster type for direct connections; adds RS member-count requirement; updates runOn topology matching to use server type.
tests/MongoDB.Driver.Examples/ClientSideEncryption2Examples.cs Skips CSFLE2 auto-encryption without crypt_shared lib; uses test connection string for clients.
tests/MongoDB.Driver.Examples/CausalConsistencyExamples.cs Uses test connection string + ReadPreference.SecondaryPreferred via settings rather than hardcoded localhost URI.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/MongoDB.Driver.TestHelpers/Core/XunitExtensions/RequireServer.cs Outdated
Comment thread tests/MongoDB.Driver.Examples/ClientSideEncryption2Examples.cs Outdated
See CSHARP-6008 for details about why this change is needed.

There were multiple iterations getting to this point:

## Iteration 1

RequireServer.cs
- ClusterType()/ClusterTypes(): Now use GetActualClusterType() which maps a directConnect RS member to ClusterType.ReplicaSet (instead of the misleading Standalone)
- topology/topologies case: Uses actual server type — single matches only Standalone, replicaset matches IsReplicaSetMember()

DriverTestConfiguration.cs
- IsReplicaSet(): Checks server.Type.IsReplicaSetMember() instead of cluster.Description.Type == ClusterType.ReplicaSet, so a directConnect RS is correctly identified

ReadPreferenceOnStandaloneTests.cs (from earlier)
- Uses ServerType.Standalone instead of ClusterType.Standalone to decide whether $readPreference is expected

ServerDiscoveryProseTests.cs
- Missing secondary now throws SkipException instead of a plain Exception

UnifiedTestSpecRunner.cs
- Added IsDirectConnectionToReplicaSet() helper
- ServerDiscoveryAndMonitoring: Skips logging-replicaset.json, replicaset-emit-topology-changed-before-close.json, rediscover-quickly-after-step-down.json when on directConnect RS (these require full multi-server topology discovery events)
- ServerSelection: Skips replica-set.json when on directConnect RS (same reason)

## Iteration 2

- Root cause: My GetActualClusterType() change made RequireServer.ClusterType(ClusterType.ReplicaSet) pass for a directConnect RSPrimary. This caused Connection_pool_should_not_be_cleared_when_replSetStepDown_and_GetMore to run — a test that was previously always skipped on this setup. That test calls { replSetStepDown: 30, force: true }, which makes the single-node RS step down and refuse writes for up to 30 seconds. All subsequent write operations in other tests (like DropCollection in fixture setup) then fail with MongoNotPrimaryException.
- Fix: Added RequireServer.ReplicaSetDataBearingMembers(int minimum) — a new check that counts how many data-bearing RS members the cluster has

## Iteration 3

- Applied .ReplicaSetDataBearingMembers(2) to Connection_pool_should_not_be_cleared_when_replSetStepDown_and_GetMore — the step-down test only makes sense (and is safe) with at least 2 RS members so a new primary can be elected after the step-down

## Iteration 4

- CausalConsistencyExamples.cs — Causal_Consistency_Example_3: Replaced the hardcoded "mongodb://localhost/?readPreference=secondaryPreferred" with settings built from CoreTestConfiguration.ConnectionString plus ReadPreference.SecondaryPreferred. The example intent is preserved.
- ClientSideEncryption2Examples.cs — FLE2AutomaticEncryption: Added an explicit skip when CRYPT_SHARED_LIB_PATH is not set (CSFLE/auto-encryption requires it), and fixed two new MongoClient() calls to use CoreTestConfiguration.ConnectionString.ToString() instead of the default localhost:27017.
@ajcvickers ajcvickers force-pushed the CSHARP-5930k branch 2 times, most recently from 397c392 to 2b4c19b Compare May 18, 2026 15:27
…ue and a normal connection to a single node replica set.

- Renamed IsDirectConnectionToReplicaSet() → IsSingleNodeReplicaSet() — better reflects that it covers both connection modes.
- Updated the logic: the old method required description.DirectConnection == true. The new method checks servers.Count == 1 && servers[0].Type.IsReplicaSetMember(), which is true for both:
 - directConnection=true to a RS member (old case)
 - replicaSet=rs0 on a single-node RS (new case)
- Updated both call sites in ServerDiscoveryAndMonitoring and ServerSelection with updated comments.

Update Causal_Consistency_Example_2 to skip because it needs two nodes
- GetActualClusterType: add SpinWait (10 s) in the DirectConnection path so
  the server type is known before reading it, avoiding spurious Unknown
  results on a fresh cluster. Throws InvalidOperationException if the
  timeout expires, matching the pattern in DriverTestConfiguration.IsReplicaSet.

- IsRequirementSatisfied "topologies"/"topology" case: same SpinWait before
  reading the server type from the cluster description.

- SkipNotSupportedTestCases: add optional `reason` parameter so callers can
  supply a specific message instead of the generic "not supported" text.

- Single-node RS skips in UnifiedTestSpecRunner now emit messages that
  explicitly mention "single-node replica set" for searchable CI logs
  ("does not produce full RS topology discovery events" / "does not have a
  secondary").

CSHARP-5930: Address second-round review feedback

- GetActualClusterType: replace InvalidOperationException with SkipException
  on spin-wait timeout, consistent with every other Require* helper that
  signals a skip rather than a hard failure when the topology is not as
  expected.

- SkipNotSupportedTestCases: use StartsWith(operationName + ":") when
  operationName ends in ".json" so that a similarly-named spec file (e.g.
  "other-logging-replicaset.json") cannot accidentally match.  Non-filename
  matches continue to use Contains as before.

- ServerDiscoveryProseTests: add ReplicaSetDataBearingMembers(2) to the
  RequireServer chain so single-node RS is skipped at the require level.
  The secondary-not-found check below it is now a hard failure, because the
  require guard means we must be on a multi-node RS where a secondary is
  expected.

CSHARP-5930: Address third-round review feedback

- UnifiedTestSpecRunner: add CSHARP-6008 references to the single-node RS
  skip-block comments so the skips are auditable and findable.

- UnifiedTestSpecRunner: split the dual-mode SkipNotSupportedTestCases into
  two clearly-named helpers: SkipFile (StartsWith match on filename, always
  requires an explicit reason) and SkipNotSupportedTestCases (Contains match
  on description substring, unchanged callers).

- RequireServer.GetActualClusterType: add a one-line comment explaining why
  the non-direct-connection path reads Description.Type without a spin-wait.

- RequireServer.IsTopologyMatch: add comment confirming that matching any
  IsReplicaSetMember() for "replicaset" is intentional per unified-spec
  semantics (Primary / Secondary / Arbiter / Other / Ghost all qualify).

- DriverTestConfiguration.IsReplicaSet: add comment noting the broad RS-
  member match and directing callers that need a data-bearing member to use
  GetReplicaSetNumberOfDataBearingMembers instead.

More review feedback
@ajcvickers ajcvickers requested a review from Copilot May 20, 2026 10:21
@ajcvickers ajcvickers marked this pull request as ready for review May 20, 2026 10:21
@ajcvickers ajcvickers requested a review from a team as a code owner May 20, 2026 10:21
@ajcvickers ajcvickers requested a review from papafe May 20, 2026 10:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment thread tests/MongoDB.Driver.Tests/Specifications/UnifiedTestSpecRunner.cs Outdated
Comment thread tests/MongoDB.Driver.Examples/CausalConsistencyExamples.cs
…ion.Servers would likely be empty or all-Unknown, the factory would return false, and the Lazy<bool> would cache that false for the rest of the process — silently un-skipping logging-replicaset.json, replicaset-emit-topology-changed-before-close.json, rediscover-quickly-after-step-down.json, and replica-set.json on real single-node RS environments.

  - The fix captures the SpinWait.SpinUntil return and throws SkipException with the cluster description on timeout, mirroring the precedent in RequireServer.GetActualClusterType() at RequireServer.cs:325-328. The Lazy then caches the exception, so all subsequent calls in a broken-cluster process throw the same skip rather than spinning another 10s each.
- Fix at RequireServer.cs:256-263: short-circuits when cluster.Description.DirectConnection && minimum > 1 so each affected test skips instantly instead of paying a 10-second timeout. The original directConnection comment is reworded into the new short-circuit branch.
- Summary: ReplicaSetDataBearingMembers now no-ops on non-RS topologies instead of throwing. Effect on the three callers:
  - ConnectionsSurvivePrimaryStepDownTests.cs:80 — unchanged (already gated by .ClusterType(ReplicaSet)).
  - ServerDiscoveryProseTests.cs:55 — unchanged (already gated by .ClusterTypes(ReplicaSet)).
  - CausalConsistencyExamples.cs:78 — now correctly runs on Sharded (where mongos handles per-shard secondary routing for ReadPreference.Secondary) while still skipping on single-node RS.
-  Three changes in ServerDiscoveryProseTests.cs:62-77:
  1. Wait on the right predicate: spin on Any(s => Connected && ReplicaSetSecondary) instead of ClusterState.Connected. The old check was satisfied by the primary alone and never actually waited for a secondary.
  2. Longer timeout: 3s → 10s, matching the other SDAM spins in the codebase, reducing the slow-environment flakiness Copilot flagged.
  3. Richer diagnostic: include clusterDescription in the failure message, so when this branch does fire it points at the real culprit (likely ReplicaSetDataBearingMembers undercounting).
  - Kept Exception rather than SkipException per the iter 2 reviewer's intent — added an inline comment so a future reader doesn't try to re-skip it. The stale "this case should be skipped" line in my old commit message was just out-of-date; iter 2 explicitly reverted that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Non–user-facing code changes (tests, build scripts, etc.).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants