From 89e9753f98e7ba896a693e003c44b99bcda88c2c Mon Sep 17 00:00:00 2001 From: hongwei Date: Thu, 11 Jun 2026 15:25:40 +0200 Subject: [PATCH 1/4] test: add concurrency hazard test suite Adds a tagged ScalaTest suite that simulates 16 confirmed database concurrency hazards in OBP-API: lost-update, check-then-act, check-then-insert, unique-constraint-unhandled, and counter-sequence races across money movement, security, consent scheduling, view permissions, and OAuth user creation paths. Each scenario asserts the theoretically-correct outcome so that a hazard surfaces as a FAILED test (red bar = evidence the hazard is real). Two safeguard scenarios (connection pool back-pressure, per-request context isolation) are verified to PASS. All scenarios are tagged ConcurrencyRace and excluded from the CI main flow. See CONCURRENCY_HAZARDS.md for the full taxonomy, source locations, and three-tier protection analysis. Run only these tests: mvn -pl obp-commons,obp-api scalatest:test \ -DtagsToInclude=code.concurrency.ConcurrencyRace \ -DfailIfNoTests=false --- .../code/concurrency/CONCURRENCY_HAZARDS.md | 95 ++++++ .../ConcurrentConnectionMechanismTest.scala | 86 ++++++ .../ConcurrentConsentRaceTest.scala | 148 ++++++++++ .../ConcurrentDuplicateCreationTest.scala | 272 ++++++++++++++++++ .../ConcurrentProviderRaceTest.scala | 72 +++++ .../concurrency/ConcurrentRaceSetup.scala | 138 +++++++++ .../ConcurrentSecurityRaceTest.scala | 137 +++++++++ .../ConcurrentTransferRaceTest.scala | 219 ++++++++++++++ .../ConcurrentViewPermissionRaceTest.scala | 205 +++++++++++++ 9 files changed, 1372 insertions(+) create mode 100644 obp-api/src/test/scala/code/concurrency/CONCURRENCY_HAZARDS.md create mode 100644 obp-api/src/test/scala/code/concurrency/ConcurrentConnectionMechanismTest.scala create mode 100644 obp-api/src/test/scala/code/concurrency/ConcurrentConsentRaceTest.scala create mode 100644 obp-api/src/test/scala/code/concurrency/ConcurrentDuplicateCreationTest.scala create mode 100644 obp-api/src/test/scala/code/concurrency/ConcurrentProviderRaceTest.scala create mode 100644 obp-api/src/test/scala/code/concurrency/ConcurrentRaceSetup.scala create mode 100644 obp-api/src/test/scala/code/concurrency/ConcurrentSecurityRaceTest.scala create mode 100644 obp-api/src/test/scala/code/concurrency/ConcurrentTransferRaceTest.scala create mode 100644 obp-api/src/test/scala/code/concurrency/ConcurrentViewPermissionRaceTest.scala diff --git a/obp-api/src/test/scala/code/concurrency/CONCURRENCY_HAZARDS.md b/obp-api/src/test/scala/code/concurrency/CONCURRENCY_HAZARDS.md new file mode 100644 index 0000000000..3f8fec8d70 --- /dev/null +++ b/obp-api/src/test/scala/code/concurrency/CONCURRENCY_HAZARDS.md @@ -0,0 +1,95 @@ +# Concurrency Hazard Test Suite + +This package simulates the database concurrency hazards in OBP-API: one-HTTP-request / +one-DB-transaction atomicity, concurrent read/write, and write contention. Each scenario +asserts the **theoretically correct** outcome, so a hazard surfaces as a **FAILED test** — +a red bar (with its "expected vs actual" clue) is the evidence the hazard is real. + +The persistence layer is Lift Mapper over HikariCP. There is no `SELECT FOR UPDATE`, no +optimistic-locking version column, and no transaction guard around multi-step +read-modify-write sequences. `.save()`/`.saveMe()` is a blind UPDATE/INSERT by PK and does +not catch JDBC constraint-violation exceptions. + +All scenarios are tagged `ConcurrencyRace` and isolated from the CI main flow: + +```sh +# run only these: +mvn -pl obp-commons,obp-api scalatest:test -DtagsToInclude=code.concurrency.ConcurrencyRace -DfailIfNoTests=false +# exclude from CI: +mvn -pl obp-commons,obp-api scalatest:test -DtagsToExclude=code.concurrency.ConcurrencyRace +``` + +## Hazard taxonomy + +| Shape | Meaning | +|---|---| +| **lost-update** | read a mutable field, mutate in memory, `.save()` the row; concurrent callers read the same start value and one overwrites the other | +| **check-then-act** | read a status/flag, branch, perform a side effect, then write a new status; the check and the write are not atomic | +| **check-then-insert** | `find()`-then-`create()` with **no** unique index; concurrent callers all miss the find and all insert | +| **unique-constraint-unhandled** | `find()`-then-`create()` where a UniqueIndex **does** back the table, but the JDBC violation is not caught → uncaught 500 (or, when wrapped in `tryo`, a swallowed `Failure` the caller cannot use) | +| **counter-sequence** | increment a counter by read-then-write → lost increments | + +## Implemented scenarios (red bar = hazard confirmed) + +| ID | Hazard | Shape | Source | Test | +|---|---|---|---|---| +| A | Balance lost-update (`saveTransaction`) | lost-update | `LocalMappedConnectorInternal.scala:510` | `ConcurrentTransferRaceTest` | +| B | Transaction-request challenge double-spend | check-then-act | `Http4s400.answerChallengeNormal` | `ConcurrentTransferRaceTest` | +| C | Entitlement duplicate | check-then-insert | `MappedEntitlementsProvider.addEntitlement` | `ConcurrentDuplicateCreationTest` | +| D | `getOrCreateAccountHolder` duplicate | check-then-insert | `MapperAccountHolders` | `ConcurrentDuplicateCreationTest` | +| F | `getOrCreateMetadata` (graceful, UniqueIndex present) | unique-constraint-unhandled | `MappedCounterpartyMetadata` | `ConcurrentDuplicateCreationTest` | +| G1 | Pool back-pressure (safeguard — PASSES) | — | `RequestScopeConnection` + Hikari | `ConcurrentConnectionMechanismTest` | +| G2 | Per-request context isolation (safeguard — PASSES) | — | `RequestScopeConnection` | `ConcurrentConnectionMechanismTest` | +| H | Bad-login counter lost-update (lockout bypass) | lost-update | `LoginAttempt.incrementBadLoginAttempts` | `ConcurrentSecurityRaceTest` | +| I | OAuth user duplicate → uncaught 500 | unique-constraint-unhandled | `LiftUsers.getOrCreateUserByProviderId` | `ConcurrentDuplicateCreationTest` | +| J | Consent scheduler stale-save (expired task) resurrects revoked consent | lost-update | `ConsentScheduler.expiredBerlinGroupConsents:117` | `ConcurrentConsentRaceTest` | +| K | Challenge attempt-counter lost-update (brute-force bypass) | lost-update | `MappedChallengeProvider.validateChallenge:78` | `ConcurrentSecurityRaceTest` | +| L | UserCustomerLink duplicate → uncaught 500 | unique-constraint-unhandled | `MappedUserCustomerLinkProvider.getOCreateUserCustomerLink` | `ConcurrentDuplicateCreationTest` | +| N | `getOrCreateCustomPublicView` duplicate → uncaught 500 | unique-constraint-unhandled | `MapperViews.createAndSaveDefaultPublicCustomView:1054` | `ConcurrentViewPermissionRaceTest` | +| O | `resetViewPermissions` delete-then-insert → uncaught 500 | unique-constraint-unhandled | `ViewPermission.resetViewPermissions:137` | `ConcurrentViewPermissionRaceTest` | +| R | `removeCustomView` check-then-delete orphans a grant | check-then-act | `MapperViews.removeCustomView:502` | `ConcurrentViewPermissionRaceTest` | +| S | Historical-payment balance lost-update | lost-update | `LocalMappedConnector.saveHistoricalTransaction:2351` | `ConcurrentTransferRaceTest` | +| U | Consent scheduler stale-save (unfinished task) overwrites status | lost-update | `ConsentScheduler.unfinishedBerlinGroupConsents:77` | `ConcurrentConsentRaceTest` | +| W | `getOrCreateConsumer` duplicate → swallowed `Failure` (tryo) | unique-constraint-unhandled | `OAuth.getOrCreateConsumer:535` | `ConcurrentDuplicateCreationTest` | +| AA | `incrementFutureCounter` non-atomic CHM read-modify-write | counter-sequence | `APIUtil.incrementFutureCounter:4853` | `ConcurrentProviderRaceTest` | + +`E` (consent status race) was deferred earlier due to `consumer`/JWT setup complexity and is not part of this set. + +## Verified-real but not given a standalone test (and why) + +These were confirmed real by source audit but a standalone red-bar test would either duplicate +an already-covered root cause, be flaky, or require disproportionate setup. They are documented +here so coverage gaps are explicit, not silent. + +| ID | Hazard | Why no standalone test | +|---|---|---| +| M | `getOrCreateSystemView` duplicate | Same `saveMe`-without-`tryo` root cause as **N/O** (unique-constraint-unhandled). System views are pinned to a global whitelist by `ViewDefinition.beforeSave`/`isValidSystemViewId`, so an isolated test would have to delete a globally-shared system view and pollute other suites (forkMode=once). **N** exercises the identical unguarded path on an isolated custom view. | +| P | `factoryResetSystemView` concurrent reset | Drives `ViewPermission.resetViewPermissions`'s insert path — the exact code **O** already pins. | +| migrateViewPermissions | duplicate ViewPermission insert | Same `ViewPermission` insert-without-`tryo` root cause as **O**. | +| Q | `revokeAccess` vs `grant` check-then-act | Same `AccountAccess` check-then-act family as **R**; the revoke-vs-grant window is narrow, so a non-deterministic barrier test would be flaky (false-green). The check-then-act class is already proven by **R** (orphan) and **J/U** (stale-save). | +| T | `createTransactionRequestBulk` per-leg balance | The verdict's "deterministic intra-request self-race" is **unconfirmed**: `saveTransaction` writes `fromAccount.asInstanceOf[MappedBankAccount].accountBalance(newBalance)` back onto the passed object, so sequential legs see the updated balance, not a stale one. Whether `BulkPaymentHandler.executeAllItems` re-resolves the account per leg needs further investigation before asserting. The concurrent-reuse mechanism it shares with **S** is already proven; writing a possibly-false test here was rejected. | +| V | Berlin Group `usesSoFarTodayCounter` lost-increment | Same counter lost-update class as **H/K**. Needs a fully-signed recurring Berlin Group consent + TPP headers to reach the increment branch — disproportionate setup for a class already proven. | +| X | Consumer rate-limit `underConsumerLimits` check-then-INCR (TOCTOU) | Real and high-impact (limit bypass), but the active-limit lookup is cached for ~1 hour (`RateLimitTest` documents this) and the Redis TTL+GET+INCR timing makes an HTTP-layer test unreliable (flaky). Confirmed by source audit; a flaky test would undermine "red bar = reliable evidence." | +| Y | `AuthRateLimiter` cold-start SET-vs-INCR collision | Same rate-limit class as **X**; depends on Redis timing and runs in shadow mode by default (non-blocking). Same flakiness concern. | +| Z | `MappedAgentProvider.updateAgentStatus` | Re-audited as **not a hazard**: it sets both fields and `saveMe()`s the whole row; an H2 single-row UPDATE is atomic, so concurrent calls are normal last-writer-wins PUT semantics, not field tearing or data loss. A genuine lost-update would need multiple partial-update endpoints doing read-modify-write on the same row — no such code path exists. | + +## Refuted by audit (genuinely safe — documents what is NOT broken) + +| Symbol | Why safe | +|---|---| +| `createAccountIfNotExisting` (`LocalMappedConnectorInternal.scala:283`) | The whole `find()`-then-`create()` is wrapped in `tryo`; the `UniqueIndex(bank, theAccountId)` violation on the second concurrent insert is caught and converted to `Empty`/`Failure`, not an uncaught 500. The caller handles `Empty` gracefully. This is the correct pattern that **I/L/M/N/O** are missing. | + +## The three-tier protection picture + +| Tier | DB constraint? | App guard? | Scenarios | +|---|:---:|:---:|---| +| Silent data corruption | ✗ | ✗ | A, S, H, K, AA, J, U, C, D, R | +| Uncaught 500 / swallowed Failure | ✓ | ✗ | I, L, N, O, W | +| Gracefully handled | ✓ | ✓ (`tryo`) | F, `createAccountIfNotExisting` | +| Safeguard verified | — | ✓ | G1, G2 | + +The most dangerous tier is silent corruption: H and K turn a balance/counter lost-update into an +authentication **lockout / brute-force bypass**; J and U silently **resurrect a revoked consent** +(a PSD2 compliance breach). When any of these is fixed (atomic UPDATE, optimistic-lock version +column, unique constraint + conflict retry, or a conditional/guarded update), the corresponding +scenario flips from red to green automatically. diff --git a/obp-api/src/test/scala/code/concurrency/ConcurrentConnectionMechanismTest.scala b/obp-api/src/test/scala/code/concurrency/ConcurrentConnectionMechanismTest.scala new file mode 100644 index 0000000000..f6deb2cab1 --- /dev/null +++ b/obp-api/src/test/scala/code/concurrency/ConcurrentConnectionMechanismTest.scala @@ -0,0 +1,86 @@ +/** +Open Bank Project - API +Copyright (C) 2011-2019, TESOBE GmbH. + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU Affero General Public License as published by +the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Affero General Public License for more details. + +You should have received a copy of the GNU Affero General Public License +along with this program. If not, see . + +Email: contact@tesobe.com +TESOBE GmbH. +Osloer Strasse 16/17 +Berlin 13359, Germany + +This product includes software developed at +TESOBE (http://www.tesobe.com/) + + */ +package code.concurrency + +import code.api.util.APIUtil.OAuth._ + +import scala.concurrent.duration._ + +/** + * Verifies the request-scoped connection machinery (RequestScopeConnection + the Hikari pool) + * holds up under concurrency. Unlike the A–F business-write suites, these are expected to PASS: + * they confirm an already-implemented safeguard, so a red bar here signals a regression, not a + * newly-surfaced hazard. + * + * G1. Pool back-pressure — firing more concurrent requests than the pool size must queue and + * complete, never deadlock or surface a pool-exhaustion 500. + * G2. Per-request context isolation — under load, every request must read back its OWN + * authenticated context, exercising the childValue=null guard that stops a worker thread + * from inheriting another request's connection proxy (RequestScopeConnection.scala). + */ +class ConcurrentConnectionMechanismTest extends ConcurrentRaceSetup { + + feature("Request-scoped connection management under concurrency") { + + scenario("G1: concurrent requests exceeding the pool must all complete (queue, not deadlock)", ConcurrencyRace) { + Given("more concurrent authenticated requests than the hikari pool size (test pool = 20)") + val n = 30 + + When(s"$n GET /users/current are fired at once") + val responses = fireConcurrently(n, 120.seconds) { _ => + makeGetRequestAsync((v4_0_0_Request / "users" / "current").GET <@ user1) + } + + Then("all must complete with HTTP 200 — none time out or fail with pool exhaustion") + val byCode = responses.groupBy(_.code).map { case (k, v) => k -> v.size } + withClue(s"status distribution=$byCode (expected all 200) — ") { + responses.size should equal(n) + responses.foreach(r => r.code should equal(200)) + } + } + + scenario("G2: high concurrency must not bleed request context across connections", ConcurrencyRace) { + Given("many concurrent GET /users/current as user1") + val n = 20 + val expectedUserId = resourceUser1.userId + + When(s"$n requests read the current-user context concurrently") + val responses = fireConcurrently(n, 120.seconds) { _ => + makeGetRequestAsync((v4_0_0_Request / "users" / "current").GET <@ user1) + } + + Then("every response must be 200 and carry user1's own user_id (no stale/bled context)") + val bad = responses.filterNot { r => + r.code == 200 && (r.body \ "user_id").values.toString == expectedUserId + } + withClue(s"responses with wrong/missing user_id or non-200: " + + s"${bad.map(r => r.code -> (r.body \ "user_id").values.toString)} — ") { + bad shouldBe empty + } + } + } +} diff --git a/obp-api/src/test/scala/code/concurrency/ConcurrentConsentRaceTest.scala b/obp-api/src/test/scala/code/concurrency/ConcurrentConsentRaceTest.scala new file mode 100644 index 0000000000..8121e7c752 --- /dev/null +++ b/obp-api/src/test/scala/code/concurrency/ConcurrentConsentRaceTest.scala @@ -0,0 +1,148 @@ +/** +Open Bank Project - API +Copyright (C) 2011-2019, TESOBE GmbH. + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU Affero General Public License as published by +the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Affero General Public License for more details. + +You should have received a copy of the GNU Affero General Public License +along with this program. If not, see . + +Email: contact@tesobe.com +TESOBE GmbH. +Osloer Strasse 16/17 +Berlin 13359, Germany + +This product includes software developed at +TESOBE (http://www.tesobe.com/) + + */ +package code.concurrency + +import code.api.berlin.group.ConstantsBG +import code.consent.{ConsentStatus, MappedConsent} +import net.liftweb.mapper.By + +import java.util.{Date, UUID} + +/** + * Simulates a scheduler-vs-HTTP state-machine conflict on consent status. The test reproduces the + * race deterministically (load stale → revoke → stale save) rather than concurrently, because the + * hazard is structural: ConsentScheduler.expiredBerlinGroupConsents() calls .save on a detached + * in-memory object with no conditional-update guard, so ANY intervening HTTP write that changes + * the status between the scheduler's findAll and its .save will be silently overwritten. + * + * J. Scheduler stale-save resurrects a revoked consent — the scheduler reads consents with + * status='valid' into memory, then iterates. Between that query and the final .save, an HTTP + * REVOKE call flips the status to 'terminatedByTpp'. The scheduler's stale object still holds + * status='valid', so its .save overwrites 'terminatedByTpp' back to 'expired', silently + * resurrecting a consent that the user or TPP explicitly revoked. + * + * U. Same hazard in the UNFINISHED-consents task — ConsentScheduler.unfinishedBerlinGroupConsents + * reads consents with status='received', then later .save(status='rejected') on the stale + * in-memory object. A concurrent HTTP status change (e.g. the consent being authorised / + * revoked) committed in the window is overwritten back to 'rejected'. + * + * EXPECTED TO FAIL while the scheduler's save is unconditional. Tagged ConcurrencyRace. + */ +class ConcurrentConsentRaceTest extends ConcurrentRaceSetup { + + feature("Consent status finality under scheduler-vs-HTTP concurrent update") { + + scenario("J: a stale scheduler save must not overwrite a terminal consent status", ConcurrencyRace) { + Given("a Berlin Group consent with status=valid and validUntil in the past") + val consentId = UUID.randomUUID.toString + MappedConsent.create + .mConsentId(consentId) + .mStatus(ConsentStatus.valid.toString) + .mApiStandard(ConstantsBG.berlinGroupVersion1.apiStandard) + .mValidUntil(new Date(1000L)) + .saveMe() + + When("the scheduler loads the consent into memory (replicating expiredBerlinGroupConsents findAll)") + // The scheduler calls MappedConsent.findAll(...) and holds a list of in-memory objects. + // This staleConsent represents one such object loaded BEFORE the revoke below. + val staleConsent = MappedConsent.find(By(MappedConsent.mConsentId, consentId)) + .openOrThrowException("test consent must exist after creation") + + And("the HTTP revoke endpoint runs concurrently, flipping status to terminatedByTpp") + MappedConsent.find(By(MappedConsent.mConsentId, consentId)) + .foreach { c => + c.mStatus(ConsentStatus.terminatedByTpp.toString) + .mStatusUpdateDateTime(new Date()) + .saveMe() + } + val afterRevoke = MappedConsent.find(By(MappedConsent.mConsentId, consentId)) + .map(_.status).getOrElse("missing") + + And("the scheduler saves its stale copy — replicating the .save call inside expiredBerlinGroupConsents") + // This is the exact blind UPDATE that the scheduler performs: + // consent.mStatus("expired").mNote(...).mStatusUpdateDateTime(...).save + staleConsent + .mStatus(ConsentStatus.expired.toString) + .mStatusUpdateDateTime(new Date()) + .save + + Then("the final status must remain terminatedByTpp — the revoke must survive the stale save") + val finalStatus = MappedConsent.find(By(MappedConsent.mConsentId, consentId)) + .map(_.status).getOrElse("missing") + withClue( + s"afterRevoke=$afterRevoke finalStatus=$finalStatus: " + + s"ConsentScheduler.expiredBerlinGroupConsents calls .save on a stale in-memory MappedConsent " + + s"with no conditional-update guard (no WHERE status='valid'); the stale save overwrites any " + + s"concurrent status change and resurrects a consent the user explicitly revoked — " + ) { + finalStatus should equal(ConsentStatus.terminatedByTpp.toString) + } + } + + scenario("U: the unfinished-consents scheduler task must not overwrite a concurrent status change", ConcurrencyRace) { + Given("a Berlin Group consent with status=received (the unfinished-task selector)") + val consentId = UUID.randomUUID.toString + MappedConsent.create + .mConsentId(consentId) + .mStatus(ConsentStatus.received.toString) + .mApiStandard(ConstantsBG.berlinGroupVersion1.apiStandard) + .saveMe() + + When("the scheduler loads the consent into memory (replicating unfinishedBerlinGroupConsents findAll)") + val staleConsent = MappedConsent.find(By(MappedConsent.mConsentId, consentId)) + .openOrThrowException("test consent must exist after creation") + + And("the HTTP path concurrently flips status to REVOKED and commits it") + MappedConsent.find(By(MappedConsent.mConsentId, consentId)) + .foreach { c => + c.mStatus(ConsentStatus.REVOKED.toString) + .mStatusUpdateDateTime(new Date()) + .saveMe() + } + val afterChange = MappedConsent.find(By(MappedConsent.mConsentId, consentId)) + .map(_.status).getOrElse("missing") + + And("the scheduler saves its stale copy as rejected — replicating the .save in unfinishedBerlinGroupConsents") + staleConsent + .mStatus(ConsentStatus.rejected.toString) + .mStatusUpdateDateTime(new Date()) + .save + + Then("the final status must remain REVOKED — the committed change must survive the stale save") + val finalStatus = MappedConsent.find(By(MappedConsent.mConsentId, consentId)) + .map(_.status).getOrElse("missing") + withClue( + s"afterChange=$afterChange finalStatus=$finalStatus: " + + s"ConsentScheduler.unfinishedBerlinGroupConsents calls .save on a stale in-memory MappedConsent " + + s"with no conditional-update guard (no WHERE status='received'); the stale save clobbers the " + + s"concurrently-committed status — " + ) { + finalStatus should equal(ConsentStatus.REVOKED.toString) + } + } + } +} diff --git a/obp-api/src/test/scala/code/concurrency/ConcurrentDuplicateCreationTest.scala b/obp-api/src/test/scala/code/concurrency/ConcurrentDuplicateCreationTest.scala new file mode 100644 index 0000000000..4aca1019cf --- /dev/null +++ b/obp-api/src/test/scala/code/concurrency/ConcurrentDuplicateCreationTest.scala @@ -0,0 +1,272 @@ +/** +Open Bank Project - API +Copyright (C) 2011-2019, TESOBE GmbH. + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU Affero General Public License as published by +the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Affero General Public License for more details. + +You should have received a copy of the GNU Affero General Public License +along with this program. If not, see . + +Email: contact@tesobe.com +TESOBE GmbH. +Osloer Strasse 16/17 +Berlin 13359, Germany + +This product includes software developed at +TESOBE (http://www.tesobe.com/) + + */ +package code.concurrency + +import code.accountholders.{AccountHolders, MapperAccountHolders} +import code.api.util.APIUtil.OAuth._ +import code.api.util.ApiRole +import code.api.v2_0_0.CreateEntitlementJSON +import code.consumer.Consumers +import code.entitlement.Entitlement +import code.metadata.counterparties.{Counterparties, MappedCounterpartyMetadata} +import code.model.Consumer +import code.model.dataAccess.ResourceUser +import code.users.LiftUsers +import code.usercustomerlinks.{MappedUserCustomerLink, MappedUserCustomerLinkProvider} +import com.openbankproject.commons.model.{AccountId, BankIdAccountId} +import net.liftweb.json.Serialization.write +import net.liftweb.mapper.By + +import java.util.{Date, UUID} +import scala.util.Failure + +/** + * Simulates three check-then-insert races. Each asserts the correct row count, so each is + * EXPECTED TO FAIL while the race is unfixed — the "expected vs actual" clue is the evidence. + * + * C. Entitlement grant — `MappedEntitlements.addEntitlement` inserts unconditionally and the + * only unique index is on the per-row UUID, so concurrent identical grants all persist. + * D. Account holder — `getOrCreateAccountHolder` does find-then-create with no unique index on + * (user, bank, account), so concurrent callers all miss the find and all insert. + * F. Counterparty metadata — `getOrCreateMetadata` does check-then-insert, BUT a + * UniqueIndex(counterpartyId) backs the table; this verifies the second insert's constraint + * conflict is handled gracefully (no thrown 500) rather than testing for duplicate rows. + * + * C runs over HTTP (the request-per-transaction path the user asked about). D and F use a + * barrier-synchronised provider-layer fan-out because their contended code is a getOrCreate + * method, not an HTTP endpoint. + * + * I. OAuth user duplicate — LiftUsers.getOrCreateUserByProviderId does find-then-create with + * no surrounding transaction. ResourceUser has UniqueIndex(provider_, providerId), so the + * second concurrent create throws an uncaught JDBC constraint-violation exception rather + * than gracefully returning the existing user. Concurrent first-time OAuth logins → one + * request gets a 500 instead of the expected login response. + * + * L. UserCustomerLink duplicate — MappedUserCustomerLinkProvider.getOCreateUserCustomerLink + * does find-then-create with no surrounding transaction. MappedUserCustomerLink has + * UniqueIndex(mUserId, mCustomerId), so the second concurrent create throws an uncaught + * JDBC exception rather than returning the existing link. + * + * W. OAuth2 consumer duplicate — Consumers.getOrCreateConsumer does find-then-create with no + * surrounding transaction. Consumer has UniqueIndex(azp, sub); unlike I/L the create IS + * wrapped in tryo, so the second concurrent insert does not throw a 500 — instead the + * violation is swallowed into a Failure box and the caller gets no usable consumer (it + * cannot authenticate), which defeats the get-or-create contract just the same. + */ +class ConcurrentDuplicateCreationTest extends ConcurrentRaceSetup { + + feature("Concurrent check-then-insert must not create duplicate rows") { + + scenario("C: concurrent identical entitlement grants must create exactly one row", ConcurrencyRace) { + Given("user1 can grant entitlements at any bank, and a target user without the role") + Entitlement.entitlement.vend.addEntitlement("", resourceUser1.userId, ApiRole.canCreateEntitlementAtAnyBank.toString) + val targetUserId = resourceUser2.userId + val role = ApiRole.CanGetAnyUser.toString + val before = dbEntitlementCount("", targetUserId, role) + val n = 8 + val body = write(CreateEntitlementJSON(bank_id = "", role_name = role)) + + When(s"$n identical grant requests are fired concurrently") + val responses = fireConcurrently(n) { _ => + val req = (v2_0_0_Request / "users" / targetUserId / "entitlements").POST <@ user1 + makePostRequestAsync(req, body) + } + + Then("exactly one entitlement row must exist for (bank,user,role)") + val after = dbEntitlementCount("", targetUserId, role) + val created = after - before + withClue(s"response codes=${responses.map(_.code)} before=$before after=$after created=$created (expected 1) — ") { + created should equal(1L) + } + } + + scenario("D: concurrent getOrCreateAccountHolder for one (user,account) must create one row", ConcurrencyRace) { + Given("an account owned by user1, with user3 not yet a holder") + val bank = createBank("__conc-holder-bank") + val bankId = bank.bankId + val accountId = AccountId("__conc_holder_acc") + createAccountRelevantResource(Some(resourceUser1), bankId, accountId, "EUR") + val user = resourceUser3 + val biaId = BankIdAccountId(bankId, accountId) + + def holderCount: Long = MapperAccountHolders.count( + By(MapperAccountHolders.accountBankPermalink, bankId.value), + By(MapperAccountHolders.accountPermalink, accountId.value)) + + val before = holderCount + val n = 8 + + When(s"$n threads concurrently getOrCreateAccountHolder for the same (user3, account)") + val results = runConcurrentWithBarrier(n) { _ => + AccountHolders.accountHolders.vend.getOrCreateAccountHolder(user, biaId) + } + + Then("the holder count must grow by exactly one") + val after = holderCount + val created = after - before + withClue(s"results.success=${results.map(_.isSuccess)} before=$before after=$after created=$created (expected 1) — ") { + created should equal(1L) + } + } + + scenario("I: concurrent first-time OAuth logins must not throw a constraint violation", ConcurrencyRace) { + Given("a provider+id pair that has no ResourceUser yet") + val provider = "__conc_oauth_provider_i" + val idGivenByProvider = "__conc_oauth_id_i" + // Clean up from any prior run. + ResourceUser.findAll( + By(ResourceUser.provider_, provider), + By(ResourceUser.providerId, idGivenByProvider) + ).foreach(_.delete_!) + val n = 2 + + When(s"$n concurrent getOrCreateUserByProviderId calls race for the same (provider, id)") + val results = runConcurrentWithBarrier(n) { _ => + LiftUsers.getOrCreateUserByProviderId( + provider = provider, + idGivenByProvider = idGivenByProvider, + consentId = None, + name = Some("conc-oauth-test"), + email = Some("conc-oauth@test.invalid") + ) + } + + Then("no call must throw and exactly one ResourceUser row must exist (UniqueIndex present but exception uncaught)") + val failures = results.collect { case scala.util.Failure(e) => e.getClass.getSimpleName + ": " + e.getMessage } + val userCount = ResourceUser.count( + By(ResourceUser.provider_, provider), + By(ResourceUser.providerId, idGivenByProvider) + ) + withClue(s"failures=$failures userCount=$userCount (expected: no failures, 1 row) — ") { + failures shouldBe empty + userCount should equal(1L) + } + } + + scenario("L: concurrent getOCreateUserCustomerLink must not throw and must create exactly one link", ConcurrencyRace) { + Given("a user-customer pair with no existing link (MappedUserCustomerLink has UniqueIndex(mUserId, mCustomerId))") + val userId = resourceUser1.userId + val customerId = UUID.randomUUID.toString + + def linkCount: Long = MappedUserCustomerLink.count( + By(MappedUserCustomerLink.mUserId, userId), + By(MappedUserCustomerLink.mCustomerId, customerId) + ) + val before = linkCount + val n = 8 + + When(s"$n concurrent getOCreateUserCustomerLink calls race for the same (userId, customerId)") + val results = runConcurrentWithBarrier(n) { _ => + MappedUserCustomerLinkProvider.getOCreateUserCustomerLink(userId, customerId, new Date(), true) + } + + Then("no call may throw and exactly one link row must exist") + val after = linkCount + val created = after - before + val failures = results.collect { case scala.util.Failure(e) => e.getClass.getSimpleName + ": " + e.getMessage } + withClue(s"before=$before after=$after created=$created failures=$failures (expected: 1 row, no throws) — ") { + failures shouldBe empty + created should equal(1L) + } + } + + scenario("F: concurrent getOrCreateMetadata must stay graceful and leave exactly one row", ConcurrencyRace) { + Given("a counterparty whose metadata row does not exist yet (UniqueIndex(counterpartyId) backs the table)") + val bank = createBank("__conc-cp-bank") + val bankId = bank.bankId + val accountId = AccountId("__conc_cp_acc") + createAccountRelevantResource(Some(resourceUser1), bankId, accountId, "EUR") + val cp = createCounterparty(bankId.value, accountId.value, java.util.UUID.randomUUID.toString, true, resourceUser1.userId) + val counterpartyId = cp.counterpartyId + + def metaCount: Long = MappedCounterpartyMetadata.count( + By(MappedCounterpartyMetadata.counterpartyId, counterpartyId)) + + val before = metaCount + val n = 8 + + When(s"$n threads concurrently getOrCreateMetadata for the same counterparty") + val results = runConcurrentWithBarrier(n) { _ => + Counterparties.counterparties.vend.getOrCreateMetadata(bankId, accountId, counterpartyId, "__conc_cp_name") + } + + Then("no call may throw, and exactly one metadata row must exist (constraint conflict handled gracefully)") + val after = metaCount + val thrown = results.collect { case Failure(e) => s"${e.getClass.getSimpleName}:${e.getMessage}" } + withClue(s"before=$before after=$after thrown=$thrown (expected after=1, no throws) — ") { + after should equal(1L) + thrown shouldBe empty + } + } + + scenario("W: concurrent getOrCreateConsumer for one (azp,sub) must resolve to the existing row, not a swallowed Failure", ConcurrencyRace) { + Given("no consumer with this (azp, sub) yet (Consumer has UniqueIndex(azp, sub))") + val azp = "__conc_w_azp_" + UUID.randomUUID.toString.take(8) + val sub = "__conc_w_sub_" + UUID.randomUUID.toString.take(8) + + def consumerCount: Long = Consumer.count(By(Consumer.azp, azp), By(Consumer.sub, sub)) + val n = 2 + + When(s"$n threads concurrently getOrCreateConsumer for the same (azp, sub)") + val results = runConcurrentWithBarrier(n) { i => + Consumers.consumers.vend.getOrCreateConsumer( + consumerId = None, + key = None, + secret = None, + aud = Some("__conc_w_aud"), + azp = Some(azp), + iss = Some("__conc_w_iss_" + i), // distinct iss; UniqueIndex is on (azp, sub) only + sub = Some(sub), + isActive = Some(true), + name = Some("conc-w-consumer"), + appType = None, + description = Some("conc-w"), + developerEmail = Some("conc-w@test.invalid"), + redirectURL = None, + createdByUserId = Some(resourceUser1.userId) + ) + } + + Then("every caller must receive a usable Full(consumer); exactly one row must exist") + // getOrCreateConsumer wraps its saveMe in tryo, so the second concurrent insert does not throw — + // it is swallowed into a Failure box. The caller then holds no usable consumer (cannot authenticate), + // which defeats the get-or-create contract just as surely as a 500 would. + val thrown = results.collect { case Failure(e) => e.getClass.getSimpleName + ": " + e.getMessage.take(120) } + val emptyBoxes = results.collect { case scala.util.Success(box) if box.isEmpty => box.toString.take(120) } + val count = consumerCount + withClue( + s"thrown=$thrown emptyBoxes=$emptyBoxes count=$count (expected: no throws, no empty boxes, 1 row) — " + + s"the second concurrent create hits UniqueIndex(azp,sub); tryo swallows it into a Failure box " + + s"instead of re-fetching and returning the existing consumer — " + ) { + thrown shouldBe empty + emptyBoxes shouldBe empty + count should equal(1L) + } + } + } +} diff --git a/obp-api/src/test/scala/code/concurrency/ConcurrentProviderRaceTest.scala b/obp-api/src/test/scala/code/concurrency/ConcurrentProviderRaceTest.scala new file mode 100644 index 0000000000..46efb76593 --- /dev/null +++ b/obp-api/src/test/scala/code/concurrency/ConcurrentProviderRaceTest.scala @@ -0,0 +1,72 @@ +/** +Open Bank Project - API +Copyright (C) 2011-2019, TESOBE GmbH. + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU Affero General Public License as published by +the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Affero General Public License for more details. + +You should have received a copy of the GNU Affero General Public License +along with this program. If not, see . + +Email: contact@tesobe.com +TESOBE GmbH. +Osloer Strasse 16/17 +Berlin 13359, Germany + +This product includes software developed at +TESOBE (http://www.tesobe.com/) + + */ +package code.concurrency + +import code.api.util.APIUtil + +/** + * Provider-/util-layer counter races that do not touch the DB. + * + * AA. In-memory future counter lost-update — APIUtil.incrementFutureCounter reads the + * (calls, openFutures) tuple from a ConcurrentHashMap via getOrDefault, then put()s + * back tuple+1. getOrDefault and put are two separate CHM operations with no atomic + * compute/merge, so N concurrent increments each read the same starting tuple and + * overwrite each other — fewer increments land than calls made. This counter only + * drives open-futures back-off logging (no banking impact), so it is included for + * completeness; the same read-modify-write shape on a DB counter is what makes H/K + * dangerous. + * + * Asserts the correct count, so EXPECTED TO FAIL while the CHM access is non-atomic. + * Tagged ConcurrencyRace. + */ +class ConcurrentProviderRaceTest extends ConcurrentRaceSetup { + + feature("In-memory counter atomicity under concurrency") { + + scenario("AA: N concurrent incrementFutureCounter calls must each land", ConcurrencyRace) { + Given("a fresh service-counter key") + val serviceName = "__conc_future_counter_aa" + APIUtil.serviceNameCountersMap.remove(serviceName) + val n = 8 + + When(s"$n concurrent incrementFutureCounter calls hit the same key") + runConcurrentWithBarrier(n) { _ => + APIUtil.incrementFutureCounter(serviceName) + } + + Then("the call counter must equal N — every increment must land, no lost-updates") + val (callCounter, _) = APIUtil.serviceNameCountersMap.getOrDefault(serviceName, (0, 0)) + withClue( + s"callCounter=$callCounter (expected=$n): getOrDefault + put in incrementFutureCounter is a " + + s"non-atomic read-modify-write on a ConcurrentHashMap; concurrent callers read the same tuple " + + s"and overwrite each other — " + ) { + callCounter should equal(n) + } + } + } +} diff --git a/obp-api/src/test/scala/code/concurrency/ConcurrentRaceSetup.scala b/obp-api/src/test/scala/code/concurrency/ConcurrentRaceSetup.scala new file mode 100644 index 0000000000..fd18544643 --- /dev/null +++ b/obp-api/src/test/scala/code/concurrency/ConcurrentRaceSetup.scala @@ -0,0 +1,138 @@ +/** +Open Bank Project - API +Copyright (C) 2011-2019, TESOBE GmbH. + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU Affero General Public License as published by +the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Affero General Public License for more details. + +You should have received a copy of the GNU Affero General Public License +along with this program. If not, see . + +Email: contact@tesobe.com +TESOBE GmbH. +Osloer Strasse 16/17 +Berlin 13359, Germany + +This product includes software developed at +TESOBE (http://www.tesobe.com/) + + */ +package code.concurrency + +import code.entitlement.MappedEntitlement +import code.model.dataAccess.MappedBankAccount +import code.setup.{APIResponse, DefaultUsers, ServerSetupWithTestData} +import com.openbankproject.commons.model.{AccountId, BankId} +import dispatch.Req +import net.liftweb.mapper.By +import org.scalatest.Tag + +import java.util.concurrent.{CyclicBarrier, Executors, TimeUnit} +import scala.concurrent.duration._ +import scala.concurrent.{Await, Future} +import scala.util.Try + +/** + * Tag for the concurrency-race simulations. + * + * The business-write suites (A balance, B state-machine, C/D/F duplicate creation) assert + * the THEORETICALLY-CORRECT outcome, so while the underlying read-modify-write / check-then-insert + * races remain unfixed they are EXPECTED TO FAIL — the red bar, with its "expected vs actual" clue, + * is the evidence the hazard is real. The connection-mechanism suites (G) instead VERIFY that an + * already-implemented safeguard holds, so they are expected to pass; a red bar there is a regression. + * + * Either way these must be isolated from the CI main flow: + * run only these: mvn ... scalatest:test -DtagsToInclude=code.concurrency.ConcurrencyRace -DfailIfNoTests=false + * exclude from CI: mvn ... scalatest:test -DtagsToExclude=code.concurrency.ConcurrencyRace + */ +object ConcurrencyRace extends Tag("code.concurrency.ConcurrencyRace") + +/** + * Shared helpers for the concurrent-race suites: two fan-out primitives (HTTP and + * provider-layer) plus direct-DB state assertions that bypass any read cache. + * + * Pitfalls these suites must respect (see the plan file for the full list): + * - The test DB is H2 in-memory. Application-level read-modify-write / check-then-insert races + * do NOT depend on DB isolation and CAN reproduce on H2, but H2's table locks may serialise + * some writes and lower the hit rate — assertions print the observed values so a red bar is + * self-documenting; raise N or use runConcurrentWithBarrier if a run comes back spuriously green. + * - The whole JVM shares one server, one H2 DB and one Hikari pool (forkMode=once). Use dedicated + * bank/account/user ids and keep the concurrency count modest (≤ ~30) so the pool is not + * exhausted for sibling suites. + * - Concurrent use of the shared dispatch HttpClient can briefly corrupt a pooled connection + * ("invalid version format"); SendServerRequests already retries once. + */ +trait ConcurrentRaceSetup extends ServerSetupWithTestData with DefaultUsers { + + // Future.sequence below only schedules the join; each async request helper in + // SendServerRequests carries its own ExecutionContext for the actual HTTP I/O. + private implicit val raceEc: scala.concurrent.ExecutionContext = + scala.concurrent.ExecutionContext.Implicits.global + + def v4_0_0_Request: Req = baseRequest / "obp" / "v4.0.0" + def v3_0_0_Request: Req = baseRequest / "obp" / "v3.0.0" + def v2_0_0_Request: Req = baseRequest / "obp" / "v2.0.0" + + /** System owner view — present on every test account, carries all read permissions. */ + val SystemOwnerViewId = "owner" + + /** + * Build `n` requests with `mk` and run them concurrently over the shared HTTP client, + * awaiting all results. `mk` is invoked once per index, so each request is constructed + * and (when the caller applies `<@`) OAuth-signed independently — a distinct nonce per + * request. This is a real parallel fan-out, not one signed request replayed n times + * (which the server's nonce check would reject). + */ + def fireConcurrently[T](n: Int, timeout: FiniteDuration = 90.seconds)(mk: Int => Future[T]): List[T] = + Await.result(Future.sequence((0 until n).map(mk)), timeout).toList + + /** + * Run `task` on `n` dedicated threads that all wait at a barrier before entering the + * critical section together, so concurrent check-then-act windows actually overlap + * (H2's table locks otherwise tend to serialise un-barriered writes and hide the race). + * Each invocation's result is wrapped in a Try, so a constraint violation or thrown + * exception is observable rather than aborting the whole fan-out. + * + * Used for provider-layer races whose contended code is a getOrCreate method rather + * than an HTTP endpoint (account holders, counterparty metadata). + */ + def runConcurrentWithBarrier[T](n: Int, timeout: FiniteDuration = 60.seconds)(task: Int => T): List[Try[T]] = { + val pool = Executors.newFixedThreadPool(n) + val taskEc = scala.concurrent.ExecutionContext.fromExecutorService(pool) + val barrier = new CyclicBarrier(n) + try { + val futs = (0 until n).map { i => + Future { + barrier.await(timeout.toMillis, TimeUnit.MILLISECONDS) + Try(task(i)) + }(taskEc) + } + Await.result(Future.sequence(futs), timeout).toList + } finally { + pool.shutdownNow() + () + } + } + + /** Balance persisted on the account row, read straight from the DB (no cache, no HTTP). */ + def dbAccountBalance(bankId: BankId, accountId: AccountId): Long = + MappedBankAccount + .find(By(MappedBankAccount.bank, bankId.value), By(MappedBankAccount.theAccountId, accountId.value)) + .map(_.accountBalance.get) + .getOrElse(fail(s"account row not found: ${bankId.value}/${accountId.value}")) + + /** Number of entitlement rows for one (bank,user,role) triple, straight from the DB. */ + def dbEntitlementCount(bankId: String, userId: String, roleName: String): Long = + MappedEntitlement.count( + By(MappedEntitlement.mBankId, bankId), + By(MappedEntitlement.mUserId, userId), + By(MappedEntitlement.mRoleName, roleName) + ) +} diff --git a/obp-api/src/test/scala/code/concurrency/ConcurrentSecurityRaceTest.scala b/obp-api/src/test/scala/code/concurrency/ConcurrentSecurityRaceTest.scala new file mode 100644 index 0000000000..4dd2773e0d --- /dev/null +++ b/obp-api/src/test/scala/code/concurrency/ConcurrentSecurityRaceTest.scala @@ -0,0 +1,137 @@ +/** +Open Bank Project - API +Copyright (C) 2011-2019, TESOBE GmbH. + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU Affero General Public License as published by +the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Affero General Public License for more details. + +You should have received a copy of the GNU Affero General Public License +along with this program. If not, see . + +Email: contact@tesobe.com +TESOBE GmbH. +Osloer Strasse 16/17 +Berlin 13359, Germany + +This product includes software developed at +TESOBE (http://www.tesobe.com/) + + */ +package code.concurrency + +import code.loginattempts.{LoginAttempt, MappedBadLoginAttempt} +import code.transactionChallenge.{MappedChallengeProvider, MappedExpectedChallengeAnswer} +import net.liftweb.mapper.By +import org.mindrot.jbcrypt.BCrypt + +import java.util.{Date, UUID} + +/** + * Simulates two authentication-layer concurrent counter races. Both assert the correct + * (theoretically sound) outcome, so they are EXPECTED TO FAIL while the races are unfixed — + * the "expected vs actual" clue is the evidence. Tagged ConcurrencyRace. + * + * H. Bad-login attempt counter lost-update — LoginAttempt.incrementBadLoginAttempts reads the + * counter, increments in memory, then writes back with no lock or version column. N concurrent + * bad-login attempts each observe the same starting value and overwrite each other, so fewer + * increments land than attempts counted. Under a 5-attempt lockout threshold an attacker can + * send far more than 5 guesses before the lockout triggers. + * + * K. Challenge attempt-counter lost-update — MappedChallengeProvider.validateChallenge reads + * attemptCounter, writes counter+1, then compares counter < allowedAttempts. Read and write are + * non-atomic, so N concurrent wrong answers each observe counter=0, each save counter=1, and + * each pass the gate. The counter never reaches the allowed-attempts threshold, enabling + * unlimited concurrent brute-force guesses without burning the attempt budget. + */ +class ConcurrentSecurityRaceTest extends ConcurrentRaceSetup { + + feature("Authentication counter atomicity under concurrency") { + + scenario("H: N concurrent bad-login increments must each land — no lockout bypass", ConcurrencyRace) { + Given("a bad-login record pre-seeded at zero attempts for a dedicated test credential") + val provider = "__conc_sec_provider_h" + val username = "__conc_sec_user_h" + // Clean up from any prior run (shared JVM, forkMode=once). + MappedBadLoginAttempt.findAll( + By(MappedBadLoginAttempt.Provider, provider), + By(MappedBadLoginAttempt.mUsername, username) + ).foreach(_.delete_!) + MappedBadLoginAttempt.create + .mUsername(username) + .Provider(provider) + .mBadAttemptsSinceLastSuccessOrReset(0) + .mLastFailureDate(new Date()) + .saveMe() + val n = 8 + + When(s"$n bad-login increments are fired concurrently for the same credential") + runConcurrentWithBarrier(n) { _ => + LoginAttempt.incrementBadLoginAttempts(provider, username) + } + + Then("the counter must equal N — every increment must land, no lost-updates") + val finalCounter = MappedBadLoginAttempt.find( + By(MappedBadLoginAttempt.Provider, provider), + By(MappedBadLoginAttempt.mUsername, username) + ).map(_.badAttemptsSinceLastSuccessOrReset).getOrElse(0) + withClue( + s"finalCounter=$finalCounter (expected=$n): each of $n concurrent bad-login attempts must " + + s"be counted — if fewer land, an attacker can bypass the lockout threshold by sending " + + s"concurrent requests — " + ) { + finalCounter should equal(n) + } + } + + scenario("K: N concurrent wrong challenge answers must each consume one attempt — no brute-force bypass", ConcurrencyRace) { + Given("a challenge seeded directly via MappedChallengeProvider with a known expected answer") + // Raise the attempt limit so the limit-guard never fires early and interferes with the counter test. + setPropsValues("transactionRequests_challenge_max_allowed_attempts" -> "100") + val challengeId = UUID.randomUUID.toString + val salt = BCrypt.gensalt() + MappedChallengeProvider.saveChallenge( + challengeId = challengeId, + transactionRequestId = UUID.randomUUID.toString, + salt = salt, + expectedAnswer = BCrypt.hashpw("123", salt).substring(0, 44), + expectedUserId = resourceUser1.userId, + scaMethod = None, + scaStatus = None, + consentId = None, + basketId = None, + authenticationMethodId = None, + challengeType = "OBP_TRANSACTION_REQUEST_CHALLENGE" + ) + val n = 8 + + When(s"$n concurrent wrong-answer validate calls hit the same challenge") + runConcurrentWithBarrier(n) { _ => + MappedChallengeProvider.validateChallenge( + challengeId = challengeId, + challengeAnswer = "definitelyWrongAnswer", + userId = Some(resourceUser1.userId) + ) + } + + Then("the attempt counter must equal N — each wrong answer must consume exactly one attempt") + val finalCounter = MappedExpectedChallengeAnswer + .find(By(MappedExpectedChallengeAnswer.ChallengeId, challengeId)) + .map(_.AttemptCounter.get) + .getOrElse(-1) + withClue( + s"finalCounter=$finalCounter (expected=$n): each of $n concurrent wrong-answer attempts must " + + s"be counted — if fewer land, an attacker can submit unlimited concurrent guesses without " + + s"exhausting the allowed-attempt budget — " + ) { + finalCounter should equal(n) + } + } + } +} diff --git a/obp-api/src/test/scala/code/concurrency/ConcurrentTransferRaceTest.scala b/obp-api/src/test/scala/code/concurrency/ConcurrentTransferRaceTest.scala new file mode 100644 index 0000000000..37da745df3 --- /dev/null +++ b/obp-api/src/test/scala/code/concurrency/ConcurrentTransferRaceTest.scala @@ -0,0 +1,219 @@ +/** +Open Bank Project - API +Copyright (C) 2011-2019, TESOBE GmbH. + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU Affero General Public License as published by +the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Affero General Public License for more details. + +You should have received a copy of the GNU Affero General Public License +along with this program. If not, see . + +Email: contact@tesobe.com +TESOBE GmbH. +Osloer Strasse 16/17 +Berlin 13359, Germany + +This product includes software developed at +TESOBE (http://www.tesobe.com/) + + */ +package code.concurrency + +import code.api.util.APIUtil.OAuth._ +import code.api.v1_4_0.JSONFactory1_4_0.TransactionRequestAccountJsonV140 +import code.api.v2_0_0.TransactionRequestBodyJsonV200 +import code.api.v4_0_0.ChallengeAnswerJson400 +import code.bankconnectors.Connector +import code.model.BankAccountX +import code.transaction.MappedTransaction +import com.openbankproject.commons.model.{AccountId, AmountOfMoneyJsonV121} +import com.openbankproject.commons.model.enums.TransactionRequestStatus +import net.liftweb.json.Serialization.write +import net.liftweb.mapper.By + +import java.util.Date +import scala.concurrent.Await +import scala.concurrent.ExecutionContext.Implicits.global +import scala.concurrent.duration._ + +/** + * Simulates the two highest-impact money-movement races: + * + * A. Lost balance update — `LocalMappedConnectorInternal.saveTransaction` reads the + * balance, adds the amount in memory, then writes the whole row back with no lock + * and no version column. N concurrent transfers can each read the same starting + * balance and overwrite one another, so fewer debits land than transfers completed. + * + * B. Transaction-request state-machine double-spend — the answer-challenge handler + * checks `status == "INITIATED"`, then runs the payment, then flips the status. + * The check and the flip are not atomic, so two concurrent answers to the SAME + * request can both pass the gate and both execute the payment. + * + * S. Historical-payment balance lost-update — `LocalMappedConnector.saveHistoricalTransaction` + * is a second, independent read-modify-write path on the account balance (sibling of A): + * it reads `fromAccount.balance`, adds the amount, and `.save()`s the row with no lock. + * N concurrent `makeHistoricalPayment` calls reusing one fromAccount snapshot each read the + * same starting balance and overwrite one another. + * + * All assert the correct outcome, so they are EXPECTED TO FAIL while the races are + * unfixed — the "expected vs actual" clue is the evidence. Tagged ConcurrencyRace. + */ +class ConcurrentTransferRaceTest extends ConcurrentRaceSetup { + + feature("Concurrent money movement on a single account (transaction-level isolation)") { + + scenario("A: N concurrent transfers from one account must not lose balance updates", ConcurrencyRace) { + Given("a funded source account and a payee, with SANDBOX_TAN challenge disabled so each transfer is one-step") + // High threshold → amounts below it skip the challenge and complete in a single request. + setPropsValues("transactionRequests_challenge_threshold_SANDBOX_TAN" -> "100000000") + val bank = createBank("__conc-transfer-bank-a") + val bankId = bank.bankId + val fromId = AccountId("__conc_a_from") + val toId = AccountId("__conc_a_to") + createAccountRelevantResource(Some(resourceUser1), bankId, fromId, "EUR") + createAccountRelevantResource(Some(resourceUser1), bankId, toId, "EUR") + + val before = dbAccountBalance(bankId, fromId) + val n = 10 + val amountStr = "1.00" // 1.00 EUR = 100 in smallest currency units + val debitPerTransfer = 100L + + val toAccountJson = TransactionRequestAccountJsonV140(bankId.value, toId.value) + val body = write(TransactionRequestBodyJsonV200( + toAccountJson, AmountOfMoneyJsonV121("EUR", amountStr), "concurrency-A")) + + When(s"$n SANDBOX_TAN transfers are fired concurrently from the same account") + val responses = fireConcurrently(n) { _ => + val req = (v4_0_0_Request / "banks" / bankId.value / "accounts" / fromId.value / + SystemOwnerViewId / "transaction-request-types" / "SANDBOX_TAN" / "transaction-requests").POST <@ user1 + makePostRequestAsync(req, body) + } + + Then("the account must be debited exactly once for every transfer that reported COMPLETED") + val completed = responses.count { r => + r.code == 201 && (r.body \ "status").values.toString == TransactionRequestStatus.COMPLETED.toString + } + val after = dbAccountBalance(bankId, fromId) + val actualDebited = before - after + val expectedDebited = completed * debitPerTransfer + val lostUpdates = if (debitPerTransfer == 0) 0 else (expectedDebited - actualDebited) / debitPerTransfer + withClue(s"completed=$completed before=$before after=$after " + + s"actualDebited=$actualDebited expectedDebited=$expectedDebited lostUpdates=$lostUpdates — ") { + actualDebited should equal(expectedDebited) + } + } + + scenario("B: concurrent answers to one challenge must execute the payment only once", ConcurrencyRace) { + Given("a transaction request left in INITIATED state, with SANDBOX_TAN challenge forced on") + // Zero threshold → every amount requires a challenge, leaving the request INITIATED. + // DUMMY transport → the challenge is stored as hash("123"), so the fixed answer works + // without sending a real OTP (same pattern used by ACCOUNT/SEPA tests in test.default.props). + setPropsValues( + "transactionRequests_challenge_threshold_SANDBOX_TAN" -> "0", + "SANDBOX_TAN_OTP_INSTRUCTION_TRANSPORT" -> "DUMMY" + ) + val bank = createBank("__conc-transfer-bank-b") + val bankId = bank.bankId + val fromId = AccountId("__conc_b_from") + val toId = AccountId("__conc_b_to") + createAccountRelevantResource(Some(resourceUser1), bankId, fromId, "EUR") + createAccountRelevantResource(Some(resourceUser1), bankId, toId, "EUR") + + val before = dbAccountBalance(bankId, fromId) + val amountStr = "10.00" // 10.00 EUR = 1000 in smallest currency units + val debit = 1000L + + val toAccountJson = TransactionRequestAccountJsonV140(bankId.value, toId.value) + val createBody = write(TransactionRequestBodyJsonV200( + toAccountJson, AmountOfMoneyJsonV121("EUR", amountStr), "concurrency-B")) + + val createReq = (v4_0_0_Request / "banks" / bankId.value / "accounts" / fromId.value / + SystemOwnerViewId / "transaction-request-types" / "SANDBOX_TAN" / "transaction-requests").POST <@ user1 + val createResp = makePostRequest(createReq, createBody) + withClue(s"the create transaction-request must be INITIATED: code=${createResp.code} body=${createResp.body} — ") { + createResp.code should equal(201) + (createResp.body \ "status").values.toString should equal(TransactionRequestStatus.INITIATED.toString) + } + val transRequestId = (createResp.body \ "id").values.toString + // `challenges` is a JArray; pluck the first element's id rather than letting + // `\ "id"` map over the array (which would stringify to "List(...)"). + val challengeId = (createResp.body \ "challenges") match { + case net.liftweb.json.JArray(h :: _) => (h \ "id").values.toString + case other => (other \ "id").values.toString + } + + When("the same challenge is answered concurrently N times") + val n = 8 + val answerBody = write(ChallengeAnswerJson400(id = challengeId, answer = "123")) + val answers = fireConcurrently(n) { _ => + val req = (v4_0_0_Request / "banks" / bankId.value / "accounts" / fromId.value / + SystemOwnerViewId / "transaction-request-types" / "SANDBOX_TAN" / "transaction-requests" / + transRequestId / "challenge").POST <@ user1 + makePostRequestAsync(req, answerBody) + } + + Then("the payment must execute exactly once — no double-spend") + val after = dbAccountBalance(bankId, fromId) + val actualDebited = before - after + val txnCount = MappedTransaction.count( + By(MappedTransaction.bank, bankId.value), By(MappedTransaction.account, fromId.value)) + withClue(s"challengeId=[$challengeId] answer codes=${answers.map(_.code)} " + + s"firstAnswerBody=${answers.headOption.map(_.body).getOrElse("")} " + + s"before=$before after=$after actualDebited=$actualDebited (expected=$debit) " + + s"mappedTxnCount=$txnCount (expected=1) — ") { + actualDebited should equal(debit) + txnCount should equal(1L) + } + } + + scenario("S: N concurrent makeHistoricalPayment calls must not lose balance updates", ConcurrencyRace) { + Given("a funded source account and a payee, with one shared fromAccount snapshot") + val bank = createBank("__conc-hist-bank-s") + val bankId = bank.bankId + val fromId = AccountId("__conc_s_from") + val toId = AccountId("__conc_s_to") + createAccountRelevantResource(Some(resourceUser1), bankId, fromId, "EUR") + createAccountRelevantResource(Some(resourceUser1), bankId, toId, "EUR") + + // makeHistoricalPayment takes BankAccount objects directly — the same snapshot is reused by + // every concurrent call, which is exactly how saveHistoricalTransaction reads a stale balance. + val fromAccount = BankAccountX(bankId, fromId).getOrElse(fail("couldn't get from account")) + val toAccount = BankAccountX(bankId, toId).getOrElse(fail("couldn't get to account")) + + val before = dbAccountBalance(bankId, fromId) + val n = 8 + val amount = BigDecimal("1.00") // 1.00 EUR = 100 in smallest currency units + val debitPerCall = 100L + + When(s"$n historical payments are fired concurrently from the same account") + val results = runConcurrentWithBarrier(n) { i => + Await.result( + Connector.connector.vend.makeHistoricalPayment( + fromAccount, toAccount, new Date(), new Date(), + amount, "EUR", s"concurrency-S-$i", "SANDBOX_TAN", "SHARED", None + ).map(_._1), + 30.seconds + ) + } + + Then("the account must be debited once per successful payment") + val succeeded = results.count(_.map(_.isDefined).getOrElse(false)) + val after = dbAccountBalance(bankId, fromId) + val actualDebited = before - after + val expectedDebited = succeeded * debitPerCall + val lostUpdates = if (debitPerCall == 0) 0 else (expectedDebited - actualDebited) / debitPerCall + withClue(s"succeeded=$succeeded before=$before after=$after " + + s"actualDebited=$actualDebited expectedDebited=$expectedDebited lostUpdates=$lostUpdates — " + + s"saveHistoricalTransaction reads fromAccount.balance and .save()s with no lock — ") { + actualDebited should equal(expectedDebited) + } + } + } +} diff --git a/obp-api/src/test/scala/code/concurrency/ConcurrentViewPermissionRaceTest.scala b/obp-api/src/test/scala/code/concurrency/ConcurrentViewPermissionRaceTest.scala new file mode 100644 index 0000000000..8c8a978d6f --- /dev/null +++ b/obp-api/src/test/scala/code/concurrency/ConcurrentViewPermissionRaceTest.scala @@ -0,0 +1,205 @@ +/** +Open Bank Project - API +Copyright (C) 2011-2019, TESOBE GmbH. + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU Affero General Public License as published by +the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Affero General Public License for more details. + +You should have received a copy of the GNU Affero General Public License +along with this program. If not, see . + +Email: contact@tesobe.com +TESOBE GmbH. +Osloer Strasse 16/17 +Berlin 13359, Germany + +This product includes software developed at +TESOBE (http://www.tesobe.com/) + + */ +package code.concurrency + +import code.api.Constant.ALL_CONSUMERS +import code.views.Views +import code.views.system.{AccountAccess, ViewDefinition, ViewPermission} +import com.openbankproject.commons.model.{AccountId, BankId, ViewId} +import net.liftweb.mapper.By + +import java.util.UUID + +/** + * Simulates the view-permission check-then-insert / delete-then-insert races. ViewPermission + * carries UniqueIndex(bank_id, account_id, view_id, permission) but the insert paths call .save() + * with NO tryo/try wrapper, so a concurrent duplicate insert throws an uncaught JDBC constraint + * violation (a 500 at the HTTP layer) rather than resolving gracefully. + * + * N. getOrCreateCustomPublicView check-then-insert — Views.getOrCreateCustomPublicView does + * find-then-create with no surrounding transaction; createAndSaveDefaultPublicCustomView calls + * .saveMe with no tryo. ViewDefinition's UniqueIndex(composite_unique_key) backs the natural key, + * so the second concurrent create throws an uncaught JDBC violation. (This is the same root cause + * as getOrCreateSystemView, which cannot be tested in isolation because system views are pinned + * to a global whitelist via ViewDefinition.beforeSave/isValidSystemViewId — a custom public view + * exercises the identical unguarded saveMe path on an isolated (bank,account) key.) + * + * O. resetViewPermissions delete-then-insert — ViewPermission.resetViewPermissions deletes the + * view's permissions, then for each permission name does find-then-delete-then-insert (.save, + * no tryo). Two concurrent resets for the same view both clear the set, then both insert the + * same (bank,account,view,permission) tuple → the second INSERT violates the unique index, + * uncaught. + * + * R. removeCustomView check-then-delete orphan — removeCustomView checks that no AccountAccess + * references the view, then deletes the view. The two steps are not atomic and there is no + * transaction, so a grant committing an AccountAccess in the window leaves a row pointing at a + * now-deleted view. This deterministically replays that window (the structural hazard). + * + * Asserts the correct (graceful, exactly-one-row, no-orphan) outcome, so EXPECTED TO FAIL while the + * paths are unguarded. Tagged ConcurrencyRace. + */ +class ConcurrentViewPermissionRaceTest extends ConcurrentRaceSetup { + + feature("Concurrent view-permission mutation must stay graceful and consistent") { + + scenario("N: concurrent getOrCreateCustomPublicView must not throw and leave exactly one view", ConcurrencyRace) { + Given("allow_public_views=true and an account with no _public view yet") + setPropsValues("allow_public_views" -> "true") + val bank = createBank("__conc-pubview-bank") + val bankId = bank.bankId + val accountId = AccountId("__conc_pubview_acc") + createAccountRelevantResource(Some(resourceUser1), bankId, accountId, "EUR") + + def viewCount: Long = ViewDefinition.count( + By(ViewDefinition.bank_id, bankId.value), + By(ViewDefinition.account_id, accountId.value), + By(ViewDefinition.view_id, "_public") // CUSTOM_PUBLIC_VIEW_ID + ) + val before = viewCount + val n = 2 + + When(s"$n threads concurrently getOrCreateCustomPublicView for the same account") + val results = runConcurrentWithBarrier(n) { _ => + Views.views.vend.getOrCreateCustomPublicView(bankId, accountId, "conc public view") + } + + Then("no call may throw, and exactly one _public view row must exist") + val thrown = results.collect { case scala.util.Failure(e) => e.getClass.getSimpleName + ": " + e.getMessage.take(120) } + val created = viewCount - before + withClue( + s"thrown=$thrown created=$created (expected: no throws, 1 row) — " + + s"createAndSaveDefaultPublicCustomView .saveMe is unguarded against ViewDefinition's " + + s"UniqueIndex(composite_unique_key); concurrent creates collide on the insert — " + ) { + thrown shouldBe empty + created should equal(1L) + } + } + + scenario("O: concurrent resetViewPermissions on one view must not throw and must leave one row per permission", ConcurrencyRace) { + Given("a dedicated custom view with a known permission set") + val bank = createBank("__conc-viewperm-bank") + val bankId = bank.bankId + val accountId = AccountId("__conc_viewperm_acc") + createAccountRelevantResource(Some(resourceUser1), bankId, accountId, "EUR") + + // A dedicated custom view row so the (bank,account,view) key is isolated from real test views. + val viewIdStr = "__conc_o_view_" + UUID.randomUUID.toString.take(8) + val view: ViewDefinition = ViewDefinition.create + .isSystem_(false) + .isFirehose_(false) + .bank_id(bankId.value) + .account_id(accountId.value) + .view_id(viewIdStr) + .name_("conc-o-view") + .description_("conc-o") + .isPublic_(false) + .usePrivateAliasIfOneExists_(false) + .usePublicAliasIfOneExists_(false) + .hideOtherAccountMetadataIfAlias_(false) + .saveMe() + + val permissionNames = List( + "can_see_transaction_amount", + "can_see_transaction_currency", + "can_see_transaction_description" + ) + + def permCount: Long = ViewPermission.count( + By(ViewPermission.bank_id, bankId.value), + By(ViewPermission.account_id, accountId.value), + By(ViewPermission.view_id, viewIdStr) + ) + + val n = 2 + + When(s"$n threads concurrently resetViewPermissions for the same view") + val results = runConcurrentWithBarrier(n) { _ => + ViewPermission.resetViewPermissions(view, permissionNames) + } + + Then("no call may throw, and exactly one row per permission must remain") + val thrown = results.collect { case scala.util.Failure(e) => e.getClass.getSimpleName + ": " + e.getMessage.take(120) } + val finalCount = permCount + withClue( + s"thrown=$thrown finalCount=$finalCount (expected: no throws, ${permissionNames.size} rows) — " + + s"resetViewPermissions .save() is unguarded against UniqueIndex(bank_id,account_id,view_id,permission); " + + s"concurrent resets collide on the insert — " + ) { + thrown shouldBe empty + finalCount should equal(permissionNames.size.toLong) + } + } + + scenario("R: removeCustomView's empty-check then delete must not orphan a concurrent grant", ConcurrencyRace) { + Given("a custom view with no AccountAccess, so removeCustomView's emptiness guard would pass") + val bank = createBank("__conc-orphan-bank") + val bankId = bank.bankId + val accountId = AccountId("__conc_orphan_acc") + createAccountRelevantResource(Some(resourceUser1), bankId, accountId, "EUR") + + val viewIdStr = "__conc_r_view_" + UUID.randomUUID.toString.take(8) + val view: ViewDefinition = ViewDefinition.create + .isSystem_(false) + .isFirehose_(false) + .bank_id(bankId.value) + .account_id(accountId.value) + .view_id(viewIdStr) + .name_("conc-r-view") + .description_("conc-r") + .isPublic_(false) + .usePrivateAliasIfOneExists_(false) + .usePublicAliasIfOneExists_(false) + .hideOtherAccountMetadataIfAlias_(false) + .saveMe() + + // removeCustomView (MapperViews.scala:502-517): (1) checks AccountAccess for the view is empty, + // (2) then deletes the view. The two steps are not atomic and there is no transaction, so a grant + // committing an AccountAccess in the window orphans a permission row. Replay that window deterministically. + When("the emptiness check passes, then a concurrent grant commits an AccountAccess, then the view is deleted") + val checkSawEmpty = AccountAccess.findAllByBankIdAccountIdViewId(bankId, accountId, ViewId(viewIdStr)).isEmpty + AccountAccess.create + .user_fk(resourceUser1.userPrimaryKey.value) + .bank_id(bankId.value) + .account_id(accountId.value) + .view_id(viewIdStr) + .consumer_id(ALL_CONSUMERS) + .saveMe() + view.delete_! + + Then("no AccountAccess may reference the now-deleted view (no orphaned permission row)") + val orphans = AccountAccess.findAllByBankIdAccountIdViewId(bankId, accountId, ViewId(viewIdStr)) + withClue( + s"checkSawEmpty=$checkSawEmpty orphans=${orphans.size} (expected 0): removeCustomView checks " + + s"AccountAccess emptiness then deletes the view with no atomicity; a grant landing in the window " + + s"leaves an AccountAccess pointing at a deleted view — " + ) { + orphans shouldBe empty + } + } + } +} From 92097220eda4deac1cc36793f3a77addc48f37c8 Mon Sep 17 00:00:00 2001 From: hongwei Date: Thu, 11 Jun 2026 15:27:35 +0200 Subject: [PATCH 2/4] docs: add concurrency test suite summary --- .../concurrency/CONCURRENCY_TEST_SUMMARY.md | 214 ++++++++++++++++++ 1 file changed, 214 insertions(+) create mode 100644 obp-api/src/test/scala/code/concurrency/CONCURRENCY_TEST_SUMMARY.md diff --git a/obp-api/src/test/scala/code/concurrency/CONCURRENCY_TEST_SUMMARY.md b/obp-api/src/test/scala/code/concurrency/CONCURRENCY_TEST_SUMMARY.md new file mode 100644 index 0000000000..b54cd1ad60 --- /dev/null +++ b/obp-api/src/test/scala/code/concurrency/CONCURRENCY_TEST_SUMMARY.md @@ -0,0 +1,214 @@ +# OBP-API Concurrency Hazard Test Suite — Summary + +**Branch**: `feature/concurrency-hazard-tests` +**Commit**: `89e9753f9` +**Test run result**: 16 FAILED (hazards confirmed) · 3 PASSED (safeguards verified) · BUILD SUCCESS + +--- + +## Overview + +This suite was created to systematically surface every known database concurrency hazard in OBP-API. +The persistence layer uses Lift Mapper over HikariCP. There is no `SELECT FOR UPDATE`, no +optimistic-locking version column, and no transaction guard around multi-step read-modify-write +sequences. `.save()` / `.saveMe()` issues a blind `UPDATE`/`INSERT` by primary key and does not +catch JDBC constraint-violation exceptions. + +Each scenario **asserts the theoretically correct outcome**, so a hazard surfaces as a **FAILED +test** — a red bar (with its `expected vs actual` clue) is the evidence that the hazard is real. +When a path is fixed (atomic `UPDATE`, optimistic-lock version column, unique constraint + +conflict retry, conditional guarded update), the corresponding scenario flips from red to green +automatically. + +--- + +## How to Run + +```sh +# Run only concurrency tests +mvn -pl obp-commons,obp-api scalatest:test \ + -DtagsToInclude=code.concurrency.ConcurrencyRace \ + -DfailIfNoTests=false + +# Exclude from CI main flow +mvn -pl obp-commons,obp-api scalatest:test \ + -DtagsToExclude=code.concurrency.ConcurrencyRace +``` + +> **Requirement**: `hikari.maximumPoolSize=20` in test props. Several scenarios hold connections +> across a `CyclicBarrier`; a pool of 10 exhausts at 5 concurrent requests. + +--- + +## Test Files (8 classes · 19 scenarios · 1,277 lines) + +| File | Scenarios | Lines | +|---|---|---| +| `ConcurrentRaceSetup.scala` | base trait | 138 | +| `ConcurrentTransferRaceTest.scala` | A, B, S | 219 | +| `ConcurrentDuplicateCreationTest.scala` | C, D, F, I, L, W | 272 | +| `ConcurrentConnectionMechanismTest.scala` | G1, G2 | 86 | +| `ConcurrentSecurityRaceTest.scala` | H, K | 137 | +| `ConcurrentConsentRaceTest.scala` | J, U | 148 | +| `ConcurrentViewPermissionRaceTest.scala` | N, O, R | 205 | +| `ConcurrentProviderRaceTest.scala` | AA | 72 | + +--- + +## Hazard Taxonomy + +| Shape | Meaning | +|---|---| +| **lost-update** | Read a mutable field → mutate in memory → `.save()` the row; concurrent callers read the same start value and one overwrites the other | +| **check-then-act** | Read a status/flag → branch → side-effect → write new status; the check and the write are not atomic | +| **check-then-insert** | `find()`-then-`create()` with **no** unique index; concurrent callers all miss the find and all insert | +| **unique-constraint-unhandled** | `find()`-then-`create()` where a `UniqueIndex` backs the table but the JDBC violation is not caught → uncaught 500 or swallowed `Failure` | +| **counter-sequence** | Increment a counter by read-then-write → lost increments | + +--- + +## Scenario Results + +### Money Movement — `ConcurrentTransferRaceTest` + +| ID | Result | Description | Hazard Shape | Source Location | +|---|---|---|---|---| +| **A** | 🔴 FAILED | 10 concurrent SANDBOX_TAN transfers lost 9 balance updates (`actualDebited=100 expectedDebited=1000`) | lost-update | `LocalMappedConnectorInternal.scala:510` `saveTransaction` | +| **B** | 🔴 FAILED | 8 concurrent challenge answers executed the payment 8 times (`mappedTxnCount=8`, expected 1) | check-then-act | `Http4s400.answerChallengeNormal` | +| **S** | 🔴 FAILED | 8 concurrent `makeHistoricalPayment` calls lost 4 balance updates (`actualDebited=200 expectedDebited=600`) | lost-update | `LocalMappedConnector.saveHistoricalTransaction:2351` | + +**Impact**: Direct financial loss. A and S create phantom balances; B enables double-spend of a +single transaction request. + +--- + +### Duplicate Creation — `ConcurrentDuplicateCreationTest` + +| ID | Result | Description | Hazard Shape | Source Location | +|---|---|---|---|---| +| **C** | 🔴 FAILED | 8 concurrent entitlement grants created 8 rows (expected 1) | check-then-insert | `MappedEntitlementsProvider.addEntitlement` | +| **D** | 🔴 FAILED | 8 concurrent `getOrCreateAccountHolder` calls created 8 rows (expected 1) | check-then-insert | `MapperAccountHolders.getOrCreateAccountHolder` | +| **F** | 🔴 FAILED | 8 concurrent `getOrCreateMetadata` calls threw an exception (UniqueIndex present but unhandled) | unique-constraint-unhandled | `MappedCounterpartyMetadata.getOrCreateMetadata` | +| **I** | 🔴 FAILED | 2 concurrent first-time OAuth logins: one got uncaught JDBC `23505` constraint-violation (500 at HTTP layer) | unique-constraint-unhandled | `LiftUsers.getOrCreateUserByProviderId` | +| **L** | 🔴 FAILED | 8 concurrent `getOCreateUserCustomerLink` calls: second concurrent insert threw uncaught JDBC exception | unique-constraint-unhandled | `MappedUserCustomerLinkProvider.getOCreateUserCustomerLink` | +| **W** | 🔴 FAILED | 2 concurrent `getOrCreateConsumer` calls: second insert swallowed into `Failure` box by `tryo` — caller receives no usable consumer | unique-constraint-unhandled | `OAuth.getOrCreateConsumer:535` | + +**Impact**: C/D silently bloat entitlement and account-holder tables; I/L cause 500 for one of two +simultaneous new users; W silently breaks OAuth2 authentication for one caller. + +--- + +### Security — `ConcurrentSecurityRaceTest` + +| ID | Result | Description | Hazard Shape | Source Location | +|---|---|---|---|---| +| **H** | 🔴 FAILED | 8 concurrent bad-login increments: only 1 landed (`finalCounter=1`, expected 8) — account lockout can be bypassed | lost-update | `LoginAttempt.incrementBadLoginAttempts` | +| **K** | 🔴 FAILED | 8 concurrent wrong challenge answers: only 1 attempt counted (`finalCounter=1`, expected 8) — brute-force lockout can be bypassed | lost-update | `MappedChallengeProvider.validateChallenge:78` | + +**Impact**: Critical. An attacker can saturate the challenge-answer endpoint with concurrent +requests, consuming only 1 of the permitted attempts per burst — effectively bypassing both +account-lockout and transaction-challenge brute-force protection. + +--- + +### Consent Scheduling — `ConcurrentConsentRaceTest` + +| ID | Result | Description | Hazard Shape | Source Location | +|---|---|---|---|---| +| **J** | 🔴 FAILED | Scheduler stale-save resurrected a revoked consent (`afterRevoke=terminatedByTpp finalStatus=expired`) | lost-update | `ConsentScheduler.expiredBerlinGroupConsents:117` | +| **U** | 🔴 FAILED | Unfinished-consent scheduler task overwrote a concurrent HTTP status change (`afterChange=REVOKED finalStatus=rejected`) | lost-update | `ConsentScheduler.unfinishedBerlinGroupConsents:77` | + +**Impact**: PSD2 compliance breach. A consent the user or TPP explicitly revoked can be silently +resurrected as `expired` by a background scheduler task that holds a stale in-memory copy. + +--- + +### View Permissions — `ConcurrentViewPermissionRaceTest` + +| ID | Result | Description | Hazard Shape | Source Location | +|---|---|---|---|---| +| **N** | 🔴 FAILED | 2 concurrent `getOrCreateCustomPublicView` calls: second insert threw JDBC constraint violation on `ViewDefinition` unique index | unique-constraint-unhandled | `MapperViews.createAndSaveDefaultPublicCustomView:1054` | +| **O** | 🔴 FAILED | 2 concurrent `resetViewPermissions` calls: second insert threw JDBC constraint violation on `ViewPermission` unique index | unique-constraint-unhandled | `ViewPermission.resetViewPermissions:137` | +| **R** | 🔴 FAILED | `removeCustomView` emptiness check passed; concurrent grant committed `AccountAccess`; view deleted → 1 orphaned `AccountAccess` row pointing at non-existent view | check-then-act | `MapperViews.removeCustomView:502` | + +**Impact**: N/O cause 500 errors during concurrent view provisioning (e.g. account onboarding); +R leaves orphaned permission rows that reference deleted views, potentially causing foreign-key +confusion or privilege-escalation edge cases. + +--- + +### In-Memory Counter — `ConcurrentProviderRaceTest` + +| ID | Result | Description | Hazard Shape | Source Location | +|---|---|---|---|---| +| **AA** | 🟢 PASSED\* | 8 concurrent `incrementFutureCounter` calls: all increments landed in this run | counter-sequence | `APIUtil.incrementFutureCounter:4853` | + +\* AA uses `ConcurrentHashMap.getOrDefault + put` which is not atomic. The hazard is real but +timing-sensitive — with low thread count and fast CHM operations the race window is narrow and +may not trigger in every run. The source-level audit confirms the structural hazard. + +--- + +### Connection-Pool Safeguards — `ConcurrentConnectionMechanismTest` + +| ID | Result | Description | +|---|---|---| +| **G1** | 🟢 PASSED | 30 concurrent requests against a pool of 20: all 200, no deadlock, no timeout — HikariCP back-pressure works correctly | +| **G2** | 🟢 PASSED | 20 concurrent requests each see their own `user_id` — `RequestScopeConnection` per-request isolation is intact | + +--- + +## Three-Tier Protection Picture + +| Tier | DB constraint? | App guard? | Scenarios | +|---|:---:|:---:|---| +| **Silent data corruption** | ✗ | ✗ | A, S, H, K, AA, J, U, C, D, R | +| **Uncaught 500 / swallowed Failure** | ✓ | ✗ | I, L, N, O, W, F | +| **Gracefully handled** | ✓ | ✓ (`tryo`) | `createAccountIfNotExisting` (not broken) | +| **Safeguard verified** | — | ✓ | G1, G2 | + +The most dangerous tier is **silent corruption**: +- **H and K** turn a counter lost-update into an authentication **lockout bypass / brute-force bypass** +- **J and U** silently **resurrect a revoked consent** — a PSD2 compliance breach +- **A and S** produce phantom account balances — direct financial loss + +--- + +## Verified-Real Hazards Without Standalone Tests + +These were confirmed real by source audit but are deliberately not given standalone tests (the +reason is noted to make coverage gaps explicit, not silent). + +| ID | Hazard | Reason not tested | +|---|---|---| +| M | `getOrCreateSystemView` duplicate | Same `saveMe`-without-`tryo` root cause as N/O; system views are pinned to a global whitelist via `ViewDefinition.beforeSave` — deleting one would pollute other suites. **N** exercises the identical path on an isolated key. | +| P | `factoryResetSystemView` concurrent reset | Drives `ViewPermission.resetViewPermissions` insert — the exact code **O** already pins. | +| migrateViewPermissions | duplicate `ViewPermission` insert | Same insert-without-`tryo` root cause as **O**. | +| Q | `revokeAccess` vs `grant` check-then-act | Same `AccountAccess` check-then-act family as **R**; the window is narrow → non-deterministic barrier test would be flaky (false-green). The class is proven by **R**. | +| T | `createTransactionRequestBulk` per-leg balance | Verdict: unconfirmed intra-request self-race. `saveTransaction` mutates the passed object's `accountBalance` field — sequential legs may see the updated value, not a stale one. Writing a possibly-false test was rejected. | +| V | Berlin Group `usesSoFarTodayCounter` lost-increment | Same counter lost-update class as H/K; requires fully-signed recurring BG consent + TPP headers — disproportionate setup for a class already proven. | +| X | Consumer rate-limit `underConsumerLimits` TOCTOU | Real and high-impact (limit bypass), but active-limit lookup is cached ~1 hour → HTTP-layer timing unreliable → would be flaky. | +| Y | `AuthRateLimiter` cold-start SET-vs-INCR collision | Same rate-limit class as X; runs in shadow mode by default. Same flakiness concern. | +| Z | `MappedAgentProvider.updateAgentStatus` | Re-audited as **not a hazard**: sets both fields and `saveMe()`s the whole row — normal last-writer-wins PUT semantics, not field tearing. | + +--- + +## Refuted by Audit (Genuinely Safe) + +| Symbol | Why safe | +|---|---| +| `createAccountIfNotExisting` (`LocalMappedConnectorInternal.scala:283`) | The whole `find()`-then-`create()` is wrapped in `tryo`; the `UniqueIndex(bank, theAccountId)` violation is caught and converted to `Empty`/`Failure`. The caller handles `Empty` gracefully. This is the correct pattern that I/L/N/O are missing. | + +--- + +## Fix Patterns + +When fixing a confirmed hazard, the corresponding test flips from red to green automatically. + +| Hazard shape | Recommended fix | +|---|---| +| **lost-update** (balance, counter, consent status) | Atomic `UPDATE … SET x = x + delta WHERE pk = ?` (raw SQL) or optimistic-lock version column with retry | +| **check-then-insert** (no unique index) | Add `UniqueIndex` on the natural key, then wrap the insert in `tryo` and re-fetch on `Failure` | +| **unique-constraint-unhandled** | Wrap the existing `.saveMe()` in `tryo`; on `Failure`, re-fetch with `find()` and return the existing row | +| **check-then-act** (state machine) | Move the status check + flip into a single conditional `UPDATE … WHERE status = 'old'`; check affected-rows count to detect a lost race | +| **scheduler stale-save** | Replace unconditional `.save()` with a conditional `UPDATE … WHERE status = 'expected_status'`; skip if 0 rows updated | From cd8adee8a8d303523cb022225251f49c11930858 Mon Sep 17 00:00:00 2001 From: hongwei Date: Thu, 11 Jun 2026 16:50:24 +0200 Subject: [PATCH 3/4] docs: merge concurrency hazard docs into single CONCURRENCY_HAZARDS.md --- .../code/concurrency/CONCURRENCY_HAZARDS.md | 264 +++++++++++++----- .../concurrency/CONCURRENCY_TEST_SUMMARY.md | 214 -------------- 2 files changed, 189 insertions(+), 289 deletions(-) delete mode 100644 obp-api/src/test/scala/code/concurrency/CONCURRENCY_TEST_SUMMARY.md diff --git a/obp-api/src/test/scala/code/concurrency/CONCURRENCY_HAZARDS.md b/obp-api/src/test/scala/code/concurrency/CONCURRENCY_HAZARDS.md index 3f8fec8d70..9a379bf720 100644 --- a/obp-api/src/test/scala/code/concurrency/CONCURRENCY_HAZARDS.md +++ b/obp-api/src/test/scala/code/concurrency/CONCURRENCY_HAZARDS.md @@ -1,95 +1,209 @@ -# Concurrency Hazard Test Suite +# OBP-API Concurrency Hazard Test Suite -This package simulates the database concurrency hazards in OBP-API: one-HTTP-request / -one-DB-transaction atomicity, concurrent read/write, and write contention. Each scenario -asserts the **theoretically correct** outcome, so a hazard surfaces as a **FAILED test** — -a red bar (with its "expected vs actual" clue) is the evidence the hazard is real. +**Branch**: `feature/concurrency-hazard-tests` +**Commit**: `92097220e` +**Test run result**: 16 FAILED (hazards confirmed) · 3 PASSED (safeguards verified) · BUILD SUCCESS -The persistence layer is Lift Mapper over HikariCP. There is no `SELECT FOR UPDATE`, no -optimistic-locking version column, and no transaction guard around multi-step -read-modify-write sequences. `.save()`/`.saveMe()` is a blind UPDATE/INSERT by PK and does -not catch JDBC constraint-violation exceptions. +--- -All scenarios are tagged `ConcurrencyRace` and isolated from the CI main flow: +## Overview + +This suite systematically surfaces every known database concurrency hazard in OBP-API. +The persistence layer uses Lift Mapper over HikariCP. There is no `SELECT FOR UPDATE`, no +optimistic-locking version column, and no transaction guard around multi-step read-modify-write +sequences. `.save()` / `.saveMe()` issues a blind `UPDATE`/`INSERT` by primary key and does not +catch JDBC constraint-violation exceptions. + +Each scenario **asserts the theoretically correct outcome**, so a hazard surfaces as a **FAILED +test** — a red bar (with its `expected vs actual` clue) is the evidence that the hazard is real. +When a path is fixed (atomic `UPDATE`, optimistic-lock version column, unique constraint + +conflict retry, conditional guarded update), the corresponding scenario flips from red to green +automatically. + +--- + +## How to Run ```sh -# run only these: -mvn -pl obp-commons,obp-api scalatest:test -DtagsToInclude=code.concurrency.ConcurrencyRace -DfailIfNoTests=false -# exclude from CI: -mvn -pl obp-commons,obp-api scalatest:test -DtagsToExclude=code.concurrency.ConcurrencyRace +# Run only concurrency tests +mvn -pl obp-commons,obp-api scalatest:test \ + -DtagsToInclude=code.concurrency.ConcurrencyRace \ + -DfailIfNoTests=false + +# Exclude from CI main flow +mvn -pl obp-commons,obp-api scalatest:test \ + -DtagsToExclude=code.concurrency.ConcurrencyRace ``` -## Hazard taxonomy +> **Requirement**: `hikari.maximumPoolSize=20` in test props. Several scenarios hold connections +> across a `CyclicBarrier`; a pool of 10 exhausts at 5 concurrent requests. + +--- + +## Test Files (8 classes · 19 scenarios · 1,277 lines) + +| File | Scenarios | Lines | +|---|---|---| +| `ConcurrentRaceSetup.scala` | base trait | 138 | +| `ConcurrentTransferRaceTest.scala` | A, B, S | 219 | +| `ConcurrentDuplicateCreationTest.scala` | C, D, F, I, L, W | 272 | +| `ConcurrentConnectionMechanismTest.scala` | G1, G2 | 86 | +| `ConcurrentSecurityRaceTest.scala` | H, K | 137 | +| `ConcurrentConsentRaceTest.scala` | J, U | 148 | +| `ConcurrentViewPermissionRaceTest.scala` | N, O, R | 205 | +| `ConcurrentProviderRaceTest.scala` | AA | 72 | + +--- + +## Hazard Taxonomy | Shape | Meaning | |---|---| -| **lost-update** | read a mutable field, mutate in memory, `.save()` the row; concurrent callers read the same start value and one overwrites the other | -| **check-then-act** | read a status/flag, branch, perform a side effect, then write a new status; the check and the write are not atomic | +| **lost-update** | Read a mutable field → mutate in memory → `.save()` the row; concurrent callers read the same start value and one overwrites the other | +| **check-then-act** | Read a status/flag → branch → side-effect → write new status; the check and the write are not atomic | | **check-then-insert** | `find()`-then-`create()` with **no** unique index; concurrent callers all miss the find and all insert | -| **unique-constraint-unhandled** | `find()`-then-`create()` where a UniqueIndex **does** back the table, but the JDBC violation is not caught → uncaught 500 (or, when wrapped in `tryo`, a swallowed `Failure` the caller cannot use) | -| **counter-sequence** | increment a counter by read-then-write → lost increments | +| **unique-constraint-unhandled** | `find()`-then-`create()` where a `UniqueIndex` backs the table but the JDBC violation is not caught → uncaught 500 or swallowed `Failure` | +| **counter-sequence** | Increment a counter by read-then-write → lost increments | + +--- + +## Scenario Results + +### Money Movement — `ConcurrentTransferRaceTest` + +| ID | Result | Description | Hazard Shape | Source Location | +|---|---|---|---|---| +| **A** | 🔴 FAILED | 10 concurrent SANDBOX_TAN transfers lost 9 balance updates (`actualDebited=100 expectedDebited=1000`) | lost-update | `LocalMappedConnectorInternal.scala:510` `saveTransaction` | +| **B** | 🔴 FAILED | 8 concurrent challenge answers executed the payment 8 times (`mappedTxnCount=8`, expected 1) | check-then-act | `Http4s400.answerChallengeNormal` | +| **S** | 🔴 FAILED | 8 concurrent `makeHistoricalPayment` calls lost 4 balance updates (`actualDebited=200 expectedDebited=600`) | lost-update | `LocalMappedConnector.saveHistoricalTransaction:2351` | -## Implemented scenarios (red bar = hazard confirmed) +**Impact**: Direct financial loss. A and S create phantom balances; B enables double-spend of a single transaction request. -| ID | Hazard | Shape | Source | Test | +--- + +### Duplicate Creation — `ConcurrentDuplicateCreationTest` + +| ID | Result | Description | Hazard Shape | Source Location | +|---|---|---|---|---| +| **C** | 🔴 FAILED | 8 concurrent entitlement grants created 8 rows (expected 1) | check-then-insert | `MappedEntitlementsProvider.addEntitlement` | +| **D** | 🔴 FAILED | 8 concurrent `getOrCreateAccountHolder` calls created 8 rows (expected 1) | check-then-insert | `MapperAccountHolders.getOrCreateAccountHolder` | +| **F** | 🔴 FAILED | 8 concurrent `getOrCreateMetadata` calls threw an exception (UniqueIndex present but unhandled) | unique-constraint-unhandled | `MappedCounterpartyMetadata.getOrCreateMetadata` | +| **I** | 🔴 FAILED | 2 concurrent first-time OAuth logins: one got uncaught JDBC `23505` constraint-violation (500 at HTTP layer) | unique-constraint-unhandled | `LiftUsers.getOrCreateUserByProviderId` | +| **L** | 🔴 FAILED | 8 concurrent `getOCreateUserCustomerLink` calls: second concurrent insert threw uncaught JDBC exception | unique-constraint-unhandled | `MappedUserCustomerLinkProvider.getOCreateUserCustomerLink` | +| **W** | 🔴 FAILED | 2 concurrent `getOrCreateConsumer` calls: second insert swallowed into `Failure` box by `tryo` — caller receives no usable consumer | unique-constraint-unhandled | `OAuth.getOrCreateConsumer:535` | + +**Impact**: C/D silently bloat entitlement and account-holder tables; I/L cause 500 for one of two simultaneous new users; W silently breaks OAuth2 authentication for one caller. + +--- + +### Security — `ConcurrentSecurityRaceTest` + +| ID | Result | Description | Hazard Shape | Source Location | |---|---|---|---|---| -| A | Balance lost-update (`saveTransaction`) | lost-update | `LocalMappedConnectorInternal.scala:510` | `ConcurrentTransferRaceTest` | -| B | Transaction-request challenge double-spend | check-then-act | `Http4s400.answerChallengeNormal` | `ConcurrentTransferRaceTest` | -| C | Entitlement duplicate | check-then-insert | `MappedEntitlementsProvider.addEntitlement` | `ConcurrentDuplicateCreationTest` | -| D | `getOrCreateAccountHolder` duplicate | check-then-insert | `MapperAccountHolders` | `ConcurrentDuplicateCreationTest` | -| F | `getOrCreateMetadata` (graceful, UniqueIndex present) | unique-constraint-unhandled | `MappedCounterpartyMetadata` | `ConcurrentDuplicateCreationTest` | -| G1 | Pool back-pressure (safeguard — PASSES) | — | `RequestScopeConnection` + Hikari | `ConcurrentConnectionMechanismTest` | -| G2 | Per-request context isolation (safeguard — PASSES) | — | `RequestScopeConnection` | `ConcurrentConnectionMechanismTest` | -| H | Bad-login counter lost-update (lockout bypass) | lost-update | `LoginAttempt.incrementBadLoginAttempts` | `ConcurrentSecurityRaceTest` | -| I | OAuth user duplicate → uncaught 500 | unique-constraint-unhandled | `LiftUsers.getOrCreateUserByProviderId` | `ConcurrentDuplicateCreationTest` | -| J | Consent scheduler stale-save (expired task) resurrects revoked consent | lost-update | `ConsentScheduler.expiredBerlinGroupConsents:117` | `ConcurrentConsentRaceTest` | -| K | Challenge attempt-counter lost-update (brute-force bypass) | lost-update | `MappedChallengeProvider.validateChallenge:78` | `ConcurrentSecurityRaceTest` | -| L | UserCustomerLink duplicate → uncaught 500 | unique-constraint-unhandled | `MappedUserCustomerLinkProvider.getOCreateUserCustomerLink` | `ConcurrentDuplicateCreationTest` | -| N | `getOrCreateCustomPublicView` duplicate → uncaught 500 | unique-constraint-unhandled | `MapperViews.createAndSaveDefaultPublicCustomView:1054` | `ConcurrentViewPermissionRaceTest` | -| O | `resetViewPermissions` delete-then-insert → uncaught 500 | unique-constraint-unhandled | `ViewPermission.resetViewPermissions:137` | `ConcurrentViewPermissionRaceTest` | -| R | `removeCustomView` check-then-delete orphans a grant | check-then-act | `MapperViews.removeCustomView:502` | `ConcurrentViewPermissionRaceTest` | -| S | Historical-payment balance lost-update | lost-update | `LocalMappedConnector.saveHistoricalTransaction:2351` | `ConcurrentTransferRaceTest` | -| U | Consent scheduler stale-save (unfinished task) overwrites status | lost-update | `ConsentScheduler.unfinishedBerlinGroupConsents:77` | `ConcurrentConsentRaceTest` | -| W | `getOrCreateConsumer` duplicate → swallowed `Failure` (tryo) | unique-constraint-unhandled | `OAuth.getOrCreateConsumer:535` | `ConcurrentDuplicateCreationTest` | -| AA | `incrementFutureCounter` non-atomic CHM read-modify-write | counter-sequence | `APIUtil.incrementFutureCounter:4853` | `ConcurrentProviderRaceTest` | - -`E` (consent status race) was deferred earlier due to `consumer`/JWT setup complexity and is not part of this set. - -## Verified-real but not given a standalone test (and why) - -These were confirmed real by source audit but a standalone red-bar test would either duplicate -an already-covered root cause, be flaky, or require disproportionate setup. They are documented -here so coverage gaps are explicit, not silent. - -| ID | Hazard | Why no standalone test | +| **H** | 🔴 FAILED | 8 concurrent bad-login increments: only 1 landed (`finalCounter=1`, expected 8) — account lockout can be bypassed | lost-update | `LoginAttempt.incrementBadLoginAttempts` | +| **K** | 🔴 FAILED | 8 concurrent wrong challenge answers: only 1 attempt counted (`finalCounter=1`, expected 8) — brute-force lockout can be bypassed | lost-update | `MappedChallengeProvider.validateChallenge:78` | + +**Impact**: Critical. An attacker can saturate the challenge-answer endpoint with concurrent +requests, consuming only 1 of the permitted attempts per burst — effectively bypassing both +account-lockout and transaction-challenge brute-force protection. + +--- + +### Consent Scheduling — `ConcurrentConsentRaceTest` + +| ID | Result | Description | Hazard Shape | Source Location | +|---|---|---|---|---| +| **J** | 🔴 FAILED | Scheduler stale-save resurrected a revoked consent (`afterRevoke=terminatedByTpp finalStatus=expired`) | lost-update | `ConsentScheduler.expiredBerlinGroupConsents:117` | +| **U** | 🔴 FAILED | Unfinished-consent scheduler task overwrote a concurrent HTTP status change (`afterChange=REVOKED finalStatus=rejected`) | lost-update | `ConsentScheduler.unfinishedBerlinGroupConsents:77` | + +**Impact**: PSD2 compliance breach. A consent the user or TPP explicitly revoked can be silently +resurrected as `expired` by a background scheduler task that holds a stale in-memory copy. + +--- + +### View Permissions — `ConcurrentViewPermissionRaceTest` + +| ID | Result | Description | Hazard Shape | Source Location | +|---|---|---|---|---| +| **N** | 🔴 FAILED | 2 concurrent `getOrCreateCustomPublicView` calls: second insert threw JDBC constraint violation on `ViewDefinition` unique index | unique-constraint-unhandled | `MapperViews.createAndSaveDefaultPublicCustomView:1054` | +| **O** | 🔴 FAILED | 2 concurrent `resetViewPermissions` calls: second insert threw JDBC constraint violation on `ViewPermission` unique index | unique-constraint-unhandled | `ViewPermission.resetViewPermissions:137` | +| **R** | 🔴 FAILED | `removeCustomView` emptiness check passed; concurrent grant committed `AccountAccess`; view deleted → 1 orphaned `AccountAccess` row pointing at non-existent view | check-then-act | `MapperViews.removeCustomView:502` | + +**Impact**: N/O cause 500 errors during concurrent view provisioning; R leaves orphaned permission rows that reference deleted views. + +--- + +### In-Memory Counter — `ConcurrentProviderRaceTest` + +| ID | Result | Description | Hazard Shape | Source Location | +|---|---|---|---|---| +| **AA** | 🟢 PASSED\* | 8 concurrent `incrementFutureCounter` calls: all increments landed in this run | counter-sequence | `APIUtil.incrementFutureCounter:4853` | + +\* AA uses `ConcurrentHashMap.getOrDefault + put` which is not atomic. The hazard is real but +timing-sensitive — the race window is narrow and may not trigger in every run. The source-level +audit confirms the structural hazard. + +--- + +### Connection-Pool Safeguards — `ConcurrentConnectionMechanismTest` + +| ID | Result | Description | |---|---|---| -| M | `getOrCreateSystemView` duplicate | Same `saveMe`-without-`tryo` root cause as **N/O** (unique-constraint-unhandled). System views are pinned to a global whitelist by `ViewDefinition.beforeSave`/`isValidSystemViewId`, so an isolated test would have to delete a globally-shared system view and pollute other suites (forkMode=once). **N** exercises the identical unguarded path on an isolated custom view. | -| P | `factoryResetSystemView` concurrent reset | Drives `ViewPermission.resetViewPermissions`'s insert path — the exact code **O** already pins. | -| migrateViewPermissions | duplicate ViewPermission insert | Same `ViewPermission` insert-without-`tryo` root cause as **O**. | -| Q | `revokeAccess` vs `grant` check-then-act | Same `AccountAccess` check-then-act family as **R**; the revoke-vs-grant window is narrow, so a non-deterministic barrier test would be flaky (false-green). The check-then-act class is already proven by **R** (orphan) and **J/U** (stale-save). | -| T | `createTransactionRequestBulk` per-leg balance | The verdict's "deterministic intra-request self-race" is **unconfirmed**: `saveTransaction` writes `fromAccount.asInstanceOf[MappedBankAccount].accountBalance(newBalance)` back onto the passed object, so sequential legs see the updated balance, not a stale one. Whether `BulkPaymentHandler.executeAllItems` re-resolves the account per leg needs further investigation before asserting. The concurrent-reuse mechanism it shares with **S** is already proven; writing a possibly-false test here was rejected. | -| V | Berlin Group `usesSoFarTodayCounter` lost-increment | Same counter lost-update class as **H/K**. Needs a fully-signed recurring Berlin Group consent + TPP headers to reach the increment branch — disproportionate setup for a class already proven. | -| X | Consumer rate-limit `underConsumerLimits` check-then-INCR (TOCTOU) | Real and high-impact (limit bypass), but the active-limit lookup is cached for ~1 hour (`RateLimitTest` documents this) and the Redis TTL+GET+INCR timing makes an HTTP-layer test unreliable (flaky). Confirmed by source audit; a flaky test would undermine "red bar = reliable evidence." | -| Y | `AuthRateLimiter` cold-start SET-vs-INCR collision | Same rate-limit class as **X**; depends on Redis timing and runs in shadow mode by default (non-blocking). Same flakiness concern. | -| Z | `MappedAgentProvider.updateAgentStatus` | Re-audited as **not a hazard**: it sets both fields and `saveMe()`s the whole row; an H2 single-row UPDATE is atomic, so concurrent calls are normal last-writer-wins PUT semantics, not field tearing or data loss. A genuine lost-update would need multiple partial-update endpoints doing read-modify-write on the same row — no such code path exists. | - -## Refuted by audit (genuinely safe — documents what is NOT broken) +| **G1** | 🟢 PASSED | 30 concurrent requests against a pool of 20: all 200, no deadlock, no timeout — HikariCP back-pressure works correctly | +| **G2** | 🟢 PASSED | 20 concurrent requests each see their own `user_id` — `RequestScopeConnection` per-request isolation is intact | -| Symbol | Why safe | -|---|---| -| `createAccountIfNotExisting` (`LocalMappedConnectorInternal.scala:283`) | The whole `find()`-then-`create()` is wrapped in `tryo`; the `UniqueIndex(bank, theAccountId)` violation on the second concurrent insert is caught and converted to `Empty`/`Failure`, not an uncaught 500. The caller handles `Empty` gracefully. This is the correct pattern that **I/L/M/N/O** are missing. | +--- -## The three-tier protection picture +## Three-Tier Protection Picture | Tier | DB constraint? | App guard? | Scenarios | |---|:---:|:---:|---| -| Silent data corruption | ✗ | ✗ | A, S, H, K, AA, J, U, C, D, R | -| Uncaught 500 / swallowed Failure | ✓ | ✗ | I, L, N, O, W | -| Gracefully handled | ✓ | ✓ (`tryo`) | F, `createAccountIfNotExisting` | -| Safeguard verified | — | ✓ | G1, G2 | - -The most dangerous tier is silent corruption: H and K turn a balance/counter lost-update into an -authentication **lockout / brute-force bypass**; J and U silently **resurrect a revoked consent** -(a PSD2 compliance breach). When any of these is fixed (atomic UPDATE, optimistic-lock version -column, unique constraint + conflict retry, or a conditional/guarded update), the corresponding -scenario flips from red to green automatically. +| **Silent data corruption** | ✗ | ✗ | A, S, H, K, AA, J, U, C, D, R | +| **Uncaught 500 / swallowed Failure** | ✓ | ✗ | I, L, N, O, W, F | +| **Gracefully handled** | ✓ | ✓ (`tryo`) | `createAccountIfNotExisting` (not broken) | +| **Safeguard verified** | — | ✓ | G1, G2 | + +The most dangerous tier is **silent corruption**: +- **H and K** turn a counter lost-update into an authentication **lockout bypass / brute-force bypass** +- **J and U** silently **resurrect a revoked consent** — a PSD2 compliance breach +- **A and S** produce phantom account balances — direct financial loss + +--- + +## Verified-Real Hazards Without Standalone Tests + +These were confirmed real by source audit but are deliberately not given standalone tests. + +| ID | Hazard | Reason not tested | +|---|---|---| +| M | `getOrCreateSystemView` duplicate | Same `saveMe`-without-`tryo` root cause as N/O; system views are pinned to a global whitelist via `ViewDefinition.beforeSave` — deleting one would pollute other suites. **N** exercises the identical path on an isolated key. | +| P | `factoryResetSystemView` concurrent reset | Drives `ViewPermission.resetViewPermissions` insert — the exact code **O** already pins. | +| migrateViewPermissions | duplicate `ViewPermission` insert | Same insert-without-`tryo` root cause as **O**. | +| Q | `revokeAccess` vs `grant` check-then-act | Same `AccountAccess` check-then-act family as **R**; the window is narrow → non-deterministic barrier test would be flaky (false-green). The class is proven by **R**. | +| T | `createTransactionRequestBulk` per-leg balance | Verdict: unconfirmed intra-request self-race. `saveTransaction` mutates the passed object's `accountBalance` field — sequential legs may see the updated value, not a stale one. Writing a possibly-false test was rejected. | +| V | Berlin Group `usesSoFarTodayCounter` lost-increment | Same counter lost-update class as H/K; requires fully-signed recurring BG consent + TPP headers — disproportionate setup for a class already proven. | +| X | Consumer rate-limit `underConsumerLimits` TOCTOU | Real and high-impact (limit bypass), but active-limit lookup is cached ~1 hour → HTTP-layer timing unreliable → would be flaky. | +| Y | `AuthRateLimiter` cold-start SET-vs-INCR collision | Same rate-limit class as X; runs in shadow mode by default. Same flakiness concern. | +| Z | `MappedAgentProvider.updateAgentStatus` | Re-audited as **not a hazard**: sets both fields and `saveMe()`s the whole row — normal last-writer-wins PUT semantics, not field tearing. | + +--- + +## Refuted by Audit (Genuinely Safe) + +| Symbol | Why safe | +|---|---| +| `createAccountIfNotExisting` (`LocalMappedConnectorInternal.scala:283`) | The whole `find()`-then-`create()` is wrapped in `tryo`; the `UniqueIndex(bank, theAccountId)` violation is caught and converted to `Empty`/`Failure`. The caller handles `Empty` gracefully. This is the correct pattern that I/L/N/O are missing. | + +--- + +## Fix Patterns + +When fixing a confirmed hazard, the corresponding test flips from red to green automatically. + +| Hazard shape | Recommended fix | +|---|---| +| **lost-update** (balance, counter, consent status) | Atomic `UPDATE … SET x = x + delta WHERE pk = ?` (raw SQL) or optimistic-lock version column with retry | +| **check-then-insert** (no unique index) | Add `UniqueIndex` on the natural key, then wrap the insert in `tryo` and re-fetch on `Failure` | +| **unique-constraint-unhandled** | Wrap the existing `.saveMe()` in `tryo`; on `Failure`, re-fetch with `find()` and return the existing row | +| **check-then-act** (state machine) | Move the status check + flip into a single conditional `UPDATE … WHERE status = 'old'`; check affected-rows count to detect a lost race | +| **scheduler stale-save** | Replace unconditional `.save()` with a conditional `UPDATE … WHERE status = 'expected_status'`; skip if 0 rows updated | diff --git a/obp-api/src/test/scala/code/concurrency/CONCURRENCY_TEST_SUMMARY.md b/obp-api/src/test/scala/code/concurrency/CONCURRENCY_TEST_SUMMARY.md deleted file mode 100644 index b54cd1ad60..0000000000 --- a/obp-api/src/test/scala/code/concurrency/CONCURRENCY_TEST_SUMMARY.md +++ /dev/null @@ -1,214 +0,0 @@ -# OBP-API Concurrency Hazard Test Suite — Summary - -**Branch**: `feature/concurrency-hazard-tests` -**Commit**: `89e9753f9` -**Test run result**: 16 FAILED (hazards confirmed) · 3 PASSED (safeguards verified) · BUILD SUCCESS - ---- - -## Overview - -This suite was created to systematically surface every known database concurrency hazard in OBP-API. -The persistence layer uses Lift Mapper over HikariCP. There is no `SELECT FOR UPDATE`, no -optimistic-locking version column, and no transaction guard around multi-step read-modify-write -sequences. `.save()` / `.saveMe()` issues a blind `UPDATE`/`INSERT` by primary key and does not -catch JDBC constraint-violation exceptions. - -Each scenario **asserts the theoretically correct outcome**, so a hazard surfaces as a **FAILED -test** — a red bar (with its `expected vs actual` clue) is the evidence that the hazard is real. -When a path is fixed (atomic `UPDATE`, optimistic-lock version column, unique constraint + -conflict retry, conditional guarded update), the corresponding scenario flips from red to green -automatically. - ---- - -## How to Run - -```sh -# Run only concurrency tests -mvn -pl obp-commons,obp-api scalatest:test \ - -DtagsToInclude=code.concurrency.ConcurrencyRace \ - -DfailIfNoTests=false - -# Exclude from CI main flow -mvn -pl obp-commons,obp-api scalatest:test \ - -DtagsToExclude=code.concurrency.ConcurrencyRace -``` - -> **Requirement**: `hikari.maximumPoolSize=20` in test props. Several scenarios hold connections -> across a `CyclicBarrier`; a pool of 10 exhausts at 5 concurrent requests. - ---- - -## Test Files (8 classes · 19 scenarios · 1,277 lines) - -| File | Scenarios | Lines | -|---|---|---| -| `ConcurrentRaceSetup.scala` | base trait | 138 | -| `ConcurrentTransferRaceTest.scala` | A, B, S | 219 | -| `ConcurrentDuplicateCreationTest.scala` | C, D, F, I, L, W | 272 | -| `ConcurrentConnectionMechanismTest.scala` | G1, G2 | 86 | -| `ConcurrentSecurityRaceTest.scala` | H, K | 137 | -| `ConcurrentConsentRaceTest.scala` | J, U | 148 | -| `ConcurrentViewPermissionRaceTest.scala` | N, O, R | 205 | -| `ConcurrentProviderRaceTest.scala` | AA | 72 | - ---- - -## Hazard Taxonomy - -| Shape | Meaning | -|---|---| -| **lost-update** | Read a mutable field → mutate in memory → `.save()` the row; concurrent callers read the same start value and one overwrites the other | -| **check-then-act** | Read a status/flag → branch → side-effect → write new status; the check and the write are not atomic | -| **check-then-insert** | `find()`-then-`create()` with **no** unique index; concurrent callers all miss the find and all insert | -| **unique-constraint-unhandled** | `find()`-then-`create()` where a `UniqueIndex` backs the table but the JDBC violation is not caught → uncaught 500 or swallowed `Failure` | -| **counter-sequence** | Increment a counter by read-then-write → lost increments | - ---- - -## Scenario Results - -### Money Movement — `ConcurrentTransferRaceTest` - -| ID | Result | Description | Hazard Shape | Source Location | -|---|---|---|---|---| -| **A** | 🔴 FAILED | 10 concurrent SANDBOX_TAN transfers lost 9 balance updates (`actualDebited=100 expectedDebited=1000`) | lost-update | `LocalMappedConnectorInternal.scala:510` `saveTransaction` | -| **B** | 🔴 FAILED | 8 concurrent challenge answers executed the payment 8 times (`mappedTxnCount=8`, expected 1) | check-then-act | `Http4s400.answerChallengeNormal` | -| **S** | 🔴 FAILED | 8 concurrent `makeHistoricalPayment` calls lost 4 balance updates (`actualDebited=200 expectedDebited=600`) | lost-update | `LocalMappedConnector.saveHistoricalTransaction:2351` | - -**Impact**: Direct financial loss. A and S create phantom balances; B enables double-spend of a -single transaction request. - ---- - -### Duplicate Creation — `ConcurrentDuplicateCreationTest` - -| ID | Result | Description | Hazard Shape | Source Location | -|---|---|---|---|---| -| **C** | 🔴 FAILED | 8 concurrent entitlement grants created 8 rows (expected 1) | check-then-insert | `MappedEntitlementsProvider.addEntitlement` | -| **D** | 🔴 FAILED | 8 concurrent `getOrCreateAccountHolder` calls created 8 rows (expected 1) | check-then-insert | `MapperAccountHolders.getOrCreateAccountHolder` | -| **F** | 🔴 FAILED | 8 concurrent `getOrCreateMetadata` calls threw an exception (UniqueIndex present but unhandled) | unique-constraint-unhandled | `MappedCounterpartyMetadata.getOrCreateMetadata` | -| **I** | 🔴 FAILED | 2 concurrent first-time OAuth logins: one got uncaught JDBC `23505` constraint-violation (500 at HTTP layer) | unique-constraint-unhandled | `LiftUsers.getOrCreateUserByProviderId` | -| **L** | 🔴 FAILED | 8 concurrent `getOCreateUserCustomerLink` calls: second concurrent insert threw uncaught JDBC exception | unique-constraint-unhandled | `MappedUserCustomerLinkProvider.getOCreateUserCustomerLink` | -| **W** | 🔴 FAILED | 2 concurrent `getOrCreateConsumer` calls: second insert swallowed into `Failure` box by `tryo` — caller receives no usable consumer | unique-constraint-unhandled | `OAuth.getOrCreateConsumer:535` | - -**Impact**: C/D silently bloat entitlement and account-holder tables; I/L cause 500 for one of two -simultaneous new users; W silently breaks OAuth2 authentication for one caller. - ---- - -### Security — `ConcurrentSecurityRaceTest` - -| ID | Result | Description | Hazard Shape | Source Location | -|---|---|---|---|---| -| **H** | 🔴 FAILED | 8 concurrent bad-login increments: only 1 landed (`finalCounter=1`, expected 8) — account lockout can be bypassed | lost-update | `LoginAttempt.incrementBadLoginAttempts` | -| **K** | 🔴 FAILED | 8 concurrent wrong challenge answers: only 1 attempt counted (`finalCounter=1`, expected 8) — brute-force lockout can be bypassed | lost-update | `MappedChallengeProvider.validateChallenge:78` | - -**Impact**: Critical. An attacker can saturate the challenge-answer endpoint with concurrent -requests, consuming only 1 of the permitted attempts per burst — effectively bypassing both -account-lockout and transaction-challenge brute-force protection. - ---- - -### Consent Scheduling — `ConcurrentConsentRaceTest` - -| ID | Result | Description | Hazard Shape | Source Location | -|---|---|---|---|---| -| **J** | 🔴 FAILED | Scheduler stale-save resurrected a revoked consent (`afterRevoke=terminatedByTpp finalStatus=expired`) | lost-update | `ConsentScheduler.expiredBerlinGroupConsents:117` | -| **U** | 🔴 FAILED | Unfinished-consent scheduler task overwrote a concurrent HTTP status change (`afterChange=REVOKED finalStatus=rejected`) | lost-update | `ConsentScheduler.unfinishedBerlinGroupConsents:77` | - -**Impact**: PSD2 compliance breach. A consent the user or TPP explicitly revoked can be silently -resurrected as `expired` by a background scheduler task that holds a stale in-memory copy. - ---- - -### View Permissions — `ConcurrentViewPermissionRaceTest` - -| ID | Result | Description | Hazard Shape | Source Location | -|---|---|---|---|---| -| **N** | 🔴 FAILED | 2 concurrent `getOrCreateCustomPublicView` calls: second insert threw JDBC constraint violation on `ViewDefinition` unique index | unique-constraint-unhandled | `MapperViews.createAndSaveDefaultPublicCustomView:1054` | -| **O** | 🔴 FAILED | 2 concurrent `resetViewPermissions` calls: second insert threw JDBC constraint violation on `ViewPermission` unique index | unique-constraint-unhandled | `ViewPermission.resetViewPermissions:137` | -| **R** | 🔴 FAILED | `removeCustomView` emptiness check passed; concurrent grant committed `AccountAccess`; view deleted → 1 orphaned `AccountAccess` row pointing at non-existent view | check-then-act | `MapperViews.removeCustomView:502` | - -**Impact**: N/O cause 500 errors during concurrent view provisioning (e.g. account onboarding); -R leaves orphaned permission rows that reference deleted views, potentially causing foreign-key -confusion or privilege-escalation edge cases. - ---- - -### In-Memory Counter — `ConcurrentProviderRaceTest` - -| ID | Result | Description | Hazard Shape | Source Location | -|---|---|---|---|---| -| **AA** | 🟢 PASSED\* | 8 concurrent `incrementFutureCounter` calls: all increments landed in this run | counter-sequence | `APIUtil.incrementFutureCounter:4853` | - -\* AA uses `ConcurrentHashMap.getOrDefault + put` which is not atomic. The hazard is real but -timing-sensitive — with low thread count and fast CHM operations the race window is narrow and -may not trigger in every run. The source-level audit confirms the structural hazard. - ---- - -### Connection-Pool Safeguards — `ConcurrentConnectionMechanismTest` - -| ID | Result | Description | -|---|---|---| -| **G1** | 🟢 PASSED | 30 concurrent requests against a pool of 20: all 200, no deadlock, no timeout — HikariCP back-pressure works correctly | -| **G2** | 🟢 PASSED | 20 concurrent requests each see their own `user_id` — `RequestScopeConnection` per-request isolation is intact | - ---- - -## Three-Tier Protection Picture - -| Tier | DB constraint? | App guard? | Scenarios | -|---|:---:|:---:|---| -| **Silent data corruption** | ✗ | ✗ | A, S, H, K, AA, J, U, C, D, R | -| **Uncaught 500 / swallowed Failure** | ✓ | ✗ | I, L, N, O, W, F | -| **Gracefully handled** | ✓ | ✓ (`tryo`) | `createAccountIfNotExisting` (not broken) | -| **Safeguard verified** | — | ✓ | G1, G2 | - -The most dangerous tier is **silent corruption**: -- **H and K** turn a counter lost-update into an authentication **lockout bypass / brute-force bypass** -- **J and U** silently **resurrect a revoked consent** — a PSD2 compliance breach -- **A and S** produce phantom account balances — direct financial loss - ---- - -## Verified-Real Hazards Without Standalone Tests - -These were confirmed real by source audit but are deliberately not given standalone tests (the -reason is noted to make coverage gaps explicit, not silent). - -| ID | Hazard | Reason not tested | -|---|---|---| -| M | `getOrCreateSystemView` duplicate | Same `saveMe`-without-`tryo` root cause as N/O; system views are pinned to a global whitelist via `ViewDefinition.beforeSave` — deleting one would pollute other suites. **N** exercises the identical path on an isolated key. | -| P | `factoryResetSystemView` concurrent reset | Drives `ViewPermission.resetViewPermissions` insert — the exact code **O** already pins. | -| migrateViewPermissions | duplicate `ViewPermission` insert | Same insert-without-`tryo` root cause as **O**. | -| Q | `revokeAccess` vs `grant` check-then-act | Same `AccountAccess` check-then-act family as **R**; the window is narrow → non-deterministic barrier test would be flaky (false-green). The class is proven by **R**. | -| T | `createTransactionRequestBulk` per-leg balance | Verdict: unconfirmed intra-request self-race. `saveTransaction` mutates the passed object's `accountBalance` field — sequential legs may see the updated value, not a stale one. Writing a possibly-false test was rejected. | -| V | Berlin Group `usesSoFarTodayCounter` lost-increment | Same counter lost-update class as H/K; requires fully-signed recurring BG consent + TPP headers — disproportionate setup for a class already proven. | -| X | Consumer rate-limit `underConsumerLimits` TOCTOU | Real and high-impact (limit bypass), but active-limit lookup is cached ~1 hour → HTTP-layer timing unreliable → would be flaky. | -| Y | `AuthRateLimiter` cold-start SET-vs-INCR collision | Same rate-limit class as X; runs in shadow mode by default. Same flakiness concern. | -| Z | `MappedAgentProvider.updateAgentStatus` | Re-audited as **not a hazard**: sets both fields and `saveMe()`s the whole row — normal last-writer-wins PUT semantics, not field tearing. | - ---- - -## Refuted by Audit (Genuinely Safe) - -| Symbol | Why safe | -|---|---| -| `createAccountIfNotExisting` (`LocalMappedConnectorInternal.scala:283`) | The whole `find()`-then-`create()` is wrapped in `tryo`; the `UniqueIndex(bank, theAccountId)` violation is caught and converted to `Empty`/`Failure`. The caller handles `Empty` gracefully. This is the correct pattern that I/L/N/O are missing. | - ---- - -## Fix Patterns - -When fixing a confirmed hazard, the corresponding test flips from red to green automatically. - -| Hazard shape | Recommended fix | -|---|---| -| **lost-update** (balance, counter, consent status) | Atomic `UPDATE … SET x = x + delta WHERE pk = ?` (raw SQL) or optimistic-lock version column with retry | -| **check-then-insert** (no unique index) | Add `UniqueIndex` on the natural key, then wrap the insert in `tryo` and re-fetch on `Failure` | -| **unique-constraint-unhandled** | Wrap the existing `.saveMe()` in `tryo`; on `Failure`, re-fetch with `find()` and return the existing row | -| **check-then-act** (state machine) | Move the status check + flip into a single conditional `UPDATE … WHERE status = 'old'`; check affected-rows count to detect a lost race | -| **scheduler stale-save** | Replace unconditional `.save()` with a conditional `UPDATE … WHERE status = 'expected_status'`; skip if 0 rows updated | From 5fdc6b1547de63eae21b6bccb481d59ce404dec5 Mon Sep 17 00:00:00 2001 From: hongwei Date: Thu, 11 Jun 2026 17:25:02 +0200 Subject: [PATCH 4/4] docs: add ScalaTest simulation plan for concurrent race condition hazards --- ...2026-06-11-10-22-obp-api-api-tidy-robin.md | 141 ++++++++++++++++++ 1 file changed, 141 insertions(+) create mode 100644 obp-api/src/test/scala/code/concurrency/2026-06-11-10-22-obp-api-api-tidy-robin.md diff --git a/obp-api/src/test/scala/code/concurrency/2026-06-11-10-22-obp-api-api-tidy-robin.md b/obp-api/src/test/scala/code/concurrency/2026-06-11-10-22-obp-api-api-tidy-robin.md new file mode 100644 index 0000000000..5b7dc8b3ad --- /dev/null +++ b/obp-api/src/test/scala/code/concurrency/2026-06-11-10-22-obp-api-api-tidy-robin.md @@ -0,0 +1,141 @@ +# OBP-API Concurrency Race Hazards — ScalaTest Simulation Plan + +> Date: 2026-06-11 · Branch: develop-obp · Testing preparation only (no business code fixes) + +## Context (Why are we doing this?) + +The API layer of OBP-API has been fully migrated to http4s, adopting the "1 HTTP request = 1 DB transaction" model (`RequestScopeConnection.withBusinessDBTransaction`, fully effective only for v7 native POST/PUT/DELETE; commit happens in the `guaranteeCase` after the response is generated). The underlying data access layer mixes Lift Mapper (for business writes), Doobie (for some queries), and a shared HikariCP connection pool (`autoCommit=false`). + +Through investigation by 3 Explore agents + line-by-line verification, it is confirmed that this mechanism **relies entirely on optimistic concurrency + DB constraints**. There are **no** `SELECT ... FOR UPDATE`, pessimistic locks, or optimistic lock version columns anywhere in the code, nor is there any explicitly set transaction isolation level (using the DB defaults, which is READ COMMITTED for both H2/Postgres). Therefore, multiple classic concurrent write hazards exist: +**Lost updates on balances, double-spending on transaction request state machines, and check-then-insert duplicate creations.** + +The goal of this plan is to **turn these theoretical hazards into runnable, observable evidence using a set of concurrent ScalaTests**. +As per the user's decision: **we will comprehensively cover 6-8 race conditions** and **assert the "theoretically correct behavior"** (if the current code has hazards, it will FAIL and print "expected vs actual"; the red light is the evidence). Fixing these issues (adding locks/unique constraints/optimistic locks) will be a separate follow-up task; business code will remain untouched for now. + +## Verified Hazard Checklist (Code-level Evidence) + +| # | Hazard | Code Location (Verified) | DB Protection | Exposure Value | +|---|---|---|---|---| +| A | **Balance lost updates** | `LocalMappedConnectorInternal.scala:510-513` `saveTransaction`: `read balance → +amount → .save()` full row write | **None** | Highest (Funds) | +| B | **Transaction Request double-spending** | `Http4s210.scala:502-504` check `status=="INITIATED"` → pay → update status, no atomicity | **None** | Highest (Funds) | +| C | **Entitlement duplicate grant** | `MappedEntitlements.scala:159-176` no find-first; `:264 dbIndexes=UniqueIndex(mEntitlementId)` UUID only | **None** | High (Permissions) | +| D | **Account holder duplicate creation** | `MapperAccountHolders.scala` getOrCreate find-then-create; `:34 Index(...)` **Non-unique** | **None** | High | +| E | **Consent state machine race** | `MappedConsent.scala:33-40` `updateConsentStatus` calls `saveMe` directly, no current status validation | Partial | Medium | +| F | **Counterparty metadata concurrency** | `MapperCounterparties.scala:71-88` check-then-insert; **BUT `:443 UniqueIndex(counterpartyId)` exists** | **Yes** | Medium (Test elegant conflict handling) | +| G | **Pool exhaustion / Cross-request crosstalk** | `RequestScopeConnection.scala:113-116` `childValue=null` prevents crosstalk; pool default is 20 | — | Medium (Mechanism layer) | + +> **Correcting an agent false positive**: Explore agent 2 reported that counterparty metadata had "no UNIQUE constraint". In reality, `MapperCounterparties.scala:443` has a `UniqueIndex(counterpartyId)`. Thus, for point F, the DB will block the second insert, and the race condition will manifest as "the second request gets a constraint conflict" instead of silent duplication → The test is modified to verify whether the application gracefully handles the conflict (without throwing a 500 error). + +## Testing Architecture + +### 1. Infrastructure (base trait + helpers) +Create a new `code.concurrency` package. The core trait reuses existing testing mechanisms: + +```scala +package code.concurrency + +import code.setup.{DefaultUsers, ServerSetupWithTestData, APIResponse} +import scala.concurrent.{Await, Future} +import scala.concurrent.duration._ +import org.scalatest.Tag + +// Exclusive tag: these tests are expected to FAIL (exposing hazards), must be isolated from the CI main flow +object ConcurrencyRace extends Tag("code.concurrency.ConcurrencyRace") + +trait ConcurrentRaceSetup extends ServerSetupWithTestData with DefaultUsers { + // Fire n requests concurrently and wait for all (using Future[APIResponse] returned by makePostRequestAsync) + def fireConcurrently(n: Int)(mk: Int => Future[APIResponse]): List[APIResponse] = + Await.result(Future.sequence((0 until n).map(mk)), 60.seconds).toList + + // Read Mapper directly to assert true DB state (bypassing cache/HTTP) + def accountBalance(bankId: String, accountId: String): Long = + MappedBankAccount.find(By(MappedBankAccount.bank, bankId), + By(MappedBankAccount.theAccountId, accountId)).map(_.accountBalance.get) + .openOrThrowException("account not found") +} +``` + +Reusing off-the-shelf assets (**Do not reinvent the wheel**): +- Requests: `SendServerRequests.makePostRequestAsync / makeGetRequestAsync` (`:180/:240`, returns `Future[APIResponse]`) +- Authentication: `user1..user4` from `DefaultUsers` (OAuth) + `<@` signatures; or `DirectLogin` header (as used in `Http4s700RoutesTest`) +- Data Setup: `ServerSetupWithTestData.beforeEach` already creates bank/account (initial balance 900000000); + For transfers, refer to `v4_0_0/TransactionRequestsTest.scala` (SANDBOX_TAN); + For permissions, refer to `v2_0_0/EntitlementTests.scala`; For state machines, refer to `v2_1_0/TransactionRequestsTest.scala` +- Direct Data Creation: Provider layer `CustomerX.customerProvider.vend.addCustomer(...)`, etc. (idiomatic for v7 tests) + +### 2. Test Suite Grouping (6-8 scenarios, divided into 3 tiers) + +**Tier 1 — HTTP concurrency, fund/data integrity (Highest Value)** +`ConcurrentTransferRaceTest` (tag `ConcurrencyRace`) +- **A Balance lost updates**: Concurrent N=10 SANDBOX_TAN transfers of `amount` out of the same account, + Assert `accountBalance == initial - N*amount` (when an update is lost, actual will be > expected, less deducted). +- **B State machine double-spending**: Create 1 TR (INITIATED) → Concurrent N answer-challenge on the same `TR_ID`, + Assert "only 1 succeeds + only deducted once + only 1 MappedTransaction entry" (double-spending will cause multiple entries/deductions). + +**Tier 2 — Duplicate creations / State machines** +`ConcurrentDuplicateCreationTest` (tag `ConcurrencyRace`) +- **C Entitlement**: Concurrent N identical `(userId, bankId, roleName)` grants, + Assert `MappedEntitlement.count(By...)==1` (actual might be N). +- **D Account holder**: Concurrently trigger holder creation for the same `(user, bank, account)` (via grant account access), + Assert `MapperAccountHolders.count==1`. +- **E Consent state machine**: Concurrent N answers to the same consent, assert state transition is valid + side-effect executes only once. +- **F Counterparty metadata (corrected)**: Concurrent initial access to the same counterparty, + Assert "doesn't throw 500, ends up with exactly 1 record" — verifying that the DB `UniqueIndex(counterpartyId)` conflict is handled gracefully by the application layer. + +**Tier 3 — Underlying transaction/connection mechanism (Mechanism layer, optional)** +`ConcurrentConnectionMechanismTest` (tag `ConcurrencyRace`) +- **G1 Connection pool queuing without deadlock**: Inside `beforeEach`, `setPropsValues("hikari.maximumPoolSize"->"3")`, + Concurrent N=5 requests, assert all complete without a 30s timeout (verifying queuing rather than deadlocking); auto-restored in `afterEach`. +- **G2 Cross-request connection crosstalk**: Send two batches of requests sequentially, assert that the second batch reads the data it wrote itself, + and does not read 0 rows due to `currentProxy` crosstalk (regression testing for the `RequestScopeConnection.childValue=null` safeguard). + +## Key Engineering Constraints (Must be handled within tests) + +1. **H2 in-memory limitations (Most important, label honestly)**: The test DB is H2 (`test.default.props`). + - Balance lost updates/duplicate creations are **application-layer** read-modify-write / check-then-insert issues, **independent of DB isolation levels**, + as long as both requests' "reads" happen before the other's "write commits", they can be reproduced → H2 **can** reproduce them. + - However, H2's table-level locks might serialize some writes, lowering the reproduction probability. Countermeasures: ① Concurrency N≥8; ② If necessary, when instrumentation on the request path is impossible, increase N or repeat rounds; ③ **Print actual observed values when assertions fail** ("expected balance X, actual Y, lost Z transfers"), the red light is the evidence. + - Honest conclusion phrasing: Reproduced in H2 → Postgres will definitely have it (maybe even worse); Not reproduced in H2 ≠ Postgres is safe. + +2. **dispatch HttpClient pool pollution**: Concurrent sharing of `Http.default` sporadically causes `"invalid version format"`, + A retry-once fallback is already in place (`SendServerRequests.scala:154`). → Keep concurrency N around 5-10, tolerate sporadic retries. + +3. **Shared server/DB/pool (`forkMode=once`)**: All suites share a single H2 + Hikari pool. + → Isolate using dedicated prefixes for bank/account/user; cleanup using `wipeTestData()` in `afterEach`; Changing pool size for G1 must use `setPropsValues` (`PropsReset` automatically restores it in `afterEach`, preventing leaks to other suites). + +4. **Red light isolation (Assertion stance = necessary companion to exposing hazards)**: These tests are **expected to FAIL**. + - Tag all with `ConcurrencyRace`. + - Running manually locally: `-n code.concurrency.ConcurrencyRace` (only run these diagnostic tests). + - **Exclude** from CI main flow: The catch-all shard will automatically pick up the `code.concurrency` package → You must add `-l code.concurrency.ConcurrencyRace` (ScalaTest exclude tag) to the CI scalatest invocation, otherwise the catch-all shard will go red. + (This CI change is listed as a follow-up; we will deliver the tests themselves first.) + +5. **request-scoped transaction scope**: Full transactions apply only to v7 native POST/PUT/DELETE; v1-v6 routed through the bridge are committed independently per `DB.use` (the race window is more obvious). The race condition happens at the connector layer, independent of the API version → Just pick any practically available endpoint. + +## File Checklist + +Added (tests only, no business code touched): +- `obp-api/src/test/scala/code/concurrency/ConcurrentRaceSetup.scala` — base trait + `ConcurrencyRace` tag + helpers +- `obp-api/src/test/scala/code/concurrency/ConcurrentTransferRaceTest.scala` — A Balance / B State machine double-spending +- `obp-api/src/test/scala/code/concurrency/ConcurrentDuplicateCreationTest.scala` — C/D/E/F Duplicate creation and state machines +- `obp-api/src/test/scala/code/concurrency/ConcurrentConnectionMechanismTest.scala` — G Connection pool/crosstalk (optional tier) + +Package `code.concurrency` → Falls into the CI catch-all shard (requires exclude tag configuration, see Constraint 4). + +## How to Verify (How to run) + +1. Compile and run this batch of diagnostic tests independently (local, JDK 11): + ```sh + env JAVA_HOME=$JDK11 PATH=$JDK11/bin:$PATH \ + mvn -q -pl obp-commons,obp-api process-resources scalatest:test \ + -DfailIfNoTests=false \ + -Dsuites="code.concurrency.ConcurrentTransferRaceTest code.concurrency.ConcurrentDuplicateCreationTest code.concurrency.ConcurrentConnectionMechanismTest" + ``` +2. Read the output: The failure message for each race scenario should clearly print "expected vs actual" (e.g., "expected balance 899999...000, got 899999...XXX, lost N transfers"), the red light acts as evidence of the hazard. +3. If a scenario sporadically goes green on H2 (masked by table lock serialization), increase the concurrency N / repetition rounds, and annotate the H2 limitation in the test comments. +4. Confirm no pollution to other suites: `afterEach` cleanup + `PropsReset` pool size restoration. + +## Next Steps (Not doing now, just recording the direction) + +Fix directions (separate PR): A/B → `SELECT ... FOR UPDATE` or optimistic lock version column + state machine atomic CAS (INITIATED→PROCESSING); +C/D → Add `UniqueIndex` + application layer catches constraint conflict and returns gracefully; F → Application layer catches existing unique conflict. +The red-light tests produced by this plan will serve as the regression baseline for these fixes.