Skip to content

Enforce exhaustive enum handling in AML decision logic #3640

@TaprootFreak

Description

@TaprootFreak

Summary

Refactor the AML decision pipeline so that every value of decision-driving enums (RiskStatus, UserDataStatus, KycStatus, PhoneCallStatus, …) is forced — at compile time — to map to an explicit AML outcome. Today the logic is a long if-chain over getter properties (isBlocked, isRiskBlocked, isSuspicious, …), which makes it possible to add a new enum value without any handler. PR #3638 fixed one concrete instance of this class of bug, but the architectural gap that allowed it remains.

This issue proposes a layered fix: a Record<Enum, AmlError | null> pattern enforced by tsc --noEmit (already in CI as npm run type-check), plus a Kuma-based production invariant monitor as a universal backstop.

Background — the bug that triggered this

PR #2389 (DEV-4326 userData RiskStatus, merged 2025-08-27) introduced a SQL-level filter in the buy-crypto and buy-fiat AML preparation services:

transaction: { userData: { riskStatus: Not(RiskStatus.SUSPICIOUS) } },

The intent — keep suspicious users out of the auto-approval path — was correct. The implementation was not: there was no corresponding rule in `AmlHelperService.getAmlErrors` that would assign `SUSPICIOUS` users an AML outcome. Result: any `buy_crypto` / `buy_fiat` whose user is flagged `Suspicious` between deposit arrival and the AML cron tick gets silently filtered out, ends up with `amlCheck = NULL` and `status = Created`, and stays there indefinitely.

Concrete production incident: `buy_crypto 120973` (35'000 CHF, userDataId 383865, transaction 312001, bank_tx 197696), stuck since 2026-04-28 07:29 UTC. It was the only entry in the entire DB with `amlCheck IS NULL AND status = 'Created'`. The bug had been latent in develop for ~8 months before it hit a real customer.

The 14-day expired-pending reaper in `aml-helper.service.ts:642` does not cover this case — it only acts on entries already in `PENDING`, never on `null`.

PR #3638 fixed the specific symptom by adding a `USER_DATA_SUSPICIOUS` rule that mirrors the existing `USER_DATA_BLOCKED` handling. But the same anti-pattern (filter at query level + missing handler in the rule chain) can occur in other places and for other enums. This issue addresses the architectural cause.

Why current testing did not catch it

  • No test exists for `doAmlCheck` that exercises each `RiskStatus` value.
  • The only AML test (`aml-helper.service.spec.ts`) covers a single feature flag (`isTrustedReferrer`).
  • TypeScript already enforces exhaustiveness for `AmlError → AmlErrorResult` via the mapped type `{ [b in AmlError]: ... }` — but no equivalent enforcement exists for `RiskStatus → AmlError` (or any other input enum). Adding a new enum value compiles cleanly even if no handler is defined.
  • The bug class is invisible to unit tests because the failure is ambient (entry never updated) rather than an exception.

Proposed solution — layered

Tier 1 — Record-map + compile-time exhaustiveness for AML input enums (this issue)

Replace the current ad-hoc if-chain in `AmlHelperService.getAmlErrors` (`aml-helper.service.ts:78–82` and follow-ups at lines 221, 391) with explicit `Record<Enum, AmlError | null>` maps:

```ts
// src/subdomains/core/aml/enums/aml-policy.ts (new file)
import { AmlError } from './aml-error.enum';
import { RiskStatus, UserDataStatus, KycStatus } from '...';

export const RiskStatusAmlError: Record<RiskStatus, AmlError | null> = {
[RiskStatus.NA]: null,
[RiskStatus.RELEASED]: null,
[RiskStatus.BLOCKED]: AmlError.USER_DATA_BLOCKED,
[RiskStatus.SUSPICIOUS]: AmlError.USER_DATA_SUSPICIOUS,
[RiskStatus.BLOCKED_BUY_CRYPTO]: AmlError.USER_DATA_BLOCKED, // overridden per-flow, see below
[RiskStatus.BLOCKED_BUY_FIAT]: AmlError.USER_DATA_BLOCKED,
};

export const UserDataStatusAmlError: Record<UserDataStatus, AmlError | null> = {
[UserDataStatus.NA]: null,
[UserDataStatus.ACTIVE]: null,
[UserDataStatus.BLOCKED]: AmlError.USER_DATA_BLOCKED,
[UserDataStatus.DEACTIVATED]: AmlError.USER_DATA_DEACTIVATED,
// ...every value must be present, or tsc fails
};
```

`getAmlErrors` then becomes:

```ts
const riskError = RiskStatusAmlError[entity.userData.riskStatus];
if (riskError) errors.push(riskError);

const statusError = UserDataStatusAmlError[entity.userData.status];
if (statusError) errors.push(statusError);
```

The `Record<Enum, X>` type forces every enum member to appear in the map — a missing key is a `tsc` error. The existing `AmlErrorResult` mapping (`aml-error.enum.ts:91`) already proves the pattern works in this codebase and is idiomatic.

Flow-specific variants (`isRiskBuyCryptoBlocked`, `isRiskBuyFiatBlocked` at lines 221/391) require either two separate maps (`RiskStatusBuyCryptoAmlError`, `RiskStatusBuyFiatAmlError`) or a single `Record<RiskStatus, { generic, buyCrypto, buyFiat }>` map. Recommend the second — keeps all riskStatus knowledge in one place.

Tier 2 — Apply the same pattern to other AML input enums

Once the pattern is in place for `RiskStatus`, extend to:

  • `UserDataStatus` (currently `isBlocked`, `isDeactivated`, `isPaymentStatusEnabled` getters in `user-data.entity.ts`)
  • `KycStatus` (currently `isPaymentKycStatusEnabled` getter)
  • `PhoneCallStatus` (currently nested ternaries at `aml-helper.service.ts:70–76` and `120–126` — already hard to read, easy to break on enum change)
  • `AccountType` if it gates AML decisions

Each migration follows the same shape: extract an `AmlError | null` Record, replace the matching getter call in `getAmlErrors`, delete the now-unused getter if it has no other callers.

Tier 3 — Production invariant monitor (parallel, low cost, universal backstop)

Add a Kuma push monitor (or a scheduled API job) that fires when:

```sql
SELECT COUNT(*) FROM buy_crypto
WHERE amlCheck IS NULL AND status = 'Created' AND created < DATEADD(hour, -2, GETDATE());
-- and identical for buy_fiat
```

returns `> 0`. This is independent of the type system — it catches any future filter-without-handler bug, in any pipeline, in any service, even ones the type-level work above does not cover. ~30 min of setup, recurring zero cost.

What we explicitly do not propose

  • Mutation testing (Stryker): too slow for per-PR CI, requires substantially more existing test coverage to be useful.
  • Property-based testing (fast-check): large initial investment in fixture infrastructure; premature given current AML test coverage.
  • Pure parametric runtime tests as the only defense: the repo's AML test setup is sparse and fragile fixtures would push the bug detection back to test-quality questions rather than guarantees.

Implementation plan

  1. Create `src/subdomains/core/aml/enums/aml-policy.ts` with `RiskStatusAmlError` (covering generic + per-flow variants).
  2. Refactor `AmlHelperService.getAmlErrors` (`aml-helper.service.ts:78–82`) to consume the map. Drop the if-chain for risk-status and the duplicated checks at lines 221, 391.
  3. Verify all callers of `UserData.isSuspicious`, `isRiskBlocked`, `isRiskBuyCryptoBlocked`, `isRiskBuyFiatBlocked` getters and decide whether to keep the getters (used by Auth guard `user-active.guard.ts`) or inline. Default: keep the getters, they read well at call sites that are not the AML rule chain.
  4. Repeat for `UserDataStatus` and `KycStatus` (separate commits, easier review).
  5. Add the Kuma monitor for stuck `amlCheck IS NULL` rows. Coordinate with infra — see status.dfx.swiss / the Kuma instance referenced internally.
  6. Update `CONTRIBUTING.md` — add a short section under "AML / decision logic" stating: "All decision-driving enums must dispatch via a `Record<Enum, …>` map; never via SQL-level filtering of enum values."

Acceptance criteria

  • `RiskStatusAmlError` (or equivalent) exists; adding a hypothetical new `RiskStatus` value fails `npm run type-check` until handled.
  • `AmlHelperService.getAmlErrors` no longer contains the SQL-level enum filter pattern (`Not(RiskStatus.X)` etc.) for AML input dispatch.
  • At least one decision-driving enum besides `RiskStatus` (`UserDataStatus` recommended first) migrated to the same pattern.
  • Kuma alarm exists for `amlCheck IS NULL AND status = 'Created' AND created < now()-2h` on `buy_crypto` and `buy_fiat`. Triggering it in DEV produces an alert.
  • CONTRIBUTING.md documents the dispatch convention.
  • No regression: `buy_crypto 120973` and analogous cases continue to receive a non-null `amlCheck` after deploy.

References

  • Bug fix PR: fix: route AML check for suspicious users to GSheet review #3638 (route AML check for suspicious users to GSheet review)
  • Original regression: PR [DEV-4326] userData RiskStatus #2389 (DEV-4326 userData RiskStatus), merged 2025-08-27
  • Affected files (current locations):
    • `src/subdomains/core/aml/services/aml-helper.service.ts` (rules at lines 78–82, 221, 391)
    • `src/subdomains/core/aml/enums/aml-error.enum.ts` (existing `AmlErrorResult` mapped type — pattern reference)
    • `src/subdomains/core/aml/enums/aml-reason.enum.ts`
    • `src/subdomains/core/buy-crypto/process/services/buy-crypto-preparation.service.ts`
    • `src/subdomains/core/sell-crypto/process/services/buy-fiat-preparation.service.ts`
    • `src/subdomains/generic/user/models/user-data/user-data.entity.ts` (getters used today)
    • `src/shared/auth/user-active.guard.ts` (other consumer of riskStatus getters)
  • Production incident: buy_crypto 120973 / userDataId 383865 / bank_tx 197696

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions