Dev/jodavis/initial commits for new ml pipeline by jodavis · Pull Request #239 · jodavis/AdaptiveRemote

jodavis · 2026-06-28T14:06:29Z

Commit changes up to ADR-223 to the new branch. No changes are needed for these tasks.

…be used for reference only.

…core/ - sample.py: Sample(ABC), SampleWithPath(Sample,ABC), TextSample, AudioSample, SampleSpectrogram, SampleTokens dataclasses with all spec fields. - manifest.py: Manifest[S] generic with O(1) by_id/by_content_hash lookups; ManifestStore.read() dispatches on sample_type; write() emits version-1 JSON. TextSample JSON omits path/parent_content_hash/applied_values; Spectrogram and Tokens JSON omits applied_values and includes parent_id per spec. - pyproject.toml: minimal pytest config so plain `pytest` from ml/ works. - Ambiguity resolutions: mutable dataclasses (no frozen=True); seed not a default field (callers pass seed=0 explicitly, matching spec's "PhraseVariator sets seed=0"). https://claude.ai/code/session_01DLVaMxWuhm75L3YaMdVTFT

1. Manifest.__init__: raise ValueError on duplicate sample ids (UUIDs must be unique); keep first occurrence for duplicate content_hash (documented contract — deterministic stages can produce identical hashes). 2. ManifestStore.read(): use data.get("sample_type") and raise ValueError explicitly when the key is absent, matching the defensive pattern used for "version". 3. ManifestStore.write(): validate that all samples share the same type before writing; a mixed-type Manifest would produce a JSON file whose sample_type header mismatches later entries, causing read() to KeyError. Adds four new tests covering each guard. https://claude.ai/code/session_01DLVaMxWuhm75L3YaMdVTFT

Tests use pytest's tmp_path fixture; tempfile was never called. https://claude.ai/code/session_01DLVaMxWuhm75L3YaMdVTFT

TextSample now derives its id from content_hash via __post_init__, so callers no longer pass id= explicitly. This means two TextSamples with the same content always share the same id, enforcing the deduplication invariant at the type level. Update ManifestStore._deserialise to omit id= when constructing TextSample, and update tests to match: duplicate-id and duplicate-content-hash tests now use AudioSample (where id and content_hash are independent) to exercise those code paths. https://claude.ai/code/session_01DLVaMxWuhm75L3YaMdVTFT

* ADR-222: Implement seed-based randomisation engine (PassFilter, VariationGenerator) Adds ml/pipeline/core/randomization.py with: - PassFilter ABC (density, sample_domain) - MinMaxFilter: uniform rejection over [min, max] - NormalFilter: Gaussian density normalised to peak 1.0; rejects std_dev <= 0 - VariationGenerator: should_vary, generate (float rejection-sampling), generate_int (bitmask rejection), choose (direct index-0 hash) All values derived from sha256 with deterministic key patterns; no random module. generate_int special-cases range==0 to return min_val immediately. 46 unit tests covering determinism, exact values, boundary conditions, ValueError after 1000 attempts, probability convergence, and stability across MinMaxFilter range widening/narrowing using concrete seed values. * Address code review: add input validation and strengthen independence tests * VariationGenerator.generate() uses a power-of-2 modulo algorithm to improve variable value stability as parameters change * Add Python pytest to CI and validate-tests scripts * Add error handling for cd ml in validate-tests scripts --------- Co-authored-by: Joe Davis <ElwoodMoves@hotmail.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

* Use dev-team plugin from feature branch * ADR-223: Implement ModifierStage abstract base class Adds ModifierStage[T_in, T_out] with the three-case transform() algorithm (skip / regen-with-stored-seed / new sample), _compute_content_hash() static helper, GC of orphaned output files, and manifest read/write integration. Also adds randomization.py (ADR-222 prerequisite) containing PassFilter, MinMaxFilter, NormalFilter, and VariationGenerator — required for modifier_stage.py to compile and for the regen path to construct generators from stored seeds. Key decisions: - ValueError from ManifestStore.read() propagates (hard error); no fallback to "no previous manifest" for unreadable/unsupported manifests. - output_dir creation is caller's responsibility; transform() does not call mkdir (consistent with DVC/entry-point owning directory setup per spec). - GC guards with output_dir.exists() so an empty first run does not raise. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ADR-223: address review feedback Bound T_in TypeVar to Sample so mypy can verify content_hash access at all four call sites in transform(); removes four type: ignore[arg-type] suppressions. --------- Co-authored-by: Joe Davis <ElwoodMoves@hotmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-06-28T14:12:13Z

Test Results

401 tests ±0 401 ✅ ±0 2m 14s ⏱️ +9s
5 suites ±0 0 💤 ±0
5 files ±0 0 ❌ ±0

Results for commit 27c59fe. ± Comparison against base commit 8f1b8b7.

This pull request removes 3 and adds 2 tests. Note that renamed tests count towards both.

,False)
AdaptiveRemote.Services.ProgrammaticSettings.PersistSettingsTests ‑ PersistSettings_Set_ValidatesKeyNameAsync (Hello
AdaptiveRemote.Services.ProgrammaticSettings.PersistSettingsTests ‑ PersistSettings_Set_ValidatesValueAsync (Invalid

AdaptiveRemote.Services.ProgrammaticSettings.PersistSettingsTests ‑ PersistSettings_Set_ValidatesKeyNameAsync (Hello
,False)
AdaptiveRemote.Services.ProgrammaticSettings.PersistSettingsTests ‑ PersistSettings_Set_ValidatesValueAsync (Invalid
,False)

ElwoodMoves and others added 9 commits June 28, 2026 07:04

Remove local Claude settings from repo

9237364

Seed old script files from a previous pipeline attempt. These should …

60f78a2

…be used for reference only.

Spec for ADR-191: Refactor DVC pipeline into proper OOP design patterns

1a4bd7d

ADR-221: remove unused import tempfile from test_manifest.py

a6e72a2

Tests use pytest's tmp_path fixture; tempfile was never called. https://claude.ai/code/session_01DLVaMxWuhm75L3YaMdVTFT

jodavis enabled auto-merge (rebase) June 28, 2026 14:08

jodavis merged commit c6638cc into feature/adr-191-new-ml-pipeline Jun 28, 2026
5 checks passed

jodavis deleted the dev/jodavis/initial-commits-for-new-ml-pipeline branch June 28, 2026 14:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev/jodavis/initial commits for new ml pipeline#239

Dev/jodavis/initial commits for new ml pipeline#239
jodavis merged 9 commits into
feature/adr-191-new-ml-pipelinefrom
dev/jodavis/initial-commits-for-new-ml-pipeline

jodavis commented Jun 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jodavis commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 28, 2026

Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jodavis commented Jun 28, 2026 •

edited

Loading