feat: distributed FaaS coordination heuristics (FaaS-MADiG / MAPoD / MABR) + coordination rework#2
feat: distributed FaaS coordination heuristics (FaaS-MADiG / MAPoD / MABR) + coordination rework#2miciav wants to merge 108 commits into
Conversation
All quality gates pass: - 83 tests pass (18 hierarchical-specific) - ruff: clean - mypy: clean - coverage: 51% (hierarchical_auction core: 87-96%)
…are after hierarchical levels - engine: broadcast service_quantum per-function, skip seller==buyer, sort candidates by effective bid, compute quantity = min(want, tokens*quantum) - runner: extract compute_offloaded_demand(), initialize rmp_omega, recompute compute_social_welfare after hierarchical allocations, pass rmp_omega to check_stopping_criteria - token_manager: preserve quantity ratio on partial token acceptance - tests: +5 tests for service quantum, no self-allocation, seller preference, offloaded demand, partial acceptance ratio
…unction Removed the fragile zero-sentinel guard (`if np.allclose(structure_price, 0.0)`) that prevented recomputation when the legitimate price is zero (eta=0, zero node prices). The call to compute_structure_price is now made once per structure, immediately before the inner per-function loop, making intent explicit and safe. Added regression test for a two-function zero-price network. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… production Add _extract_latency helper using nx_adjacency_matrix with network_latency weight, call it once before the time loop, and pass the result to both define_bids and run_higher_levels (replacing the previous np.zeros placeholders). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… call Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…blic exports Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ze loops, cache available tokens, move engine out of loop
…LI parsing Brings coverage from 52% to 60% (+406 covered statements). Highlights: - models/sp.py 44→86%, models/auction_models.py 45→84% - generate_data.py 49→76%, run_centralized_model.py 37→63% - what_if_analysis.py 33→53%, run_faasmacro.py 26→35% Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add config_files/planar_hierarchical.json for running the hierarchical auction model on Sage-generated planar degree-3 graphs (Nn 10-50, 3 repetitions). Document the workflow and conda install requirement in README. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- fix: omega_bar and y_bar params use PYO_PARAM_TYPE (NonNegativeReals) instead of PYO_VAR_TYPE — solver outputs can be fractional - fix: import PYO_PARAM_TYPE in models/sp.py - fix: use nx. prefix for circular_ladder_graph and adjacency_matrix in generators/generate_data.py after merge removed explicit imports - fix: add hierarchical termination condition format to postprocessing parser in run.py (missing obj. deviation / best it fields) - fix: remove undefined title_key references in rlagents/postprocessing.py - feat: add pre-commit ruff hook (pre-push stage) - test: regression tests for omega_bar/y_bar float domain and missing import - config: update planar_hierarchical.json load to sinusoidal trace type Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- hierarchical runner now saves runtime.csv with 'tot' column so that results_postprocessing can read it without falling back to FaaS-MACrO log parsing - fix deviation append to handle None (not just the string "None") in load_termination_condition for hierarchical TC format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace planar_hierarchical.json with planar_comparison.json covering centralized, faas-macro, and hierarchical on planar degree-3 graphs. Update README accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…quisites cmd_run (Inventory -> Project -> jobs -> Dispatcher -> run_batch -> Manifest) had no automated coverage, only cmd_define and arg parsing were tested. Adds a fake Dispatcher (context-manager variant of the existing FakeDispatcher test double from test_remote_experiments_runner.py) to exercise the full wiring without real Ray/SSH, and asserts the resulting Manifest reflects a completed run. README was missing the ../ray-dispatcher sibling-checkout requirement (a uv path dependency), VM prerequisites (SSH + licensed Gurobi), and --project-path defaults/excludes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Added 13 tasks, TDD throughout, individually reviewed + a final whole-branch review (one fix-wave addressed: added Depends on a small prerequisite addition to See |
The centralized-feasibility series restricted the hierarchical auction's offloading to neighbours (no ping-pong), but left the replica-acquisition path (start_additional_replicas) greedily filling residual memory to receive offload that the restriction now keeps from arriving. The leftover replicas made the combined solution violate utilization_equilibrium2, and since combine_solutions validates every auction iteration, the run aborted on the first such intermediate state. This was a regression: the pre-fix code produced 0 utilization_equilibrium2 violations on the affected instances. combine_solutions now sets r = ceil(served_utilization / max_utilization) — the centralized model's own replica equilibrium — for the realized served load (local + received offload). r does not enter the welfare objective (alpha*x + beta*y - gamma*z), so this is an objective-neutral feasibility repair: it frees the wasted replicas, never increases memory use, and satisfies utilization_equilibrium and utilization_equilibrium2 by construction. For FaaS-MACrO the MILP already pins r to this value, so the recomputation is a no-op there. Verified: the three previously-crashing planar instances now run to completion with centrally-feasible solutions and hierarchical objective <= centralized at every step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Summary
This branch adds a family of price-free distributed heuristics for DiFRALB/DeFRALB and reworks the shared coordination logic across all three for correctness, determinism, and consistency. The heuristics form a spectrum of coordination styles, all built as controlled ablations of the FaaS-MADeA auction:
d-choices (removes full visibility; buyer probes onlydsampled neighbours per step, serves best of sample).All keep the local planning stack and shared helpers, reuse the same seller-side clearing, and isolate exactly one mechanism each (pricing → visibility → sequential-vs-simultaneous coordination).
Validation:
uv run pytest -q→ 270 passed. FaaS-MABR e2e runs under real Gurobi (smoke + same-seed reproducibility for all three variants).FaaS-MABR (Gauss-Seidel best response) —
decentralized_bestresponse.pynew_row − previous_row). Later nodes observe earlier nodes' updates through the live ledger — the defining Gauss-Seidel property — and the loop converges to a fixed point.allocation_changed(no node revised its row this sweep), the correct signal under release-and-recompute (raw placement volume never settles).FaaS-MABR-S(fixed order),FaaS-MABR-R(seeded random order; variance via--n_experiments),FaaS-MABR-O(capped local re-optimization viaLSP_capped/LSP_capped_fixedrand a per-node re-solve).compute_sweep_runtime(re-optimization time excluded before amortizing bookkeeping over active nodes); input validation fororder/response.faas-br-s/-r/-o, method namesFaaS-MABR-S/-R/-O(mkeyLSPc),compare_results.pypalette + default set,planar_comparison.jsonbr_*blocks. Paper-ready LaTeX note underfaas-bestresponse-note/positioning it honestly as the textbook Gauss-Seidel relaxation (not claimed novel).Cross-method coordination rework (diffusion / powerd / bestresponse)
Landed in lockstep across the three runners so they stay mutually consistent:
rho[j] >= memory_requirement[f](wasrho[j] > 0).(score, index)tie-breaks replace unstablenp.argsort;evaluate_assignmentsreworked —seller_pairsnow includes current hosts (so saturated sellers can still be re-evaluated for incumbent replacement), leftover-aware replica start, and lowest-score-incumbent-first reassignment.LSPr_fixedrunder--fix_r(social-welfare re-solve now fixes replicas consistently with the subproblem);best_centralized_costinitialized to-inf(fixes a latent bug where a negative centralized objective could never set the initial best).coordination_rhozeroes memory/replica expansion under--fix_r(no new replicas when replicas are pinned).force_memory_bidsparity in the block-A memory-bid emission; vectorizedrmp_omega/omega/fairnessupdates.Shared
run_faasmadea.start_additional_replicas: deterministic proportional allocation plus a leftover-memory packing pass (the old per-function floor division left memory unused).Also on this branch
faas-madig-note/,faas-mapod-note/— 5/5 cited works verified against CrossRef, PDFs committed).run.py: method→(mkey, name)mapping flattened into a singleMETHOD_RESULT_MODELSdict.Heads-up for reviewers / reproducibility
The determinism rework of
define_assignments/evaluate_assignmentsand the leftover-packing instart_additional_replicaschange the numeric outputs of the existing baselines (FaaS-MADeA, FaaS-MADiG, FaaS-MAPoD), not just the new method — the changes are more correct and deterministic, but any benchmark CSVs/figures produced before this rework are now stale and should be regenerated for an apples-to-apples three-way comparison.🤖 Generated with Claude Code