feat: distributed FaaS coordination heuristics (FaaS-MADiG / MAPoD / MABR) + coordination rework by miciav · Pull Request #2 · unimib-datAI/DFaaSOptimizer

miciav · 2026-05-19T14:31:39Z

Summary

This branch adds a family of price-free distributed heuristics for DiFRALB/DeFRALB and reworks the shared coordination logic across all three for correctness, determinism, and consistency. The heuristics form a spectrum of coordination styles, all built as controlled ablations of the FaaS-MADeA auction:

FaaS-MADiG — greedy diffusion (removes the price signal; buyer scans its whole one-hop neighbourhood greedily by score).
FaaS-MAPoD — power-of-d-choices (removes full visibility; buyer probes only d sampled neighbours per step, serves best of sample).
FaaS-MABR-S / -R / -O — Gauss-Seidel best response (the sequential counterpart to the simultaneous diffusion methods): fixed-order, randomized-order, and capped-reoptimization variants.

All keep the local planning stack and shared helpers, reuse the same seller-side clearing, and isolate exactly one mechanism each (pricing → visibility → sequential-vs-simultaneous coordination).

Validation: uv run pytest -q → 270 passed. FaaS-MABR e2e runs under real Gurobi (smoke + same-seed reproducibility for all three variants).

FaaS-MABR (Gauss-Seidel best response) — `decentralized_bestresponse.py`

True best-response sweep: each node releases its current buyer row back to the shared residual-capacity ledger, recomputes its placement greedily by score, and commits the coordinate delta (new_row − previous_row). Later nodes observe earlier nodes' updates through the live ledger — the defining Gauss-Seidel property — and the loop converges to a fixed point.
Fixed-point termination on allocation_changed (no node revised its row this sweep), the correct signal under release-and-recompute (raw placement volume never settles).
Three variants: FaaS-MABR-S (fixed order), FaaS-MABR-R (seeded random order; variance via --n_experiments), FaaS-MABR-O (capped local re-optimization via LSP_capped/LSP_capped_fixedr and a per-node re-solve).
Runtime amortization via compute_sweep_runtime (re-optimization time excluded before amortizing bookkeeping over active nodes); input validation for order/response.
Additive wiring: CLI keys faas-br-s/-r/-o, method names FaaS-MABR-S/-R/-O (mkey LSPc), compare_results.py palette + default set, planar_comparison.json br_* blocks. Paper-ready LaTeX note under faas-bestresponse-note/ positioning it honestly as the textbook Gauss-Seidel relaxation (not claimed novel).

Cross-method coordination rework (diffusion / powerd / bestresponse)

Landed in lockstep across the three runners so they stay mutually consistent:

Memory-aware seller eligibility: a node can host a replica only if rho[j] >= memory_requirement[f] (was rho[j] > 0).
Deterministic seller clearing: explicit (score, index) tie-breaks replace unstable np.argsort; evaluate_assignments reworked — seller_pairs now includes current hosts (so saturated sellers can still be re-evaluated for incumbent replacement), leftover-aware replica start, and lowest-score-incumbent-first reassignment.
LSPr_fixedr under --fix_r (social-welfare re-solve now fixes replicas consistently with the subproblem); best_centralized_cost initialized to -inf (fixes a latent bug where a negative centralized objective could never set the initial best).
coordination_rho zeroes memory/replica expansion under --fix_r (no new replicas when replicas are pinned).
force_memory_bids parity in the block-A memory-bid emission; vectorized rmp_omega/omega/fairness updates.

Shared run_faasmadea.start_additional_replicas: deterministic proportional allocation plus a leftover-memory packing pass (the old per-function floor division left memory unused).

Also on this branch

FaaS-MADiG and FaaS-MAPoD (greedy diffusion + power-of-d), each with a design spec, plan, and citation-audited LaTeX note (faas-madig-note/, faas-mapod-note/ — 5/5 cited works verified against CrossRef, PDFs committed).
run.py: method→(mkey, name) mapping flattened into a single METHOD_RESULT_MODELS dict.
Hierarchical auction, uv migration, and extended test coverage.

Heads-up for reviewers / reproducibility

The determinism rework of define_assignments/evaluate_assignments and the leftover-packing in start_additional_replicas change the numeric outputs of the existing baselines (FaaS-MADeA, FaaS-MADiG, FaaS-MAPoD), not just the new method — the changes are more correct and deterministic, but any benchmark CSVs/figures produced before this rework are now stale and should be regenerated for an apples-to-apples three-way comparison.

🤖 Generated with Claude Code

All quality gates pass: - 83 tests pass (18 hierarchical-specific) - ruff: clean - mypy: clean - coverage: 51% (hierarchical_auction core: 87-96%)

…are after hierarchical levels - engine: broadcast service_quantum per-function, skip seller==buyer, sort candidates by effective bid, compute quantity = min(want, tokens*quantum) - runner: extract compute_offloaded_demand(), initialize rmp_omega, recompute compute_social_welfare after hierarchical allocations, pass rmp_omega to check_stopping_criteria - token_manager: preserve quantity ratio on partial token acceptance - tests: +5 tests for service quantum, no self-allocation, seller preference, offloaded demand, partial acceptance ratio

…unction Removed the fragile zero-sentinel guard (`if np.allclose(structure_price, 0.0)`) that prevented recomputation when the legitimate price is zero (eta=0, zero node prices). The call to compute_structure_price is now made once per structure, immediately before the inner per-function loop, making intent explicit and safe. Added regression test for a two-function zero-price network. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… production Add _extract_latency helper using nx_adjacency_matrix with network_latency weight, call it once before the time loop, and pass the result to both define_bids and run_higher_levels (replacing the previous np.zeros placeholders). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… call Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…blic exports Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ze loops, cache available tokens, move engine out of loop

…LI parsing Brings coverage from 52% to 60% (+406 covered statements). Highlights: - models/sp.py 44→86%, models/auction_models.py 45→84% - generate_data.py 49→76%, run_centralized_model.py 37→63% - what_if_analysis.py 33→53%, run_faasmacro.py 26→35% Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-extended-tests

Add config_files/planar_hierarchical.json for running the hierarchical auction model on Sage-generated planar degree-3 graphs (Nn 10-50, 3 repetitions). Document the workflow and conda install requirement in README. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- fix: omega_bar and y_bar params use PYO_PARAM_TYPE (NonNegativeReals) instead of PYO_VAR_TYPE — solver outputs can be fractional - fix: import PYO_PARAM_TYPE in models/sp.py - fix: use nx. prefix for circular_ladder_graph and adjacency_matrix in generators/generate_data.py after merge removed explicit imports - fix: add hierarchical termination condition format to postprocessing parser in run.py (missing obj. deviation / best it fields) - fix: remove undefined title_key references in rlagents/postprocessing.py - feat: add pre-commit ruff hook (pre-push stage) - test: regression tests for omega_bar/y_bar float domain and missing import - config: update planar_hierarchical.json load to sinusoidal trace type Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- hierarchical runner now saves runtime.csv with 'tot' column so that results_postprocessing can read it without falling back to FaaS-MACrO log parsing - fix deviation append to handle None (not just the string "None") in load_termination_condition for hierarchical TC format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace planar_hierarchical.json with planar_comparison.json covering centralized, faas-macro, and hierarchical on planar degree-3 graphs. Update README accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…quisites cmd_run (Inventory -> Project -> jobs -> Dispatcher -> run_batch -> Manifest) had no automated coverage, only cmd_define and arg parsing were tested. Adds a fake Dispatcher (context-manager variant of the existing FakeDispatcher test double from test_remote_experiments_runner.py) to exercise the full wiring without real Ray/SSH, and asserts the resulting Manifest reflects a completed run. README was missing the ../ray-dispatcher sibling-checkout requirement (a uv path dependency), VM prerequisites (SSH + licensed Gurobi), and --project-path defaults/excludes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

miciav · 2026-06-30T15:06:40Z

Added remote_experiments/: dispatches DFaaSOptimizer algorithm comparisons to remote Gurobi VMs via ray-dispatcher, with a two-phase define/run workflow, a pluggable experiment-suite registry, manifest-based stop/resume, and a rich TUI for live per-experiment/per-VM progress.

13 tasks, TDD throughout, individually reviewed + a final whole-branch review (one fix-wave addressed: added cmd_run integration test coverage and README prerequisites). 320/320 tests passing.

Depends on a small prerequisite addition to ray-dispatcher itself (Dispatcher.running_hosts(), already merged on that repo's main) for live per-VM job attribution.

See docs/superpowers/specs/2026-06-30-remote-experiments-design.md and docs/superpowers/plans/2026-06-30-remote-experiments-implementation.md.

The centralized-feasibility series restricted the hierarchical auction's offloading to neighbours (no ping-pong), but left the replica-acquisition path (start_additional_replicas) greedily filling residual memory to receive offload that the restriction now keeps from arriving. The leftover replicas made the combined solution violate utilization_equilibrium2, and since combine_solutions validates every auction iteration, the run aborted on the first such intermediate state. This was a regression: the pre-fix code produced 0 utilization_equilibrium2 violations on the affected instances. combine_solutions now sets r = ceil(served_utilization / max_utilization) — the centralized model's own replica equilibrium — for the realized served load (local + received offload). r does not enter the welfare objective (alpha*x + beta*y - gamma*z), so this is an objective-neutral feasibility repair: it frees the wasted replicas, never increases memory use, and satisfies utilization_equilibrium and utilization_equilibrium2 by construction. For FaaS-MACrO the MILP already pins r to this value, so the recomputation is a no-op there. Verified: the three previously-crashing planar instances now run to completion with centrally-feasible solutions and hierarchical objective <= centralized at every step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

miciav and others added 30 commits February 16, 2026 16:03

feat: migrate to uv and add extended unit tests

650403b

add uv quality gates and coverage tests

127541f

docs: revise hierarchical auction plan

0b620f8

feat: add hierarchical auction allocation types

08445a2

feat: add hierarchical structure graph

1e05077

feat: add cumulative hierarchical token manager

8929942

feat: add hierarchical structure pricing

e3e141e

feat: map hierarchical tokens to flows

8ca314a

feat: add hierarchical auction engine

d8c645c

feat: add hierarchical auction runner

d50dc7f

feat: register hierarchical auction method in run.py

070da5b

test: add hierarchical auction import smoke test

e69d1c4

All quality gates pass: - 83 tests pass (18 hierarchical-specific) - ruff: clean - mypy: clean - coverage: 51% (hierarchical_auction core: 87-96%)

fix: assert global feasibility invariant after each run_higher_levels…

90d3f27

… call Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor: remove dead omega assignment and bid_price field; expand pu…

0d06285

…blic exports Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: add engine early-termination and multi-level cascade tests

8e1fb00

docs: add post-review fixes plan and update GitNexus index metadata

58138ac

refactor: simplify engine/runner — remove narrating comments, vectori…

b961d7a

…ze loops, cache available tokens, move engine out of loop

fix hierarchical auction edge cases

a20cec7

add gurobi planar e2e test

8e5154e

fix: Update benchmark and test files

3985554

fix: Update benchmark and test files

2dc9c74

Merge remote-tracking branch 'origin/main' into feat/uv-migration-and…

c0e96ac

…-extended-tests

docs: rename planar config and extend to three-way comparison

34f1ec4

Replace planar_hierarchical.json with planar_comparison.json covering centralized, faas-macro, and hierarchical on planar degree-3 graphs. Update README accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

miciav and others added 7 commits June 30, 2026 15:52

feat: add batch/VM progress stats and ETA heuristic

6930fe5

feat: add submit/poll loop with stop-on-interrupt

f699688

feat: add rich TUI rendering for batch/VM/experiment progress

7de8480

feat: add define/run CLI for remote_experiments

1ba0fcc

docs: add remote_experiments usage README and example inventory

4bb514f

document centralized feasibility contract

b199ef9

miciav and others added 22 commits June 30, 2026 17:11

plan centralized feasibility enforcement

b5eea4b

add centralized feasibility validator

2e78b4b

cover distinct non-neighbor offload

04f9068

harden centralized feasibility validation

15c5c01

fix centralized validator edge cases

d996673

fix malformed validator diagnostics

545f3da

enforce centralized feasibility for heuristic solutions

9ca332c

keep outputs centrally feasible

4791f6f

fix centralized feasibility regressions

5f84d8f

fix rejection handling and add regression instances

fc2427c

document remote experiments fixes

ca5b97d

fix remote experiment execution

ef46f2e

plan hierarchical model experiments

ce22917

design paper experiment generators

72fb5a0

add paper experiment generators

3406412

design FaaS-MALD dual coordination

9dc03fa

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

spec: add LaTeX note deliverable and Sonnet-5 execution policy

e92a5e3

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

add materialized paper experiment instances

9acee0b

preserve combined solution snapshots

1074ca7

hierarchical madea runner added

4bb18a4

added hierarchical madea runner

e4137ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: distributed FaaS coordination heuristics (FaaS-MADiG / MAPoD / MABR) + coordination rework#2

feat: distributed FaaS coordination heuristics (FaaS-MADiG / MAPoD / MABR) + coordination rework#2
miciav wants to merge 108 commits into
mainfrom
feat/uv-migration-and-extended-tests

miciav commented May 19, 2026 •

edited

Loading

Uh oh!

miciav commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

miciav commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

FaaS-MABR (Gauss-Seidel best response) — decentralized_bestresponse.py

Cross-method coordination rework (diffusion / powerd / bestresponse)

Also on this branch

Heads-up for reviewers / reproducibility

Uh oh!

miciav commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

miciav commented May 19, 2026 •

edited

Loading

FaaS-MABR (Gauss-Seidel best response) — `decentralized_bestresponse.py`