Skip to content

explorer: architectural direction — make filter semantics coherent across all surfaces #234

@rdhyee

Description

@rdhyee

Purpose

Capture the architectural direction for the explorer's filter semantics, sequenced as a roadmap, with explicit decision rationale. Filed for review (Codex, Gemini, and human collaborators). The goal is alignment before implementation begins on the substantive steps.

Context

The explorer has five surfaces that show numbers about samples — map dots, "Samples in View" stat, "Samples Rendered" stat, samples table count, facet-legend counts, and the search-results line. Today each surface applies a different combination of constraints:

Constraint Map / Stats / Table Facet counts Search results line
Source checkboxes
Material / Sampled Feature / Specimen Type
Bbox (viewport) optional (area-scope)
Search text

Result: three different "filter" semantics on one page. The 2026-05-22 investigation session (see #229 closure note, and the design briefing in ~/dev-journal/projects/isamples-facets.md) hit three concrete confusions stemming from this:

  • "I have pottery Cyprus in the search box but the facet counts and 'Samples in View' don't reflect it."
  • "I filtered to material=soil but the cluster dots include non-SESAR colors even though most soil is SESAR."
  • "What does '5,451 samples match the current filters' actually mean?"

The decision space

Three orthogonal axes (named for cross-reference):

  • A1: search is a global filter — restricts map, table, stats, facet counts.
    A2: search is a side-panel lookup — restricts only the search-results list. (current)
  • B1: facet counts reflect viewport bbox — pan, counts change.
    B2: facet counts stay global regardless of viewport. (current)
  • C1: cluster mode honestly reflects facet filter — H3 dots per filtered subset (expensive — pre-bake per facet, or live aggregate).
    C2: cluster mode ignores filter, surface this loudly with #facetNote. (current — but the note is bugged on URL-load).
    C3: auto-switch to point mode when any facet is active — no cluster dishonesty, but point density problems (see explorer: dense point overlap saturates to yellow, looks like Smithsonian dots #231).

Direction picked

A1 + B1 + (C3-when-feasible, C2-with-prominent-warning-when-too-dense), with progressive refinement (sampled-fast then full-when-idle) underlying every dynamic surface, and issue #233's progressive heatmap as the eventual unifier that retires the cluster-vs-point dichotomy.

Mental model the user gets

"The explorer is a single coherent answer to: what samples match my current intent? Every number on the page tells me the size of that intent. Every dot tells me where one of those samples is. If there are too many dots to draw individually, the page tells me so and falls back to cluster mode with a visible warning that it's an approximation."

Why this combination

  • A1 (search global): the search box stops being decorative. Users naturally assume what they type restricts what they see; the page should honor that.
  • B1 (counts viewport-aware): legend becomes "what's in front of me" — agreeing with the table and the stats. The legend stops being a global pivot tool (which is conceptually clean but in practice confused users in the 2026-05-22 session).
  • C3-then-C2 fallback: cluster mode is treated as a perf optimization, not a feature. When it's feasible to draw individual dots, draw them. When the count exceeds a threshold (still TBD), keep cluster but warn loudly that what you see isn't the filtered set.
  • Progressive refinement: addresses the "want both snappy and honest" tension. Counts/dots show a coarse approximation during active panning, refine to honest values when the user sits still for ~500ms. Cancellation on any new pan. The facetCountsReqId and requestId patterns already in the codebase generalize directly.

Why NOT the "cleanest" earlier framing (A1 + B2)

An earlier version of this briefing recommended A1 + B2 — keep the legend global as a pivot tool ("what could I navigate to"). Decision made to go with B1 instead because:

  • The explorer's primary user is studying data, not navigating around. "What's here" matters more than "where could I go."
  • All other numbers on the page reflect the viewport; making the legend the lone exception causes silent disagreement.
  • Progressive refinement makes B1's perf cost (100-300 ms recompute per pan) feel acceptable — the user sees stale counts go italic instantly, then update.

Sequenced roadmap

# Step Effort Architectural change Unblocks
1 Fix #facetNote-on-URL-load bug 1-2 hours None — pure bug fix C2 honesty when arriving via shared URL
2 #232: "50+" → real count ½ day None — adds a COUNT query Honest disclosure of search-result size
3 B1: facet counts viewport-aware, with .recomputing italic state during query 1-2 days Add bbox predicate to updateCrossFilteredCounts live-query path; cube fast-path falls back when bbox is non-global Legend agrees with table and stats
4 A1: search as global filter — add ILIKE search predicate to facet counts, table query, and loadViewportSamples 2-3 days Touches every count surface; biggest behavior change Search box becomes a real filter
5 C3: auto-promote to point mode when any facet active, with density-cap fallback to cluster + prominent "showing cluster — too dense for individual dots" warning 2-3 days Mode-selection logic now considers facet state, not just zoom Map dots honestly reflect filter
6 #233: progressive heatmap spike — third visualization that replaces the cluster apology with an actually-filter-honest density layer ~1 week New visualization mode; reuses DuckDB-WASM + wide-parquet stack Retires the cluster-vs-point tradeoff for high-density filtered views

Steps 1-2 are quick-win, independent of the architectural direction. Steps 3-4-5 are the substantive coherence work. Step 6 is the long-term answer that makes the cluster-mode apology obsolete.

Progressive refinement pattern (applies to steps 3, 5, 6)

A single debounce-+-cancel-+-progressive scaffold reused across surfaces:

moveStart:
  - snapshot current values
  - apply `.recomputing` italic state

moveEnd + 250 ms (debounce — cancels if another move comes):
  - kick off coarse-pass query (10% TABLESAMPLE for counts; cube for legend single-axis case)
  - apply result; keep `.recomputing` if there's a refine pass pending

moveEnd + 1-2 s still idle:
  - run full-scan query
  - apply result; drop `.recomputing`

any new move / filter change:
  - bump request token; in-flight queries discard their result via stale guard

The codebase already has the cancellation primitives (facetCountsReqId, requestId, freshSelectionToken) and the .recomputing CSS class.

Open questions for review

  1. Is A1 + B1 actually the right call? The earlier framing argued B2 keeps the legend stable and avoids per-pan jitter. We chose B1 because it makes the page coherent and the user is studying data. But B2 + A1 is also defensible — does the review prefer it?
  2. Density-cap threshold for C3 → C2 fallback? When should auto-point-mode give up and revert to cluster? 5,000 dots? 50,000? Empirically test, or pick a number?
  3. "Snappy vs accurate" explicit toggle? Or is progressive refinement enough that the user doesn't need to choose? The current lean is no toggle — the page just behaves fast-while-moving and honest-when-still.
  4. Does step 4 (A1) require the FTS work in Explorer FTS Track 1b: Honesty fix for query-spec / live mismatch #168-Explorer FTS Track 5: GO/NO-GO decision gate #172 to land first? The current search uses ILIKE against three text columns. Performance-acceptable for the search-results list at LIMIT 50, but folding it into every count query means scanning the same columns much more often. Might need the BM25 index to be ready before A1 is shippable at scale.
  5. C3's interaction with explorer: dense point overlap saturates to yellow, looks like Smithsonian dots #231 (point saturation). Auto-promoting to point mode reveals the yellow-saturation bug at high densities. Should explorer: dense point overlap saturates to yellow, looks like Smithsonian dots #231's fix (sub-options A/B/C/D in that issue) land before C3, or alongside it?
  6. Should step 6 (explorer: spike a progressive heatmap layer as filter-honest alternative to cluster mode #233 heatmap) actually come earlier? If the spike works, it might supplant the C3 work entirely — no point auto-promoting to point if a filter-honest heatmap is the better visualization. Risk vs. value of trying the spike before committing to C3.

What's NOT in this issue

Cross-refs

Acceptance for this issue (not the implementation)

Once those are settled, individual PRs follow against the existing tracking issues (#230, #231, #232, #233 plus the #facetNote bug to be filed).

Metadata

Metadata

Assignees

No one assigned

    Labels

    explorerInteractive Explorer featuresneeds-discussionRequires team input before implementing

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions