Proposal: safe-by-default patch/script API with unsafe-* escape hatches by andriytyurnikov · Pull Request #36 · starfederation/datastar-clojure

andriytyurnikov · 2026-04-29T13:31:37Z

Status: proposal — alternative design to #32

Filed as a separate proposal because @JeremS has indicated in #32 that he prefers opt-in helpers + docstring warnings over baked-in validation. I respect that call; this PR is a sketch of the alternative for evaluation, not a request to override that decision. Reasonable people can disagree here, and either design is shippable.

Design

The five user-facing patch/script functions become safe by default — they validate / escape values that get written raw onto the SSE wire or into a <script> tag, throwing on inputs that would inject extra SSE lines or close the tag. Each has an unsafe-* twin that preserves the current behavior, for callers who have already validated their input or know the value is trusted.

;; safe (default): throws on \n in id
(d*/patch-elements! sse-gen html {d*/id user-supplied-id})

;; explicit opt-out
(d*/unsafe-patch-elements! sse-gen html {d*/id pre-validated-id})

Atomic helpers backing the safe path are public, so callers can validate at a different boundary or pre-validate before calling an unsafe-* variant:

assert-sse-line-safe!
assert-script-body-safe!
escape-script-attribute-value
assert-script-attribute-name-safe!

Why I think safe-by-default is worth considering

1. The validations only reject inputs that are already spec violations. SSE id/event lines are single-line by spec. CSS selectors don't span lines. HTML attribute names can't contain whitespace. </script (any case) always closes a script tag regardless of escaping. No legitimate caller is broken — the only inputs rejected are ones that would silently corrupt the output anyway.

2. "Developer-controlled" is a fuzzy claim in modern apps. A few realistic patterns where user data flows into these fields without anyone consciously thinking about it:

id ← (str \"msg-\" (:request-id req)) — request-id originates client-side.
selector ← (str \"#row-\" username) — username from auth.
attributes map values — data-user-name, data-href, etc. The whole point of data-* attributes is to carry arbitrary data.

In each case the burden of remembering "this can carry user input, validate it" sits at every call site. Safe-by-default flips that: callers have to deliberately opt out, which is much rarer.

3. Asymmetric cost. Validation is one regex per call (~10 ns). The cost of getting it wrong is a CVE on a security-adjacent library. That asymmetry is the same reason parameterized SQL beat string-concatenated SQL, and why Hiccup / React / Selmer / every modern HTML library escape by default.

4. The Go / PHP SDK precedent. Lateral conformity is a reasonable starting point, but "the others don't either" is a weak design argument; it perpetuates a default rather than justifying it. If a vulnerability were ever reported against any of those SDKs, all of them would likely move in this direction at once.

Where I think the maintainer's view holds up

It's the project's call. Library authors get to set the contract. Disagreeing on technical merit doesn't override that.
The attribute escaping (not the validation — the auto-transform of & \" etc.) is a behavior change that could surprise a developer who happened to want a literal & or < in an attribute. This PR mitigates by exposing unsafe-execute-script! and escape-script-attribute-value for explicit control, but it's still a real concern.
unsafe-* doubles the API surface to document and maintain. That's a legitimate tax.
This is RC; some users may be relying on existing permissive behavior (though I'd argue no correct code can be).

Test plan

bb test:bb — 78/78 pass (existing tests + 16 new specs covering safe-rejection across all five functions and pass-through across the four unsafe-* twins).
No diff vs main outside api.clj (sanitizers, safe wrappers, unsafe-* twins), utils.clj (private helper), and api_test.clj (tests).

Disposition

Happy with whatever you decide:

Accept this PR — close Add opt-in sanitizers and docstring warnings for user-controlled input #32 (which has the previous, opt-in design), this becomes the contract for 1.0.
Reject this PR, accept Add opt-in sanitizers and docstring warnings for user-controlled input #32 — your judgment, no hard feelings; opt-in helpers + docstring warnings is a perfectly defensible design.
Reject both, accept only the brotli charset (Fix: use UTF-8 explicitly in brotli compress/decompress #35) — also fine. I'll continue to use the safe-by-default version on my fork.

The five user-facing functions now validate / escape values that get written raw onto the SSE wire or into a `<script>` tag, throwing on inputs that would inject extra SSE lines or close the script tag. Each has an `unsafe-*` twin that preserves the previous behavior for callers who have already validated their input or know the value is trusted. Validations applied by the safe variants: - `patch-elements!` / `patch-elements-seq!`: id, selector, patch-mode, element-ns must not contain CR/LF. - `remove-element!`: same, plus the positional `selector`. - `patch-signals!`: id must not contain CR/LF. - `execute-script!`: - script body must not contain `</script` (any case); - id must not contain CR/LF; - attribute names validated against `[A-Za-z_][A-Za-z0-9_:.-]*`; - attribute values HTML-escaped (& " < >). Atomic helpers backing the safe path are also public so callers can validate at a different boundary or pre-validate before calling an `unsafe-*` variant: `assert-sse-line-safe!`, `assert-script-body-safe!`, `escape-script-attribute-value`, `assert-script-attribute-name-safe!`. The validations only reject inputs that are *already* spec violations (SSE id/event lines are single-line; CSS selectors don't span lines; HTML attribute names can't contain whitespace; `</script` always closes a script tag regardless of escaping). No legitimate caller is broken; the change tightens the contract from "trust developer input" to "fail loudly on inputs that would silently inject events or HTML". bb test:bb: 78/78.

teodorlu · 2026-05-07T08:58:56Z

Is the purpose of this PR to improve developer ergonomics or to secure Datastar applications?

The answer to that question influences desired the Datastar SDK design. If it's for developer ergonomics, I want all checks turned on in local development, and turned off in production. If it's for security, I want it on. I already compile my HTML strings from Hiccup with escapes, I don't want those checks/escapes to happen twice.

andriy-swareco · 2026-05-07T10:27:17Z

@teodorlu - I understand that you have your workflow, stack, perspective and habits, but this PR is not about those.

Purpose of this PR is to present my perspective, and as far as I am concerned it addresses both, security and ergonomics.

teodorlu · 2026-05-07T11:25:44Z

What you are presenting as my habits is also the thinking behind clojure.core/assert, which can be turned off/on without changing surrounding code.

Assert is for developer ergonomics. For security, the advice (from eg Dave Lieppmann) is to throw exceptions.

andriy-swareco · 2026-05-07T11:46:11Z

@teodorlu it looks like you doubling down on pushing your perspective.

The analogy to clojure.core/assert doesn't quite land. assert is toggled globally via *assert* at compile time — it's a build-level switch for developer-time invariants. What's here is a per-call opt-in at the API surface, which is a different mechanism with a different threat model.

Liepmann's piece is about error signaling style, not about whether opt-ins should exist. Off-point here.

What are you actually objecting to? Specifically

andriytyurnikov · 2026-05-07T15:37:35Z

To wrap my participation in this thread:

This PR is a sharp tool vs batteries included question — caller validates vs library validates with explicit unsafe-* opt-outs. Both designs are coherent. This PR proposes the second as Datastar Clojure's default. The claim is about where the library's default should sit, not where any one app currently puts the boundary. I recognize it as a matter of preference, and obviously express mine.

The PR is here for whoever finds it useful. I won't be pushing further in this thread.

andriytyurnikov mentioned this pull request Apr 29, 2026

Add opt-in sanitizers and docstring warnings for user-controlled input #32

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: safe-by-default patch/script API with unsafe-* escape hatches#36

Proposal: safe-by-default patch/script API with unsafe-* escape hatches#36
andriytyurnikov wants to merge 1 commit intostarfederation:mainfrom
andriytyurnikov:proposal/safe-by-default-sanitization

andriytyurnikov commented Apr 29, 2026

Uh oh!

teodorlu commented May 7, 2026

Uh oh!

andriy-swareco commented May 7, 2026

Uh oh!

teodorlu commented May 7, 2026

Uh oh!

andriy-swareco commented May 7, 2026

Uh oh!

andriytyurnikov commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

andriytyurnikov commented Apr 29, 2026

Status: proposal — alternative design to #32

Design

Why I think safe-by-default is worth considering

Where I think the maintainer's view holds up

Test plan

Disposition

Uh oh!

teodorlu commented May 7, 2026

Uh oh!

andriy-swareco commented May 7, 2026

Uh oh!

teodorlu commented May 7, 2026

Uh oh!

andriy-swareco commented May 7, 2026

Uh oh!

andriytyurnikov commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants