Proposal: safe-by-default patch/script API with unsafe-* escape hatches#36
Conversation
The five user-facing functions now validate / escape values that get written raw onto the SSE wire or into a `<script>` tag, throwing on inputs that would inject extra SSE lines or close the script tag. Each has an `unsafe-*` twin that preserves the previous behavior for callers who have already validated their input or know the value is trusted. Validations applied by the safe variants: - `patch-elements!` / `patch-elements-seq!`: id, selector, patch-mode, element-ns must not contain CR/LF. - `remove-element!`: same, plus the positional `selector`. - `patch-signals!`: id must not contain CR/LF. - `execute-script!`: - script body must not contain `</script` (any case); - id must not contain CR/LF; - attribute names validated against `[A-Za-z_][A-Za-z0-9_:.-]*`; - attribute values HTML-escaped (& " < >). Atomic helpers backing the safe path are also public so callers can validate at a different boundary or pre-validate before calling an `unsafe-*` variant: `assert-sse-line-safe!`, `assert-script-body-safe!`, `escape-script-attribute-value`, `assert-script-attribute-name-safe!`. The validations only reject inputs that are *already* spec violations (SSE id/event lines are single-line; CSS selectors don't span lines; HTML attribute names can't contain whitespace; `</script` always closes a script tag regardless of escaping). No legitimate caller is broken; the change tightens the contract from "trust developer input" to "fail loudly on inputs that would silently inject events or HTML". bb test:bb: 78/78.
|
Is the purpose of this PR to improve developer ergonomics or to secure Datastar applications? The answer to that question influences desired the Datastar SDK design. If it's for developer ergonomics, I want all checks turned on in local development, and turned off in production. If it's for security, I want it on. I already compile my HTML strings from Hiccup with escapes, I don't want those checks/escapes to happen twice. |
|
@teodorlu - I understand that you have your workflow, stack, perspective and habits, but this PR is not about those. Purpose of this PR is to present my perspective, and as far as I am concerned it addresses both, security and ergonomics. |
|
What you are presenting as my habits is also the thinking behind clojure.core/assert, which can be turned off/on without changing surrounding code. Assert is for developer ergonomics. For security, the advice (from eg Dave Lieppmann) is to throw exceptions. |
|
@teodorlu it looks like you doubling down on pushing your perspective. The analogy to Liepmann's piece is about error signaling style, not about whether opt-ins should exist. Off-point here. What are you actually objecting to? Specifically |
|
To wrap my participation in this thread: This PR is a sharp tool vs batteries included question — caller validates vs library validates with explicit The PR is here for whoever finds it useful. I won't be pushing further in this thread. |
Status: proposal — alternative design to #32
Filed as a separate proposal because @JeremS has indicated in #32 that he prefers opt-in helpers + docstring warnings over baked-in validation. I respect that call; this PR is a sketch of the alternative for evaluation, not a request to override that decision. Reasonable people can disagree here, and either design is shippable.
Design
The five user-facing patch/script functions become safe by default — they validate / escape values that get written raw onto the SSE wire or into a
<script>tag, throwing on inputs that would inject extra SSE lines or close the tag. Each has anunsafe-*twin that preserves the current behavior, for callers who have already validated their input or know the value is trusted.Atomic helpers backing the safe path are public, so callers can validate at a different boundary or pre-validate before calling an
unsafe-*variant:assert-sse-line-safe!assert-script-body-safe!escape-script-attribute-valueassert-script-attribute-name-safe!Why I think safe-by-default is worth considering
1. The validations only reject inputs that are already spec violations. SSE id/event lines are single-line by spec. CSS selectors don't span lines. HTML attribute names can't contain whitespace.
</script(any case) always closes a script tag regardless of escaping. No legitimate caller is broken — the only inputs rejected are ones that would silently corrupt the output anyway.2. "Developer-controlled" is a fuzzy claim in modern apps. A few realistic patterns where user data flows into these fields without anyone consciously thinking about it:
id←(str \"msg-\" (:request-id req))— request-id originates client-side.selector←(str \"#row-\" username)— username from auth.attributesmap values —data-user-name,data-href, etc. The whole point ofdata-*attributes is to carry arbitrary data.In each case the burden of remembering "this can carry user input, validate it" sits at every call site. Safe-by-default flips that: callers have to deliberately opt out, which is much rarer.
3. Asymmetric cost. Validation is one regex per call (~10 ns). The cost of getting it wrong is a CVE on a security-adjacent library. That asymmetry is the same reason parameterized SQL beat string-concatenated SQL, and why Hiccup / React / Selmer / every modern HTML library escape by default.
4. The Go / PHP SDK precedent. Lateral conformity is a reasonable starting point, but "the others don't either" is a weak design argument; it perpetuates a default rather than justifying it. If a vulnerability were ever reported against any of those SDKs, all of them would likely move in this direction at once.
Where I think the maintainer's view holds up
&\"etc.) is a behavior change that could surprise a developer who happened to want a literal&or<in an attribute. This PR mitigates by exposingunsafe-execute-script!andescape-script-attribute-valuefor explicit control, but it's still a real concern.unsafe-*doubles the API surface to document and maintain. That's a legitimate tax.Test plan
bb test:bb— 78/78 pass (existing tests + 16 new specs covering safe-rejection across all five functions and pass-through across the fourunsafe-*twins).mainoutsideapi.clj(sanitizers, safe wrappers,unsafe-*twins),utils.clj(private helper), andapi_test.clj(tests).Disposition
Happy with whatever you decide: