Skip to content

Fix: use UTF-8 explicitly in brotli compress/decompress#35

Merged
JeremS merged 1 commit intostarfederation:mainfrom
andriytyurnikov:fix/brotli-utf-8-charset
May 1, 2026
Merged

Fix: use UTF-8 explicitly in brotli compress/decompress#35
JeremS merged 1 commit intostarfederation:mainfrom
andriytyurnikov:fix/brotli-utf-8-charset

Conversation

@andriytyurnikov
Copy link
Copy Markdown
Contributor

Split off from #32 per @JeremS's request — this is the brotli-only piece.

Summary

brotli/compress and brotli/decompress had no charset argument on String/.getBytes and String/new, so they fell back to the JVM platform default. Non-ASCII content silently corrupts on JVMs whose default charset isn't UTF-8 (Windows JDK ≤17, some misconfigured containers).

This PR uses StandardCharsets/UTF_8 explicitly, matching the rest of the SDK (adapter/common.clj's ->os-writer) and the convention in hyperlith.

Test plan

  • New roundtrip test with non-ASCII content ("héllo — café — Ω 🚀") added to brotli_test.clj.
  • bb test:bb — 46/46 (this lib is JVM-only; bb run is just a smoke check that nothing else broke).
  • bb test:all non-browser tests pass; the only failures are the pre-existing etaoin/geckodriver smoke tests, unrelated.

`(String/.getBytes data)` and `(String/new (.getDecompressedData ...))`
had no charset argument, so they fell back to the JVM platform default.
Non-ASCII content silently corrupts on JVMs whose default charset isn't
UTF-8 (Windows JDK <=17, some misconfigured containers). Use
StandardCharsets/UTF_8 explicitly, matching the rest of the SDK
(adapter/common.clj's ->os-writer) and the convention in hyperlith.

Adds a roundtrip test with non-ASCII content.
andriytyurnikov added a commit to andriytyurnikov/datastar-clojure that referenced this pull request Apr 29, 2026
Per discussion in starfederation#32: the SDK trusts its inputs by design — option
values, ids, selectors, script bodies and attributes are expected to
come from the developer, not from end users. Enforcing newline/escape
sanitization in the hot path (the original PR) was overreach.

Instead, this PR adds:

- Opt-in helpers in `starfederation.datastar.clojure.api` for callers
  that need defense-in-depth at a boundary:
  - `assert-sse-line-safe!` — throws on \\n/\\r in id/selector/etc.
  - `assert-script-body-safe!` — throws on `</script` (any case).
  - `escape-script-attribute-value` — HTML-escapes & " < > for
    attribute-context values.
  - `assert-script-attribute-name-safe!` — validates attr-name shape.
  Backed by a small `utils/assert-no-newline!` helper.

- `> [!WARNING]` blocks on `patch-elements!`, `patch-elements-seq!`,
  `remove-element!`, `patch-signals!` and `execute-script!` matching
  the style added by starfederation#31 to the script helpers, calling out the
  injection vectors and pointing at the helpers.

No behavior change to existing functions.

Brotli charset fix split out to starfederation#35.
@JeremS
Copy link
Copy Markdown
Collaborator

JeremS commented May 1, 2026

Thanks @andriytyurnikov !

@JeremS JeremS merged commit 5e789c2 into starfederation:main May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants