Fix: use UTF-8 explicitly in brotli compress/decompress#35
Merged
JeremS merged 1 commit intostarfederation:mainfrom May 1, 2026
Merged
Conversation
`(String/.getBytes data)` and `(String/new (.getDecompressedData ...))` had no charset argument, so they fell back to the JVM platform default. Non-ASCII content silently corrupts on JVMs whose default charset isn't UTF-8 (Windows JDK <=17, some misconfigured containers). Use StandardCharsets/UTF_8 explicitly, matching the rest of the SDK (adapter/common.clj's ->os-writer) and the convention in hyperlith. Adds a roundtrip test with non-ASCII content.
andriytyurnikov
added a commit
to andriytyurnikov/datastar-clojure
that referenced
this pull request
Apr 29, 2026
Per discussion in starfederation#32: the SDK trusts its inputs by design — option values, ids, selectors, script bodies and attributes are expected to come from the developer, not from end users. Enforcing newline/escape sanitization in the hot path (the original PR) was overreach. Instead, this PR adds: - Opt-in helpers in `starfederation.datastar.clojure.api` for callers that need defense-in-depth at a boundary: - `assert-sse-line-safe!` — throws on \\n/\\r in id/selector/etc. - `assert-script-body-safe!` — throws on `</script` (any case). - `escape-script-attribute-value` — HTML-escapes & " < > for attribute-context values. - `assert-script-attribute-name-safe!` — validates attr-name shape. Backed by a small `utils/assert-no-newline!` helper. - `> [!WARNING]` blocks on `patch-elements!`, `patch-elements-seq!`, `remove-element!`, `patch-signals!` and `execute-script!` matching the style added by starfederation#31 to the script helpers, calling out the injection vectors and pointing at the helpers. No behavior change to existing functions. Brotli charset fix split out to starfederation#35.
This was referenced Apr 29, 2026
Collaborator
|
Thanks @andriytyurnikov ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Split off from #32 per @JeremS's request — this is the brotli-only piece.
Summary
brotli/compressandbrotli/decompresshad no charset argument onString/.getBytesandString/new, so they fell back to the JVM platform default. Non-ASCII content silently corrupts on JVMs whose default charset isn't UTF-8 (Windows JDK ≤17, some misconfigured containers).This PR uses
StandardCharsets/UTF_8explicitly, matching the rest of the SDK (adapter/common.clj's->os-writer) and the convention in hyperlith.Test plan
"héllo — café — Ω 🚀") added tobrotli_test.clj.bb test:bb— 46/46 (this lib is JVM-only; bb run is just a smoke check that nothing else broke).bb test:allnon-browser tests pass; the only failures are the pre-existing etaoin/geckodriver smoke tests, unrelated.