Scrub CR/LF in SSE outputs to prevent field forging (fixes #18)#22
Closed
andriytyurnikov wants to merge 12 commits intostarfederation:mainfrom
Closed
Scrub CR/LF in SSE outputs to prevent field forging (fixes #18)#22andriytyurnikov wants to merge 12 commits intostarfederation:mainfrom
andriytyurnikov wants to merge 12 commits intostarfederation:mainfrom
Conversation
Previously CI only exercised Ruby 4.0. Adds a build matrix so unit tests and the Datastar SDK conformance suite run on every supported Ruby. Also enables the workflow on pull_request events.
CI runs on ubuntu-latest. Without x86_64-linux locked, Bundler resolves platform-specific gems at install time on CI, which can cause version drift relative to local development. Locking the platform keeps CI reproducible.
bundle install fails on Ruby 3.1 because console 1.34.2 (transitive dep via async) requires ruby >= 3.2. The dependency tree's effective floor is 3.2.
Adds an optional generator_class: keyword to Dispatcher.new (defaulting to ServerSentEventGenerator) and routes the three internal generator instantiations — stream_one, stream_many's connection generator, and each per-stream generator inside spawned threads — through it. Lets downstream wrappers layer behavior (logging, metrics, input scrubbing, etc.) on top of the SDK by passing a subclass, instead of monkey-patching the SDK class.
Pre-fix #redirect interpolated the URL raw inside a single-quoted JS
string:
window.location = '#{url}'
A ' in url broke out of the literal, letting attacker-influenced
fragments execute as JS. The classic vulnerable pattern:
datastar.redirect("/page?ref=#{params[:ref]}")
is a Rails idiom that's safe under Rails' own redirect_to, so the
mismatch is a footgun.
Fix: encode the URL with JSON.generate(ascii_only: true,
escape_slash: true). The output is a properly-quoted JS string
literal that:
- escapes ", \, and control characters,
- escapes U+2028 / U+2029 (which terminate JS string literals
even when delimited by " or '),
- escapes / to \/ so a </script> substring in the URL can't
prematurely close the surrounding <script> tag during HTML
parsing (the parser does not recognize <\/script> as the end
tag, while \/ is a no-op inside a JS string literal).
Output format change: window.location is now wrapped in double
quotes with JSON-style escapes instead of single quotes. Behavior
at runtime is unchanged for safe URLs.
6 new specs cover single quotes, double quotes, backslashes,
</script> breakout, U+2028, and U+2029.
stream_one's lifecycle contract (handling_sync_errors) fires exactly one of on_server_disconnect / on_client_disconnect / on_error per stream — never a combination. stream_many violated this: its ensure block always fired on_server_disconnect, even when an error or client disconnect had already triggered the matching callback. Consumers writing observability/cleanup logic against these callbacks would see different behavior depending on whether they called .stream once or many times — a subtle source of bugs. Fix: track a completed_normally flag in stream_many's control thread, set it false when handle_streaming_error runs, and only fire on_server_disconnect in the ensure block when the flag is still true. 2 new shared examples (run for both ThreadExecutor and AsyncExecutor): - streamer raises → on_error fires, on_server_disconnect doesn't - client disconnects → on_client_disconnect fires, on_server_disconnect doesn't
Author
|
c173c0d to
39a6094
Compare
CI: run matrix against Ruby 3.2, 3.3, 3.4, 4.0
Lock x86_64-linux platform in Gemfile.lock
Allow Dispatcher to inject a custom generator_class
Make multi-stream disconnect callbacks mutually exclusive
Encode redirect URL as a safe JS string literal
Closes the SSE injection surface where attacker-controlled strings reaching element bodies, script bodies, attribute values, scalar option values, or array/hash option entries could forge SSE fields the browser then dispatches as legitimate events. The WHATWG SSE parser treats \r, \n, and \r\n all as line terminators. Two scrubbers, applied at the API boundary: - scrub_body: strip \r only (used for element/script bodies and String signal payloads). \n is preserved because the SDK splits bodies on \n to emit per-line `data:` fields. - scrub_option: recursively strip \r and \n from option values (Strings, Arrays, Hashes). Option values are written as single `data:` lines, so any embedded line terminator forges a field. Hash signals are unaffected — JSON.dump already escapes \r/\n in string values, so the serialized payload is always a single safe line. execute_script attribute values are scrubbed pre-tag-construction since they are interpolated raw and bypass patch_elements' splitter. 17 new specs covering element bodies, scalar/array/hash options, String and Hash signals, remove_elements, execute_script (body and attributes), redirect, and a top-level reproduction of the issue's attack vector.
39a6094 to
89010fc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #18.
Summary
Closes the SSE injection surface where caller-supplied strings reaching:
patch_elements),Stringsignal payloads (patch_signals),execute_script,redirect),id,selector,mode,useViewTransition, etc.),could forge SSE fields the browser then dispatches as legitimate events. The WHATWG SSE parser treats
\r,\n, and\r\nall as line terminators.Approach
Two private helpers on
ServerSentEventGenerator, applied at the API boundary (matches the sketch in #18):`scrub_option` walks Strings, Arrays, and Hashes; Ints/Floats/Booleans pass through. Hash signal payloads are unaffected because `JSON.dump` already escapes `\r`/`\n` in string values, so the serialized payload is always one safe line.
`execute_script` attribute values are scrubbed before the `<script>` tag is constructed, because they're interpolated raw into the tag and would otherwise bypass `patch_elements` element-body splitter.
Behavior change
Calls that don't pass
\r/\nthrough user-controlled paths see no change. Existing 100-spec baseline still green. Calls that do pass through CR/LF strings now see those stripped (bodies) or normalized into the parentdata:line (options) — which is what callers intended in 100% of legitimate cases.Tests
17 new specs under `Dispatcher SSE injection guards`, including:
Full suite: 117 examples, 0 failures.
Verification
The issue's exact reproduction script now produces only the 4 legitimate
event: datastar-patch-elementsheaders (one per call). Strings that were previously parsed as forged SSE fields (event: forged,injected: evil) are now absorbed as text into the parentdata:lines.Test plan