fix(mdxish): support unquoted attributes in component tags#1389
fix(mdxish): support unquoted attributes in component tags#1389maximilianfalco wants to merge 10 commits intonextfrom
Conversation
…o/fix-components-not-accepting-unquoted-attributes
|
@rafegoldberg this works as is to get parity with the legacy engine (ie supporting non quoted attributes like but there is another way tho, which involves us making a whole new tokenizer for pascal case tags which is the definitively correct approach, but im just wondering if this will be overkill? wanted to know your thoughts on this as well |
|
@rafegoldberg @eaglethrost I tried a new approach here in 642774c, which basically creates a micromark tokenizer specifically for it doesnt do any processing, it just guards these pascal case component tags from being malformed by other transformers (mostly remarkParse) the old approach lives in e6f8533 where it was a preprocess step... |
eaglethrost
left a comment
There was a problem hiding this comment.
@maximilianfalco Wow quite a few changes😅 I think overall yeah I agree tokenizer is a safer and cleaner way to go though to parse these components syntaxes.
QA'd the tokenizer and it seems to work well, didn't notice any regressions so far! Just have a couple of questions and suggestions, some general ones:
- Just curious, was the previous preprocess way unable to deal with multiline attribute syntaxes? (The example I gave before)
- I think this tokenizer works but since we need to set up the character sequence tokenizer for individual components, I wonder how we can make the code more scalable and extensible. Right now it looks fine, but even then we haven't account for every single readme component (e.g. Columns, Tabs, etc). Do you think we'll want to do that as well? Or just for certain components, just gauging the vision here
…o/fix-components-not-accepting-unquoted-attributes
yes, the old one couldnt match attributes that span multiple lines
adding them should be pretty easy i feel like, simply extending them in the |

🧰 Changes
A follow up of #1375 and this comment.
Adds a micromark tokenizer extension (
jsxComponentBlock) that claims ownership of known component tags (<Image,<img,<Callout,<Embed,<Recipe,<Anchor) before micromark's built-in HTML block parser can reject them. This prevents GFM autolinks from fragmenting unquoted URLs in attribute values.How it works
Note
The extension doesn't process or transform the HTML, it just guards the tag text so it passes through intact as an
htmlMDAST node. All actual attribute parsing and component rendering still happens downstream inmdxish-component-blocks.Without the extension, micromark rejects
<Image src=https://example.com/img.png />as invalid HTML, falls through to text parsing, and GFM autolinks fragment the URL into a link node which breaks the component.Tag matching
Uses the same pattern as the existing
jsxTableextension, matches specific tag names character by character (I-m-a-g-e,i-m-g,C-a-l-l-o-u-t, etc.) and rejects anything else. No conflict withjsxTablesinceTableis not in the list.Supported tag names are
<Image<img<Callout<Embed<Recipe<Anchor🧬 QA & Testing