fix(mdxish): support unquoted attributes in component tags by maximilianfalco · Pull Request #1389 · readmeio/markdown

maximilianfalco · 2026-03-23T05:50:16Z

	Fix RM-XYZ

🧰 Changes

A follow up of #1375 and this comment.

Adds a micromark tokenizer extension (jsxComponentBlock) that claims ownership of known component tags (<Image, <img, <Callout, <Embed, <Recipe, <Anchor) before micromark's built-in HTML block parser can reject them. This prevents GFM autolinks from fragmenting unquoted URLs in attribute values.

How it works

Note

The extension doesn't process or transform the HTML, it just guards the tag text so it passes through intact as an html MDAST node. All actual attribute parsing and component rendering still happens downstream in mdxish-component-blocks.

Without the extension, micromark rejects <Image src=https://example.com/img.png /> as invalid HTML, falls through to text parsing, and GFM autolinks fragment the URL into a link node which breaks the component.

Tag matching

Uses the same pattern as the existing jsxTable extension, matches specific tag names character by character (I-m-a-g-e, i-m-g, C-a-l-l-o-u-t, etc.) and rejects anything else. No conflict with jsxTable since Table is not in the list.

Supported tag names are

<Image
<img
<Callout
<Embed
<Recipe
<Anchor

🧬 QA & Testing

<Image src=https://files.readme.io/6f52e22-man-eating-pizza-and-making-an-ok-gesture.jpg caption="lol he's eating pizza!" height="100px" align="center" border="true" />

<img src=https://files.readme.io/6f52e22-man-eating-pizza-and-making-an-ok-gesture.jpg />

…o/fix-components-not-accepting-unquoted-attributes

maximilianfalco · 2026-03-24T06:22:29Z

@rafegoldberg this works as is to get parity with the legacy engine (ie supporting non quoted attributes like src in image tags) but it requires us to make another preprocessing step.

but there is another way tho, which involves us making a whole new tokenizer for pascal case tags which is the definitively correct approach, but im just wondering if this will be overkill? wanted to know your thoughts on this as well

maximilianfalco · 2026-03-24T07:38:49Z

@rafegoldberg @eaglethrost I tried a new approach here in 642774c, which basically creates a micromark tokenizer specifically for <Image, <img, <Callout, <Anchor, <Embed and <Recipe

it doesnt do any processing, it just guards these pascal case component tags from being malformed by other transformers (mostly remarkParse)

the old approach lives in e6f8533 where it was a preprocess step...

eaglethrost · 2026-03-26T02:46:38Z

I brought up this before with the Table tokenizer, but I think with this tokenizer we can now deal with this better where there is an escaped / coded closing tag in the middle of the component syntax & make it not break.

Example:

<Callout>            
Here is some code: `</Callout>` and more text.
</Callout>

eaglethrost

@maximilianfalco Wow quite a few changes😅 I think overall yeah I agree tokenizer is a safer and cleaner way to go though to parse these components syntaxes.

QA'd the tokenizer and it seems to work well, didn't notice any regressions so far! Just have a couple of questions and suggestions, some general ones:

Just curious, was the previous preprocess way unable to deal with multiline attribute syntaxes? (The example I gave before)
I think this tokenizer works but since we need to set up the character sequence tokenizer for individual components, I wonder how we can make the code more scalable and extensible. Right now it looks fine, but even then we haven't account for every single readme component (e.g. Columns, Tabs, etc). Do you think we'll want to do that as well? Or just for certain components, just gauging the vision here

…o/fix-components-not-accepting-unquoted-attributes

maximilianfalco · 2026-03-27T01:07:23Z

Just curious, was the previous preprocess way unable to deal with multiline attribute syntaxes? (The example I gave before)

yes, the old one couldnt match attributes that span multiple lines

I think this tokenizer works but since we need to set up the character sequence tokenizer for individual components, I wonder how we can make the code more scalable and extensible. Right now it looks fine, but even then we haven't account for every single readme component (e.g. Columns, Tabs, etc). Do you think we'll want to do that as well? Or just for certain components, just gauging the vision here

adding them should be pretty easy i feel like, simply extending them in the tags.ts file and adding a suffix should be enough

maximilianfalco added 2 commits March 23, 2026 16:49

fix: support unquoted attrs in PascalCase component tags

e6f8533

Merge branch 'next' of https://github.com/readmeio/markdown into falc…

b623854

…o/fix-components-not-accepting-unquoted-attributes

eaglethrost reviewed Mar 24, 2026

View reviewed changes

Comment thread __tests__/transformers/normalize-component-attributes.test.ts Outdated

maximilianfalco added 2 commits March 24, 2026 18:31

feat: add jsxComponentBlock micromark extension for grabbing html tags

642774c

feat: replace attrs preprocessor with new micromark extension

7d8ae4e

maximilianfalco changed the title ~~fix(mdxish): support unquoted attributes in PascalCase component tags~~ fix(mdxish): support unquoted attributes in component tags Mar 24, 2026

chore: make more test for multiline image components

466df18

maximilianfalco requested a review from eaglethrost March 24, 2026 07:56

maximilianfalco marked this pull request as ready for review March 24, 2026 09:08

maximilianfalco requested review from kevinports and rafegoldberg March 24, 2026 09:08