Skip to content

fix(mdxish): support unquoted attributes in component tags#1389

Draft
maximilianfalco wants to merge 10 commits intonextfrom
falco/fix-components-not-accepting-unquoted-attributes
Draft

fix(mdxish): support unquoted attributes in component tags#1389
maximilianfalco wants to merge 10 commits intonextfrom
falco/fix-components-not-accepting-unquoted-attributes

Conversation

@maximilianfalco
Copy link
Copy Markdown
Contributor

@maximilianfalco maximilianfalco commented Mar 23, 2026

PR App Fix RM-XYZ

🧰 Changes

A follow up of #1375 and this comment.

Adds a micromark tokenizer extension (jsxComponentBlock) that claims ownership of known component tags (<Image, <img, <Callout, <Embed, <Recipe, <Anchor) before micromark's built-in HTML block parser can reject them. This prevents GFM autolinks from fragmenting unquoted URLs in attribute values.

How it works

Note

The extension doesn't process or transform the HTML, it just guards the tag text so it passes through intact as an html MDAST node. All actual attribute parsing and component rendering still happens downstream in mdxish-component-blocks.

Without the extension, micromark rejects <Image src=https://example.com/img.png /> as invalid HTML, falls through to text parsing, and GFM autolinks fragment the URL into a link node which breaks the component.

Tag matching

Uses the same pattern as the existing jsxTable extension, matches specific tag names character by character (I-m-a-g-e, i-m-g, C-a-l-l-o-u-t, etc.) and rejects anything else. No conflict with jsxTable since Table is not in the list.

Supported tag names are

  • <Image
  • <img
  • <Callout
  • <Embed
  • <Recipe
  • <Anchor

🧬 QA & Testing

<Image src=https://files.readme.io/6f52e22-man-eating-pizza-and-making-an-ok-gesture.jpg caption="lol he's eating pizza!" height="100px" align="center" border="true" />

<img src=https://files.readme.io/6f52e22-man-eating-pizza-and-making-an-ok-gesture.jpg />

@maximilianfalco
Copy link
Copy Markdown
Contributor Author

@rafegoldberg this works as is to get parity with the legacy engine (ie supporting non quoted attributes like src in image tags) but it requires us to make another preprocessing step.

but there is another way tho, which involves us making a whole new tokenizer for pascal case tags which is the definitively correct approach, but im just wondering if this will be overkill? wanted to know your thoughts on this as well

Comment thread __tests__/transformers/normalize-component-attributes.test.ts Outdated
@maximilianfalco
Copy link
Copy Markdown
Contributor Author

maximilianfalco commented Mar 24, 2026

@rafegoldberg @eaglethrost I tried a new approach here in 642774c, which basically creates a micromark tokenizer specifically for <Image, <img, <Callout, <Anchor, <Embed and <Recipe

it doesnt do any processing, it just guards these pascal case component tags from being malformed by other transformers (mostly remarkParse)

the old approach lives in e6f8533 where it was a preprocess step...

@maximilianfalco maximilianfalco changed the title fix(mdxish): support unquoted attributes in PascalCase component tags fix(mdxish): support unquoted attributes in component tags Mar 24, 2026
@maximilianfalco maximilianfalco marked this pull request as ready for review March 24, 2026 09:08
@eaglethrost
Copy link
Copy Markdown
Contributor

I brought up this before with the Table tokenizer, but I think with this tokenizer we can now deal with this better where there is an escaped / coded closing tag in the middle of the component syntax & make it not break.

Example:

<Callout>            
Here is some code: `</Callout>` and more text.
</Callout>
Screenshot 2026-03-26 at 1 45 56 pm

Comment thread lib/micromark/pascalcase-html-block/syntax.ts Outdated
Comment thread lib/micromark/pascalcase-html-block/syntax.ts Outdated
Comment thread lib/micromark/jsx-component/syntax.ts
Comment thread lib/micromark/pascalcase-html-block/syntax.ts Outdated
Copy link
Copy Markdown
Contributor

@eaglethrost eaglethrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maximilianfalco Wow quite a few changes😅 I think overall yeah I agree tokenizer is a safer and cleaner way to go though to parse these components syntaxes.

QA'd the tokenizer and it seems to work well, didn't notice any regressions so far! Just have a couple of questions and suggestions, some general ones:

  • Just curious, was the previous preprocess way unable to deal with multiline attribute syntaxes? (The example I gave before)
  • I think this tokenizer works but since we need to set up the character sequence tokenizer for individual components, I wonder how we can make the code more scalable and extensible. Right now it looks fine, but even then we haven't account for every single readme component (e.g. Columns, Tabs, etc). Do you think we'll want to do that as well? Or just for certain components, just gauging the vision here

@maximilianfalco maximilianfalco marked this pull request as draft March 26, 2026 04:46
@maximilianfalco
Copy link
Copy Markdown
Contributor Author

  • Just curious, was the previous preprocess way unable to deal with multiline attribute syntaxes? (The example I gave before)

yes, the old one couldnt match attributes that span multiple lines

  • I think this tokenizer works but since we need to set up the character sequence tokenizer for individual components, I wonder how we can make the code more scalable and extensible. Right now it looks fine, but even then we haven't account for every single readme component (e.g. Columns, Tabs, etc). Do you think we'll want to do that as well? Or just for certain components, just gauging the vision here

adding them should be pretty easy i feel like, simply extending them in the tags.ts file and adding a suffix should be enough

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants