Skip to content

PDF stage 4: Type3 glyphs + non-embedded fonts (deferred from stage 3)#553

Draft
andiwand wants to merge 7 commits into
mainfrom
pdf-stage-3.6-type3-nonembedded
Draft

PDF stage 4: Type3 glyphs + non-embedded fonts (deferred from stage 3)#553
andiwand wants to merge 7 commits into
mainfrom
pdf-stage-3.6-type3-nonembedded

Conversation

@andiwand

@andiwand andiwand commented Jun 23, 2026

Copy link
Copy Markdown
Member

Type3 glyphs + non-embedded fonts — deferred from stage 3 to stage 4, where the path → SVG machinery they need naturally lives. Stacked on the 3.5 Type1 PR.

⚠️ Draft / design-only. Seeds the branch with docs/design/pdf/stage-3.6-type3-nonembedded.md (filename keeps its original 3.6 slug); implementation follows once stage 4 (graphics / SVG) is underway.

Plan

  • Type3: char procs → SVG via stage 4's path → SVG machinery (no longer a minimal slice pulled forward into stage 3); per-glyph SVG placed at the text transform, Unicode from the stage-1 chain.
  • Non-embedded: standard-14 + common-name substitution to CSS font-family stacks; metrics from /Widths, AFM widths for the standard 14 (closes stage 2's deferred item) as a generated data table.

With this deferred, stage 3 completes at 3.5 (Type1). See the roadmap in src/odr/internal/pdf/AGENTS.md.

🤖 Generated with Claude Code

@andiwand andiwand force-pushed the pdf-stage-3.5-type1 branch from 289f85a to dccb1d9 Compare June 23, 2026 19:23
@andiwand andiwand force-pushed the pdf-stage-3.6-type3-nonembedded branch from b548931 to 5c20c27 Compare June 23, 2026 19:23
@andiwand andiwand force-pushed the pdf-stage-3.5-type1 branch from dccb1d9 to 424f31f Compare June 23, 2026 19:32
@andiwand andiwand force-pushed the pdf-stage-3.6-type3-nonembedded branch from 5c20c27 to c9e95c6 Compare June 23, 2026 19:32
@andiwand andiwand force-pushed the pdf-stage-3.5-type1 branch from 424f31f to 75d240f Compare June 23, 2026 19:46
@andiwand andiwand force-pushed the pdf-stage-3.6-type3-nonembedded branch 4 times, most recently from 4b596bf to e230889 Compare June 23, 2026 20:19
@andiwand andiwand force-pushed the pdf-stage-3.5-type1 branch from f9351ef to 29cdc2f Compare June 23, 2026 21:05
@andiwand andiwand force-pushed the pdf-stage-3.6-type3-nonembedded branch from e230889 to 5f7e680 Compare June 23, 2026 21:05
andiwand and others added 7 commits June 24, 2026 12:25
Seed the stage-3.5 branch. Read a Type1 program (eexec + charstring
decryption), translate Type1 -> Type2 charstrings, build a CFF and reuse
3.4's CFF -> OTF path; reverse map via glyph names -> AGL. Stacked on 3.4.
Implementation follows.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
First self-contained piece of 3.5: the Type1 running-key cipher
(font::type1::decrypt) and its two entry points — decrypt_eexec (key 55665,
4-byte skip, binary or ASCII-hex/PFA auto-detected) and decrypt_charstring
(key 4330, /lenIV-aware). These don't depend on the CFF translation work, so
they land ahead of the full Type1Font reader (eexec parse + Type1->Type2
charstring translation -> reuse 3.4's CFF->OTF path).

Tests: round-trips against an independent forward-cipher reference (so they're
not circular), the lenIV override, and the hex eexec form.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
Parse an Adobe Type1 font program into its decrypted parts: split the
clear-text header / eexec section / trailer, read /FontName, /FontMatrix,
/FontBBox and /Encoding (StandardEncoding or a custom dup-code-name-put
array) from the header, decrypt the eexec section (type1_crypt) and extract
every glyph's decrypted charstring plus /Subrs (RD/-| binary entries,
/lenIV-aware). PFB segment framing is stripped if present. Charstrings are
not yet interpreted — that's the Type1->Type2 translation that follows,
feeding 3.4's CFF->OTF path.

Tests: a hand-built encrypted Type1 program (independent forward cipher) —
magic, header/encoding parse, and the decrypted charstrings/subrs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
cff::build_cff serializes a name-keyed CFF from a list of (name, Type2
charstring) glyphs + default/nominalWidthX + bbox: Header, Name INDEX, Top
DICT (FontBBox + charset/CharStrings/Private offsets, fixed-width so the
layout resolves in one pass), String INDEX (every glyph name as a custom SID,
so no standard-strings table is needed), empty Global Subr INDEX, CharStrings
INDEX, format-0 charset, Private DICT. This is the assembly target for the
Type1 -> CFF path: the translated Type2 charstrings land here, the result
feeds CffFont + wrap_to_otf (3.4).

Test: build a 2-glyph CFF, read it back through CffFont (name, glyph name,
bbox, charstring width vs. default) and confirm it wraps to a loadable OTTO.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
type1::to_type2 translates a decrypted Type1 charstring to Type2 (CFF): a
stack machine that flattens callsubr (inlining the font's /Subrs, depth
guarded), folds div, lifts the hsbw side bearing into the first moveto and
returns the advance width separately, drops Type1-only hints (dotsection,
*stem3, hint-replacement OtherSubr 3), and translates the flex OtherSubrs
(1/2/0 -> two rrcurvetos) and seac (-> Type2 endchar form). Path operators
(r/h/v lineto, rr/vh/hv curveto, stems, moves, endchar) share opcodes with
Type2 and pass through. Best-effort / display-oriented: hints affect
rendering quality, not glyph shape.

Tests: exact Type2 output for hsbw width + side-bearing folding into the
first move, callsubr inlining, and div folding.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
type1::to_cff translates every glyph (to_type2, flattening /Subrs), places
.notdef at glyph 0 (synthesizing one when absent) and assembles a CFF via the
builder. load_embedded_font now reads /FontFile: parse the Type1 program,
convert to CFF, and hold it as a CffFont — so embedded Type1 reuses the entire
3.4 CFF path (PUA re-encode, @font-face wrap, reverse map) with no new
abstract::Font subclass.

Simple-font glyph selection by PostScript name (PDF /Encoding -> name -> glyph)
is the shared CFF/Type1 follow-up tied to the AGL/name-mapping decision;
composite and the wrap/display path work today.

Tests: a Type1 program converts to a CFF that reads back through CffFont
(glyph count incl. synthesized .notdef, names) and wraps to a loadable OTTO.
Full font + PDF + HTML corpus green (460 tests).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
Seed the stage-3.6 branch. Type3 char procs -> SVG via a minimal path->SVG
capability pulled forward from stage 4; non-embedded standard-14 substitution
+ AFM widths (closes stage 2's deferred item). Stacked on 3.5.
Implementation follows.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
@andiwand andiwand force-pushed the pdf-stage-3.6-type3-nonembedded branch from 5f7e680 to d5ed651 Compare June 24, 2026 10:30
@andiwand andiwand force-pushed the pdf-stage-3.5-type1 branch from 29cdc2f to 5908b96 Compare June 24, 2026 10:30
@andiwand andiwand changed the title PDF stage 3.6: Type3 glyphs + non-embedded fonts PDF stage 4: Type3 glyphs + non-embedded fonts (deferred from stage 3) Jun 24, 2026
@andiwand andiwand force-pushed the pdf-stage-3.5-type1 branch from 5908b96 to fac6b57 Compare June 24, 2026 11:07
Base automatically changed from pdf-stage-3.5-type1 to main June 24, 2026 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant