Skip to content

lukeswindale/threadfold

Repository files navigation

Threadfold

validate

An open format for capturing communication and work artifacts for AI consumption.

Your working life arrives as threads — email chains, chat conversations, a meeting and its transcript, an issue and its comments. Threadfold gathers those scattered threads and folds them into one durable, plain-file record that both a person and a language model can follow: a structured, portable, on-disk archive of your email, chat, meetings, notes, calendar, issues, and documents, captured once and kept in a form that outlives the software that made it.

Threadfold is owned by no product. Anything can write it (a browser capturer, a CLI exporter, an IMAP or Notion bridge) and anything can read it (a retrieval index, a personal "second brain", an analytics tool). A conforming writer and a conforming reader interoperate with no further coordination.

your tools ──▶  capturer (writes Threadfold)  ──▶  📁 Threadfold archive  ──▶  reader (indexes Threadfold)  ──▶  LLM / search

Why a format, not just an app

  • Portable. Your archive is plain files — Markdown, JSON, WebVTT — that you can back up, grep, diff, and read in fifty years without the original software.
  • LLM-ready. Each item's body is a self-contained context unit: it opens with YAML frontmatter carrying its provenance, so any chunk sliced from it stays attributable without a database lookup.
  • Source-agnostic. Items are classified by shape (message, conversation, transcript, note, event, task, document, image) rather than by which app produced them, so a reader understands a brand-new source for free.
  • Lossless. Source-specific fields with no standard home are preserved verbatim in a namespaced escape hatch.
  • Forward compatible. Readers ignore what they don't recognise and fall back to the core, so a newer writer never breaks an older reader.

The shape of an item

Every item is two required files plus optional siblings:

2026-05-28_091400_Re-Q3-planning.md            # body + frontmatter (the context unit)
2026-05-28_091400_Re-Q3-planning-metadata.json # structured metadata (the retrieval surface)
2026-05-28_091400_Re-Q3-planning.json          # raw source payload (recommended)

The metadata has a small mandatory core, optional standard blocks (account, participants, conversation, channel, links, derivations), at most one typed shape block, and the sourceExtra escape hatch. See SPEC.md for the full normative definition and examples/ for one worked item per shape.

Every .md opens with traditional --- YAML frontmatter carrying the core identity (the hidden provenance block a chunker relies on). For a visible metadata block, Threadfold 1.1.1 makes the meta:frontmatter fenced form canonical — e.g. ```yaml meta:frontmatter carrying the payload directly, no --- wrapper. The earlier 1.1.0 ----wrapped fenced form and plain document frontmatter remain accepted for import. See SPEC.md §3.2.

{
  "formatVersion": "1.1.1",
  "itemId": "acme-mail:AAQk0001",
  "source": "ms-outlook",
  "itemType": "message",
  "title": "Re: Q3 planning",
  "composedAt": "2026-05-28T09:14:00+10:00",
  "account": { "source": "ms-outlook", "id": "you@acme.com", "email": "you@acme.com" },
  "participants": [ { "role": "author", "name": "Sarah Chen", "email": "sarah@acme.com" } ],
  "conversation": { "id": "conv_AAQk", "position": "reply", "depth": 1, "index": 3 },
  "message": { "format": "markdown", "importance": "high" }
}

Conformance

A writer conforms to Threadfold 1.1.1 if every item it emits has a .md (with frontmatter) and a -metadata.json that validates against schema/v1.1.1/metadata.schema.json, every path follows the layout in SPEC.md §2, and a manifest.json validating against schema/v1.1.1/manifest.schema.json sits at the archive root.

Validate an archive with any JSON Schema validator. Example with Python:

pip install jsonschema
python - <<'PY'
import json, glob
from jsonschema import Draft202012Validator, FormatChecker
v = Draft202012Validator(json.load(open("schema/v1.1.1/metadata.schema.json")), format_checker=FormatChecker())
for f in glob.glob("examples/**/*-metadata.json", recursive=True):
    errs = list(v.iter_errors(json.load(open(f))))
    print(("FAIL " if errs else "ok   ") + f)
    for e in errs: print("   -", e.message)
PY

Repository layout

SPEC.md                     # the normative specification
schema/v1.1.1/              # JSON Schemas (the conformance mechanism)
  metadata.schema.json
  manifest.schema.json
examples/                   # worked items across 11 source types & 8 shapes, plus the
                            #   referenced-original, fan-out, windowed-stream, and
                            #   supersede patterns (synthetic data)
CHANGELOG.md
VERSIONING.md
CONTRIBUTING.md
LICENSE

Status

Threadfold 1.1.1 is the current release; 1.0.0 was the initial stable release (see CHANGELOG.md). 1.1.1 is backward-compatible — it makes the meta:frontmatter fenced block the preferred visible metadata form (SPEC.md §3.2) while keeping 1.0.0/1.1.0 content valid for import. The format is implemented by at least one capturer and one reader; see the specification's references for the standards it builds on (JMAP, ActivityStreams, schema.org, WebVTT).

License

The specification text is licensed CC BY 4.0. The JSON Schemas and examples are released under CC0 1.0 (public domain) so that implementing Threadfold carries no licensing friction. See LICENSE.

Validating locally

The same check CI runs on every push:

pip install jsonschema
python tests/validate.py examples      # or any Threadfold archive path

About

An open, on-disk format for capturing communication and work artifacts - email, chat, meetings, notes, calendar, issues, documents - as structured, portable, LLM-ready files.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages