Citation retrieval and Markdown extraction sidecar for AI agents — turns BibTeX references into local PDF and Markdown folders.
citation-needed ingests a BibTeX file, stores citation metadata, downloads PDFs when they can be resolved, and converts those PDFs into Markdown using JavaScript.
- 📚 BibTeX-first workflow — process a
.bibfile in one run - 🗄️ SQLite database — track citation metadata, PDF paths, and processing status
- 🔓 Open-access retrieval — arXiv API and Unpaywall API
- 🔒 Authenticated PDF download — via Playwright for proxy-gated content (optional)
- 📝 PDF to Markdown extraction — convert downloaded PDFs into Markdown in JavaScript
- 📁 Automatic output folders — write PDFs to
papers/pdf/and Markdown topapers/markdown/by default - 🤖 MCP server — Model Context Protocol tools for citation metadata and retrieval
npm install -g citation-needed
# or run locally:
git clone https://github.com/your-org/citation-needed
cd citation-needed
npm install
npm run buildFor authenticated PDF downloads via Playwright (optional):
npm install playwright
npx playwright install chromium# Import a BibTeX file, download PDFs into ./papers/pdf, and write Markdown into ./papers/markdown
citation-needed import-bibtex references.bib
# Override the PDF output directory for the run
citation-needed import-bibtex references.bib --paper-path ./downloaded-papers
# List stored citations
citation-needed list
# Download a single PDF manually if needed
citation-needed download 10.1234/example.doi --url https://arxiv.org/pdf/2301.12345
# Start the MCP server (stdio transport)
citation-needed serverBy default, import-bibtex writes PDFs to a papers/pdf/ folder next to the BibTeX file and Markdown files to papers/markdown/. File naming is determined by the current retrieval and conversion pipeline. Use --paper-path to change the PDF output directory for that run.
| Variable | Default | Description |
|---|---|---|
CITATION_NEEDED_DIR |
~/.citation-needed |
Base data directory (auth config, db, pdf defaults) |
CITATION_NEEDED_DB |
~/.citation-needed/citations.db |
Path to SQLite database |
CITATION_NEEDED_PDF_DIR |
~/.citation-needed/pdfs |
Fallback directory for standalone PDF downloads |
CITATION_NEEDED_EMAIL |
(unset) | Contact email sent to Unpaywall and the Crossref User-Agent |
LOG_LEVEL |
info |
Logger verbosity: debug / info / warn / error / silent |
See .env.example for a copy-paste starter.
Add to your MCP client configuration (e.g., Claude Desktop claude_desktop_config.json):
{
"mcpServers": {
"citation-needed": {
"command": "citation-needed",
"args": ["server"]
}
}
}| Tool | Description | Required params |
|---|---|---|
get-citation |
Get citation details | doi |
import-bibtex |
Import citations from BibTeX string into the database | bibtex |
download-pdf |
Download PDF for a citation | doi |
list-citations |
List all citations | — |
search-arxiv |
Search arXiv by paper title | title |
npm run build # Compile TypeScript
npm test # Run Jest tests
npm run dev # Run with ts-node (no build required)src/
├── index.ts # Entry point
├── models/ # Shared TypeScript interfaces
├── utils/ # Logger, RateLimiter, file helpers
├── parsers/ # BibTeX, DOI, URL parsers
├── db/ # SQLite database (schema + Database class)
├── retrieval/
│ ├── resolvers/ # arXiv, Unpaywall, DOI/Crossref resolvers
│ ├── downloaders/ # Open-access + authenticated PDF downloaders
│ ├── publishers/ # Publisher URL adapters (Springer, Elsevier, ACM)
│ └── index.ts # RetrievalOrchestrator
├── verification/ # PDF Markdown extraction helpers
├── workflows/ # BibTeX batch processing workflow
├── mcp/ # MCP server with per-tool modules
├── tui/ # Ink React components
└── cli/ # Commander CLI with per-command files
MIT
