A web app for scientists to review LLM-extracted claims and evaluations from research papers. The cllm tool produces claims.json and eval_llm.json per paper — this UI lets a reviewer browse results, agree/disagree, override judgments, add comments, compare runs, and save reviews as per-reviewer JSON files alongside existing paper data.
Requires Python 3.12+ and uv.
-
Clone the repository and
cdinto it. -
Install dependencies:
uv sync
-
Start the server:
uv run python main.py
-
Open http://127.0.0.1:8000 in your browser (or set the
PORTenvironment variable to use a different port).
The papers/ directory should contain paper data (XML source files, claims.json, eval_llm.json) for the UI to display.
To generate cross-run similarity reports (displayed in the UI), run compare_results.py for any paper that has multiple runs:
uv run python compare_results.py \
--runs papers/<paper-id>/<run-a> papers/<paper-id>/<run-b> \
--labels "run-a-label" "run-b-label" \
--output papers/<paper-id>/compare_results_report.jsonThe report is saved as compare_results_report.json in the paper directory and is automatically displayed in the UI on that paper's runs page.
Each paper lives in its own directory under papers/, identified by a bioRxiv DOI suffix or custom name. Within each paper directory, dated subdirectories hold versioned pipeline runs. When you save a review in the UI, it is written to a reviews/ folder inside the corresponding run directory.
papers/
├── 2025.12.02.691876/ # Paper directory (bioRxiv DOI suffix)
│ ├── 2025.12.02.691876.source.xml # GROBID-parsed TEI/JATS XML
│ ├── compare_results_report.json # Cross-run similarity report (generated by compare_results.py)
│ ├── comparisons/ # Per-reviewer run comparison files
│ │ └── comparison_Dr_Smith.json
│ └── 20260206/ # Run directory (dated)
│ ├── claims.json # Extracted claims
│ ├── eval_llm.json # LLM evaluation results
│ └── reviews/ # Created automatically on first save
│ ├── review_Dr_Smith.json
│ └── review_Jane_Doe.json
└── nikbakht_diamond/ # Paper directory (custom name)
├── elife-66429-v2.pdf.tei.xml
├── compare_results_report.json
├── comparisons/
└── 20260316_anthropic/
├── claims.json
├── eval_llm.json
└── reviews/
Once you've completed your review, submit it via a pull request:
-
Fork this repository on GitHub and clone your fork:
git clone https://github.com/<your-username>/<repo-name>.git cd <repo-name>
-
Create a branch for your review:
git checkout -b review/<your-name>
-
Complete your review in the UI, then add your review and comparison file(s). The patterns below cover all nesting levels (see Directory Structure):
git add papers/*/reviews/ papers/*/*/reviews/ papers/*/comparisons/
To submit a review for a specific paper only:
git add papers/<paper-id>/*/reviews/ papers/<paper-id>/comparisons/
-
Commit and push:
git commit -m "Review: <paper title> by <your name>" git push -u origin review/<your-name>
-
Open a pull request on GitHub targeting the original repository's
mainbranch.
Each reviewer should use their own branch. If you are reviewing multiple papers, you can include all review files in a single PR.
On first load, a name prompt collects your reviewer name once. It is reused for all reviews and comparisons in that session.
The UI has three views, navigable with the browser's back/forward buttons:
-
Paper list — Browse all papers. Includes reviewer instructions and a Refresh button to pick up newly added runs without restarting the server.
-
Runs list — Click a paper to see all its pipeline runs. Hover over any row to see a tooltip with model, cost, and processing time. If a
compare_results_report.jsonexists for the paper, a Run Similarity Analysis section appears showing an automated cross-run comparison: a summary metrics table (result similarity, match rate, evaluation type agreement, result type agreement, claim-set Jaccard, and claim similarity) and a per-pair accordion with matched/unmatched results and their underlying claim pairs. Column headers have hover tooltips explaining each metric. Below that, a Compare Runs section lets you drag runs into a ranked order and add free-form notes about which run performed better and why. This is saved per-reviewer ascomparisons/comparison_{name}.jsonat the paper level. -
Paper review — Click a run to open it. If you have an existing review, you are asked whether to continue or start fresh; otherwise you go straight in. The review page includes:
- A sticky reference sidebar with key definitions and review actions
- Scrollable result cards showing the LLM's grouped evaluation
- Agree/Disagree on each result, with override dropdowns on disagree
- Accept/Oppose on individual claims
- "View in paper" side panel that highlights the source passage in the original text
- Per-claim and per-result comments
- An Overall Review text box at the bottom for free-form observations
- Auto-save throughout (progress is saved automatically as you review)
Reviews are saved as review_{reviewer_name}.json in the reviews/ subdirectory (see Directory Structure), so multiple reviewers can work on the same paper independently.
- Backend: FastAPI + Jinja2 + uvicorn
- Frontend: Vanilla HTML/CSS/JS (no build step)
- Storage: JSON files in
reviews/subdirectories per paper
| Method | Path | Description |
|---|---|---|
GET |
/ |
Serve the review UI |
GET |
/api/papers |
List all reviewable paper/run combos |
GET |
/api/papers/{paper_id}/{run_id}/results |
Eval results with claims inlined |
GET |
/api/papers/{paper_id}/text |
Paper text extracted from XML |
GET |
/api/papers/{paper_id}/{run_id}/review?reviewer= |
Load a reviewer's review |
POST |
/api/papers/{paper_id}/{run_id}/review |
Save/merge a review |
GET |
/api/papers/{paper_id}/comparison?reviewer= |
Load a reviewer's run comparison |
POST |
/api/papers/{paper_id}/comparison |
Save a run comparison |
GET |
/api/papers/{paper_id}/results-comparison |
Load cross-run similarity report |
POST |
/api/papers/refresh |
Clear paper discovery cache |