CLLM Review

A web app for scientists to review LLM-extracted claims and evaluations from research papers. The cllm tool produces claims.json and eval_llm.json per paper — this UI lets a reviewer browse results, agree/disagree, override judgments, add comments, compare runs, and save reviews as per-reviewer JSON files alongside existing paper data.

Setup

Requires Python 3.12+ and uv.

Clone the repository and cd into it.
Install dependencies:
```
uv sync
```
Start the server:
```
uv run python main.py
```
Open http://127.0.0.1:8000 in your browser (or set the PORT environment variable to use a different port).

The papers/ directory should contain paper data (XML source files, claims.json, eval_llm.json) for the UI to display.

To generate cross-run similarity reports (displayed in the UI), run compare_results.py for any paper that has multiple runs:

uv run python compare_results.py \
    --runs papers/<paper-id>/<run-a> papers/<paper-id>/<run-b> \
    --labels "run-a-label" "run-b-label" \
    --output papers/<paper-id>/compare_results_report.json

The report is saved as compare_results_report.json in the paper directory and is automatically displayed in the UI on that paper's runs page.

Directory Structure

Each paper lives in its own directory under papers/, identified by a bioRxiv DOI suffix or custom name. Within each paper directory, dated subdirectories hold versioned pipeline runs. When you save a review in the UI, it is written to a reviews/ folder inside the corresponding run directory.

papers/
├── 2025.12.02.691876/              # Paper directory (bioRxiv DOI suffix)
│   ├── 2025.12.02.691876.source.xml  # GROBID-parsed TEI/JATS XML
│   ├── compare_results_report.json   # Cross-run similarity report (generated by compare_results.py)
│   ├── comparisons/                  # Per-reviewer run comparison files
│   │   └── comparison_Dr_Smith.json
│   └── 20260206/                     # Run directory (dated)
│       ├── claims.json               # Extracted claims
│       ├── eval_llm.json             # LLM evaluation results
│       └── reviews/                  # Created automatically on first save
│           ├── review_Dr_Smith.json
│           └── review_Jane_Doe.json
└── nikbakht_diamond/               # Paper directory (custom name)
    ├── elife-66429-v2.pdf.tei.xml
    ├── compare_results_report.json
    ├── comparisons/
    └── 20260316_anthropic/
        ├── claims.json
        ├── eval_llm.json
        └── reviews/

Submitting Your Review

Once you've completed your review, submit it via a pull request:

Fork this repository on GitHub and clone your fork:

git clone https://github.com/<your-username>/<repo-name>.git
cd <repo-name>

Create a branch for your review:
```
git checkout -b review/<your-name>
```
Complete your review in the UI, then add your review and comparison file(s). The patterns below cover all nesting levels (see Directory Structure):
```
git add papers/*/reviews/ papers/*/*/reviews/ papers/*/comparisons/
```
To submit a review for a specific paper only:
```
git add papers/<paper-id>/*/reviews/ papers/<paper-id>/comparisons/
```

Commit and push:

git commit -m "Review: <paper title> by <your name>"
git push -u origin review/<your-name>

Open a pull request on GitHub targeting the original repository's main branch.

Each reviewer should use their own branch. If you are reviewing multiple papers, you can include all review files in a single PR.

How It Works

On first load, a name prompt collects your reviewer name once. It is reused for all reviews and comparisons in that session.

The UI has three views, navigable with the browser's back/forward buttons:

Paper list — Browse all papers. Includes reviewer instructions and a Refresh button to pick up newly added runs without restarting the server.
Runs list — Click a paper to see all its pipeline runs. Hover over any row to see a tooltip with model, cost, and processing time. If a compare_results_report.json exists for the paper, a Run Similarity Analysis section appears showing an automated cross-run comparison: a summary metrics table (result similarity, match rate, evaluation type agreement, result type agreement, claim-set Jaccard, and claim similarity) and a per-pair accordion with matched/unmatched results and their underlying claim pairs. Column headers have hover tooltips explaining each metric. Below that, a Compare Runs section lets you drag runs into a ranked order and add free-form notes about which run performed better and why. This is saved per-reviewer as comparisons/comparison_{name}.json at the paper level.
Paper review — Click a run to open it. If you have an existing review, you are asked whether to continue or start fresh; otherwise you go straight in. The review page includes:
- A sticky reference sidebar with key definitions and review actions
- Scrollable result cards showing the LLM's grouped evaluation
- Agree/Disagree on each result, with override dropdowns on disagree
- Accept/Oppose on individual claims
- "View in paper" side panel that highlights the source passage in the original text
- Per-claim and per-result comments
- An Overall Review text box at the bottom for free-form observations
- Auto-save throughout (progress is saved automatically as you review)

Reviews are saved as review_{reviewer_name}.json in the reviews/ subdirectory (see Directory Structure), so multiple reviewers can work on the same paper independently.

Tech Stack

Backend: FastAPI + Jinja2 + uvicorn
Frontend: Vanilla HTML/CSS/JS (no build step)
Storage: JSON files in reviews/ subdirectories per paper

API Endpoints

Method	Path	Description
`GET`	`/`	Serve the review UI
`GET`	`/api/papers`	List all reviewable paper/run combos
`GET`	`/api/papers/{paper_id}/{run_id}/results`	Eval results with claims inlined
`GET`	`/api/papers/{paper_id}/text`	Paper text extracted from XML
`GET`	`/api/papers/{paper_id}/{run_id}/review?reviewer=`	Load a reviewer's review
`POST`	`/api/papers/{paper_id}/{run_id}/review`	Save/merge a review
`GET`	`/api/papers/{paper_id}/comparison?reviewer=`	Load a reviewer's run comparison
`POST`	`/api/papers/{paper_id}/comparison`	Save a run comparison
`GET`	`/api/papers/{paper_id}/results-comparison`	Load cross-run similarity report
`POST`	`/api/papers/refresh`	Clear paper discovery cache

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
app		app
papers		papers
templates		templates
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
compare_results.py		compare_results.py
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLLM Review

Setup

Directory Structure

Submitting Your Review

How It Works

Tech Stack

API Endpoints

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CLLM Review

Setup

Directory Structure

Submitting Your Review

How It Works

Tech Stack

API Endpoints

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages