A stem-cell differentiation copilot for computational biology teams.
OpenLineage is an AI agent built to serve the computational biology workflow at stem-cell engineering companies. It acts as five tools in one system — single-cell analyst, differentiation predictor, protocol scout, batch-QC monitor, and literature curator — driven from a single chat interface.
Drop in a scRNA-seq dataset and ask "what lineage is this drifting toward, and which factors would push it to cardiomyocyte fate?"
The agent annotates cell types against CELLxGENE Discover, the Human Cell Atlas, and Tabula Sapiens, maps differentiation trajectories, and cross-references transcription-factor targets from Open Targets, regulatory networks from Reactome and KEGG, and protein-level evidence from UniProt and the Human Protein Atlas. For protocol design it pulls published differentiation recipes from PubMed, bioRxiv, and Europe PMC, and surfaces small-molecule modulators from ChEMBL and DrugBank. For translational and regulatory context it queries ClinicalTrials.gov for cell-therapy trials and openFDA for cell-based product filings.
Given a scRNA-seq dataset, return the most likely lineage trajectory, drift risk, and the transcription factors / small molecules that would bias the population toward a target fate.
Given a target cell type, recommend differentiation strategies from the published literature and predict batch-consistency risk from historical scRNA-seq profiles.
| Module | Purpose |
|---|---|
| Annotator | Cell-type calls against CELLxGENE / HCA / Tabula Sapiens, built on Scanpy + scVI |
| Trajectory | Pseudotime and lineage prediction with CellRank |
| Retrieval | Unified, cached query layer across 9+ public sources |
| Protocol scout | Mines bioRxiv / PubMed / Europe PMC for differentiation recipes |
| Batch-QC monitor | Flags runs drifting > 2 SD from a reference embedding |
| Dashboard | Streamlit chat-driven UI |
┌──────────────────────────┐
User question ─────────▶│ Agent (LangChain) │
└──────────┬───────────────┘
│
┌──────────────┬───────────────┼───────────────┬──────────────┐
▼ ▼ ▼ ▼ ▼
Annotation Trajectory Retrieval Protocol Batch-QC
(Scanpy, (CellRank, (cached over scout monitor
scVI) scFates) 9 sources) (bioRxiv, (drift vs.
Europe PMC) reference)
│
┌──────────────────────────┼──────────────────────────┐
▼ ▼ ▼
Single-cell refs Targets & pathways Chem & literature
CELLxGENE Discover Open Targets ChEMBL · DrugBank
Human Cell Atlas Reactome · KEGG PubMed · bioRxiv
Tabula Sapiens UniProt · HPA Europe PMC
ClinicalTrials.gov
openFDA
│
▼
Streamlit dashboard + benchmark dataset
- 📦 Open benchmark dataset of curated stem-cell differentiation references.
- 📊 Streamlit dashboard with chat-driven analysis, tracking lineage decisions across experiments.
- 🧰 Python library (
openlineage) that wet-lab and computational teams can call from notebooks or pipelines.
Python · PyTorch · Scanpy · scVI · CellRank · Seurat (R) · Hugging Face Transformers · LangChain · FastAPI · DuckDB · Streamlit · AWS S3 · GitHub Actions
# clone
git clone https://github.com/ds4cabs/OpenLineage.git
cd OpenLineage
# install (uv recommended)
uv sync
# or: pip install -e .
# launch the dashboard
uv run streamlit run src/openlineage/app.pyEnd-to-end notebook walkthroughs land in docs/ and notebooks/ as
modules ship.
🚧 Active development — built as a CABS Data Science Summer Intern Program 2026 project (June – August 2026). Milestones tracked in Issues.
Performance targets the team is shooting for:
- Lineage predictions returned in under 90 s for datasets up to 200K cells.
- Unified retrieval over 9 public sources, with ~12K cached query results.
- 400+ published differentiation protocols ranked by lineage-match score.
- Batch-QC validated against 8 publicly released iPSC differentiation time courses.
MIT — see LICENSE. The curated benchmark dataset is released under the same terms; individual source datasets retain their original licenses.
OpenLineage is a 2026 intern project of the Chinese American Biopharmaceutical Society (CABS), under the DS4CABS open-source data-science initiative.