Skip to content

Commit 8026f3d

Browse files
committed
Update ontology registry entries and regenerate build artifacts
1 parent cfe7d31 commit 8026f3d

6 files changed

Lines changed: 349 additions & 83 deletions

File tree

.claude/settings.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"permissions": {
33
"allow": [
4-
"Bash(*)",
4+
"Bash",
55
"Edit",
66
"MultiEdit",
77
"NotebookEdit",

AGENTS.md

Lines changed: 114 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,77 +1,118 @@
1-
# AGENTS.md for semantic-sql
1+
# CLAUDE.md
22

3-
SQL and SQLite builds of common OWL ontologies, including all of OBO
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
44

5-
TODO: fill in extra description here
5+
## Project Overview
66

7-
## Repo management
7+
Semantic-SQL transforms OWL/RDF ontologies into SQLite databases with standardized SQL views. Pre-built databases for all OBO ontologies are available via S3 (e.g., `https://s3.amazonaws.com/bbop-sqlite/hp.db.gz`).
8+
9+
## Key Commands
810

9-
This repo uses `uv` for managing dependencies. Never use commands like `pip` to add or manage dependencies.
10-
`uv run` is the best way to run things, unless you are using `justfile` or `makefile` target
11-
12-
`mkdocs` is used for documentation.## This is a LinkML Schema repository
13-
14-
Layout:
15-
16-
* `src/semantic_sql/schema/semantic_sql.yaml` - LinkML source schema (edit this)
17-
* `project` - derived files (do not edit these directly, they are derived from the LinkML)
18-
* `src/docs` - source markdown for documentation
19-
* `docs` - derived docs - do not edit these directly
20-
* `src/data/examples/{valid,invalid}` - example data files
21-
* always include positive examples of each class in the `valid` subfolder
22-
* include negative examples for unit tests and to help illustrate pitfalls
23-
* format is `ClassName-{SOMENAME}.yaml`
24-
* `examples` - derived examples. Do not edit these directly
25-
26-
Building and testing:
27-
28-
* `just --list` to see all commands
29-
* `just gen-project` to generate `project` files
30-
* `just test` to test schema and pos/neg examples
31-
* `just lint` analogous to ruff for python
32-
33-
These are wrappers on top of existing linkml commands such as `gen-project`, `linkml-convert`, `linkml-run-examples`.
34-
You can run the underlying commands (with `uv run ...`) but in general justfile targets should be favored.
35-
36-
Best practice:
37-
38-
* For full documentation, see https://linkml.io/linkml/
39-
* Follow LinkML naming conventions (CamelCase for classes, snake_case for slots/attributes)
40-
* For schemas with polymorphism, consider using field `type` marked as a `type_designator: true`
41-
* Include meaningful descriptions of each element
42-
* map to standards where appropriate (e.g. dcterms)
43-
* Never guess OBO term IDs. Always use the OLS MCP to look for relevant ontology terms
44-
* be proactive in using due diligence to do deep research on the domain, and look at existing standards## This is a Python repository
45-
46-
Layout:
47-
48-
* `src/semantic_sql/` - Code goes here
49-
* `docs` - mkdocs docs
50-
* `mkdocs.yml` - index of docs
51-
* `tests/input` - example files
52-
53-
Building and testing:
54-
55-
* `just --list` to see all commands
56-
* `just test` performs unit tests, doctests, ruff/liniting
57-
* `just test-full` as above plus integration tests
58-
59-
You can run the underlying commands (with `uv run ...`) but in general justfile targets should be favored.
60-
61-
Best practice:
62-
63-
* Use doctests liberally - these serve as both explanatory examples for humans and as unit tests
64-
* For longer examples, write pytest tests
65-
* always write pytest functional style rather than unittest OO style
66-
* use modern pytest idioms, including `@pytest.mark.parametrize` to test for combinations of inputs
67-
* NEVER write mock tests unless requested. I need to rely on tests to know if something breaks
68-
* For tests that have external dependencies, you can do `@pytest.mark.integration`
69-
* Do not "fix" issues by changing or weakening test conditions. Try harder, or ask questions if a test fails.
70-
* Avoid try/except blocks, these can mask bugs
71-
* Fail fast is a good principle
72-
* Follow the DRY principle
73-
* Avoid repeating chunks of code, but also avoid premature over-abstraction
74-
* Pydantic or LinkML is favored for data objects
75-
* For state in engine-style OO classes, dataclasses is favored
76-
* Declarative principles are favored
77-
* Always use type hints, always document methods and classes
11+
```bash
12+
# Build/test
13+
make test # Run unit tests
14+
poetry run pytest tests/ # Run specific tests
15+
poetry run pytest tests/test_orm/test_basic_sqla.py -k "test_name" # Single test
16+
17+
# Schema development (after editing src/semsql/linkml/*.yaml)
18+
make gen-ddl # Generate SQL DDL from LinkML
19+
make gen-sqla # Generate SQLAlchemy ORM models
20+
make gendoc # Generate documentation
21+
22+
# Ontology builds
23+
semsql make foo.db # Build SQLite from foo.owl (requires rdftab + relation-graph)
24+
semsql download cl -o cl.db # Download pre-built database
25+
make build_all # Build all OBO ontologies
26+
make s3-deploy # Deploy to S3
27+
28+
# Docker alternative
29+
docker run -v $PWD:/work -w /work -ti linkml/semantic-sql semsql make foo.db
30+
```
31+
32+
## Architecture
33+
34+
### Core Data Model
35+
36+
**Base tables** (physical storage):
37+
- `statements` - RDF triples (stanza, subject, predicate, object, value, datatype, language)
38+
- `prefix` - CURIE prefix mappings
39+
- `entailed_edge` - Pre-computed transitive closures from relation-graph
40+
41+
**All other "tables" are SQL views** defined in LinkML schemas via embedded `sqlview>>` comments:
42+
```yaml
43+
rdfs_label_statement:
44+
comments:
45+
- sqlview>> SELECT * FROM statements WHERE predicate='rdfs:label'
46+
```
47+
48+
### Build Pipeline
49+
50+
```
51+
OWL file → robot preprocessing → rdftab → SQLite statements table
52+
53+
relation-graph → entailed_edge table
54+
55+
Apply SQL views from schema
56+
```
57+
58+
External dependencies: [rdftab.rs](https://github.com/ontodev/rdftab.rs), [relation-graph](https://github.com/balhoff/relation-graph)
59+
60+
### Source Layout
61+
62+
```
63+
src/semsql/
64+
├── linkml/ # LinkML schemas (THE SOURCE OF TRUTH)
65+
│ ├── semsql.yaml # Main schema, imports all modules
66+
│ ├── rdf.yaml # RDF/RDFS abstractions
67+
│ ├── owl.yaml # OWL constructs (restrictions, expressions)
68+
│ ├── obo.yaml # OBO patterns and validation checks
69+
│ ├── omo.yaml # Ontology Metadata mappings
70+
│ └── relation_graph.yaml # Edge-based graph views
71+
├── builder/
72+
│ ├── cli.py # semsql command (make, download, query, view2table)
73+
│ ├── builder.py # Build orchestration
74+
│ ├── build.Makefile # Core db build rules
75+
│ ├── sql_schema/ # Generated SQL DDL (from LinkML)
76+
│ ├── registry/ # ontologies.yaml - non-OBO ontology registry
77+
│ └── prefixes/ # CURIE mappings
78+
├── sqla/ # Generated SQLAlchemy ORM models
79+
└── sqlutils/
80+
└── viewgen.py # Extracts SQL views from LinkML comments
81+
```
82+
83+
### Ontology Registry
84+
85+
`src/semsql/builder/registry/ontologies.yaml` defines non-OBO ontologies. After adding a new entry:
86+
87+
```bash
88+
# If you added prefixes to the entry, rebuild prefix mappings first:
89+
make build_prefixes
90+
91+
# May need to touch STAMP to force re-download:
92+
rm STAMP
93+
94+
# Build the database:
95+
make db/NAME.db
96+
97+
# Test with OAK:
98+
runoak -i db/NAME.db terms
99+
```
100+
101+
## Testing
102+
103+
Tests use pytest, not unittest. Integration tests require rdftab/relation-graph and are marked `@pytest.mark.integration`.
104+
105+
```bash
106+
poetry run pytest tests/test_orm/ # ORM tests use tests/inputs/go-nucleus.db
107+
poetry run pytest tests/test_builder/ # Builder tests
108+
```
109+
110+
## Best Practices from User
111+
112+
- Use `uv` for dependencies (never pip)
113+
- pytest functional style, use `@pytest.mark.parametrize`
114+
- Never mock tests unless explicitly requested
115+
- Avoid try/except blocks
116+
- Use doctests liberally
117+
- Never guess OBO term IDs - use OLS MCP to look them up
118+
- LinkML naming: CamelCase for classes, snake_case for slots

ontologies.Makefile

Lines changed: 91 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ download/chemrof.owl: STAMP
127127
.PRECIOUS: download/chemrof.owl
128128

129129
db/chemrof.owl: download/chemrof.owl
130-
cp $< $@
130+
robot merge -i $< -o $@
131131

132132

133133
download/deb.owl: STAMP
@@ -1425,7 +1425,7 @@ download/como.owl: STAMP
14251425
.PRECIOUS: download/como.owl
14261426

14271427
db/como.owl: download/como.owl
1428-
cp $< $@
1428+
robot merge -i $< -o $@
14291429

14301430

14311431
download/ecosim.owl: STAMP
@@ -1461,6 +1461,17 @@ db/valuesets.owl: download/valuesets.owl
14611461
robot merge -i $< -o $@
14621462

14631463

1464+
download/micront.owl: STAMP
1465+
curl -L -s https://raw.githubusercontent.com/grp-schmidt/microntology/refs/heads/main/micront.owl > $@.tmp
1466+
sha256sum -b $@.tmp > $@.sha256
1467+
mv $@.tmp $@
1468+
1469+
.PRECIOUS: download/micront.owl
1470+
1471+
db/micront.owl: download/micront.owl
1472+
cp $< $@
1473+
1474+
14641475
download/nmdc_schema.owl: STAMP
14651476
curl -L -s https://raw.githubusercontent.com/microbiomedata/nmdc-schema/main/project/owl/nmdc.owl.ttl > $@.tmp
14661477
sha256sum -b $@.tmp > $@.sha256
@@ -1549,6 +1560,83 @@ db/bfo2020_time.owl: download/bfo2020_time.owl
15491560
cp $< $@
15501561

15511562

1563+
download/saref4ener.owl: STAMP
1564+
curl -L -s https://saref.etsi.org/saref4ener/ > $@.tmp
1565+
sha256sum -b $@.tmp > $@.sha256
1566+
mv $@.tmp $@
1567+
1568+
.PRECIOUS: download/saref4ener.owl
1569+
1570+
db/saref4ener.owl: download/saref4ener.owl
1571+
curl -sL -H 'Accept: text/turtle' https://saref.etsi.org/saref4ener/ > $@.ttl && robot convert -i $@.ttl -o $@ && rm $@.ttl
1572+
1573+
1574+
download/saref4bldg.owl: STAMP
1575+
curl -L -s https://saref.etsi.org/saref4bldg/ > $@.tmp
1576+
sha256sum -b $@.tmp > $@.sha256
1577+
mv $@.tmp $@
1578+
1579+
.PRECIOUS: download/saref4bldg.owl
1580+
1581+
db/saref4bldg.owl: download/saref4bldg.owl
1582+
curl -sL -H 'Accept: text/turtle' https://saref.etsi.org/saref4bldg/ > $@.ttl && robot convert -i $@.ttl -o $@ && rm $@.ttl
1583+
1584+
1585+
download/hhearvs.owl: STAMP
1586+
curl -L -s https://data.bioontology.org/ontologies/HHEARVS/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb > $@.tmp
1587+
sha256sum -b $@.tmp > $@.sha256
1588+
mv $@.tmp $@
1589+
1590+
.PRECIOUS: download/hhearvs.owl
1591+
1592+
db/hhearvs.owl: download/hhearvs.owl
1593+
perl -npe 's@<owl:imports.*/>@@g; s@skos:broader@rdfs:subClassOf@g; s@skos:prefLabel@rdfs:label@g; s@owl:NamedIndividual@owl:Class@g' $< > $@.tmp && robot convert -i $@.tmp -o $@ && rm $@.tmp
1594+
1595+
1596+
download/sdoho.owl: STAMP
1597+
curl -L -s https://data.bioontology.org/ontologies/SDOHO/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb > $@.tmp
1598+
sha256sum -b $@.tmp > $@.sha256
1599+
mv $@.tmp $@
1600+
1601+
.PRECIOUS: download/sdoho.owl
1602+
1603+
db/sdoho.owl: download/sdoho.owl
1604+
perl -npe 's@<owl:imports.*/>@@g' $< > $@.tmp && robot convert -i $@.tmp -o $@ && rm $@.tmp
1605+
1606+
1607+
download/pathgo.owl: STAMP
1608+
curl -L -s https://raw.githubusercontent.com/jhuapl-bio/pathogenesis-gene-ontology/refs/heads/master/pathgo.owl > $@.tmp
1609+
sha256sum -b $@.tmp > $@.sha256
1610+
mv $@.tmp $@
1611+
1612+
.PRECIOUS: download/pathgo.owl
1613+
1614+
db/pathgo.owl: download/pathgo.owl
1615+
cp $< $@
1616+
1617+
1618+
download/brick.owl: STAMP
1619+
curl -L -s https://github.com/BrickSchema/Brick/releases/download/v1.4.0/Brick.ttl > $@.tmp
1620+
sha256sum -b $@.tmp > $@.sha256
1621+
mv $@.tmp $@
1622+
1623+
.PRECIOUS: download/brick.owl
1624+
1625+
db/brick.owl: download/brick.owl
1626+
sed '48061,48072d' $< > $@.tmp && robot convert -i $@.tmp -f owl -o $@ && rm $@.tmp
1627+
1628+
1629+
download/minsysont.owl: STAMP
1630+
curl -L -s https://raw.githubusercontent.com/adavarpa/CMO-Critical-Minerals-Ontology-/refs/heads/main/CMO.owl > $@.tmp
1631+
sha256sum -b $@.tmp > $@.sha256
1632+
mv $@.tmp $@
1633+
1634+
.PRECIOUS: download/minsysont.owl
1635+
1636+
db/minsysont.owl: download/minsysont.owl
1637+
robot merge -i $< -o $@.tmp.owl && perl -npe 's/\[HSiO4\]/%5BHSiO4%5D/g' $@.tmp.owl > $@ && rm $@.tmp.owl
1638+
1639+
15521640
download/%.owl: STAMP
15531641
curl -L -s http://purl.obolibrary.org/obo/$*.owl > $@.tmp
15541642
sha256sum -b $@.tmp > $@.sha256
@@ -1559,4 +1647,4 @@ download/%.owl: STAMP
15591647
db/%.owl: download/%.owl
15601648
robot merge -i $< -o $@
15611649

1562-
EXTRA_ONTOLOGIES = swo chiro pcl chemessence ogco ncit fma maxo foodon chebiplus msio chemrof deb matpo panet phenx pride sosa emi npc modl phenio comploinc hba mba dmba dhba pba bero aio reacto xsmo bcio sio icd10who icd11f ordo gard icd10cm omim mondo-ingest oeo envthes wifire taxslim goldterms sdgio kin metpo d3o biovoices omop comet cco occo iof upa go go-lego go-amigo neo bao orcid ror cpont biolink biopax enanomapper mlo ito chemont molgenie cso obiws biopragmatics-reactome reactome-hs reactome-mm efo hcao hpinternational edam chr sweetAll oboe-core oboe-standards lov schema-dot-org prov dtype vaem qudtunit quantitykind cellosaurus cosmo gist gistBFO fhkb dbpendiaont uberoncm co_324 ppeo interpro pfam hgnc.genegroup hgnc sgd gtdb eccode uniprot uniprot.ptm credit rhea swisslipid drugbank drugcentral complexportal wikipathways pathbank kegg.genome drugmechdb rxnorm vccf ontobiotope nando ecso enigma_context cbo ontie pain como ecosim bervo valuesets nmdc_schema mixs kgcl fibo bfo2020 bfo2020_core bfo2020_notime bfo2020_time
1650+
EXTRA_ONTOLOGIES = swo chiro pcl chemessence ogco ncit fma maxo foodon chebiplus msio chemrof deb matpo panet phenx pride sosa emi npc modl phenio comploinc hba mba dmba dhba pba bero aio reacto xsmo bcio sio icd10who icd11f ordo gard icd10cm omim mondo-ingest oeo envthes wifire taxslim goldterms sdgio kin metpo d3o biovoices omop comet cco occo iof upa go go-lego go-amigo neo bao orcid ror cpont biolink biopax enanomapper mlo ito chemont molgenie cso obiws biopragmatics-reactome reactome-hs reactome-mm efo hcao hpinternational edam chr sweetAll oboe-core oboe-standards lov schema-dot-org prov dtype vaem qudtunit quantitykind cellosaurus cosmo gist gistBFO fhkb dbpendiaont uberoncm co_324 ppeo interpro pfam hgnc.genegroup hgnc sgd gtdb eccode uniprot uniprot.ptm credit rhea swisslipid drugbank drugcentral complexportal wikipathways pathbank kegg.genome drugmechdb rxnorm vccf ontobiotope nando ecso enigma_context cbo ontie pain como ecosim bervo valuesets micront nmdc_schema mixs kgcl fibo bfo2020 bfo2020_core bfo2020_notime bfo2020_time saref4ener saref4bldg hhearvs sdoho pathgo brick minsysont

src/semsql/builder/prefixes/prefixes.csv

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,8 +113,10 @@ GARD,http://purl.obolibrary.org/obo/GARD_
113113
ICD10CM,http://purl.bioontology.org/ontology/ICD10CM/
114114
OMIM,https://omim.org/entry/
115115
OMIMPS,https://omim.org/phenotypicSeries/PS
116-
OEO,http://openenergy-platform.org/ontology/oeo/OEO_
117-
OEOX,http://openenergy-platform.org/ontology/oeo/OEOX_
116+
OEO,https://openenergyplatform.org/ontology/oeo/OEO_
117+
OEOX,https://openenergyplatform.org/ontology/oeo/OEOX_
118+
MENO,https://raw.githubusercontent.com/stap-m/midlevel-energy-ontology/main/ontology/src/midlevel-energy.owl/MENO_
119+
OEO.CCO,http://www.ontologyrepository.com/CommonCoreOntologies/
118120
envthes,http://vocabs.lter-europe.net/EnvThes/
119121
omv,http://omv.ontoware.org/2005/05/
120122
iadopt,https://w3id.org/iadopt/ont/
@@ -146,6 +148,7 @@ MGI,http://identifiers.org/mgi/MGI:
146148
TAIR.LOCUS,http://arabidopsis.org/servlets/TairObject?type=locus&name=
147149
EcoCyc,https://ecocyc.org/gene?id=
148150
orcid,https://orcid.org/
151+
geonames,https://www.geonames.org/
149152
ror,https://ror.org/
150153
evs.ncit,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#
151154
old.fix,http://purl.org/obo/owl/FIX#
@@ -215,13 +218,40 @@ ECOSIM,http://purl.obolibrary.org/obo/ECOSIM_
215218
ECOSIMCONCEPT,http://purl.obolibrary.org/obo/ECOSIMCONCEPT_
216219
BERVO,https://w3id.org/bervo/BERVO_
217220
VALUESETS,https://w3id.org/valuesets/
221+
MICRONT,http://purl.obolibrary.org/obo/micront.owl#MICRONT_
218222
nmdc,https://w3id.org/nmdc/
219223
linkml,https://w3id.org/linkml/
220224
mixs,https://w3id.org/mixs/
221225
mixs,https://w3id.org/mixs/
222226
kgcl,https://w3id.org/kgcl/
223227
fibo,https://spec.edmcouncil.org/fibo/ontology/
224228
cmnsav,https://www.omg.org/spec/Commons/AnnotationVocabulary/
229+
saref4ener,https://saref.etsi.org/saref4ener/
230+
saref,https://saref.etsi.org/core/
231+
s4ener,https://saref.etsi.org/saref4ener/
232+
saref4bldg,https://saref.etsi.org/saref4bldg/
233+
saref,https://saref.etsi.org/core/
234+
s4bldg,https://saref.etsi.org/saref4bldg/
235+
HHEAR,http://purl.org/twc/HHEAR_
236+
HHEARVS,http://purl.org/twc/HHEARVS_
237+
HHEARVS.nihreporter,https://reporter.nih.gov/search/2NC9YrDMM0SiuN8TnuRgKg/project-details/
238+
SDOHO,https://sbmi.uth.edu/bsdi/SDoHO#
239+
PATHGO,http://purl.obolibrary.org/obo/PATHGO_
240+
brick,https://brickschema.org/schema/Brick#
241+
bsh,https://brickschema.org/schema/BrickShape#
242+
tag,https://brickschema.org/schema/BrickTag#
243+
ref,https://brickschema.org/schema/Brick/ref#
244+
rec,https://w3id.org/rec#
245+
bacnet,http://data.ashrae.org/bacnet/2020#
246+
s223,http://data.ashrae.org/standard223#
247+
qudt,http://qudt.org/schema/qudt/
248+
qudtqk,http://qudt.org/vocab/quantitykind/
249+
unit,http://qudt.org/vocab/unit/
250+
sosa,http://www.w3.org/ns/sosa/
251+
sh,http://www.w3.org/ns/shacl#
252+
vcard,http://www.w3.org/2006/vcard/ns#
253+
MinSysOnt,http://www.semanticweb.org/hbabaie/ontologies/2021/5/MinSysOnt#
254+
CMO.minerals,http://www.semanticweb.org/Davarpanah-Babaie/Ontologies/2022/CMO#
225255
RBO,http://purl.obolibrary.org/obo/RBO_
226256
RBO,http://purl.obolibrary.org/obo/RBO_
227257
CLYH,http://purl.obolibrary.org/obo/CLYH_

0 commit comments

Comments
 (0)