|
1 | | -# AGENTS.md for semantic-sql |
| 1 | +# CLAUDE.md |
2 | 2 |
|
3 | | -SQL and SQLite builds of common OWL ontologies, including all of OBO |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
4 | 4 |
|
5 | | -TODO: fill in extra description here |
| 5 | +## Project Overview |
6 | 6 |
|
7 | | -## Repo management |
| 7 | +Semantic-SQL transforms OWL/RDF ontologies into SQLite databases with standardized SQL views. Pre-built databases for all OBO ontologies are available via S3 (e.g., `https://s3.amazonaws.com/bbop-sqlite/hp.db.gz`). |
| 8 | + |
| 9 | +## Key Commands |
8 | 10 |
|
9 | | -This repo uses `uv` for managing dependencies. Never use commands like `pip` to add or manage dependencies. |
10 | | -`uv run` is the best way to run things, unless you are using `justfile` or `makefile` target |
11 | | - |
12 | | -`mkdocs` is used for documentation.## This is a LinkML Schema repository |
13 | | - |
14 | | -Layout: |
15 | | - |
16 | | - * `src/semantic_sql/schema/semantic_sql.yaml` - LinkML source schema (edit this) |
17 | | - * `project` - derived files (do not edit these directly, they are derived from the LinkML) |
18 | | - * `src/docs` - source markdown for documentation |
19 | | - * `docs` - derived docs - do not edit these directly |
20 | | - * `src/data/examples/{valid,invalid}` - example data files |
21 | | - * always include positive examples of each class in the `valid` subfolder |
22 | | - * include negative examples for unit tests and to help illustrate pitfalls |
23 | | - * format is `ClassName-{SOMENAME}.yaml` |
24 | | - * `examples` - derived examples. Do not edit these directly |
25 | | - |
26 | | -Building and testing: |
27 | | - |
28 | | -* `just --list` to see all commands |
29 | | -* `just gen-project` to generate `project` files |
30 | | -* `just test` to test schema and pos/neg examples |
31 | | -* `just lint` analogous to ruff for python |
32 | | - |
33 | | -These are wrappers on top of existing linkml commands such as `gen-project`, `linkml-convert`, `linkml-run-examples`. |
34 | | -You can run the underlying commands (with `uv run ...`) but in general justfile targets should be favored. |
35 | | - |
36 | | -Best practice: |
37 | | - |
38 | | -* For full documentation, see https://linkml.io/linkml/ |
39 | | -* Follow LinkML naming conventions (CamelCase for classes, snake_case for slots/attributes) |
40 | | -* For schemas with polymorphism, consider using field `type` marked as a `type_designator: true` |
41 | | -* Include meaningful descriptions of each element |
42 | | -* map to standards where appropriate (e.g. dcterms) |
43 | | -* Never guess OBO term IDs. Always use the OLS MCP to look for relevant ontology terms |
44 | | -* be proactive in using due diligence to do deep research on the domain, and look at existing standards## This is a Python repository |
45 | | - |
46 | | -Layout: |
47 | | - |
48 | | - * `src/semantic_sql/` - Code goes here |
49 | | - * `docs` - mkdocs docs |
50 | | - * `mkdocs.yml` - index of docs |
51 | | - * `tests/input` - example files |
52 | | - |
53 | | -Building and testing: |
54 | | - |
55 | | -* `just --list` to see all commands |
56 | | -* `just test` performs unit tests, doctests, ruff/liniting |
57 | | -* `just test-full` as above plus integration tests |
58 | | - |
59 | | -You can run the underlying commands (with `uv run ...`) but in general justfile targets should be favored. |
60 | | - |
61 | | -Best practice: |
62 | | - |
63 | | -* Use doctests liberally - these serve as both explanatory examples for humans and as unit tests |
64 | | -* For longer examples, write pytest tests |
65 | | -* always write pytest functional style rather than unittest OO style |
66 | | -* use modern pytest idioms, including `@pytest.mark.parametrize` to test for combinations of inputs |
67 | | -* NEVER write mock tests unless requested. I need to rely on tests to know if something breaks |
68 | | -* For tests that have external dependencies, you can do `@pytest.mark.integration` |
69 | | -* Do not "fix" issues by changing or weakening test conditions. Try harder, or ask questions if a test fails. |
70 | | -* Avoid try/except blocks, these can mask bugs |
71 | | -* Fail fast is a good principle |
72 | | -* Follow the DRY principle |
73 | | -* Avoid repeating chunks of code, but also avoid premature over-abstraction |
74 | | -* Pydantic or LinkML is favored for data objects |
75 | | -* For state in engine-style OO classes, dataclasses is favored |
76 | | -* Declarative principles are favored |
77 | | -* Always use type hints, always document methods and classes |
| 11 | +```bash |
| 12 | +# Build/test |
| 13 | +make test # Run unit tests |
| 14 | +poetry run pytest tests/ # Run specific tests |
| 15 | +poetry run pytest tests/test_orm/test_basic_sqla.py -k "test_name" # Single test |
| 16 | + |
| 17 | +# Schema development (after editing src/semsql/linkml/*.yaml) |
| 18 | +make gen-ddl # Generate SQL DDL from LinkML |
| 19 | +make gen-sqla # Generate SQLAlchemy ORM models |
| 20 | +make gendoc # Generate documentation |
| 21 | + |
| 22 | +# Ontology builds |
| 23 | +semsql make foo.db # Build SQLite from foo.owl (requires rdftab + relation-graph) |
| 24 | +semsql download cl -o cl.db # Download pre-built database |
| 25 | +make build_all # Build all OBO ontologies |
| 26 | +make s3-deploy # Deploy to S3 |
| 27 | + |
| 28 | +# Docker alternative |
| 29 | +docker run -v $PWD:/work -w /work -ti linkml/semantic-sql semsql make foo.db |
| 30 | +``` |
| 31 | + |
| 32 | +## Architecture |
| 33 | + |
| 34 | +### Core Data Model |
| 35 | + |
| 36 | +**Base tables** (physical storage): |
| 37 | +- `statements` - RDF triples (stanza, subject, predicate, object, value, datatype, language) |
| 38 | +- `prefix` - CURIE prefix mappings |
| 39 | +- `entailed_edge` - Pre-computed transitive closures from relation-graph |
| 40 | + |
| 41 | +**All other "tables" are SQL views** defined in LinkML schemas via embedded `sqlview>>` comments: |
| 42 | +```yaml |
| 43 | +rdfs_label_statement: |
| 44 | + comments: |
| 45 | + - sqlview>> SELECT * FROM statements WHERE predicate='rdfs:label' |
| 46 | +``` |
| 47 | +
|
| 48 | +### Build Pipeline |
| 49 | +
|
| 50 | +``` |
| 51 | +OWL file → robot preprocessing → rdftab → SQLite statements table |
| 52 | + ↓ |
| 53 | + relation-graph → entailed_edge table |
| 54 | + ↓ |
| 55 | + Apply SQL views from schema |
| 56 | +``` |
| 57 | + |
| 58 | +External dependencies: [rdftab.rs](https://github.com/ontodev/rdftab.rs), [relation-graph](https://github.com/balhoff/relation-graph) |
| 59 | + |
| 60 | +### Source Layout |
| 61 | + |
| 62 | +``` |
| 63 | +src/semsql/ |
| 64 | +├── linkml/ # LinkML schemas (THE SOURCE OF TRUTH) |
| 65 | +│ ├── semsql.yaml # Main schema, imports all modules |
| 66 | +│ ├── rdf.yaml # RDF/RDFS abstractions |
| 67 | +│ ├── owl.yaml # OWL constructs (restrictions, expressions) |
| 68 | +│ ├── obo.yaml # OBO patterns and validation checks |
| 69 | +│ ├── omo.yaml # Ontology Metadata mappings |
| 70 | +│ └── relation_graph.yaml # Edge-based graph views |
| 71 | +├── builder/ |
| 72 | +│ ├── cli.py # semsql command (make, download, query, view2table) |
| 73 | +│ ├── builder.py # Build orchestration |
| 74 | +│ ├── build.Makefile # Core db build rules |
| 75 | +│ ├── sql_schema/ # Generated SQL DDL (from LinkML) |
| 76 | +│ ├── registry/ # ontologies.yaml - non-OBO ontology registry |
| 77 | +│ └── prefixes/ # CURIE mappings |
| 78 | +├── sqla/ # Generated SQLAlchemy ORM models |
| 79 | +└── sqlutils/ |
| 80 | + └── viewgen.py # Extracts SQL views from LinkML comments |
| 81 | +``` |
| 82 | + |
| 83 | +### Ontology Registry |
| 84 | + |
| 85 | +`src/semsql/builder/registry/ontologies.yaml` defines non-OBO ontologies. After adding a new entry: |
| 86 | + |
| 87 | +```bash |
| 88 | +# If you added prefixes to the entry, rebuild prefix mappings first: |
| 89 | +make build_prefixes |
| 90 | + |
| 91 | +# May need to touch STAMP to force re-download: |
| 92 | +rm STAMP |
| 93 | + |
| 94 | +# Build the database: |
| 95 | +make db/NAME.db |
| 96 | + |
| 97 | +# Test with OAK: |
| 98 | +runoak -i db/NAME.db terms |
| 99 | +``` |
| 100 | + |
| 101 | +## Testing |
| 102 | + |
| 103 | +Tests use pytest, not unittest. Integration tests require rdftab/relation-graph and are marked `@pytest.mark.integration`. |
| 104 | + |
| 105 | +```bash |
| 106 | +poetry run pytest tests/test_orm/ # ORM tests use tests/inputs/go-nucleus.db |
| 107 | +poetry run pytest tests/test_builder/ # Builder tests |
| 108 | +``` |
| 109 | + |
| 110 | +## Best Practices from User |
| 111 | + |
| 112 | +- Use `uv` for dependencies (never pip) |
| 113 | +- pytest functional style, use `@pytest.mark.parametrize` |
| 114 | +- Never mock tests unless explicitly requested |
| 115 | +- Avoid try/except blocks |
| 116 | +- Use doctests liberally |
| 117 | +- Never guess OBO term IDs - use OLS MCP to look them up |
| 118 | +- LinkML naming: CamelCase for classes, snake_case for slots |
0 commit comments