Skip to content

Commit 49fa461

Browse files
Integer-CtrlTahoora-Tabassum
authored andcommitted
cli: delete datasets (#38)
* feat: databus api key for downloading * refactored README.md * feat: cli delete to delete datasets from databus
1 parent f6f67c0 commit 49fa461

5 files changed

Lines changed: 295 additions & 194 deletions

File tree

README.md

Lines changed: 19 additions & 157 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Command-line and Python client for downloading and deploying datasets on DBpedia
1010
- [DBpedia](#dbpedia)
1111
- [Registration (Access Token)](#registration-access-token)
1212
- [DBpedia Knowledge Graphs](#dbpedia-knowledge-graphs)
13-
- [Download Live Fusion KG Dump (BUSL 1.1, registration needed)](#download-live-fusion-kg-dump-busl-11-registration-needed)
13+
- [Download Live Fusion KG Snapshot (BUSL 1.1, registration needed)](#download-live-fusion-kg-snapshot-busl-11-registration-needed)
1414
- [Download Enriched Knowledge Graphs (BUSL 1.1, registration needed)](#download-enriched-knowledge-graphs-busl-11-registration-needed)
1515
- [Download DBpedia Wikipedia Knowledge Graphs (CC-BY-SA, no registration needed)](#download-dbpedia-wikipedia-knowledge-graphs-cc-by-sa-no-registration-needed)
1616
- [Download DBpedia Wikidata Knowledge Graphs (CC-BY-SA, no registration needed)](#download-dbpedia-wikidata-knowledge-graphs-cc-by-sa-no-registration-needed)
@@ -20,9 +20,6 @@ Command-line and Python client for downloading and deploying datasets on DBpedia
2020
- [Delete](#cli-delete)
2121
- [Module Usage](#module-usage)
2222
- [Deploy](#module-deploy)
23-
- [Development & Contributing](#development--contributing)
24-
- [Linting](#linting)
25-
- [Testing](#testing)
2623

2724

2825
## Quickstart
@@ -33,68 +30,21 @@ You can use either **Python** or **Docker**. Both methods support all client fea
3330

3431
### Python
3532

36-
Requirements: [Python 3.11+](https://www.python.org/downloads/) and [pip](https://pip.pypa.io/en/stable/installation/)
33+
Requirements: [Python](https://www.python.org/downloads/) and [pip](https://pip.pypa.io/en/stable/installation/)
3734

3835
Before using the client, install it via pip:
3936

4037
```bash
4138
python3 -m pip install databusclient
4239
```
4340

44-
Note: the PyPI release was updated and this repository prepares version `0.15`. If you previously installed `databusclient` via `pip` and observe different CLI behavior, upgrade to the latest release:
41+
You can then use the client in the command line:
4542

4643
```bash
47-
python3 -m pip install --upgrade databusclient==0.15
48-
```
49-
50-
**Help and further general information:**
51-
52-
```bash
53-
# Python
5444
databusclient --help
55-
# Docker
56-
docker run --rm -v $(pwd):/data dbpedia/databus-python-client --help
57-
58-
# Output:
59-
Usage: databusclient [OPTIONS] COMMAND [ARGS]...
60-
61-
Databus Client CLI
62-
63-
Options:
64-
--help Show this message and exit.
65-
66-
Commands:
67-
deploy Flexible deploy to Databus command supporting three modes:
68-
download Download datasets from databus, optionally using vault access...
69-
```
70-
71-
<a id="cli-download"></a>
72-
### Download
73-
74-
With the download command, you can download datasets or parts thereof from the Databus. The download command expects one or more Databus URIs or a SPARQL query as arguments. The URIs can point to files, versions, artifacts, groups, or collections. If a SPARQL query is provided, the query must return download URLs from the Databus which will be downloaded.
75-
76-
```bash
77-
# Python
78-
databusclient download $DOWNLOADTARGET
79-
# Docker
80-
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download $DOWNLOADTARGET
81-
```
82-
83-
- `$DOWNLOADTARGET`
84-
- Can be any Databus URI including collections OR SPARQL query (or several thereof).
85-
- `--localdir`
86-
- If no `--localdir` is provided, the current working directory is used as base directory. The downloaded files will be stored in the working directory in a folder structure according to the Databus layout, i.e. `./$ACCOUNT/$GROUP/$ARTIFACT/$VERSION/`.
87-
- `--vault-token`
88-
- If the dataset/files to be downloaded require vault authentication, you need to provide a vault token with `--vault-token /path/to/vault-token.dat`. See [Registration (Access Token)](#registration-access-token) for details on how to get a vault token.
89-
- `--databus-key`
90-
- If the databus is protected and needs API key authentication, you can provide the API key with `--databus-key YOUR_API_KEY`.
91-
92-
**Help and further information on download command:**
93-
```bash
94-
# Python
45+
databusclient deploy --help
9546
databusclient download --help
96-
# Docker
97-
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download --help
47+
```
9848

9949
### Docker
10050

@@ -123,48 +73,48 @@ To download BUSL 1.1 licensed datasets, you need to register and get an access t
12373

12474
### DBpedia Knowledge Graphs
12575

126-
#### Download Live Fusion KG Dump (BUSL 1.1, registration needed)
127-
High-frequency, conflict-resolved knowledge graph that merges Live Wikipedia and Wikidata signals into a single, queryable dump for enterprise consumption. [More information](https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-kg-dump)
76+
#### Download Live Fusion KG Snapshot (BUSL 1.1, registration needed)
77+
High-frequency, conflict-resolved knowledge graph that merges Live Wikipedia and Wikidata signals into a single, queryable snapshot for enterprise consumption. [More information](https://databus.dev.dbpedia.link/fhofer/live-fusion-kg-dump)
12878
```bash
12979
# Python
130-
databusclient download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-kg-dump --vault-token vault-token.dat
80+
databusclient download https://databus.dev.dbpedia.link/fhofer/live-fusion-kg-dump --vault-token vault-token.dat
13181
# Docker
132-
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-kg-dump --vault-token vault-token.dat
82+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dev.dbpedia.link/fhofer/live-fusion-kg-dump --vault-token vault-token.dat
13383
```
13484

13585
#### Download Enriched Knowledge Graphs (BUSL 1.1, registration needed)
13686

13787
**DBpedia Wikipedia Extraction Enriched**
13888

139-
DBpedia-based enrichment of structured Wikipedia extractions (currently EN DBpedia only). [More information](https://databus.dbpedia.org/dbpedia-enterprise/dbpedia-wikipedia-kg-enriched-dump)
89+
DBpedia-based enrichment of structured Wikipedia extractions (currently EN DBpedia only). [More information](https://databus.dev.dbpedia.link/fhofer/dbpedia-wikipedia-kg-enriched-dump)
14090

14191
```bash
14292
# Python
143-
databusclient download https://databus.dbpedia.org/dbpedia-enterprise/dbpedia-wikipedia-kg-enriched-dump --vault-token vault-token.dat
93+
databusclient download https://databus.dev.dbpedia.link/fhofer/dbpedia-wikipedia-kg-enriched-dump --vault-token vault-token.dat
14494
# Docker
145-
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/dbpedia-wikipedia-kg-enriched-dump --vault-token vault-token.dat
95+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dev.dbpedia.link/fhofer/dbpedia-wikipedia-kg-enriched-dump --vault-token vault-token.dat
14696
```
14797

14898
#### Download DBpedia Wikipedia Knowledge Graphs (CC-BY-SA, no registration needed)
14999

150-
Original extraction of structured Wikipedia data before enrichment. [More information](https://databus.dbpedia.org/dbpedia/dbpedia-wikipedia-kg-dump)
100+
Original extraction of structured Wikipedia data before enrichment. [More information](https://databus.dev.dbpedia.link/fhofer/dbpedia-wikipedia-kg-dump)
151101

152102
```bash
153103
# Python
154-
databusclient download https://databus.dbpedia.org/dbpedia/dbpedia-wikipedia-kg-dump
104+
databusclient download https://databus.dev.dbpedia.link/fhofer/dbpedia-wikipedia-kg-dump
155105
# Docker
156-
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/dbpedia-wikipedia-kg-dump
106+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dev.dbpedia.link/fhofer/dbpedia-wikipedia-kg-dump
157107
```
158108

159109
#### Download DBpedia Wikidata Knowledge Graphs (CC-BY-SA, no registration needed)
160110

161-
Original extraction of structured Wikidata data before enrichment. [More information](https://databus.dbpedia.org/dbpedia/dbpedia-wikidata-kg-dump)
111+
Original extraction of structured Wikidata data before enrichment. [More information](https://databus.dev.dbpedia.link/fhofer/dbpedia-wikidata-kg-dump)
162112

163113
```bash
164114
# Python
165-
databusclient download https://databus.dbpedia.org/dbpedia/dbpedia-wikidata-kg-dump
115+
databusclient download https://databus.dev.dbpedia.link/fhofer/dbpedia-wikidata-kg-dump
166116
# Docker
167-
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/dbpedia-wikidata-kg-dump
117+
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dev.dbpedia.link/fhofer/dbpedia-wikidata-kg-dump
168118
```
169119

170120
## CLI Usage
@@ -210,8 +160,6 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client download $DOWNLOAD
210160
- If no `--localdir` is provided, the current working directory is used as base directory. The downloaded files will be stored in the working directory in a folder structure according to the Databus layout, i.e. `./$ACCOUNT/$GROUP/$ARTIFACT/$VERSION/`.
211161
- `--vault-token`
212162
- If the dataset/files to be downloaded require vault authentication, you need to provide a vault token with `--vault-token /path/to/vault-token.dat`. See [Registration (Access Token)](#registration-access-token) for details on how to get a vault token.
213-
214-
Note: Vault tokens are only required for certain protected Databus hosts (for example: `data.dbpedia.io`, `data.dev.dbpedia.link`). The client now detects those hosts and will fail early with a clear message if a token is required but not provided. Do not pass `--vault-token` for public downloads.
215163
- `--databus-key`
216164
- If the databus is protected and needs API key authentication, you can provide the API key with `--databus-key YOUR_API_KEY`.
217165

@@ -235,8 +183,6 @@ Options:
235183
e.g. https://databus.dbpedia.org/sparql)
236184
--vault-token TEXT Path to Vault refresh token file
237185
--databus-key TEXT Databus API key to download from protected databus
238-
--all-versions When downloading artifacts, download all versions
239-
instead of only the latest
240186
--authurl TEXT Keycloak token endpoint URL [default:
241187
https://auth.dbpedia.org/realms/dbpedia/protocol/openid-
242188
connect/token]
@@ -329,7 +275,7 @@ Usage: databusclient deploy [OPTIONS] [DISTRIBUTIONS]...
329275
- Upload & deploy via Nextcloud (--webdav-url, --remote, --path)
330276

331277
Options:
332-
--versionid TEXT Target databus version/dataset identifier of the form <h
278+
--version-id TEXT Target databus version/dataset identifier of the form <h
333279
ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
334280
RSION> [required]
335281
--title TEXT Dataset title [required]
@@ -451,7 +397,6 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
451397
./data_folder
452398
```
453399
454-
455400
<a id="cli-delete"></a>
456401
### Delete
457402
@@ -528,47 +473,6 @@ databusclient delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-sna
528473
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEY
529474
```
530475
531-
### mkdist command
532-
533-
Create a distribution string from components.
534-
535-
Usage:
536-
```
537-
databusclient mkdist URL --cv key=value --cv key2=value2 --format ttl --compression gz --sha-length <sha256hex>:<length>
538-
```
539-
540-
Example:
541-
```
542-
python -m databusclient mkdist "https://example.org/file.ttl" --cv lang=en --cv part=sorted --format ttl --compression gz --sha-length aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:12345
543-
```
544-
545-
## Completion
546-
547-
Enable shell completion (bash example):
548-
```
549-
eval "$(_DATABUSCLIENT_COMPLETE=source_bash python -m databusclient)"
550-
```
551-
552-
### mkdist command
553-
554-
Create a distribution string from components.
555-
556-
Usage:
557-
```
558-
databusclient mkdist URL --cv key=value --cv key2=value2 --format ttl --compression gz --sha-length <sha256hex>:<length>
559-
```
560-
561-
Example:
562-
```
563-
python -m databusclient mkdist "https://example.org/file.ttl" --cv lang=en --cv part=sorted --format ttl --compression gz --sha-length aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:12345
564-
```
565-
566-
## Completion
567-
568-
Enable shell completion (bash example):
569-
```
570-
eval "$(_DATABUSCLIENT_COMPLETE=source_bash python -m databusclient)"
571-
```
572476
## Module Usage
573477
574478
<a id="module-deploy"></a>
@@ -647,45 +551,3 @@ from databusclient import deploy
647551
# API key can be found (or generated) at https://$$DATABUS_BASE$$/$$USER$$#settings
648552
deploy(dataset, "mysterious API key")
649553
```
650-
651-
## Development & Contributing
652-
653-
Install development dependencies yourself or via [Poetry](https://python-poetry.org/):
654-
655-
```bash
656-
poetry install --with dev
657-
```
658-
659-
### Linting
660-
661-
The used linter is [Ruff](https://ruff.rs/). Ruff is configured in `pyproject.toml` and is enforced in CI (`.github/workflows/ruff.yml`).
662-
663-
For development, you can run linting locally with `ruff check .` and optionally auto-format with `ruff format .`.
664-
665-
To ensure compatibility with the `pyproject.toml` configured dependencies, run Ruff via Poetry:
666-
667-
```bash
668-
# To check for linting issues:
669-
poetry run ruff check .
670-
671-
# To auto-format code:
672-
poetry run ruff format .
673-
```
674-
675-
### Testing
676-
677-
When developing new features please make sure to add appropriate tests and ensure that all tests pass. Tests are under `tests/` and use [pytest](https://docs.pytest.org/en/7.4.x/) as test framework.
678-
679-
When fixing bugs or refactoring existing code, please make sure to add tests that cover the affected functionality. The current test coverage is very low, so any additional tests are highly appreciated.
680-
681-
To run tests locally, use:
682-
683-
```bash
684-
pytest tests/
685-
```
686-
687-
Or to ensure compatibility with the `pyproject.toml` configured dependencies, run pytest via Poetry:
688-
689-
```bash
690-
poetry run pytest tests/
691-
```

0 commit comments

Comments
 (0)