Skip to content

stoolap/stoolap-python

Repository files navigation

stoolap-python

High-performance Python driver for Stoolap embedded SQL database. Built with PyO3 for native Rust performance with both sync and async APIs.

Run the benchmark yourself: python benchmark.py

Installation

pip install stoolap-python

Quick Start

from stoolap import Database

# In-memory database
db = Database.open(":memory:")

# exec() runs one or more DDL/DML statements (no parameters)
db.exec("""
    CREATE TABLE users (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        email TEXT
    );
    CREATE INDEX idx_users_name ON users(name);
""")

# execute() runs a single statement with parameters, returns rows affected
db.execute(
    "INSERT INTO users (id, name, email) VALUES ($1, $2, $3)",
    [1, "Alice", "alice@example.com"],
)

# Named parameters (:key)
db.execute(
    "INSERT INTO users (id, name, email) VALUES (:id, :name, :email)",
    {"id": 2, "name": "Bob", "email": "bob@example.com"},
)

# query() returns a list of dicts
users = db.query("SELECT * FROM users ORDER BY id")
# [{"id": 1, "name": "Alice", "email": "alice@example.com"}, ...]

# query_one() returns a single dict or None
user = db.query_one("SELECT * FROM users WHERE id = $1", [1])
# {"id": 1, "name": "Alice", "email": "alice@example.com"}

# query_raw() returns columnar format (faster for large results)
raw = db.query_raw("SELECT id, name FROM users ORDER BY id")
# {"columns": ["id", "name"], "rows": [[1, "Alice"], [2, "Bob"]]}

db.close()

Prepared Statements

Parse SQL once, execute many times with different parameters:

insert = db.prepare("INSERT INTO users (id, name) VALUES ($1, $2)")
insert.execute([1, "Alice"])
insert.execute([2, "Bob"])

# Batch execution (auto-wrapped in a transaction)
insert.execute_batch([
    [3, "Charlie"],
    [4, "Diana"],
])

# Prepared queries
lookup = db.prepare("SELECT * FROM users WHERE id = $1")
user = lookup.query_one([1])       # Single row as dict or None
rows = lookup.query([1])           # All rows as list of dicts
raw  = lookup.query_raw([1])       # Columnar format

# Named parameters also work with prepared statements
lookup = db.prepare("SELECT * FROM users WHERE id = :id")
user = lookup.query_one({"id": 1})

Transactions

# Context manager (auto-commit on clean exit, auto-rollback on exception)
with db.begin() as tx:
    tx.execute("INSERT INTO users (id, name) VALUES ($1, $2)", [1, "Alice"])
    tx.execute("INSERT INTO users (id, name) VALUES ($1, $2)", [2, "Bob"])

# Manual control
tx = db.begin()
try:
    tx.execute("INSERT INTO users (id, name) VALUES ($1, $2)", [1, "Alice"])
    tx.commit()
except:
    tx.rollback()
    raise

Transactions support execute(), query(), query_one(), query_raw(), and execute_batch() with both positional ($1, $2) and named (:key) parameters.

Batch Execution

Execute the same statement with multiple parameter sets, auto-wrapped in a transaction:

# On Database
changes = db.execute_batch(
    "INSERT INTO users (id, name) VALUES ($1, $2)",
    [[1, "Alice"], [2, "Bob"], [3, "Charlie"]],
)
# changes == 3

# On PreparedStatement (reuses cached plan)
stmt = db.prepare("INSERT INTO users (id, name) VALUES ($1, $2)")
changes = stmt.execute_batch([[4, "Diana"], [5, "Eve"]])

Async API

All methods release the GIL and run on a thread executor:

from stoolap import AsyncDatabase

db = await AsyncDatabase.open(":memory:")

await db.exec("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)")
await db.execute("INSERT INTO users (id, name) VALUES ($1, $2)", [1, "Alice"])

rows = await db.query("SELECT * FROM users")

# Async transactions
async with await db.begin() as tx:
    await tx.execute("INSERT INTO users (id, name) VALUES ($1, $2)", [2, "Bob"])

# Async prepared statements
stmt = db.prepare("SELECT * FROM users WHERE id = $1")
user = await stmt.query_one([1])

await db.close()

Error Handling

All database errors raise StoolapError:

from stoolap import Database, StoolapError

db = Database.open(":memory:")
try:
    db.query("SELECT * FROM nonexistent_table")
except StoolapError as e:
    print(f"Database error: {e}")

Persistence

# File-based database (data persists across restarts)
db = Database.open("file:///path/to/mydata")

# Relative paths also work
db = Database.open("./mydata")

Configuration Options

Pass options as query parameters in the DSN:

# Max durability
db = Database.open("file:///path/to/mydata?sync_mode=full")

# Max throughput (less durable)
db = Database.open("file:///path/to/mydata?sync_mode=none&checkpoint_interval=120")
Parameter Values Default Description
sync_mode none, normal, full normal Durability level (full = fsync every write, normal = fsync every 1s)
checkpoint_interval seconds 60 Seconds between checkpoint cycles (seal + compact + WAL truncate)
compact_threshold count 4 Sub-target volumes per table before merging
target_volume_rows count 1048576 Target rows per cold volume (controls compaction split boundary)
checkpoint_on_close on, off on Seal all hot rows on clean shutdown for fast startup
keep_snapshots count 3 Number of backup snapshots to retain
compression on, off on Enable both WAL + volume compression (LZ4)
wal_compression on, off on WAL compression only
volume_compression on, off on Cold volume file compression only
compression_threshold bytes 64 Minimum data size before compression
wal_buffer_size bytes 65536 WAL write buffer size
wal_flush_trigger bytes 32768 WAL size before flush
wal_max_size bytes 67108864 WAL size before rotation (64 MB)
commit_batch_size count 100 Commits batched before syncing (normal mode)
sync_interval_ms milliseconds 1000 Minimum ms between syncs (normal mode)

Type Mapping

Python Stoolap Notes
int INTEGER 64-bit signed
float FLOAT 64-bit double
str TEXT UTF-8
bool BOOLEAN
None NULL
datetime.datetime TIMESTAMP Converted to/from UTC
dict / list JSON Serialized via json.dumps
Vector VECTOR(N) list[float] on output

Vector Similarity Search

Store embeddings and perform k-NN similarity search using HNSW indexes:

from stoolap import Database, Vector

db = Database.open(":memory:")

# Create a table with a VECTOR column
db.exec("""
    CREATE TABLE documents (
        id INTEGER PRIMARY KEY,
        title TEXT,
        embedding VECTOR(3)
    );
    CREATE INDEX idx_emb ON documents(embedding) USING HNSW WITH (metric = 'cosine');
""")

# Insert vectors using the Vector wrapper
db.execute(
    "INSERT INTO documents VALUES ($1, $2, $3)",
    [1, "Hello world", Vector([0.1, 0.2, 0.3])],
)
db.execute(
    "INSERT INTO documents VALUES ($1, $2, $3)",
    [2, "Goodbye world", Vector([0.9, 0.1, 0.0])],
)

# k-NN search: find 5 nearest neighbors
results = db.query(
    "SELECT id, title, VEC_DISTANCE_COSINE(embedding, '[0.1, 0.2, 0.3]') AS dist "
    "FROM documents ORDER BY dist LIMIT 5"
)

# Read vectors back as list[float]
row = db.query_one("SELECT embedding FROM documents WHERE id = 1")
emb = row["embedding"]  # [0.1, 0.2, 0.3]

Distance Functions

Function Description
VEC_DISTANCE_L2(a, b) Euclidean distance
VEC_DISTANCE_COSINE(a, b) Cosine distance (1 - similarity)
VEC_DISTANCE_IP(a, b) Negative inner product

Vector Utilities

Function Description
VEC_DIMS(v) Number of dimensions
VEC_NORM(v) L2 norm (magnitude)
VEC_TO_TEXT(v) Convert to string [1.0, 2.0, 3.0]

HNSW Index Options

CREATE INDEX idx ON table(column) USING HNSW WITH (metric = 'cosine');

Supported metrics: l2 (default), cosine, ip (inner product).

Features

Stoolap is a full-featured embedded SQL database:

  • MVCC Transactions with snapshot isolation
  • Cost-based query optimizer with adaptive execution
  • Parallel query execution (filter, join, sort, distinct)
  • JOINs: INNER, LEFT, RIGHT, FULL OUTER, CROSS, NATURAL
  • Subqueries: scalar, EXISTS, IN, NOT IN, ANY/ALL, correlated
  • Window functions: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, NTILE
  • CTEs: WITH and WITH RECURSIVE
  • Aggregations: GROUP BY, HAVING, ROLLUP, CUBE, GROUPING SETS
  • Vector similarity search with HNSW indexes (L2, cosine, inner product)
  • Indexes: B-tree, Hash, Bitmap (auto-selected), HNSW, multi-column composite
  • 110+ built-in functions: string, math, date/time, JSON, vector, aggregate
  • Immutable volume-based storage with columnar format, zone maps, bloom filters, and LZ4 compression
  • WAL + checkpoint cycles for crash recovery (seal + compact + WAL truncate)
  • Aggregation pushdown to cold volume statistics (COUNT, SUM, MIN, MAX)
  • Semantic query caching with predicate subsumption

Building from Source

Requires Rust (stable) and Python >= 3.9.

git clone https://github.com/stoolap/stoolap-python.git
cd stoolap-python
python -m venv .venv && source .venv/bin/activate
pip install maturin pytest pytest-asyncio
maturin develop --release
pytest

License

Apache-2.0

About

Stoolap Python Bindings

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors