Vector Database Benchmark Tool

This tool benchmarks and compares vector database performance, with current support for Milvus (DiskANN, HNSW, AISAQ indexing).

Installation

Clone the repository

git clone https://github.com/mlcommons/storage.git
cd storage/vdb_benchmark

Setup a virtual environment (recommended)

The python environment can be setup via the bash script setup_env.sh from the top level.

Manual installation

cd storage/vdb_benchmark
# Note:  For development use editable installs (-e option):
pip3 install -e ./

Installation via mlpstorage (recommended)

The VDB benchmark is integrated into the MLPerf Storage benchmark suite and can be run through the mlpstorage CLI. From the repository root:

cd storage

# Install with VDB dependencies
uv sync --extra vectordb

# Also install the vdbbench package into the uv-managed venv
uv pip install -e ./vdb_benchmark

This makes both `./mlpstorage vectordb` commands and standalone `uv run vdbbench` available in the same locked virtual environment.

Deploying a Standalone Milvus Instance

Stand-alone instances are available via Docker containers in the stacks directory.

stacks └── milvus ├── cluster └── standalone ├── minio │ ├── .env.example │ └── docker-compose.yml └── s3 ├── .env.example └── docker-compose-s3.yml

For each specific instance, copy the .env.example file to .env and update the values as needed.

# Example
cp stacks/milvus/standalone/minio/.env.example stacks/milvus/standalone/minio/.env

Local Storage -- MinIO

The configuration file stacks/milvus/standalone/minio/docker-compose.yml creates a 3-container Milvus stack using local storage:

Milvus database
MinIO object storage
etcd metadata store

The compose file uses /mnt/vdb as the root directory for Docker volumes. Set DOCKER_VOLUME_DIRECTORY or edit the compose file to point to your target storage location. To test more than one storage solution use separate compose stacks with different port mappings, or bring containers down, copy /mnt/vdb to a new location, update the mount point, and restart.

Start the container

# Version 1
docker compose -f stacks/milvus/standalone/minio/docker-compose.yml up -d
# Version 2
docker-compose -f stacks/milvus/standalone/minio/docker-compose.yml up -d

Tip: The -d flag detaches from container logs. Without it, ctrl+c stops all containers. For proxy issues see: https://medium.com/@SrvZ/docker-proxy-and-my-struggles-a4fd6de21861

docker ps -a
CONTAINER ID   IMAGE                                      COMMAND                  CREATED          STATUS                     PORTS                                                                                      NAMES
7bbb96825428   milvusdb/milvus:v2.5.10                    "/tini -- milvus run…"   14 minutes ago   Up 14 minutes (healthy)    0.0.0.0:9091->9091/tcp, :::9091->9091/tcp, 0.0.0.0:19530->19530/tcp, :::19530->19530/tcp   milvus-standalone
e35d11ee6eba   minio/minio:RELEASE.2023-03-20T20-16-18Z   "/usr/bin/docker-ent…"   14 minutes ago   Up 14 minutes (healthy)    0.0.0.0:9000-9001->9000-9001/tcp, :::9000-9001->9000-9001/tcp                              milvus-minio
06b3aa6c777b   quay.io/coreos/etcd:v3.5.18                "etcd -advertise-cli…"   14 minutes ago   Up 14 minutes (healthy)    0.0.0.0:2379->2379/tcp, :::2379->2379/tcp, 2380/tcp                                        milvus-etcd

S3 Storage

The configuration file stacks/milvus/standalone/s3/docker-compose.yml creates a Milvus stack using an external S3-compatible object storage service (such as MinIO or AWS S3).

Milvus database

Start the container

# Version 1
docker compose -f stacks/milvus/standalone/s3/docker-compose.yml up -d
# Version 2
docker-compose -f stacks/milvus/standalone/s3/docker-compose.yml up -d

Tip: The -d flag detaches from container logs. Without it, ctrl+c stops all containers. For proxy issues see: https://medium.com/@SrvZ/docker-proxy-and-my-struggles-a4fd6de21861

docker ps -a
CONTAINER ID   IMAGE                                      COMMAND                  CREATED          STATUS                      PORTS                                                                                      NAMES
beec3366bea4   milvusdb/milvus:v2.5.10                    "/tini -- milvus run…"   4 minutes ago    Up 4 minutes                0.0.0.0:9091->9091/tcp, :::9091->9091/tcp, 0.0.0.0:19530->19530/tcp, :::19530->19530/tcp   milvus-s3                                       milvus-etcd

Running the Benchmark

The benchmark workflow has four main steps:

Step 0 — Estimate Storage Requirements (optional)

Before loading data, estimate how much storage the dataset and index will require:

# via mlpstorage
./mlpstorage vectordb datasize \
    --dimension 1536 --num-vectors 10000000 \
    --index-type DISKANN --num-shards 10

This is pure math and does not require a running Milvus instance or pymilvus. It reports raw vector data size plus estimated index overhead for DISKANN, HNSW, and AISAQ index types.

Example output:

Vectors:       10,000,000 x dim=1536 x 4B
Raw data:      61.44 GB
Index type:    DISKANN (130% overhead)
Shards:        10
Estimated total: 798.72 GB

Step 1 — Load Vectors

Load 10 million vectors into the database (can take up to 8 hours):

python vdbbench/load_vdb.py --config vdbbench/configs/10m_diskann.yaml

For faster testing with a smaller dataset:

python vdbbench/load_vdb.py \
  --config vdbbench/configs/10m_diskann.yaml \
  --collection-name mlps_500k_10shards_1536dim_uniform_diskann \
  --num-vectors 500000

Key parameters: --collection-name, --dimension, --num-vectors, --chunk-size, --distribution (uniform or normal), --batch-size.

Example YAML config (vdbbench/configs/10m_diskann.yaml):

database:
  host: 127.0.0.1
  port: 19530
  database: milvus
  max_receive_message_length: 514_983_574
  max_send_message_length: 514_983_574

dataset:
  collection_name: mlps_10m_10shards_1536dim_uniform_diskann
  num_vectors: 10_000_000
  dimension: 1536
  distribution: uniform
  batch_size: 1000
  num_shards: 10
  vector_dtype: FLOAT_VECTOR

index:
  index_type: DISKANN
  metric_type: COSINE
  max_degree: 64
  search_list_size: 200

workflow:
  compact: True

Via mlpstorage

# Load using the default config (1M vectors, 1536-dim, DiskANN)
./mlpstorage vectordb datagen \
    --host 127.0.0.1 --port 19530 --config default \
    --force --results-dir /tmp/vdb_results

# Load using the 10M config
./mlpstorage vectordb datagen \
    --host 127.0.0.1 --port 19530 --config 10m \
    --force --results-dir /tmp/vdb_results

# Override vector count for quick testing
./mlpstorage vectordb datagen \
    --host 127.0.0.1 --port 19530 --config default \
    --num-vectors 50000 --dimension 1536 --num-shards 1 \
    --force --results-dir /tmp/vdb_results

The --config argument refers to YAML files in configs/vectordbbench/ (without the .yaml extension). The --force flag drops and recreates the collection if it already exists.

Important: When overriding --dimension in datagen, the same dimension must match the config used for run. Mismatched dimensions between datagen and run cause Milvus to reject queries with a vector dimension mismatch error. The safest approach is to create a custom config YAML with the desired dimension and use the same --config for both commands.

Note: Dimension consistency

The vector dimension must be consistent between data loading and benchmarking. If you override --dimension during datagen, the config YAML used for run must specify the same dimension — otherwise Milvus will reject queries with:

vector dimension mismatch, expected vector size(byte) 512, actual 6144

The safest approach is to use matching --config for both datagen and run without CLI dimension overrides, or to create a dedicated config YAML for non-standard dimensions.

Step 2 — Compact (if needed)

The load script performs compaction automatically when compact: true is set. If it exits early, run compaction manually:

python vdbbench/compact_and_watch.py \
  --config vdbbench/configs/10m_diskann.yaml \
  --interval 5

Step 3 — Run the Benchmark

Use enhanced_bench.py (the recommended benchmark script, described fully below) or the simpler simple_bench.py for a quick run:

# quick run with simple_bench
python vdbbench/simple_bench.py \
  --host 127.0.0.1 \
  --collection <collection_name> \
  --processes 4 \
  --batch-size 10 \
  --runtime 120

Via mlpstorage

# Timed run (default mode, uses simple_bench/vdbbench)
./mlpstorage vectordb run \
    --host 127.0.0.1 --port 19530 --config default \
    --num-query-processes 4 --runtime 120 \
    --results-dir /tmp/vdb_results

# Query-count mode
./mlpstorage vectordb run \
    --host 127.0.0.1 --port 19530 --config default \
    --num-query-processes 2 --queries 1000 --mode query_count \
    --results-dir /tmp/vdb_results

# Sweep mode (uses enhanced_bench)
./mlpstorage vectordb run \
    --host 127.0.0.1 --port 19530 --config default \
    --mode sweep --runtime 120 \
    --results-dir /tmp/vdb_results

The --mode parameter controls which benchmark script is invoked:

Mode	Script	Use case
`timed` (default)	`vdbbench` (simple_bench.py)	Fixed-duration sustained load
`query_count`	`vdbbench` (simple_bench.py)	Run exactly N queries
`sweep`	`enhanced-bench` (enhanced_bench.py)	Parameter tuning, recall sweeps

Results are saved under --results-dir in a timestamped subdirectory with metadata JSON, per-worker CSV files, recall stats, and disk I/O statistics.

History and Reports

# View run history
./mlpstorage history show

# Generate reports from results
./mlpstorage reports reportgen --results-dir /tmp/vdb_results

mlpstorage Quick Reference

The VDB benchmark is integrated into the MLPerf Storage CLI (./mlpstorage). All commands below use the uv run execution model introduced in PR #308 to ensure locked dependencies.

Available commands

./mlpstorage vectordb --help
./mlpstorage vectordb datasize --help
./mlpstorage vectordb datagen --help
./mlpstorage vectordb run --help

End-to-end example (single node)

# 1. Estimate storage
./mlpstorage vectordb datasize \
    --dimension 1536 --num-vectors 1000000 --index-type DISKANN

# 2. Load vectors
./mlpstorage vectordb datagen \
    --host 127.0.0.1 --port 19530 --config default \
    --force --results-dir ~/vdb_results

# 3. Run benchmark (2 processes, 60 seconds)
./mlpstorage vectordb run \
    --host 127.0.0.1 --port 19530 --config default \
    --num-query-processes 2 --runtime 60 \
    --results-dir ~/vdb_results

# 4. View history
./mlpstorage history show

Config files

YAML configs live in configs/vectordbbench/. The --config flag takes the filename without .yaml:

Config	Vectors	Dimension	Shards	Index
`default`	1M	1536	1	DiskANN
`10m`	10M	1536	10	DiskANN

Custom configs can be added to the same directory.

Preview status

The VDB benchmark is currently in preview status. All runs qualify for OPEN category only — closed submissions are not yet accepted. Pass --open to acknowledge this:

./mlpstorage vectordb run --open \
    --host 127.0.0.1 --config default \
    --num-query-processes 4 --runtime 120 \
    --results-dir ~/vdb_results

enhanced_bench.py — Full Reference

enhanced_bench.py merges simple_bench (operational features: FLAT GT auto-creation, runtime-based execution, per-worker CSV, full P99.9/P99.99 latency stats) with enhanced_bench (advanced features: parameter sweep, warm/cold cache regimes, budget mode, YAML config, memory estimator). It exposes a single unified command.

Two Execution Paths

The script automatically selects the path based on the flags you provide:

Path	Trigger	Best for
A — Runtime/query-count	`--runtime` or `--batch-size` present	Sustained load, CI gating, storage team testing
B — Sweep/cache	Neither `--runtime` nor `--batch-size` present	Parameter tuning, recall target sweep, warm vs. cold analysis

Execution Path A — Runtime / Query-Count Mode

Mimics simple_bench.py. Runs workers for a fixed duration or query count, writes per-process CSV files, and aggregates full latency/recall statistics.

Step A-1: Auto-create the FLAT Ground Truth Collection (first run only)

python vdbbench/enhanced_bench.py \
  --host 127.0.0.1 \
  --collection mlps_10m_10shards_1536dim_uniform_diskann \
  --auto-create-flat \
  --runtime 1  \
  --batch-size 1 \
  --processes 1

This copies all vectors + primary keys from your ANN collection into a new FLAT-indexed collection (<collection>_flat_gt) and uses it for exact ground-truth recall.
You only need to do this once per collection; subsequent runs reuse the existing FLAT collection.

Why FLAT? DiskANN/HNSW/AISAQ are approximate. FLAT performs brute-force exact search, giving true nearest neighbours — required for correct recall@k calculation.

Step A-2: Run the benchmark

# Runtime-based (120 seconds, 4 processes, batch size 10)
python vdbbench/enhanced_bench.py \
  --host 127.0.0.1 \
  --collection mlps_10m_10shards_1536dim_uniform_diskann \
  --runtime 120 \
  --batch-size 10 \
  --processes 4 \
  --search-limit 10 \
  --search-ef 200

# Query-count-based (run exactly 50 000 queries total)
python vdbbench/enhanced_bench.py \
  --host 127.0.0.1 \
  --collection mlps_10m_10shards_1536dim_uniform_diskann \
  --queries 50000 \
  --batch-size 10 \
  --processes 4

# With an explicit FLAT GT collection name
python vdbbench/enhanced_bench.py \
  --host 127.0.0.1 \
  --collection mlps_10m_10shards_1536dim_uniform_diskann \
  --gt-collection mlps_10m_10shards_1536dim_uniform_diskann_flat_gt \
  --runtime 120 \
  --batch-size 10 \
  --processes 4

# YAML config + CLI overrides
python vdbbench/enhanced_bench.py \
  --config vdbbench/configs/10m_diskann.yaml \
  --runtime 300 \
  --batch-size 10 \
  --processes 8 \
  --output-dir /tmp/bench_results

enhanced_bench.py - Via mlpstorage

mlpstorage shortcut: enhanced_bench.py can be invoked through mlpstorage using --mode sweep:
./mlpstorage vectordb run --config default --mode sweep --runtime 120 \
    --results-dir ~/vdb_results
This is equivalent to running enhanced_bench.py directly with the config's parameters.

Path A — Key Parameters

Parameter	Default	Description
`--collection`	required	ANN-indexed collection name
`--runtime`	`None`	Benchmark duration in seconds
`--queries`	`1000`	Total query count (also sets query-set size in Path B)
`--batch-size`	required	Queries per batch
`--processes`	`8`	Worker processes
`--search-limit`	`10`	Top-k results per query
`--search-ef`	`200`	ef (HNSW) / search_list (DiskANN, AISAQ) / nprobe (IVF) override
`--num-query-vectors`	`1000`	Pre-generated query vectors for recall
`--recall-k`	`= --search-limit`	k for recall@k
`--gt-collection`	`<collection>_flat_gt`	FLAT GT collection name
`--auto-create-flat`	`False`	Auto-create FLAT GT collection from source
`--vector-dim`	`1536`	Vector dimension (auto-detected from schema when possible)
`--output-dir`	`vdbbench_results/<ts>`	Directory for CSV files + statistics
`--json-output`	`False`	Print summary as JSON instead of formatted text
`--report-count`	`10`	Batches between progress log lines
`--host` / `--port`	`localhost:19530`	Milvus connection
`--config`	`None`	YAML config file (CLI flags override YAML)

Path A — Outputs

<output-dir>/
  config.json                       # Run configuration
  milvus_benchmark_p0.csv           # Per-process timing rows (one file per worker)
  milvus_benchmark_p1.csv
  recall_hits_p0.jsonl              # Per-worker ANN result IDs for recall (one file per worker)
  recall_hits_p1.jsonl              # Each line: {"q": <query_idx>, "ids": [...]}
  recall_stats.json                 # Full recall@k statistics
  statistics.json                   # Aggregated latency + recall + disk I/O

recall_stats.json includes: mean_recall, median_recall, min_recall, max_recall, p95_recall, p99_recall, num_queries_evaluated.

statistics.json includes: mean_latency_ms, p95_latency_ms, p99_latency_ms, p999_latency_ms, p9999_latency_ms, throughput_qps, batch stats, recall stats, and disk I/O with throughput rates and IOPS per device — same fields as Path B's CSV columns.

Execution Path B — Sweep / Cache / Budget Mode

Runs a parameter sweep to find the best search parameters meeting a recall target, optionally under warm and/or cold cache conditions.

# Single-thread, both warm+cold cache, recall sweep targeting 0.95
python vdbbench/enhanced_bench.py \
  --host 127.0.0.1 \
  --collection mlps_10m_10shards_1536dim_uniform_diskann \
  --gt-collection mlps_10m_10shards_1536dim_uniform_diskann_flat_gt \
  --mode single \
  --sweep \
  --target-recall 0.95 \
  --cache-state both \
  --queries 1000 \
  --k 10

# Multi-process, default (non-sweep) params
python vdbbench/enhanced_bench.py \
  --host 127.0.0.1 \
  --collection mlps_10m_10shards_1536dim_uniform_diskann \
  --gt-collection mlps_10m_10shards_1536dim_uniform_diskann_flat_gt \
  --mode mp \
  --processes 8 \
  --cache-state warm \
  --queries 1000 \
  --k 10

# Multiple recall targets, optimize for latency
python vdbbench/enhanced_bench.py \
  --host 127.0.0.1 \
  --collection mlps_10m_10shards_1536dim_uniform_diskann \
  --gt-collection mlps_10m_10shards_1536dim_uniform_diskann_flat_gt \
  --mode both \
  --sweep \
  --recall-targets 0.90 0.95 0.99 \
  --optimize latency \
  --cache-state warm

# Auto-create FLAT collection + sweep (combined, first run)
python vdbbench/enhanced_bench.py \
  --host 127.0.0.1 \
  --collection mlps_10m_10shards_1536dim_uniform_diskann \
  --auto-create-flat \
  --mode both \
  --sweep \
  --target-recall 0.95 \
  --cache-state both

Path B — Key Additional Parameters

Parameter	Default	Description
`--mode`	`both`	`single` / `mp` / `both`
`--k`	`10`	Top-k for recall calculation
`--seed`	`1234`	Query generation seed
`--normalize-cosine`	`False`	Normalize query vectors for COSINE metric
`--sweep`	`False`	Enable parameter sweep
`--target-recall`	`0.95`	Single recall target for sweep
`--recall-targets`	`None`	Multiple recall targets, e.g. `0.90 0.95 0.99`
`--optimize`	`quality`	Sweep objective: `quality` (QPS) / `latency` / `cost`
`--sweep-queries`	`300`	Queries used during sweep phase
`--cache-state`	`both`	`warm` / `cold` / `both`
`--drop-caches-cmd`	see help	Command to drop OS page cache for cold runs
`--restart-milvus-cmd`	`None`	Optional Milvus restart command for cold runs
`--milvus-container`	`None`	Container name(s) for RSS measurement (repeatable)
`--disk-dev`	`None`	Block device(s) to track (repeatable); default: all real disks
`--gt-cache-dir`	`gt_cache`	Directory for ground truth NPZ cache
`--gt-cache-disable`	`False`	Disable GT caching
`--gt-cache-force-refresh`	`False`	Force GT recomputation even if cache exists
`--mem-budget-gb`	`None`	Max container RSS in GB (requires `--milvus-container`)
`--host-mem-reserve-gb`	`None`	Min host MemAvailable required before each run
`--budget-soft`	`False`	Record budget violations and skip instead of exiting
`--out-dir`	`results`	Directory for JSON/CSV output files
`--tag`	`None`	Tag string included in output file names

Path B — Outputs

results/
  combined_bench_<tag>_<timestamp>.json       # All run results + sweep data (includes recall_stats + disk IOPS)
  combined_bench_<tag>_<timestamp>.csv        # Per-run tabular summary (see columns below)
  combined_bench_<tag>_<timestamp>.sweep.csv  # Per-candidate sweep details (if --sweep)

gt_cache/
  gt_<hash>.npz                 # Cached ground truth (compressed NumPy)
  gt_<hash>.meta.json           # Cache signature / metadata

The CSV now includes unified recall and disk columns identical to Path A's statistics.json:

Column	Description
`recall_mean` / `recall_median` / `recall_p95` / `recall_p99`	Per-query recall distribution
`recall_min` / `recall_max` / `recall_queries_evaluated`	Recall bounds and coverage
`disk_read_mbps` / `disk_write_mbps`	Average read/write throughput (MB/s)
`disk_read_iops` / `disk_write_iops`	Average read/write IOPS
`disk_duration_sec`	Benchmark wall-clock time used for rate derivation

Unified Statistics Output (Both Paths)

Both Path A and Path B now print the same summary block per run:

============================================================
BENCHMARK SUMMARY — <mode> [MAX THROUGHPUT]
============================================================
Index:         DISKANN  |  Metric: COSINE
Params:        {'search_list': 200}
Cache:         warm
Total Queries: 1000

QUERY STATISTICS
------------------------------------------------------------
Mean Latency:      12.34 ms
Median Latency:    11.89 ms
P95 Latency:       18.72 ms
P99 Latency:       24.10 ms
Throughput:        81.07 queries/second

RECALL STATISTICS (recall@10)
------------------------------------------------------------
Mean Recall:       0.9512
Median Recall:     0.9600
Min Recall:        0.7000
Max Recall:        1.0000
P95 Recall:        1.0000
P99 Recall:        1.0000
Queries Evaluated: 1000

DISK I/O DURING BENCHMARK
------------------------------------------------------------
Total Read:        14.82 GB  (312.45 MB/s,  8420 IOPS)
Total Write:       0.23 GB   (4.88 MB/s,    210 IOPS)
Read / Query:      15.12 MB
============================================================

Memory Estimator Mode

Plan memory requirements before indexing:

python vdbbench/enhanced_bench.py \
  --estimate-only \
  --est-index-type HNSW \
  --est-n 10000000 \
  --est-dim 1536 \
  --est-hnsw-m 64

HNSW Example

For HNSW indexing, use the matching config and update the collection name:

python vdbbench/load_vdb.py --config vdbbench/configs/10m_hnsw.yaml

python vdbbench/enhanced_bench.py \
  --collection mlps_10m_10shards_1536dim_uniform_hnsw \
  --auto-create-flat \
  --runtime 120 \
  --batch-size 10 \
  --processes 4

enhanced_bench.py auto-detects index type, metric, and vector field from the collection schema — no --vector-dim flag is needed for standard 1536-dim collections.

Supported Databases

Milvus with DiskANN, HNSW, and AISAQ indexing (implemented)
IVF flat/PQ indexes (basic support)

Dependencies

Via mlpstorage (recommended)

# From the repository root
uv sync --extra vectordb
uv pip install -e ./vdb_benchmark

This installs all required dependencies into the uv-managed virtual environment.

Manual installation

pip install pymilvus numpy pyyaml tabulate pandas

Package	Purpose
`pymilvus`	Milvus client
`numpy`	Vector generation + recall math
`pyyaml`	YAML config support
`tabulate`	Collection info table display
`pandas`	Full latency statistics aggregation

Note: The datasize command (./mlpstorage vectordb datasize) does not require pymilvus or a running Milvus instance — it is pure math. All other commands require pymilvus and a running Milvus server. If dependencies are missing, the benchmark will exit with a clear error listing the missing packages and install instructions.

How Recall Is Measured (Both Paths)

Recall is computed entirely outside the timed benchmark loop so it never inflates latency numbers. Both paths share the same _recall_from_lists() → calc_recall() pipeline, producing identical statistics.

Path A (runtime / query-count mode)

Ground truth is pre-computed before any timed work by searching a FLAT collection — exact nearest neighbours, no approximation.
During the benchmark each worker writes ANN result IDs to its own recall_hits_p<N>.jsonl file. Each line is a JSON object:
```
{"q": 42, "ids": [1000234, 9981, 720055, ...]}
```
Only the first result seen for each query index is recorded per worker. Using one local file per worker (instead of a shared mp.Manager dict) eliminates IPC race conditions that previously caused recall to report 0.000 under multiprocessing.
After all workers finish, the main process merges the JSONL files with load_recall_hits() and calls calc_recall() to compute per-query recall@k statistics.

Path B (sweep / cache / budget mode)

Ground truth is computed via compute_ground_truth() against the FLAT GT collection (or the same collection if none is provided) and optionally cached in gt_cache/ as an NPZ file.
bench_single and bench_multiprocess collect pred_ids as ordered lists of search result IDs.
Both call _recall_from_lists(gt_ids, pred_ids, k) which converts both lists to {query_idx → ids} dicts (avoiding silent truncation from length mismatches) before calling calc_recall().

Output statistics (identical for both paths)

Statistic	Description
`mean_recall`	Average recall@k across all evaluated queries
`median_recall`	Median recall (50th percentile)
`min_recall` / `max_recall`	Worst and best single-query recall
`p95_recall` / `p99_recall`	Tail recall percentiles
`num_queries_evaluated`	Number of queries with valid GT entries

Tip: If recall shows 0.000, check that the FLAT GT collection exists and contains the same vectors as the ANN collection. For Path A, also verify that recall_hits_p*.jsonl files are non-empty in the output directory.

Disk I/O Metrics

Disk I/O is measured by diffing /proc/diskstats before and after the benchmark. Fields captured per device:

Field	Source in `/proc/diskstats`	Description
`bytes_read`	`sectors_read × 512`	Total bytes read
`bytes_written`	`sectors_written × 512`	Total bytes written
`read_ios`	`reads_completed`	Read I/O operations completed
`write_ios`	`writes_completed`	Write I/O operations completed
`read_mbps`	derived	Average read throughput (MB/s)
`write_mbps`	derived	Average write throughput (MB/s)
`read_iops`	derived	Average read IOPS
`write_iops`	derived	Average write IOPS

All rates are averaged over the benchmark's total wall-clock time. Virtual/loop devices (loop*, ram*, dm-*) are filtered out of per-device breakdowns by default.

Contributing

Contributions are welcome! Please submit a Pull Request.

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Vector Database Benchmark Tool

Installation

Clone the repository

Setup a virtual environment (recommended)

Manual installation

Installation via mlpstorage (recommended)

This makes both ./mlpstorage vectordb commands and standalone uv run vdbbench available in the same locked virtual environment.

Deploying a Standalone Milvus Instance

Local Storage -- MinIO

Start the container

S3 Storage

Start the container

Running the Benchmark

Step 0 — Estimate Storage Requirements (optional)

Step 1 — Load Vectors

Via mlpstorage

Note: Dimension consistency

Step 2 — Compact (if needed)

Step 3 — Run the Benchmark

Via mlpstorage

History and Reports

mlpstorage Quick Reference

Available commands

End-to-end example (single node)

Config files

Preview status

enhanced_bench.py — Full Reference

Two Execution Paths

Execution Path A — Runtime / Query-Count Mode

Step A-1: Auto-create the FLAT Ground Truth Collection (first run only)

Step A-2: Run the benchmark

enhanced_bench.py - Via mlpstorage

Path A — Key Parameters

Path A — Outputs

Execution Path B — Sweep / Cache / Budget Mode

Path B — Key Additional Parameters

Path B — Outputs

Unified Statistics Output (Both Paths)

Memory Estimator Mode

HNSW Example

Supported Databases

Dependencies

Via mlpstorage (recommended)

Manual installation

How Recall Is Measured (Both Paths)

Path A (runtime / query-count mode)

Path B (sweep / cache / budget mode)

Output statistics (identical for both paths)

Disk I/O Metrics

Contributing

This makes both `./mlpstorage vectordb` commands and standalone `uv run vdbbench` available in the same locked virtual environment.