Skip to content

Commit 2ce220a

Browse files
committed
docs & hardening: nkode-review fixes, rollback CLI, backup/resume docs
nkode-review findings addressed: - Add --resume as deprecated alias for --backup-dir with warning - Add num_workers >= 1 validation in split_keys() and CLI --workers - Replace assert statements with ValueError for multi-worker guards - Update apply() docstring to accurately describe multi-worker dump ordering New features: - Add 'rvl migrate rollback' CLI command to restore vectors from backups Documentation: - Expand executor/planner/async_executor docstrings with full parameter docs - Add 'Backup, Resume & Rollback' section to migration guide - Add Performance Tuning section with throughput tables and worker guidance - Add HNSW vs FLAT index capacity technical note - Add CLI migration examples to cli.ipynb - Update common flags (replace --resume with --backup-dir, --workers, etc.) Test scripts: - Add test_migration_e2e.py (500K doc benchmark) - Add test_crash_resume_e2e.py (crash-safe resume verification) - Add verify_data_correctness.py (float32->float16 value correctness)
1 parent e0c8e45 commit 2ce220a

11 files changed

Lines changed: 1343 additions & 63 deletions

File tree

docs/user_guide/cli.ipynb

Lines changed: 50 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -364,6 +364,35 @@
364364
"!rvl stats -i vectorizers"
365365
]
366366
},
367+
{
368+
"cell_type": "markdown",
369+
"metadata": {},
370+
"source": [
371+
"## Migrate\n",
372+
"\n",
373+
"The ``rvl migrate`` command provides a full workflow for changing index schemas without losing data. Common use cases include vector quantization (float32 → float16), algorithm changes (HNSW → FLAT), and adding/removing fields.\n",
374+
"\n",
375+
"```bash\n",
376+
"# List available indexes\n",
377+
"rvl migrate list --url redis://localhost:6379\n",
378+
"\n",
379+
"# Build a migration plan interactively\n",
380+
"rvl migrate wizard --index myindex --url redis://localhost:6379\n",
381+
"\n",
382+
"# Or generate from a schema patch file\n",
383+
"rvl migrate plan --index myindex --schema-patch patch.yaml --url redis://localhost:6379\n",
384+
"\n",
385+
"# Apply with backup and multi-worker quantization\n",
386+
"rvl migrate apply --plan migration_plan.yaml --url redis://localhost:6379 \\\n",
387+
" --backup-dir /tmp/backups --workers 4 --batch-size 500\n",
388+
"\n",
389+
"# Validate the result\n",
390+
"rvl migrate validate --plan migration_plan.yaml --url redis://localhost:6379\n",
391+
"```\n",
392+
"\n",
393+
"See the [Migration Guide](how_to_guides/migrate-indexes.md) for detailed usage, performance tuning, and examples."
394+
]
395+
},
367396
{
368397
"cell_type": "markdown",
369398
"metadata": {},
@@ -383,15 +412,6 @@
383412
},
384413
{
385414
"cell_type": "markdown",
386-
"metadata": {},
387-
"source": [
388-
"### Choosing your Redis instance\n",
389-
"By default rvl first checks if you have `REDIS_URL` environment variable defined and tries to connect to that. If not, it then falls back to `localhost:6379`, unless you pass the `--host` or `--port` arguments"
390-
]
391-
},
392-
{
393-
"cell_type": "code",
394-
"execution_count": 11,
395415
"metadata": {
396416
"execution": {
397417
"iopub.execute_input": "2026-02-16T15:58:08.651332Z",
@@ -400,33 +420,23 @@
400420
"shell.execute_reply": "2026-02-16T15:58:10.874011Z"
401421
}
402422
},
403-
"outputs": [
404-
{
405-
"name": "stdout",
406-
"output_type": "stream",
407-
"text": [
408-
"Indices:\n",
409-
"1. vectorizers\n"
410-
]
411-
}
412-
],
413423
"source": [
414-
"# specify your Redis instance to connect to\n",
415-
"!rvl index listall --host localhost --port 6379"
424+
"### Choosing your Redis instance\n",
425+
"By default rvl first checks if you have `REDIS_URL` environment variable defined and tries to connect to that. If not, it then falls back to `localhost:6379`, unless you pass the `--host` or `--port` arguments"
416426
]
417427
},
418428
{
419-
"cell_type": "markdown",
429+
"cell_type": "code",
420430
"metadata": {},
421431
"source": [
422-
"### Using SSL encryption\n",
423-
"If your Redis instance is configured to use SSL encryption then set the `--ssl` flag.\n",
424-
"You can similarly specify the username and password to construct the full Redis URL"
425-
]
432+
"# specify your Redis instance to connect to\n",
433+
"!rvl index listall --host localhost --port 6379"
434+
],
435+
"outputs": [],
436+
"execution_count": null
426437
},
427438
{
428-
"cell_type": "code",
429-
"execution_count": 12,
439+
"cell_type": "markdown",
430440
"metadata": {
431441
"execution": {
432442
"iopub.execute_input": "2026-02-16T15:58:10.876537Z",
@@ -435,10 +445,10 @@
435445
"shell.execute_reply": "2026-02-16T15:58:13.099303Z"
436446
}
437447
},
438-
"outputs": [],
439448
"source": [
440-
"# connect to rediss://jane_doe:password123@localhost:6379\n",
441-
"!rvl index listall --user jane_doe -a password123 --ssl"
449+
"### Using SSL encryption\n",
450+
"If your Redis instance is configured to use SSL encryption then set the `--ssl` flag.\n",
451+
"You can similarly specify the username and password to construct the full Redis URL"
442452
]
443453
},
444454
{
@@ -462,8 +472,16 @@
462472
}
463473
],
464474
"source": [
465-
"!rvl index destroy -i vectorizers"
475+
"# connect to rediss://jane_doe:password123@localhost:6379\n",
476+
"!rvl index listall --user jane_doe -a password123 --ssl"
466477
]
478+
},
479+
{
480+
"metadata": {},
481+
"cell_type": "code",
482+
"outputs": [],
483+
"execution_count": null,
484+
"source": "!rvl index destroy -i vectorizers"
467485
}
468486
],
469487
"metadata": {

docs/user_guide/how_to_guides/migrate-indexes.md

Lines changed: 189 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -589,10 +589,15 @@ rvl migrate validate \
589589
- `--index` : Index name to migrate
590590
- `--plan` / `--plan-out` : Path to migration plan
591591
- `--async` : Use async executor for large migrations (apply only)
592-
- `--resume` : Path to checkpoint file for crash-safe quantization resume (apply only)
593592
- `--report-out` : Path for validation report
594593
- `--benchmark-out` : Path for performance metrics
595594

595+
**Apply flags (quantization & reliability):**
596+
- `--backup-dir <dir>` : Directory for vector backup files. Enables crash-safe resume and manual rollback. Required when using `--workers` > 1.
597+
- `--batch-size <N>` : Keys per pipeline batch (default 500). Values 200–1000 are typical.
598+
- `--workers <N>` : Parallel quantization workers (default 1). Each worker opens its own Redis connection. See [Performance](#performance-tuning) for guidance.
599+
- `--keep-backup` : Retain backup files after a successful migration (default: auto-cleanup).
600+
596601
**Batch-specific flags:**
597602
- `--pattern` : Glob pattern to match index names (e.g., `*_idx`)
598603
- `--indexes` : Explicit list of index names
@@ -631,6 +636,111 @@ If `apply` fails mid-migration:
631636

632637
The underlying documents are never deleted by `drop_recreate`.
633638

639+
## Backup, Resume & Rollback
640+
641+
### How Backups Work
642+
643+
When you pass `--backup-dir` (or `backup_dir` in the Python API), the
644+
migration executor saves **original vector bytes** to disk before mutating
645+
them. This enables two key capabilities:
646+
647+
1. **Crash-safe resume** — if the process dies mid-migration, re-running the
648+
same command with the same `--backup-dir` automatically detects partial
649+
progress and resumes from the last completed batch.
650+
2. **Manual rollback** — the backup files contain the original (pre-quantization)
651+
vector values, which can be restored to undo a migration.
652+
653+
Backup files are written to the specified directory with this layout:
654+
655+
```
656+
<backup-dir>/
657+
migration_backup_<index_name>.header # JSON: phase, progress counters, field metadata
658+
migration_backup_<index_name>.data # Binary: length-prefixed batches of original vectors
659+
```
660+
661+
**Disk usage:** approximately `num_docs × dims × bytes_per_element`.
662+
For example, 1M docs with 768-dim float32 vectors ≈ 2.9 GB.
663+
664+
By default, backup files are **automatically deleted** after a successful
665+
migration. Pass `--keep-backup` to retain them for post-migration auditing
666+
or potential rollback.
667+
668+
### Crash-Safe Resume
669+
670+
If a migration is interrupted (crash, network error, Ctrl+C), simply re-run
671+
the exact same command:
672+
673+
```bash
674+
# Original command that was interrupted
675+
rvl migrate apply --plan plan.yaml --url redis://localhost:6379 \
676+
--backup-dir /tmp/backups --workers 4
677+
678+
# Just re-run it — progress is resumed automatically
679+
rvl migrate apply --plan plan.yaml --url redis://localhost:6379 \
680+
--backup-dir /tmp/backups --workers 4
681+
```
682+
683+
The executor detects the existing backup header, reads how many batches were
684+
completed, and resumes from the next unfinished batch. No data is duplicated
685+
or lost.
686+
687+
```{note}
688+
**Single-worker vs multi-worker resume:** In single-worker mode, the full
689+
backup is written *before* the index is dropped, so a crash at any point
690+
leaves a complete backup on disk. In multi-worker mode, dump and quantize
691+
are fused (each worker reads, backs up, and converts its shard in one pass
692+
*after* the index drop). A crash during this fused phase may leave partial
693+
backup shards. Re-running detects and resumes from partial state.
694+
```
695+
696+
### Rollback
697+
698+
If you need to undo a quantization migration and restore original vectors,
699+
use the `rollback` command:
700+
701+
```bash
702+
rvl migrate rollback --backup-dir /tmp/backups --url redis://localhost:6379
703+
```
704+
705+
This reads every batch from the backup files and pipeline-HSETs the original
706+
(pre-quantization) vector bytes back into Redis. After rollback completes:
707+
708+
- Your vector data is restored to its original datatype
709+
- You will need to **manually recreate the original index schema** if the
710+
index was changed during migration (the rollback command restores data
711+
only, not the index definition)
712+
713+
```bash
714+
# After rollback, recreate the original index if needed:
715+
rvl index create --schema original_schema.yaml --url redis://localhost:6379
716+
```
717+
718+
```{important}
719+
Rollback requires that backup files were preserved. Either pass
720+
`--keep-backup` during migration, or ensure the backup directory was not
721+
cleaned up. Without backup files, rollback is not possible.
722+
```
723+
724+
### Python API for Rollback
725+
726+
```python
727+
from redisvl.migration.backup import VectorBackup
728+
import redis
729+
730+
r = redis.from_url("redis://localhost:6379")
731+
backup = VectorBackup.load("/tmp/backups/migration_backup_myindex")
732+
733+
for keys, originals in backup.iter_batches():
734+
pipe = r.pipeline(transaction=False)
735+
for key in keys:
736+
if key in originals:
737+
for field_name, original_bytes in originals[key].items():
738+
pipe.hset(key, field_name, original_bytes)
739+
pipe.execute()
740+
741+
print("Rollback complete")
742+
```
743+
634744
## Python API
635745

636746
For programmatic migrations, use the migration classes directly:
@@ -652,6 +762,20 @@ report = executor.apply(plan, redis_url="redis://localhost:6379")
652762
print(f"Migration result: {report.result}")
653763
```
654764

765+
With backup and multi-worker quantization:
766+
767+
```python
768+
report = executor.apply(
769+
plan,
770+
redis_url="redis://localhost:6379",
771+
backup_dir="/tmp/migration_backups", # enables crash-safe resume
772+
batch_size=500, # keys per pipeline batch
773+
num_workers=4, # parallel quantization workers
774+
keep_backup=True, # retain backups for rollback
775+
)
776+
print(f"Quantized in {report.timings.quantize_duration_seconds}s")
777+
```
778+
655779
### Async API
656780

657781
```python
@@ -667,7 +791,12 @@ async def migrate():
667791
)
668792

669793
executor = AsyncMigrationExecutor()
670-
report = await executor.apply(plan, redis_url="redis://localhost:6379")
794+
report = await executor.apply(
795+
plan,
796+
redis_url="redis://localhost:6379",
797+
backup_dir="/tmp/migration_backups",
798+
num_workers=4,
799+
)
671800
print(f"Migration result: {report.result}")
672801

673802
asyncio.run(migrate())
@@ -927,6 +1056,64 @@ print(f"Successful: {report.summary.successful}/{report.summary.total_indexes}")
9271056

9281057
5. **Keep checkpoint files**: The `batch_state.yaml` is essential for resume. Don't delete it until the batch completes successfully.
9291058

1059+
## Performance Tuning
1060+
1061+
### Quantization Throughput
1062+
1063+
Vector quantization (e.g. float32 → float16) is the most time-consuming
1064+
phase of a datatype migration. Observed throughput on a local Redis instance:
1065+
1066+
| Workers | Dims | Throughput | Notes |
1067+
|---------|------|------------|-------|
1068+
| 1 | 256 | ~70K docs/sec | Single worker is fastest for low dims |
1069+
| 4 | 256 | ~62K docs/sec | Worker overhead exceeds parallelism benefit |
1070+
| 1 | 1536 | ~15K docs/sec | Higher dims = more conversion work |
1071+
| 4 | 1536 | ~15K docs/sec | I/O-bound; Redis is the bottleneck |
1072+
1073+
**Guidance:**
1074+
- For **low-dimensional vectors** (≤ 256 dims), use `--workers 1` (the default). Per-vector conversion is so cheap that process-spawning and extra-connection overhead outweigh the parallelism benefit.
1075+
- For **high-dimensional vectors** (≥ 768 dims), `--workers 2-4` may help if the Redis server has available CPU headroom. Diminishing returns above 4–8 workers on a single Redis instance because Redis command processing is single-threaded.
1076+
- The main bottleneck for large migrations is typically **index rebuild time** (the `FT.CREATE` background indexing after vectors are written), not quantization itself.
1077+
1078+
### Batch Size
1079+
1080+
The `--batch-size` flag controls how many keys are read/written per Redis
1081+
pipeline round-trip. The default of 500 is a good balance. Larger batches
1082+
(1000+) reduce round-trips but increase per-batch memory and latency.
1083+
1084+
### Backup Disk Space
1085+
1086+
When `--backup-dir` is provided, original vectors are saved to disk before
1087+
mutation. Approximate size: `num_docs × dims × bytes_per_element`.
1088+
1089+
| Docs | Dims | Source dtype | Backup size |
1090+
|--------|------|-------------|-------------|
1091+
| 100K | 768 | float32 | ~292 MB |
1092+
| 1M | 768 | float32 | ~2.9 GB |
1093+
| 1M | 1536 | float32 | ~5.7 GB |
1094+
1095+
### HNSW vs FLAT Index Capacity
1096+
1097+
```{note}
1098+
When migrating from **HNSW** to **FLAT**, the target index may report a
1099+
*higher* document count than the source. This is not a bug — it reflects
1100+
a fundamental difference in how the two algorithms store vectors.
1101+
1102+
HNSW maintains a navigable small-world graph with per-node neighbor lists.
1103+
This graph overhead limits how many vectors can fit in available memory.
1104+
FLAT stores vectors as a simple array with no graph overhead.
1105+
1106+
If the source HNSW index was operating near its memory capacity, some
1107+
documents may have been registered in Redis Search's document table but
1108+
not fully indexed into the HNSW graph. After migration to FLAT, those
1109+
same documents become fully searchable because FLAT requires less memory
1110+
per vector.
1111+
1112+
The migration validator compares the total key count
1113+
(`num_docs + hash_indexing_failures`) between source and target, so this
1114+
scenario is handled correctly in the general case.
1115+
```
1116+
9301117
## Learn more
9311118

9321119
- {doc}`/concepts/index-migrations`: How migrations work and which changes are supported

0 commit comments

Comments
 (0)