You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user_guide/cli.ipynb
+50-32Lines changed: 50 additions & 32 deletions
Original file line number
Diff line number
Diff line change
@@ -364,6 +364,35 @@
364
364
"!rvl stats -i vectorizers"
365
365
]
366
366
},
367
+
{
368
+
"cell_type": "markdown",
369
+
"metadata": {},
370
+
"source": [
371
+
"## Migrate\n",
372
+
"\n",
373
+
"The ``rvl migrate`` command provides a full workflow for changing index schemas without losing data. Common use cases include vector quantization (float32 → float16), algorithm changes (HNSW → FLAT), and adding/removing fields.\n",
374
+
"\n",
375
+
"```bash\n",
376
+
"# List available indexes\n",
377
+
"rvl migrate list --url redis://localhost:6379\n",
"See the [Migration Guide](how_to_guides/migrate-indexes.md) for detailed usage, performance tuning, and examples."
394
+
]
395
+
},
367
396
{
368
397
"cell_type": "markdown",
369
398
"metadata": {},
@@ -383,15 +412,6 @@
383
412
},
384
413
{
385
414
"cell_type": "markdown",
386
-
"metadata": {},
387
-
"source": [
388
-
"### Choosing your Redis instance\n",
389
-
"By default rvl first checks if you have `REDIS_URL` environment variable defined and tries to connect to that. If not, it then falls back to `localhost:6379`, unless you pass the `--host` or `--port` arguments"
"By default rvl first checks if you have `REDIS_URL` environment variable defined and tries to connect to that. If not, it then falls back to `localhost:6379`, unless you pass the `--host` or `--port` arguments"
416
426
]
417
427
},
418
428
{
419
-
"cell_type": "markdown",
429
+
"cell_type": "code",
420
430
"metadata": {},
421
431
"source": [
422
-
"### Using SSL encryption\n",
423
-
"If your Redis instance is configured to use SSL encryption then set the `--ssl` flag.\n",
424
-
"You can similarly specify the username and password to construct the full Redis URL"
Copy file name to clipboardExpand all lines: docs/user_guide/how_to_guides/migrate-indexes.md
+189-2Lines changed: 189 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -589,10 +589,15 @@ rvl migrate validate \
589
589
- `--index` : Index name to migrate
590
590
- `--plan` / `--plan-out` : Path to migration plan
591
591
- `--async` : Use async executor for large migrations (apply only)
592
-
- `--resume` : Path to checkpoint file for crash-safe quantization resume (apply only)
593
592
- `--report-out` : Path for validation report
594
593
- `--benchmark-out` : Path for performance metrics
595
594
595
+
**Apply flags (quantization & reliability):**
596
+
- `--backup-dir <dir>` : Directory for vector backup files. Enables crash-safe resume and manual rollback. Required when using `--workers` > 1.
597
+
- `--batch-size <N>` : Keys per pipeline batch (default 500). Values 200–1000 are typical.
598
+
- `--workers <N>` : Parallel quantization workers (default 1). Each worker opens its own Redis connection. See [Performance](#performance-tuning) for guidance.
599
+
- `--keep-backup` : Retain backup files after a successful migration (default: auto-cleanup).
600
+
596
601
**Batch-specific flags:**
597
602
- `--pattern` : Glob pattern to match index names (e.g., `*_idx`)
598
603
- `--indexes` : Explicit list of index names
@@ -631,6 +636,111 @@ If `apply` fails mid-migration:
631
636
632
637
The underlying documents are never deleted by `drop_recreate`.
633
638
639
+
## Backup, Resume & Rollback
640
+
641
+
### How Backups Work
642
+
643
+
When you pass `--backup-dir` (or `backup_dir` in the Python API), the
644
+
migration executor saves **original vector bytes** to disk before mutating
645
+
them. This enables two key capabilities:
646
+
647
+
1. **Crash-safe resume** — if the process dies mid-migration, re-running the
648
+
same command with the same `--backup-dir` automatically detects partial
649
+
progress and resumes from the last completed batch.
650
+
2. **Manual rollback** — the backup files contain the original (pre-quantization)
651
+
vector values, which can be restored to undo a migration.
652
+
653
+
Backup files are written to the specified directory with this layout:
654
+
655
+
```
656
+
<backup-dir>/
657
+
migration_backup_<index_name>.header # JSON: phase, progress counters, field metadata
658
+
migration_backup_<index_name>.data # Binary: length-prefixed batches of original vectors
659
+
```
660
+
661
+
**Disk usage:** approximately `num_docs × dims × bytes_per_element`.
662
+
For example, 1M docs with 768-dim float32 vectors ≈ 2.9 GB.
663
+
664
+
By default, backup files are **automatically deleted** after a successful
665
+
migration. Pass `--keep-backup` to retain them for post-migration auditing
666
+
or potential rollback.
667
+
668
+
### Crash-Safe Resume
669
+
670
+
If a migration is interrupted (crash, network error, Ctrl+C), simply re-run
| 1 | 1536 |~15K docs/sec | Higher dims = more conversion work |
1071
+
| 4 | 1536 |~15K docs/sec | I/O-bound; Redis is the bottleneck |
1072
+
1073
+
**Guidance:**
1074
+
- For **low-dimensional vectors** (≤ 256 dims), use `--workers 1` (the default). Per-vector conversion is so cheap that process-spawning and extra-connection overhead outweigh the parallelism benefit.
1075
+
- For **high-dimensional vectors** (≥ 768 dims), `--workers 2-4` may help if the Redis server has available CPU headroom. Diminishing returns above 4–8 workers on a single Redis instance because Redis command processing is single-threaded.
1076
+
- The main bottleneck for large migrations is typically **index rebuild time** (the `FT.CREATE` background indexing after vectors are written), not quantization itself.
1077
+
1078
+
### Batch Size
1079
+
1080
+
The `--batch-size` flag controls how many keys are read/written per Redis
1081
+
pipeline round-trip. The default of 500 is a good balance. Larger batches
1082
+
(1000+) reduce round-trips but increase per-batch memory and latency.
1083
+
1084
+
### Backup Disk Space
1085
+
1086
+
When `--backup-dir` is provided, original vectors are saved to disk before
0 commit comments