Skip to content

Commit 183ffc5

Browse files
committed
feat:add batch indexing
1 parent 33ebf54 commit 183ffc5

4 files changed

Lines changed: 678 additions & 28 deletions

File tree

docs/user_guide/how_to_guides/migrate-indexes.md

Lines changed: 277 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -41,28 +41,15 @@ docker run -d --name redis -p 6379:6379 redis/redis-stack-server:latest
4141
## Step 1: Discover Available Indexes
4242

4343
```bash
44-
rvl migrate helper --url redis://localhost:6379
4544
rvl migrate list --url redis://localhost:6379
4645
```
4746

4847
**Example output:**
4948
```
50-
Index Migrator
51-
==============
52-
The migrator helps you safely change your index schema.
53-
54-
Supported changes:
55-
- Add, remove, or update text/tag/numeric/geo fields
56-
- Change vector algorithm (FLAT, HNSW, SVS-VAMANA)
57-
- Change distance metric (COSINE, L2, IP)
58-
- Quantize vectors (float32 → float16)
59-
60-
Commands:
61-
rvl migrate list List all indexes
62-
rvl migrate wizard Build a migration interactively
63-
rvl migrate plan Generate a migration plan
64-
rvl migrate apply Execute a migration
65-
rvl migrate validate Verify a migration
49+
Available indexes:
50+
1. products_idx
51+
2. users_idx
52+
3. orders_idx
6653
```
6754

6855
## Step 2: Build Your Schema Change
@@ -330,7 +317,6 @@ rvl migrate apply \
330317
**When to use async:**
331318

332319
- Quantizing millions of vectors (float32 to float16)
333-
- Redis instance has 40M+ keys
334320
- Integrating into an async application
335321

336322
For most migrations (index-only changes, small datasets), sync mode is sufficient and simpler.
@@ -379,15 +365,25 @@ rvl migrate validate \
379365

380366
## CLI Reference
381367

368+
### Single-Index Commands
369+
382370
| Command | Description |
383371
|---------|-------------|
384-
| `rvl migrate helper` | Show supported changes and usage tips |
385372
| `rvl migrate list` | List all indexes |
386373
| `rvl migrate wizard` | Build a migration interactively |
387374
| `rvl migrate plan` | Generate a migration plan |
388375
| `rvl migrate apply` | Execute a migration |
389376
| `rvl migrate validate` | Verify a migration result |
390377

378+
### Batch Commands
379+
380+
| Command | Description |
381+
|---------|-------------|
382+
| `rvl migrate batch-plan` | Create a batch migration plan |
383+
| `rvl migrate batch-apply` | Execute a batch migration |
384+
| `rvl migrate batch-resume` | Resume an interrupted batch |
385+
| `rvl migrate batch-status` | Check batch progress |
386+
391387
**Common flags:**
392388
- `--url` : Redis connection URL
393389
- `--index` : Index name to migrate
@@ -397,6 +393,16 @@ rvl migrate validate \
397393
- `--report-out` : Path for validation report
398394
- `--benchmark-out` : Path for performance metrics
399395

396+
**Batch-specific flags:**
397+
- `--pattern` : Glob pattern to match index names (e.g., `*_idx`)
398+
- `--indexes` : Explicit list of index names
399+
- `--indexes-file` : File containing index names (one per line)
400+
- `--schema-patch` : Path to shared schema patch YAML
401+
- `--state` : Path to checkpoint state file
402+
- `--failure-policy` : `fail_fast` or `continue_on_error`
403+
- `--accept-data-loss` : Required for quantization (lossy changes)
404+
- `--retry-failed` : Retry previously failed indexes on resume
405+
400406
## Troubleshooting
401407

402408
### Migration blocked: "unsupported change"
@@ -467,6 +473,258 @@ async def migrate():
467473
asyncio.run(migrate())
468474
```
469475

476+
## Batch Migration
477+
478+
When you need to apply the same schema change to multiple indexes, use batch migration. This is common for:
479+
480+
- Quantizing all indexes from float32 → float16
481+
- Standardizing vector algorithms across indexes
482+
- Coordinated migrations during maintenance windows
483+
484+
### Quick Start: Batch Migration
485+
486+
```bash
487+
# 1. Create a shared patch (applies to any index with an 'embedding' field)
488+
cat > quantize_patch.yaml << 'EOF'
489+
version: 1
490+
changes:
491+
update_fields:
492+
- name: embedding
493+
attrs:
494+
datatype: float16
495+
EOF
496+
497+
# 2. Create a batch plan for all indexes matching a pattern
498+
rvl migrate batch-plan \
499+
--pattern "*_idx" \
500+
--schema-patch quantize_patch.yaml \
501+
--output batch_plan.yaml \
502+
--url redis://localhost:6379
503+
504+
# 3. Apply the batch plan
505+
rvl migrate batch-apply \
506+
--plan batch_plan.yaml \
507+
--allow-downtime \
508+
--accept-data-loss \
509+
--url redis://localhost:6379
510+
511+
# 4. Check status
512+
rvl migrate batch-status --state batch_state.yaml
513+
```
514+
515+
### Batch Plan Options
516+
517+
**Select indexes by pattern:**
518+
```bash
519+
rvl migrate batch-plan \
520+
--pattern "*_idx" \
521+
--schema-patch quantize_patch.yaml \
522+
--output batch_plan.yaml \
523+
--url redis://localhost:6379
524+
```
525+
526+
**Select indexes by explicit list:**
527+
```bash
528+
rvl migrate batch-plan \
529+
--indexes products_idx users_idx orders_idx \
530+
--schema-patch quantize_patch.yaml \
531+
--output batch_plan.yaml \
532+
--url redis://localhost:6379
533+
```
534+
535+
**Select indexes from a file (for 100+ indexes):**
536+
```bash
537+
# Create index list file
538+
echo -e "products_idx\nusers_idx\norders_idx" > indexes.txt
539+
540+
rvl migrate batch-plan \
541+
--indexes-file indexes.txt \
542+
--schema-patch quantize_patch.yaml \
543+
--output batch_plan.yaml \
544+
--url redis://localhost:6379
545+
```
546+
547+
### Batch Plan Review
548+
549+
The generated `batch_plan.yaml` shows which indexes will be migrated:
550+
551+
```yaml
552+
version: 1
553+
batch_id: "batch_20260320_100000"
554+
mode: drop_recreate
555+
failure_policy: fail_fast
556+
requires_quantization: true
557+
558+
shared_patch:
559+
version: 1
560+
changes:
561+
update_fields:
562+
- name: embedding
563+
attrs:
564+
datatype: float16
565+
566+
indexes:
567+
- name: products_idx
568+
applicable: true
569+
skip_reason: null
570+
- name: users_idx
571+
applicable: true
572+
skip_reason: null
573+
- name: legacy_idx
574+
applicable: false
575+
skip_reason: "Field 'embedding' not found"
576+
577+
created_at: "2026-03-20T10:00:00Z"
578+
```
579+
580+
**Key fields:**
581+
- `applicable: true` means the patch applies to this index
582+
- `skip_reason` explains why an index will be skipped
583+
584+
### Applying a Batch Plan
585+
586+
```bash
587+
# Apply with fail-fast (default: stop on first error)
588+
rvl migrate batch-apply \
589+
--plan batch_plan.yaml \
590+
--allow-downtime \
591+
--accept-data-loss \
592+
--url redis://localhost:6379
593+
594+
# Apply with continue-on-error (process all possible indexes)
595+
rvl migrate batch-apply \
596+
--plan batch_plan.yaml \
597+
--allow-downtime \
598+
--accept-data-loss \
599+
--failure-policy continue_on_error \
600+
--url redis://localhost:6379
601+
```
602+
603+
**Flags:**
604+
- `--allow-downtime` : Required (each index is temporarily unavailable during migration)
605+
- `--accept-data-loss` : Required when quantizing vectors (float32 → float16 is lossy)
606+
- `--failure-policy` : `fail_fast` (default) or `continue_on_error`
607+
- `--state` : Path to checkpoint file (default: `batch_state.yaml`)
608+
- `--report-dir` : Directory for per-index reports (default: `./reports/`)
609+
610+
### Resume After Failure
611+
612+
Batch migration automatically checkpoints progress. If interrupted:
613+
614+
```bash
615+
# Resume from where it left off
616+
rvl migrate batch-resume \
617+
--state batch_state.yaml \
618+
--allow-downtime \
619+
--url redis://localhost:6379
620+
621+
# Retry previously failed indexes
622+
rvl migrate batch-resume \
623+
--state batch_state.yaml \
624+
--retry-failed \
625+
--allow-downtime \
626+
--url redis://localhost:6379
627+
```
628+
629+
### Checking Batch Status
630+
631+
```bash
632+
rvl migrate batch-status --state batch_state.yaml
633+
```
634+
635+
**Example output:**
636+
```
637+
Batch Migration Status
638+
======================
639+
Batch ID: batch_20260320_100000
640+
Started: 2026-03-20T10:00:00Z
641+
Updated: 2026-03-20T10:25:00Z
642+
643+
Completed: 2
644+
- products_idx: succeeded (10:02:30)
645+
- users_idx: failed - Redis connection timeout (10:05:45)
646+
647+
In Progress: inventory_idx
648+
Remaining: 1 (analytics_idx)
649+
```
650+
651+
### Batch Report
652+
653+
After completion, a `batch_report.yaml` is generated:
654+
655+
```yaml
656+
version: 1
657+
batch_id: "batch_20260320_100000"
658+
status: completed # or partial_failure, failed
659+
summary:
660+
total_indexes: 3
661+
successful: 3
662+
failed: 0
663+
skipped: 0
664+
total_duration_seconds: 127.5
665+
indexes:
666+
- name: products_idx
667+
status: succeeded
668+
duration_seconds: 45.2
669+
docs_migrated: 15000
670+
report_path: ./reports/products_idx_report.yaml
671+
- name: users_idx
672+
status: succeeded
673+
duration_seconds: 38.1
674+
docs_migrated: 8500
675+
- name: orders_idx
676+
status: succeeded
677+
duration_seconds: 44.2
678+
docs_migrated: 22000
679+
completed_at: "2026-03-20T10:02:07Z"
680+
```
681+
682+
### Python API for Batch Migration
683+
684+
```python
685+
from redisvl.migration import BatchMigrationPlanner, BatchMigrationExecutor
686+
687+
# Create batch plan
688+
planner = BatchMigrationPlanner()
689+
batch_plan = planner.create_plan(
690+
redis_url="redis://localhost:6379",
691+
pattern="*_idx",
692+
schema_patch_path="quantize_patch.yaml",
693+
)
694+
695+
# Review applicability
696+
for idx in batch_plan.indexes:
697+
if idx.applicable:
698+
print(f"Will migrate: {idx.name}")
699+
else:
700+
print(f"Skipping {idx.name}: {idx.skip_reason}")
701+
702+
# Execute batch
703+
executor = BatchMigrationExecutor()
704+
report = executor.apply(
705+
batch_plan,
706+
redis_url="redis://localhost:6379",
707+
state_path="batch_state.yaml",
708+
report_dir="./reports/",
709+
progress_callback=lambda name, pos, total, status: print(f"[{pos}/{total}] {name}: {status}"),
710+
)
711+
712+
print(f"Batch status: {report.status}")
713+
print(f"Successful: {report.summary.successful}/{report.summary.total_indexes}")
714+
```
715+
716+
### Batch Migration Tips
717+
718+
1. **Test on a single index first**: Run a single-index migration to verify the patch works before applying to a batch.
719+
720+
2. **Use `continue_on_error` for large batches**: This ensures one failure doesn't block all remaining indexes.
721+
722+
3. **Schedule during low-traffic periods**: Each index has downtime during migration.
723+
724+
4. **Review skipped indexes**: The `skip_reason` often indicates schema differences that need attention.
725+
726+
5. **Keep checkpoint files**: The `batch_state.yaml` is essential for resume. Don't delete it until the batch completes successfully.
727+
470728
## Learn more
471729

472730
- {doc}`/concepts/index-migrations`: How migrations work and which changes are supported

0 commit comments

Comments
 (0)