redis
diff --git a/‎AGENTS.md‎
Lines changed: 165 additions & 0 deletions b/‎AGENTS.md‎
Lines changed: 165 additions & 0 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 22 additions & 1 deletion b/‎CONTRIBUTING.md‎
Lines changed: 22 additions & 1 deletion
diff --git a/‎nitin_docs/index_migrator/00_index.md‎
Lines changed: 71 additions & 0 deletions b/‎nitin_docs/index_migrator/00_index.md‎
Lines changed: 71 additions & 0 deletions
diff --git a/‎nitin_docs/index_migrator/01_context.md‎
Lines changed: 100 additions & 0 deletions b/‎nitin_docs/index_migrator/01_context.md‎
Lines changed: 100 additions & 0 deletions
@@ -0,0 +1,165 @@
+# AGENTS.md - RedisVL Project Context
+
+## Frequently Used Commands
+
+```bash
+# Development workflow
+make install          # Install dependencies
+make format           # Format code (black + isort)
+make check-types      # Run mypy type checking
+make lint             # Run all linting (format + types)
+make test             # Run tests (no external APIs)
+make test-all         # Run all tests (includes API tests)
+make check            # Full check (lint + test)
+
+# Redis setup
+make redis-start      # Start Redis container
+make redis-stop       # Stop Redis container
+
+# Documentation
+make docs-build       # Build documentation
+make docs-serve       # Serve docs locally
+```
+
+Pre-commit hooks are also configured, which you should
+run before you commit:
+```bash
+pre-commit run --all-files
+```
+
+## Important Architectural Patterns
+
+### Async/Sync Dual Interfaces
+- Most core classes have both sync and async versions (e.g., `SearchIndex` / `AsyncSearchIndex`)
+- Follow existing patterns when adding new functionality
+
+### Schema-Driven Design
+```python
+# Index schemas define structure
+schema = IndexSchema.from_yaml("schema.yaml")
+index = SearchIndex(schema, redis_url="redis://localhost:6379")
+```
+
+## Critical Rules
+
+### Do Not Modify
+- **CRITICAL**: Do not change this line unless explicitly asked:
+  ```python
+  token.strip().strip(",").replace(""", "").replace(""", "").lower()
+  ```
+
+### Git Operations
+**CRITICAL**: NEVER use `git push` or attempt to push to remote repositories. The user will handle all git push operations.
+
+### Branch and Commit Policy
+**IMPORTANT**: Use conventional branch names and conventional commits.
+
+Branch naming:
+- Human-created branches should use `<type>/<short-kebab-description>`
+- Automation-created branches may use `codex/<type>/<short-kebab-description>`
+- Preferred branch types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `perf`, `build`, `ci`
+- Examples:
+  - `feat/index-migrator`
+  - `fix/async-sentinel-pool`
+  - `docs/index-migrator-benchmarking`
+  - `codex/feat/index-migrator`
+
+Commit messages:
+- Use Conventional Commits: `<type>(optional-scope): <summary>`
+- Preferred commit types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `perf`, `build`, `ci`
+- Examples:
+  - `feat(migrate): add drop recreate planning docs`
+  - `docs(index-migrator): add benchmarking guidance`
+  - `fix(cli): validate migrate plan inputs`
+
+### Code Quality
+**IMPORTANT**: Always run `make format` before committing code to ensure proper formatting and linting compliance.
+
+### README.md Maintenance
+**IMPORTANT**: DO NOT modify README.md unless explicitly requested.
+
+**If you need to document something, use these alternatives:**
+- Development info → CONTRIBUTING.md
+- API details → docs/ directory
+- Examples → docs/examples/
+- Project memory (explicit preferences, directives, etc.) → AGENTS.md
+
+## Code Style Preferences
+
+### Import Organization
+- **Prefer module-level imports** by default for clarity and standard Python conventions
+- **Use local/inline imports only when necessary** for specific reasons:
+  - Avoiding circular import dependencies
+  - Improving startup time for heavy/optional dependencies
+  - Lazy loading for performance-critical paths
+- When using local imports, add a brief comment explaining why (e.g., `# Local import to avoid circular dependency`)
+
+### Comments and Output
+- **No emojis in code comments or print statements**
+- Keep comments professional and focused on technical clarity
+- Use emojis sparingly only in user-facing documentation (markdown files), not in Python code
+
+### General Guidelines
+- Follow existing patterns in the RedisVL codebase
+- Maintain consistency with the project's established conventions
+- Run `make format` before committing to ensure code quality standards
+
+## Testing Notes
+RedisVL uses `pytest` with `testcontainers` for testing.
+
+- `make test` - unit tests only (no external APIs)
+- `make test-all` - run the full suite, including tests that call external APIs
+- `pytest --run-api-tests` - explicitly run API-dependent tests (e.g., LangCache,
+  external vectorizer/reranker providers). These require the appropriate API
+  keys and environment variables to be set.
+
+## Project Structure
+
+```
+redisvl/
+├── cli/              # Command-line interface (rvl command)
+├── extensions/       # AI extensions (cache, memory, routing)
+│   ├── cache/        # Semantic caching for LLMs
+│   ├── llmcache/     # LLM-specific caching
+│   ├── message_history/  # Chat history management
+│   ├── router/       # Semantic routing
+│   └── session_manager/  # Session management
+├── index/            # SearchIndex classes (sync/async)
+├── query/            # Query builders (Vector, Range, Filter, Count)
+├── redis/            # Redis client utilities
+├── schema/           # Index schema definitions
+└── utils/            # Utilities (vectorizers, rerankers, optimization)
+    ├── rerank/       # Result reranking
+    └── vectorize/    # Embedding providers integration
+```
+
+## Core Components
+
+### 1. Index Management
+- `SearchIndex` / `AsyncSearchIndex` - Main interface for Redis vector indices
+- `IndexSchema` - Define index structure with fields (text, tags, vectors, etc.)
+- Support for JSON and Hash storage types
+
+### 2. Query System
+- `VectorQuery` - Semantic similarity search
+- `RangeQuery` - Vector search within distance range
+- `FilterQuery` - Metadata filtering and full-text search
+- `CountQuery` - Count matching records
+- Etc.
+
+### 3. AI Extensions
+- `SemanticCache` - LLM response caching with semantic similarity
+- `EmbeddingsCache` - Cache for vector embeddings
+- `MessageHistory` - Chat history with recency/relevancy retrieval
+- `SemanticRouter` - Route queries to topics/intents
+
+### 4. Vectorizers (Optional Dependencies)
+- OpenAI, Azure OpenAI, Cohere, HuggingFace, Mistral, VoyageAI
+- Custom vectorizer support
+- Batch processing capabilities
+
+## Documentation
+- Main docs: https://docs.redisvl.com
+- Built with Sphinx from `docs/` directory
+- Includes API reference and user guides
+- Example notebooks in documentation `docs/user_guide/...`
@@ -251,12 +251,33 @@ Before suggesting a new feature:
 
 ## Pull Request Process
 
-1. **Fork and create a branch**: Create a descriptive branch name (e.g., `fix-search-bug` or `add-vector-similarity`)
+1. **Fork and create a branch**: Use a conventional branch name such as `feat/index-migrator`, `fix/search-bug`, or `docs/vectorizer-guide`
 2. **Make your changes**: Follow our coding standards and include tests
 3. **Test thoroughly**: Ensure your changes work and don't break existing functionality
 4. **Update documentation**: Add or update documentation as needed
 5. **Submit your PR**: Include a clear description of what your changes do
 
+### Branch Naming and Commit Messages
+
+We use conventional branch names and Conventional Commits to keep history easy to scan and automate.
+
+Branch naming:
+
+- Use `<type>/<short-kebab-description>`
+- Recommended types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `perf`, `build`, `ci`
+- Examples:
+  - `feat/index-migrator`
+  - `fix/async-sentinel-pool`
+  - `docs/migration-benchmarking`
+
+Commit messages:
+
+- Use `<type>(optional-scope): <summary>`
+- Examples:
+  - `feat(migrate): add drop recreate plan generation`
+  - `docs(index-migrator): add benchmark guidance`
+  - `fix(cli): reject unsupported migration diffs`
+
 ### Review Process
 
 - The core team reviews Pull Requests regularly
 
@@ -0,0 +1,71 @@
+# Index Migrator Workspace
+
+## Overview
+
+This directory is the sole source of truth for RedisVL index migration planning.
+
+No implementation should start unless the corresponding task exists in a `*_tasks.md` file in this directory.
+
+This workspace is organized around two phases:
+
+- Phase 1 MVP: `drop_recreate`
+- Phase 2: `iterative_shadow`
+
+The overall initiative covers both simple schema-only rebuilds and harder migrations that change vector dimensions, datatypes, precision, algorithms, or payload shape. Those advanced cases are intentionally delivered after the MVP rather than being treated as out of scope for the product.
+
+The planning goal is to make handoff simple. Another engineer or process should be able to open this directory, read the active spec and task list, and start implementation without needing to rediscover product decisions.
+
+## Guiding Principles
+
+- Prefer simple and safe over clever orchestration.
+- Reuse existing RedisVL primitives before adding new abstractions.
+- Migrate one index at a time.
+- Keep cutover and platform scaling operator-owned.
+- Fail closed on unsupported schema changes.
+- Treat documentation artifacts as implementation inputs, not as narrative background.
+
+## Phase Status
+
+| Phase | Mode | Status | Implementation Target |
+| --- | --- | --- | --- |
+| Phase 1 | `drop_recreate` | Ready | Yes |
+| Phase 2 | `iterative_shadow` | Planned | No |
+
+## Doc Map
+
+- [01_context.md](./01_context.md): customer problem, constraints, and why the work is phased
+- [02_architecture.md](./02_architecture.md): shared architecture, responsibilities, capacity model, and diagrams
+- [03_benchmarking.md](./03_benchmarking.md): migration benchmarking goals, metrics, scenarios, and output artifacts
+- [90_prd.md](./90_prd.md): final product requirements document for team review
+- [10_v1_drop_recreate_spec.md](./10_v1_drop_recreate_spec.md): decision-complete MVP spec
+- [11_v1_drop_recreate_tasks.md](./11_v1_drop_recreate_tasks.md): implementable MVP task list
+- [12_v1_drop_recreate_tests.md](./12_v1_drop_recreate_tests.md): MVP test plan
+- [20_v2_iterative_shadow_spec.md](./20_v2_iterative_shadow_spec.md): future iterative shadow spec
+- [21_v2_iterative_shadow_tasks.md](./21_v2_iterative_shadow_tasks.md): future iterative shadow tasks
+- [22_v2_iterative_shadow_tests.md](./22_v2_iterative_shadow_tests.md): future iterative shadow test plan
+
+## Current Truth
+
+The active implementation target is Phase 1.
+
+- Spec: [10_v1_drop_recreate_spec.md](./10_v1_drop_recreate_spec.md)
+- Tasks: [11_v1_drop_recreate_tasks.md](./11_v1_drop_recreate_tasks.md)
+- Tests: [12_v1_drop_recreate_tests.md](./12_v1_drop_recreate_tests.md)
+
+## Next Actions
+
+- `V1-T01`
+- `V1-T02`
+- `V1-T03`
+
+## Locked Decisions
+
+- The planning workspace lives entirely under `nitin_docs/index_migrator/`.
+- The top-level migration notes have been removed to avoid competing sources of truth.
+- Phase 1 is documentation-backed implementation scope.
+- Phase 2 stays planned until Phase 1 is implemented and learnings are folded back into this directory.
+- The default artifact format for plans and reports is YAML.
+- Benchmarking is required for migration duration, query impact, and resource-impact planning, but it should be implemented with simple structured outputs rather than a separate benchmarking framework.
+- The default execution unit is a single index.
+- The default operational model is operator-owned downtime, cutover, and scaling.
+- Phase 2 owns advanced vector and payload-shape migrations, including datatype, precision, dimension, and algorithm changes.
@@ -0,0 +1,100 @@
+# Index Migrator Context
+
+## Problem Statement
+
+RedisVL does not currently provide a first-class migration workflow for search index changes.
+
+Today, teams can create indexes, delete indexes, inspect index info, and load documents, but they still need ad hoc scripts and operational runbooks to handle schema evolution. This becomes risky when the index is large, shared by multiple applications, or deployed on clustered Redis Cloud or Redis Software.
+
+The migration problem has three different shapes:
+
+- A simpler index rebuild that preserves existing documents and recreates the index definition in place.
+- A shadow migration over the same documents when the target schema can still be built from the current stored payload.
+- A shadow migration with transform or backfill when vector dimensions, datatypes, precision, algorithms, or payload shape change and a new target payload must be built.
+
+This workspace deliberately splits those shapes into phases instead of trying to solve everything in one design. Phase 1 proves the plan-first migration workflow. Phase 2 exists to take on the harder vector and payload-shape migrations safely.
+
+## Customer Requirements
+
+The planning baseline for this work is:
+
+- preserve existing documents during migration
+- capture the previous index configuration before making changes
+- apply only the requested schema changes
+- preview the migration plan before execution
+- support advanced vector migrations such as `HNSW -> FLAT`, `FP32 -> FP16`, vector dimension changes, and payload-shape-changing model or algorithm swaps
+- estimate migration timing, memory impact, and operational impact using simple benchmark artifacts
+- benchmark source-versus-target memory and size changes, including peak overlap footprint during shadow migrations
+- support both guided and scripted workflows
+- make downtime and disruption explicit
+- support large datasets without defaulting to full-keyspace audits or fleet-wide orchestration
+- keep the implementation understandable enough that another team can operate it safely
+
+## Current RedisVL Capabilities
+
+RedisVL already has useful primitives that should be reused instead of replaced:
+
+- `SearchIndex.from_existing()` can reconstruct schema from a live index.
+- `SearchIndex.delete(drop=False)` can remove the index structure without deleting documents.
+- `SearchIndex.info()` can retrieve index stats used for planning and validation.
+- Existing CLI commands already establish the connection and index lookup patterns the migrator can follow.
+
+RedisVL does not yet have:
+
+- a migration planner
+- a schema diff classifier
+- a migration-specific CLI workflow
+- a guided schema migration wizard
+- structured migration reports
+- capacity-aware orchestration across indexes
+- transform or backfill planning for migrations that need new stored payloads
+
+## Why Phase 1 Comes First
+
+Phase 1 is intentionally narrow because it gives the team an MVP that is both useful and low-risk:
+
+- It preserves documents while changing only the index definition.
+- It reuses current RedisVL primitives instead of introducing a separate migration runtime.
+- It keeps operational ownership clear: RedisVL handles planning, execution, and validation for a single index, while the operator handles the migration window and downstream application expectations.
+- It avoids the hardest problems for now: target-payload generation, shadow overlap estimation, cutover automation, and cluster-wide scheduling.
+
+Phase 1 does not define the full migration goal. The harder vector and payload-shape changes are the reason Phase 2 exists.
+
+The MVP should prove the planning model, CLI shape, plan artifact, and validation/reporting flow before more advanced orchestration is attempted.
+
+## Downtime and Disruption
+
+Phase 1 accepts downtime for the migrated index.
+
+Engineers need to plan for the following impacts:
+
+- Search on the target index is unavailable between index drop and recreated index readiness.
+- Query results can be partial or unstable while the recreated index is still completing its initial indexing pass.
+- Reindexing uses shared database resources and can increase CPU, memory, and indexing pressure on the deployment.
+- Shadow migrations can temporarily duplicate index structures and sometimes duplicate payloads as well, increasing peak memory requirements.
+- Downstream applications need either a maintenance window, a degraded mode, or a clear operational pause during the rebuild.
+
+The tooling should not hide these facts. The plan artifact and CLI output must force the user to acknowledge downtime before applying a `drop_recreate` migration.
+
+## Non-Goals
+
+The following are explicitly out of scope for Phase 1, not for the overall initiative:
+
+- a generic migration framework for every schema evolution case
+- automatic platform scaling
+- automatic traffic cutover
+- full key manifest capture by default
+- document transforms or backfills in the MVP execution path
+- payload relocation to a new keyspace in the MVP execution path
+- concurrent migration of multiple large indexes
+- fully managed Redis Cloud or Redis Software integration
+- automatic transform inference or automatic re-embedding
+
+The simplicity rules for this effort are:
+
+- use existing RedisVL index introspection and lifecycle primitives
+- do not design a generic migration framework for the MVP
+- do not automate platform scaling
+- do not automate traffic cutover
+- do not require full key manifests by default
+- require an explicit transform or backfill plan before Phase 2 handles payload-shape-changing migrations