Skip to content

Commit 8ab9d71

Browse files
committed
docs(index-migrator): add planning workspace and repo guidance
1 parent a2808c4 commit 8ab9d71

13 files changed

Lines changed: 2479 additions & 1 deletion

AGENTS.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# AGENTS.md - RedisVL Project Context
2+
3+
## Frequently Used Commands
4+
5+
```bash
6+
# Development workflow
7+
make install # Install dependencies
8+
make format # Format code (black + isort)
9+
make check-types # Run mypy type checking
10+
make lint # Run all linting (format + types)
11+
make test # Run tests (no external APIs)
12+
make test-all # Run all tests (includes API tests)
13+
make check # Full check (lint + test)
14+
15+
# Redis setup
16+
make redis-start # Start Redis container
17+
make redis-stop # Stop Redis container
18+
19+
# Documentation
20+
make docs-build # Build documentation
21+
make docs-serve # Serve docs locally
22+
```
23+
24+
Pre-commit hooks are also configured, which you should
25+
run before you commit:
26+
```bash
27+
pre-commit run --all-files
28+
```
29+
30+
## Important Architectural Patterns
31+
32+
### Async/Sync Dual Interfaces
33+
- Most core classes have both sync and async versions (e.g., `SearchIndex` / `AsyncSearchIndex`)
34+
- Follow existing patterns when adding new functionality
35+
36+
### Schema-Driven Design
37+
```python
38+
# Index schemas define structure
39+
schema = IndexSchema.from_yaml("schema.yaml")
40+
index = SearchIndex(schema, redis_url="redis://localhost:6379")
41+
```
42+
43+
## Critical Rules
44+
45+
### Do Not Modify
46+
- **CRITICAL**: Do not change this line unless explicitly asked:
47+
```python
48+
token.strip().strip(",").replace(""", "").replace(""", "").lower()
49+
```
50+
51+
### Git Operations
52+
**CRITICAL**: NEVER use `git push` or attempt to push to remote repositories. The user will handle all git push operations.
53+
54+
### Branch and Commit Policy
55+
**IMPORTANT**: Use conventional branch names and conventional commits.
56+
57+
Branch naming:
58+
- Human-created branches should use `<type>/<short-kebab-description>`
59+
- Automation-created branches may use `codex/<type>/<short-kebab-description>`
60+
- Preferred branch types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `perf`, `build`, `ci`
61+
- Examples:
62+
- `feat/index-migrator`
63+
- `fix/async-sentinel-pool`
64+
- `docs/index-migrator-benchmarking`
65+
- `codex/feat/index-migrator`
66+
67+
Commit messages:
68+
- Use Conventional Commits: `<type>(optional-scope): <summary>`
69+
- Preferred commit types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `perf`, `build`, `ci`
70+
- Examples:
71+
- `feat(migrate): add drop recreate planning docs`
72+
- `docs(index-migrator): add benchmarking guidance`
73+
- `fix(cli): validate migrate plan inputs`
74+
75+
### Code Quality
76+
**IMPORTANT**: Always run `make format` before committing code to ensure proper formatting and linting compliance.
77+
78+
### README.md Maintenance
79+
**IMPORTANT**: DO NOT modify README.md unless explicitly requested.
80+
81+
**If you need to document something, use these alternatives:**
82+
- Development info → CONTRIBUTING.md
83+
- API details → docs/ directory
84+
- Examples → docs/examples/
85+
- Project memory (explicit preferences, directives, etc.) → AGENTS.md
86+
87+
## Code Style Preferences
88+
89+
### Import Organization
90+
- **Prefer module-level imports** by default for clarity and standard Python conventions
91+
- **Use local/inline imports only when necessary** for specific reasons:
92+
- Avoiding circular import dependencies
93+
- Improving startup time for heavy/optional dependencies
94+
- Lazy loading for performance-critical paths
95+
- When using local imports, add a brief comment explaining why (e.g., `# Local import to avoid circular dependency`)
96+
97+
### Comments and Output
98+
- **No emojis in code comments or print statements**
99+
- Keep comments professional and focused on technical clarity
100+
- Use emojis sparingly only in user-facing documentation (markdown files), not in Python code
101+
102+
### General Guidelines
103+
- Follow existing patterns in the RedisVL codebase
104+
- Maintain consistency with the project's established conventions
105+
- Run `make format` before committing to ensure code quality standards
106+
107+
## Testing Notes
108+
RedisVL uses `pytest` with `testcontainers` for testing.
109+
110+
- `make test` - unit tests only (no external APIs)
111+
- `make test-all` - run the full suite, including tests that call external APIs
112+
- `pytest --run-api-tests` - explicitly run API-dependent tests (e.g., LangCache,
113+
external vectorizer/reranker providers). These require the appropriate API
114+
keys and environment variables to be set.
115+
116+
## Project Structure
117+
118+
```
119+
redisvl/
120+
├── cli/ # Command-line interface (rvl command)
121+
├── extensions/ # AI extensions (cache, memory, routing)
122+
│ ├── cache/ # Semantic caching for LLMs
123+
│ ├── llmcache/ # LLM-specific caching
124+
│ ├── message_history/ # Chat history management
125+
│ ├── router/ # Semantic routing
126+
│ └── session_manager/ # Session management
127+
├── index/ # SearchIndex classes (sync/async)
128+
├── query/ # Query builders (Vector, Range, Filter, Count)
129+
├── redis/ # Redis client utilities
130+
├── schema/ # Index schema definitions
131+
└── utils/ # Utilities (vectorizers, rerankers, optimization)
132+
├── rerank/ # Result reranking
133+
└── vectorize/ # Embedding providers integration
134+
```
135+
136+
## Core Components
137+
138+
### 1. Index Management
139+
- `SearchIndex` / `AsyncSearchIndex` - Main interface for Redis vector indices
140+
- `IndexSchema` - Define index structure with fields (text, tags, vectors, etc.)
141+
- Support for JSON and Hash storage types
142+
143+
### 2. Query System
144+
- `VectorQuery` - Semantic similarity search
145+
- `RangeQuery` - Vector search within distance range
146+
- `FilterQuery` - Metadata filtering and full-text search
147+
- `CountQuery` - Count matching records
148+
- Etc.
149+
150+
### 3. AI Extensions
151+
- `SemanticCache` - LLM response caching with semantic similarity
152+
- `EmbeddingsCache` - Cache for vector embeddings
153+
- `MessageHistory` - Chat history with recency/relevancy retrieval
154+
- `SemanticRouter` - Route queries to topics/intents
155+
156+
### 4. Vectorizers (Optional Dependencies)
157+
- OpenAI, Azure OpenAI, Cohere, HuggingFace, Mistral, VoyageAI
158+
- Custom vectorizer support
159+
- Batch processing capabilities
160+
161+
## Documentation
162+
- Main docs: https://docs.redisvl.com
163+
- Built with Sphinx from `docs/` directory
164+
- Includes API reference and user guides
165+
- Example notebooks in documentation `docs/user_guide/...`

CONTRIBUTING.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -251,12 +251,33 @@ Before suggesting a new feature:
251251

252252
## Pull Request Process
253253

254-
1. **Fork and create a branch**: Create a descriptive branch name (e.g., `fix-search-bug` or `add-vector-similarity`)
254+
1. **Fork and create a branch**: Use a conventional branch name such as `feat/index-migrator`, `fix/search-bug`, or `docs/vectorizer-guide`
255255
2. **Make your changes**: Follow our coding standards and include tests
256256
3. **Test thoroughly**: Ensure your changes work and don't break existing functionality
257257
4. **Update documentation**: Add or update documentation as needed
258258
5. **Submit your PR**: Include a clear description of what your changes do
259259

260+
### Branch Naming and Commit Messages
261+
262+
We use conventional branch names and Conventional Commits to keep history easy to scan and automate.
263+
264+
Branch naming:
265+
266+
- Use `<type>/<short-kebab-description>`
267+
- Recommended types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `perf`, `build`, `ci`
268+
- Examples:
269+
- `feat/index-migrator`
270+
- `fix/async-sentinel-pool`
271+
- `docs/migration-benchmarking`
272+
273+
Commit messages:
274+
275+
- Use `<type>(optional-scope): <summary>`
276+
- Examples:
277+
- `feat(migrate): add drop recreate plan generation`
278+
- `docs(index-migrator): add benchmark guidance`
279+
- `fix(cli): reject unsupported migration diffs`
280+
260281
### Review Process
261282

262283
- The core team reviews Pull Requests regularly
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Index Migrator Workspace
2+
3+
## Overview
4+
5+
This directory is the sole source of truth for RedisVL index migration planning.
6+
7+
No implementation should start unless the corresponding task exists in a `*_tasks.md` file in this directory.
8+
9+
This workspace is organized around two phases:
10+
11+
- Phase 1 MVP: `drop_recreate`
12+
- Phase 2: `iterative_shadow`
13+
14+
The overall initiative covers both simple schema-only rebuilds and harder migrations that change vector dimensions, datatypes, precision, algorithms, or payload shape. Those advanced cases are intentionally delivered after the MVP rather than being treated as out of scope for the product.
15+
16+
The planning goal is to make handoff simple. Another engineer or process should be able to open this directory, read the active spec and task list, and start implementation without needing to rediscover product decisions.
17+
18+
## Guiding Principles
19+
20+
- Prefer simple and safe over clever orchestration.
21+
- Reuse existing RedisVL primitives before adding new abstractions.
22+
- Migrate one index at a time.
23+
- Keep cutover and platform scaling operator-owned.
24+
- Fail closed on unsupported schema changes.
25+
- Treat documentation artifacts as implementation inputs, not as narrative background.
26+
27+
## Phase Status
28+
29+
| Phase | Mode | Status | Implementation Target |
30+
| --- | --- | --- | --- |
31+
| Phase 1 | `drop_recreate` | Ready | Yes |
32+
| Phase 2 | `iterative_shadow` | Planned | No |
33+
34+
## Doc Map
35+
36+
- [01_context.md](./01_context.md): customer problem, constraints, and why the work is phased
37+
- [02_architecture.md](./02_architecture.md): shared architecture, responsibilities, capacity model, and diagrams
38+
- [03_benchmarking.md](./03_benchmarking.md): migration benchmarking goals, metrics, scenarios, and output artifacts
39+
- [90_prd.md](./90_prd.md): final product requirements document for team review
40+
- [10_v1_drop_recreate_spec.md](./10_v1_drop_recreate_spec.md): decision-complete MVP spec
41+
- [11_v1_drop_recreate_tasks.md](./11_v1_drop_recreate_tasks.md): implementable MVP task list
42+
- [12_v1_drop_recreate_tests.md](./12_v1_drop_recreate_tests.md): MVP test plan
43+
- [20_v2_iterative_shadow_spec.md](./20_v2_iterative_shadow_spec.md): future iterative shadow spec
44+
- [21_v2_iterative_shadow_tasks.md](./21_v2_iterative_shadow_tasks.md): future iterative shadow tasks
45+
- [22_v2_iterative_shadow_tests.md](./22_v2_iterative_shadow_tests.md): future iterative shadow test plan
46+
47+
## Current Truth
48+
49+
The active implementation target is Phase 1.
50+
51+
- Spec: [10_v1_drop_recreate_spec.md](./10_v1_drop_recreate_spec.md)
52+
- Tasks: [11_v1_drop_recreate_tasks.md](./11_v1_drop_recreate_tasks.md)
53+
- Tests: [12_v1_drop_recreate_tests.md](./12_v1_drop_recreate_tests.md)
54+
55+
## Next Actions
56+
57+
- `V1-T01`
58+
- `V1-T02`
59+
- `V1-T03`
60+
61+
## Locked Decisions
62+
63+
- The planning workspace lives entirely under `nitin_docs/index_migrator/`.
64+
- The top-level migration notes have been removed to avoid competing sources of truth.
65+
- Phase 1 is documentation-backed implementation scope.
66+
- Phase 2 stays planned until Phase 1 is implemented and learnings are folded back into this directory.
67+
- The default artifact format for plans and reports is YAML.
68+
- Benchmarking is required for migration duration, query impact, and resource-impact planning, but it should be implemented with simple structured outputs rather than a separate benchmarking framework.
69+
- The default execution unit is a single index.
70+
- The default operational model is operator-owned downtime, cutover, and scaling.
71+
- Phase 2 owns advanced vector and payload-shape migrations, including datatype, precision, dimension, and algorithm changes.
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# Index Migrator Context
2+
3+
## Problem Statement
4+
5+
RedisVL does not currently provide a first-class migration workflow for search index changes.
6+
7+
Today, teams can create indexes, delete indexes, inspect index info, and load documents, but they still need ad hoc scripts and operational runbooks to handle schema evolution. This becomes risky when the index is large, shared by multiple applications, or deployed on clustered Redis Cloud or Redis Software.
8+
9+
The migration problem has three different shapes:
10+
11+
- A simpler index rebuild that preserves existing documents and recreates the index definition in place.
12+
- A shadow migration over the same documents when the target schema can still be built from the current stored payload.
13+
- A shadow migration with transform or backfill when vector dimensions, datatypes, precision, algorithms, or payload shape change and a new target payload must be built.
14+
15+
This workspace deliberately splits those shapes into phases instead of trying to solve everything in one design. Phase 1 proves the plan-first migration workflow. Phase 2 exists to take on the harder vector and payload-shape migrations safely.
16+
17+
## Customer Requirements
18+
19+
The planning baseline for this work is:
20+
21+
- preserve existing documents during migration
22+
- capture the previous index configuration before making changes
23+
- apply only the requested schema changes
24+
- preview the migration plan before execution
25+
- support advanced vector migrations such as `HNSW -> FLAT`, `FP32 -> FP16`, vector dimension changes, and payload-shape-changing model or algorithm swaps
26+
- estimate migration timing, memory impact, and operational impact using simple benchmark artifacts
27+
- benchmark source-versus-target memory and size changes, including peak overlap footprint during shadow migrations
28+
- support both guided and scripted workflows
29+
- make downtime and disruption explicit
30+
- support large datasets without defaulting to full-keyspace audits or fleet-wide orchestration
31+
- keep the implementation understandable enough that another team can operate it safely
32+
33+
## Current RedisVL Capabilities
34+
35+
RedisVL already has useful primitives that should be reused instead of replaced:
36+
37+
- `SearchIndex.from_existing()` can reconstruct schema from a live index.
38+
- `SearchIndex.delete(drop=False)` can remove the index structure without deleting documents.
39+
- `SearchIndex.info()` can retrieve index stats used for planning and validation.
40+
- Existing CLI commands already establish the connection and index lookup patterns the migrator can follow.
41+
42+
RedisVL does not yet have:
43+
44+
- a migration planner
45+
- a schema diff classifier
46+
- a migration-specific CLI workflow
47+
- a guided schema migration wizard
48+
- structured migration reports
49+
- capacity-aware orchestration across indexes
50+
- transform or backfill planning for migrations that need new stored payloads
51+
52+
## Why Phase 1 Comes First
53+
54+
Phase 1 is intentionally narrow because it gives the team an MVP that is both useful and low-risk:
55+
56+
- It preserves documents while changing only the index definition.
57+
- It reuses current RedisVL primitives instead of introducing a separate migration runtime.
58+
- It keeps operational ownership clear: RedisVL handles planning, execution, and validation for a single index, while the operator handles the migration window and downstream application expectations.
59+
- It avoids the hardest problems for now: target-payload generation, shadow overlap estimation, cutover automation, and cluster-wide scheduling.
60+
61+
Phase 1 does not define the full migration goal. The harder vector and payload-shape changes are the reason Phase 2 exists.
62+
63+
The MVP should prove the planning model, CLI shape, plan artifact, and validation/reporting flow before more advanced orchestration is attempted.
64+
65+
## Downtime and Disruption
66+
67+
Phase 1 accepts downtime for the migrated index.
68+
69+
Engineers need to plan for the following impacts:
70+
71+
- Search on the target index is unavailable between index drop and recreated index readiness.
72+
- Query results can be partial or unstable while the recreated index is still completing its initial indexing pass.
73+
- Reindexing uses shared database resources and can increase CPU, memory, and indexing pressure on the deployment.
74+
- Shadow migrations can temporarily duplicate index structures and sometimes duplicate payloads as well, increasing peak memory requirements.
75+
- Downstream applications need either a maintenance window, a degraded mode, or a clear operational pause during the rebuild.
76+
77+
The tooling should not hide these facts. The plan artifact and CLI output must force the user to acknowledge downtime before applying a `drop_recreate` migration.
78+
79+
## Non-Goals
80+
81+
The following are explicitly out of scope for Phase 1, not for the overall initiative:
82+
83+
- a generic migration framework for every schema evolution case
84+
- automatic platform scaling
85+
- automatic traffic cutover
86+
- full key manifest capture by default
87+
- document transforms or backfills in the MVP execution path
88+
- payload relocation to a new keyspace in the MVP execution path
89+
- concurrent migration of multiple large indexes
90+
- fully managed Redis Cloud or Redis Software integration
91+
- automatic transform inference or automatic re-embedding
92+
93+
The simplicity rules for this effort are:
94+
95+
- use existing RedisVL index introspection and lifecycle primitives
96+
- do not design a generic migration framework for the MVP
97+
- do not automate platform scaling
98+
- do not automate traffic cutover
99+
- do not require full key manifests by default
100+
- require an explicit transform or backfill plan before Phase 2 handles payload-shape-changing migrations

0 commit comments

Comments
 (0)