rag-cli CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
CRITICAL: No Emojis Policy
NEVER use emojis in ANY documentation, plans, guides, or written output for this project UNLESS explicitly given permission. This includes:
- README files
- Documentation files
- Code comments and docstrings
- Plan descriptions
- Commit messages
- All text-based output
Focus on clear, professional documentation without decorative elements.
Project Overview
RAG-CLI v2.0 is a local Retrieval-Augmented Generation system designed as a Claude Code plugin. It processes documents locally, generates embeddings, stores vectors in ChromaDB, and uses claude-haiku-4-5-20251001 for response generation.
This version features a complete restructure with clean separation between core library and plugin code, marketplace-ready lifecycle management, and improved maintainability.
Project Structure (v2.0)
RAG-CLI/
src/
rag_cli/ # CORE LIBRARY (plugin-agnostic)
__init__.py # Version: 2.0.0
core/ # Core RAG functionality
constants.py # Centralized configuration
document_processor.py # Document chunking
embeddings.py # Embedding generation
vector_store.py # ChromaDB operations
retrieval_pipeline.py # Hybrid search + reranking
claude_integration.py # Claude API integration
[30+ other core modules]
agents/ # Multi-agent framework
base_agent.py # Agent base class
query_decomposer.py # Query decomposition
result_synthesizer.py # Result synthesis
maf/ # Multi-Agent Framework
integrations/ # External integrations
arxiv_connector.py # ArXiv integration
tavily_connector.py # Tavily search
maf_connector.py # MAF integration
cli/ # Command-line tools
index.py # rag-index command
retrieve.py # rag-retrieve command
utils/ # Shared utilities
rag_cli_plugin/ # PLUGIN CODE (Claude Code specific)
__init__.py # Version: 2.0.0
lifecycle/ # Lifecycle management
installer.py # Marketplace installation
updater.py # Update handling
commands/ # Slash commands
update_rag.py # /update-rag command
rag_project_indexer.py # /rag-project command
[other commands]
hooks/ # Event hooks
user-prompt-submit.py # Main RAG orchestration
document-indexing.py # Auto-indexing
session-start.py # Session initialization
[other hooks]
mcp/ # MCP server
unified_server.py # Single unified MCP server
services/ # Plugin services
service_manager.py # Service registry
dashboard.py # Web dashboard
tcp_server.py # Monitoring server
[monitoring modules]
skills/ # Agent skills
config/ # Configuration
defaults/ # Default configurations
mcp.json # MCP server config
rag_settings.json # RAG settings
services.json # Service settings
[other defaults]
templates/ # User-editable templates
.env.template # Environment template
citation_config.json.template
schemas/ # JSON schemas
settings.schema.json # Settings validation
scripts/ # Scripts
install/ # Installation scripts
update/ # Update scripts
utils/ # Utility scripts
update_imports_v2.py # Import updater
update_plugin_imports.py # Plugin import updater
.claude-plugin/ # Plugin metadata
plugin.json # Plugin configuration (v2.0.0)
hooks.json # Hook configurations
lifecycle.json # Lifecycle hooks (NEW)
commands/ # Command documentation
data/ # Runtime data
vectors/ # ChromaDB indexes
cache/ # Query cache
documents/ # Source documents
logs/ # Application logs
tests/ # Test suite
docs/ # Documentation
pyproject.toml # Package configuration (v2.0.0)
requirements.txt # Python dependencies
README.md # Project README
LICENSE # MIT License
CHANGELOG.md # Version history
Package Structure
RAG-CLI v2.0 uses a dual-package src-layout structure:
-
Core Library (rag_cli): Platform-agnostic RAG engine
from rag_cli.core.X import Yfrom rag_cli.agents.X import Yfrom rag_cli.integrations.X import Y
-
Plugin Code (rag_cli_plugin): Claude Code integration
from rag_cli_plugin.services.X import Yfrom rag_cli_plugin.lifecycle.X import Y
This separation allows the core RAG engine to be used independently while keeping Claude Code-specific code isolated.
Development Commands
Initial Setup
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install package in editable mode (development)
pip install -e .
Plugin Installation
# Install as Claude Code plugin
python install_plugin.py
# This will:
# 1. Install RAG-CLI as Python package (pip install -e .)
# 2. Create plugin directory in ~/.claude/plugins/rag-cli/
# 3. Copy configuration files and commands
# 4. Set up data directory symlinks
# 5. Configure MCP server
Command-Line Tools
# Index documents (after installation)
rag-index ./data/documents --recursive --pattern "*.md"
# Retrieve and generate responses
rag-retrieve --query "How to configure API?" --top-k 5
# Interactive retrieval mode
rag-retrieve --interactive
# Run monitoring server
rag-monitor
# Or: python -m monitoring
# Test installation
python scripts/verify_installation.py
Testing
# Run all tests
pytest
# Run specific test module
pytest tests/test_vector_store.py
# Run with coverage
pytest --cov=src --cov-report=html
# Run integration tests only
pytest tests/test_integration.py -v
Core Implementation Details
0. Global Constants (core/constants.py)
Centralized configuration values for easier maintenance and tuning:
- Cache Configuration:
TCP_CHECK_CACHE_SECONDS,RESPONSE_CACHE_MAX_SIZE,EMBEDDING_CACHE_SIZE - Token Estimation:
CHARS_PER_TOKEN,TOKEN_ESTIMATION_RATIO - Search Parameters:
DEFAULT_TOP_K,MAX_TOP_K,MAX_QUERY_LENGTH - Retrieval Weights:
DEFAULT_VECTOR_WEIGHT(0.7),DEFAULT_KEYWORD_WEIGHT(0.3) - File Processing:
CHUNK_SIZE_TOKENS(500),CHUNK_OVERLAP_TOKENS(100),MAX_FILE_SIZE_MB - Vector Store Thresholds:
HNSW_THRESHOLD_VECTORS(2000),IVF_THRESHOLD_VECTORS(1M) - Performance Tuning:
DEFAULT_BATCH_SIZE(32),MAX_WORKERS(4) - Monitoring Limits:
MAX_EVENT_HISTORY,METRICS_HISTORY_SIZE - API Limits:
TAVILY_FREE_TIER_LIMIT,CLAUDE_RATE_LIMIT_REQUESTS - Timeouts:
DEFAULT_HTTP_TIMEOUT,EMBEDDING_TIMEOUT,SEARCH_TIMEOUT
All magic numbers throughout the codebase should reference these constants for consistency and maintainability.
1. Document Processing (core/document_processor.py)
- Chunk size:
CHUNK_SIZE_TOKENS(500 tokens) - Overlap:
CHUNK_OVERLAP_TOKENS(100 tokens, 20%) - Use RecursiveCharacterTextSplitter from langchain
- Add contextual headers (document title, section)
- Support formats: MD, PDF, DOCX, HTML, TXT
- Max file size:
MAX_FILE_SIZE_MB(10 MB)
2. Embeddings (core/embeddings.py)
- Model: sentence-transformers/all-MiniLM-L6-v2 (v5.1+)
- Dimensions: 384
- Batch processing:
DEFAULT_BATCH_SIZE(32) - LRU cache for repeated queries:
EMBEDDING_CACHE_SIZE(1000)
3. Vector Store (core/vector_store.py)
- ChromaDB PersistentClient for <
HNSW_THRESHOLD_VECTORS(2000 vectors) - ChromaDB HNSW (built-in) for 2K-1M vectors (threshold:
HNSW_THRESHOLD_VECTORS) - ChromaDB IVF for >
IVF_THRESHOLD_VECTORS(1M+ vectors) - Native persistence (automatic save/load)
- Metadata storage with pickle
4. Retrieval Pipeline (core/retrieval_pipeline.py)
- Hybrid search:
DEFAULT_VECTOR_WEIGHT(0.7) +DEFAULT_KEYWORD_WEIGHT(0.3) - Default results:
DEFAULT_TOP_K(5), max:MAX_TOP_K(100) - Two-stage: retrieve 2×top_k, rerank to top_k
- Cross-encoder: cross-encoder/ms-marco-MiniLM-L-6-v2
- Target latency: <100ms (configurable via
SEARCH_TIMEOUT)
5. Claude Integration (core/claude_integration.py)
- Model: claude-haiku-4-5-20251001
- Stream responses for better UX
- Context assembly from retrieved chunks
- Prompt template with citations
- Response caching:
RESPONSE_CACHE_MAX_SIZE(100) - Rate limiting:
CLAUDE_RATE_LIMIT_REQUESTS(100/min)
6. Monitoring (monitoring/tcp_server.py)
- TCP server on port 9999
- Endpoints: /status, /logs, /metrics
- JSON responses
- Real-time log streaming
- Event history:
MAX_EVENT_HISTORY(100) - Metrics retention:
METRICS_HISTORY_SIZE(1000) - Connection caching:
TCP_CHECK_CACHE_SECONDS(30)
ChromaDB Persistence and Document Update Best Practices (v2.0+)
This section documents the enhanced persistence and update strategies implemented in v2.0 to ensure reliable document management across sessions.
Core Enhancements
1. Automatic Persistence
- ChromaDB PersistentClient automatically saves all changes
- No manual save() or load() calls required
- Data persists to:
data/vectors/chroma_db/ - Singleton pattern ensures single source of truth per session
- Graceful shutdown in session-end hook saves duplicate registry
2. Upsert Functionality (vector_store.py:270-361)
Use upsert() instead of add() when re-indexing documents to prevent duplicates:
# PREFERRED: Update existing or insert new
vector_store.upsert(
embeddings=embeddings,
texts=texts,
ids=optional_ids, # If None, auto-generates
metadata=metadata,
sources=sources
)
# OLD WAY: Always adds, creates duplicates on re-index
vector_store.add(embeddings, texts, metadata, sources)
When to use upsert():
- Re-indexing updated documents
- Correcting indexed content
- Maintaining document versions
- Any scenario where the same source may be indexed multiple times
3. Source-Based Operations (vector_store.py:492-612)
Get all vectors from a source:
# Returns List[VectorMetadata]
vectors = vector_store.get_by_source("path/to/document.md")
print(f"Found {len(vectors)} chunks from document")
Delete all vectors from a source:
# Useful before re-indexing a modified file
deleted_count = vector_store.delete_by_source("path/to/document.md")
Replace all vectors from a source:
# Combines delete + add in one operation
new_ids = vector_store.update_by_source(
source="path/to/document.md",
embeddings=new_embeddings,
texts=new_texts,
metadata=new_metadata
)
4. Duplicate Detection Integration (cli/index.py:124-181)
The indexing pipeline now integrates content hash-based duplicate detection:
Incremental Indexing (skip unchanged documents):
rag-index ./docs --recursive --incremental
This mode:
- Computes SHA-256 hash of each document
- Skips documents with matching content hash
- Only processes new or changed documents
- Updates duplicate registry on completion
Update Mode (replace changed documents):
rag-index ./docs --recursive --update
This mode:
- Detects documents with changed content
- Deletes old chunks from changed sources
- Uses upsert() to add new chunks
- Maintains duplicate registry consistency
Combine both modes:
rag-index ./docs --recursive --incremental --update
5. Metadata Validation (vector_store.py:186-225)
All metadata is validated before storage:
- Checks for reserved ChromaDB keys (id, embedding, document, etc.)
- Validates metadata is JSON-serializable
- Warns about large metadata (>10KB)
- Prevents invalid metadata from corrupting the index
6. Session Lifecycle Management
Session Start (hooks/session-start.py:93-199):
- Performs comprehensive ChromaDB health check
- Verifies collection accessibility
- Tests query capability with peek()
- Reports vector count and status
- Non-blocking: allows session to continue even if issues detected
Session End (hooks/session-end.py:130-169):
- Saves duplicate detector registry
- Logs final vector count
- Allows ChromaDB auto-persistence to complete
- Cleans up temporary cache files
- Graceful shutdown prevents data loss
Best Practices Summary
-
Always use upsert() for re-indexing:
# Good: Update existing entries vector_store.upsert(embeddings, texts, sources=sources) # Bad: Creates duplicates on re-index vector_store.add(embeddings, texts, sources=sources) -
Use source-based operations for document management:
# Check if document is indexed existing = vector_store.get_by_source("doc.md") # Delete before re-indexing (if not using upsert) if existing: vector_store.delete_by_source("doc.md") # Or use update_by_source for atomic replace vector_store.update_by_source("doc.md", new_embeddings, new_texts) -
Use incremental indexing for large document sets:
# First time: full index rag-index ./docs --recursive # Subsequent updates: only changed documents rag-index ./docs --recursive --incremental --update -
Trust automatic persistence:
- No need to call save() or load()
- ChromaDB handles persistence automatically
- Session hooks ensure clean shutdown
- Data survives across sessions
-
Monitor duplicate registry:
- Location:
data/vectors/content_hashes.json - Updated automatically during indexing
- Cleaned up by session-end hook
- Used for incremental indexing decisions
- Location:
Testing Persistence
Run the test suite to verify persistence and updates:
python test/test_chromadb_persistence.py
Tests verify:
- Automatic persistence across operations
- Upsert replaces existing vectors
- Source-based operations work correctly
- Duplicate detection prevents re-processing
- Health checks detect collection issues
Troubleshooting
Issue: Duplicates in index after re-indexing
- Solution: Use
--updateflag or upsert() instead of add()
Issue: "Collection does not exist" error
- Solution: Normal on first run - collection created automatically
- If persistent: Check persist_directory permissions
Issue: Vectors not persisting across sessions
- Solution: Verify persist_directory exists and is writable
- Check session-end hook executed successfully
- Review logs for ChromaDB errors
Issue: Incremental indexing still processes all documents
- Solution: Ensure duplicate_detector registry exists
- Check if document content actually changed
- Verify content hashing is working (see logs)
Import Guidelines (v2.0)
All imports MUST use the new dual-package structure:
# CORRECT - Core library imports
from rag_cli.core.config import get_config
from rag_cli.core.embeddings import EmbeddingGenerator
from rag_cli.agents.base_agent import BaseAgent
from rag_cli.integrations.tavily_connector import TavilyConnector
# CORRECT - Plugin imports
from rag_cli_plugin.services.service_manager import ServiceManager
from rag_cli_plugin.lifecycle.installer import install_dependencies
from rag_cli_plugin.mcp.unified_server import MCPServer
# INCORRECT - Old v1.x imports (DO NOT USE)
from core.config import get_config
from monitoring.logger import get_logger
from plugin.mcp.unified_server import MCPServer
from src.core.config import get_config
The package is installed using pip with both rag_cli and rag_cli_plugin as top-level packages.
Git Commit Checkpoints
Create commits at these milestones:
- Project structure setup
- Document processor implementation
- Embedding system complete
- Vector store operations
- Retrieval pipeline working
- Claude integration functional
- Monitoring system online
- Tests passing
- Plugin components ready
Use conventional commits:
feature:new functionalityfix:bug fixesrefactor:code improvementstest:test additionsdocs:documentation
Key Technical Decisions
- Local-first: Everything runs locally except Claude API and online research calls
- Lightweight model: all-MiniLM-L6-v2 for speed (0.5s/100 docs)
- ChromaDB for development: Simple, fast, no persistence complexity
- Hybrid retrieval: Better accuracy than pure vector search
- Streaming responses: Improved perceived performance
- TCP monitoring: PowerShell-friendly interface
Testing Strategy
- Unit tests: Each module in isolation
- Integration tests: Component interactions
- Performance tests: Latency and throughput
- Quality tests: RAGAS metrics (precision, recall, faithfulness)
Target metrics:
- Vector search: <100ms
- End-to-end: <5 seconds
- Retrieval precision: >0.8
- Faithfulness: >0.7
Implementation Checklist
- [ ] Create project structure
- [ ] Set up virtual environment
- [ ] Install dependencies
- [ ] Initialize git repository
- [ ] Implement document processor
- [ ] Build embedding system
- [ ] Create ChromaDB vector store
- [ ] Develop retrieval pipeline
- [ ] Integrate Claude API
- [ ] Add monitoring server
- [ ] Create PowerShell interface
- [ ] Write unit tests
- [ ] Add integration tests
- [ ] Create indexing script
- [ ] Build retrieval CLI
- [ ] Setup Claude Code plugin structure
- [ ] Test end-to-end pipeline
- [ ] Document usage examples
Quick Debugging
# Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)
# Test embeddings
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
embedding = model.encode("test query")
print(f"Embedding shape: {embedding.shape}") # Should be (384,)
# Test ChromaDB
import chromadb
client = chromadb.PersistentClient(path="./test_chroma")
collection = client.get_or_create_collection(name="test")
collection.add(
embeddings=[embedding.tolist()],
documents=["test query"],
ids=["test1"]
)
results = collection.query(query_embeddings=[embedding.tolist()], n_results=1)
print(f"Search result: {results}") # Should return the test document
Known Issues and Limitations
PostToolUse Hook Disabled (Claude Code Framework Bug)
The PostToolUse hook (src/rag_cli_plugin/hooks/response-post.py) is currently disabled due to a JSON parsing bug in the Claude Code plugin framework.
Impact:
- RAG functionality works normally
- Context retrieval and injection is unaffected
- Citations are not automatically added to responses
Workaround:
- Hook is disabled in
.claude-plugin/hooks.json(line 40:"enabled": false) - System remains stable and fully functional
- Users can manually request source information if needed
Resolution:
- Waiting for Claude Code framework update to fix JSON parsing
- Do not re-enable this hook until the framework bug is resolved
- See
KNOWN_ISSUES.mdfor detailed information and testing instructions
For Developers:
- Do NOT modify the hook's enabled status without testing
- The hook code is functional in isolation (unit tests pass)
- Issue is specific to the plugin framework's PostToolUse processing
- Alternative citation methods can be explored via UserPromptSubmit hook
References
- Full specifications:
RAG-implementation.md - Known issues and workarounds:
KNOWN_ISSUES.md - Claude Code plugin docs: https://docs.claude.com/en/docs/claude-code/
- ChromaDB documentation: https://docs.trychroma.com/