context-creator CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Overview
context-creator is a high-performance Rust CLI tool that converts entire codebases into LLM-optimized Markdown for consumption by models like Gemini. It processes git repositories by intelligently filtering, prioritizing, and formatting code into a single cohesive document.
Key Commands
Development Commands
# Build and run validations (format check + lint)
make build
# Run tests with validation
make test
# Run in development mode with example
make dev
# Install locally
make install
# Generate documentation
make doc
Quality & CI Commands
# Format code
make fmt
# Check formatting (CI)
make fmt-check
# Lint with clippy
make lint
# Run all validation checks
make validate
# Generate test coverage
make coverage
Testing Commands
# Run all tests
cargo test
# Run specific test
cargo test test_name
# Run tests with specific features
cargo test --all-targets
# Run benchmarks
cargo bench
Project Architecture
Core Components
-
CLI Layer (
src/cli.rs)- Command-line argument parsing using clap
- Configuration validation and loading
- Supports multiple input sources: directories, glob patterns, GitHub repos
-
Core Processing (
src/core/)walker.rs: Directory traversal with .gitignore support and custom filteringcontext_builder.rs: Markdown generation with token managementprioritizer.rs: File prioritization based on importance and token limitssemantic/: Language-aware semantic analysis for imports and dependenciescache.rs: File caching for performance optimization
-
Configuration System (
src/config.rs)- TOML-based configuration with
.context-creator.toml - Support for custom priorities, token limits, and ignore patterns
- Hierarchical configuration loading (CLI > config file > defaults)
- TOML-based configuration with
-
Semantic Analysis (
src/core/semantic/)- Multi-language AST parsing using tree-sitter
- Import tracing and dependency resolution
- Language-specific analyzers in
languages/directory - Supports 20+ programming languages
Key Data Flow
- Input Processing: CLI args → Config validation → Directory resolution
- File Discovery: Walker scans directories → Applies ignore patterns → Filters by include patterns
- Semantic Analysis: Optional import tracing → Dependency resolution → Enhanced file relationships
- Prioritization: File importance scoring → Token budget allocation → Selection for output
- Output Generation: Markdown formatting → Token counting → Final document assembly
Configuration Files
.contextignore: Exclude files/directories (gitignore syntax).contextkeep: Prioritize important files.context-creator.toml: Advanced configuration (priorities, token limits, defaults)
Token Management
The tool implements sophisticated token budget management:
- Automatic token counting using tiktoken-rs
- Prompt token reservation for LLM interactions
- Intelligent file truncation and selection
- Per-LLM token limit configuration
Semantic Features
- Import Tracing: Follow import chains across files
- Caller Analysis: Find functions that call specific modules
- Type Dependencies: Include type definitions and interfaces
- Multi-language Support: Unified analysis across different languages
Important Implementation Details
Error Handling
- Uses
anyhowfor error propagation - Custom error types in
src/utils/error.rs - Comprehensive validation at multiple levels
Performance Optimizations
- Parallel processing with rayon
- File caching to avoid re-reading
- Tree-sitter parser pooling
- LRU caching for AST parsing
Testing Strategy
- Unit tests for core functionality
- Integration tests for CLI combinations
- Performance benchmarks
- Semantic analysis stress tests
- Security vulnerability tests
Development Notes
Adding New Languages
Language support is added in src/core/semantic/languages/:
- Implement language-specific analyzer
- Add tree-sitter grammar dependency
- Update language registry in
mod.rs
Configuration Precedence
- Explicit CLI arguments (highest priority)
- Config file token limits (for prompts)
- Config file defaults
- Hard-coded defaults (lowest priority)
File Prioritization
Base file type priorities can be customized via config file using glob patterns. Uses "first-match-wins" evaluation order.