CLAUDE.mdpython

CCProxy CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

View Source

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project: CCProxy – OpenAI-compatible proxy for Anthropic Messages API

Common commands

  • Install deps: uv pip install -r requirements.txt (includes asyncer for async operations and aiofiles for async file I/O)
  • Run dev (uvicorn): python main.py
  • Run via script (env checks): ./run-ccproxy.sh
  • Docker build/run (compose): ./docker-compose-run.sh up -d
  • Docker logs: ./docker-compose-run.sh logs -f
  • Lint (ruff check): ./start-lint.sh --check
  • Lint fix + format: ./start-lint.sh --all or ./start-lint.sh --fix
  • Typecheck: mypy . (strict mode enabled)
  • Tests (all): ./run-tests.sh or uv run pytest -q
  • Tests with coverage: ./run-tests.sh --coverage
  • Single test file: uv run pytest -q test_optimized_client.py
  • Single test by node: uv run pytest -q test_optimized_client.py::test_name

Environment configuration Required (via .env or environment)

  • OPENAI_API_KEY or OPENROUTER_API_KEY
  • BIG_MODEL_NAME
  • SMALL_MODEL_NAME Optional
  • OPENAI_BASE_URL (default https://api.openai.com/v1)
  • HOST (default 127.0.0.1)
  • PORT (default 11434)
  • LOG_LEVEL (default INFO)
  • LOG_FILE_PATH (default log.jsonl)
  • ERROR_LOG_FILE_PATH (default error.jsonl)
  • WEB_CONCURRENCY (for multi-worker Uvicorn deployments) Thread Pool Configuration (all optional)
  • THREAD_POOL_MAX_WORKERS (default None - auto-calculates based on CPU cores, max 40)
  • THREAD_POOL_HIGH_CPU_THRESHOLD (default None - auto-calculates based on CPU count: 60% + 2.5% per core, max 90%)
  • THREAD_POOL_AUTO_SCALE (default False - enable dynamic scaling based on CPU contention) Cache Warmup (all optional)
  • CACHE_WARMUP_ENABLED (default False)
  • CACHE_WARMUP_FILE_PATH (default cache_warmup.json)
  • CACHE_WARMUP_MAX_ITEMS (default 100)
  • CACHE_WARMUP_ON_STARTUP (default True)
  • CACHE_WARMUP_PRELOAD_COMMON (default True)
  • CACHE_WARMUP_AUTO_SAVE_POPULAR (default True)
  • CACHE_WARMUP_POPULARITY_THRESHOLD (default 3)
  • CACHE_WARMUP_SAVE_INTERVAL_SECONDS (default 3600) Cython Optimization (all optional)
  • CCPROXY_ENABLE_CYTHON (default True - enable Cython-compiled modules for 15-35% performance improvement)
  • CCPROXY_BUILD_CYTHON (default True - build Cython extensions during installation) Scripts create .env.example and validate env where helpful.

Run options

  • Local dev: python main.py (FastAPI with uvicorn; auto-reload per Settings.reload)
  • Production: ./run-ccproxy.sh (Uvicorn with multi-worker support; workers = CPU × 2 + 1)
  • Docker: docker build -t ccproxy:latest -f Dockerfile .; docker-compose up -d Health/metrics
  • Health: GET / (root) returns {status: ok}
  • Metrics: GET /v1/metrics; cache stats: GET /v1/cache/stats; clear caches: POST /v1/cache/clear

Big-picture architecture (Hexagonal/Clean Architecture)

Domain Layer (ccproxy/domain/)

  • Domain models and core business logic
  • ccproxy/domain/models.py: Core domain entities and data structures
  • ccproxy/domain/exceptions.py: Domain-specific exceptions and error handling

Application Layer (ccproxy/application/)

  • Use cases and application services
  • ccproxy/application/converters.py: Message format conversion between Anthropic and OpenAI (exports async converters)
  • ccproxy/application/converters_module/: Modular converter implementations with specialized processors
    • async_converter.py: AsyncMessageConverter and AsyncResponseConverter for parallel processing
    • Uses Asyncer library for improved async operations (asyncify for CPU-bound operations, anyio.create_task_group for parallel execution)
    • Optimized for high-throughput with parallel message and tool call processing
  • ccproxy/application/tokenizer.py: Advanced async-aware token counting with TTL-based cache (300s expiry); uses anyio.create_task_group for parallel token encoding with asyncified tiktoken operations; includes OpenAI request counting via count_tokens_for_openai_request for precise integration with tiktoken encoders.
  • ccproxy/application/model_selection.py: Model mapping (opus/sonnet→BIG, haiku→SMALL)
  • ccproxy/application/request_validator.py: LRU cache (10,000 capacity) with cryptographic hashing
  • ccproxy/application/response_cache.py: Response caching abstraction (delegates to cache implementations)
  • ccproxy/application/cache/: Advanced caching with circuit breaker pattern, memory management, streaming de-duplication
    • warmup.py: CacheWarmupManager for preloading popular requests and common prompts; uses anyio.Path for async file operations and parallel warmup item loading
  • ccproxy/application/error_tracker.py: Comprehensive error tracking and monitoring system with async JSON serialization and parallel redaction processing using asyncer
  • ccproxy/application/thread_pool.py: Intelligent thread pool management for CPU-bound operations
    • Auto-detects multi-worker deployment via WEB_CONCURRENCY and adjusts accordingly
    • Prevents resource exhaustion: reduces threads per worker in multi-worker mode
    • Target total threads = CPU_count × 5 (distributed across workers)
    • Single worker: up to 40 threads; Multi-worker: 4-20 threads per worker
  • ccproxy/application/type_utils.py: Type utilities and helper functions (uses Cython optimizations for type checking)

Infrastructure Layer (ccproxy/infrastructure/)

  • External service integrations and infrastructure concerns
  • ccproxy/infrastructure/providers/: Provider implementations for external services
    • base.py: ChatProvider protocol definition
    • openai_provider.py: High-performance HTTP/2 client with connection pooling (500 connections, 120s keepalive); includes circuit breaker (failure threshold=5, recovery=60s), comprehensive metrics (latency percentiles, health scoring), error tracking, adaptive timeouts, tiktoken for precise token estimation in rate limiting (via tokenizer.py), and request correlation IDs for resilience and monitoring
    • rate_limiter.py: Client-side adaptive rate limiter using sliding window (1-min tracking); supports RPM/TPM limits, auto-start, 429 backoff (80% reduction), success recovery (10% increase after 10 successes); uses asyncified list operations for non-blocking cleanup of request history; integrates with openai_provider for token estimation and release via precise count_tokens_for_openai_request for TPM accuracy.

Interface Layer (ccproxy/interfaces/)

  • External interfaces and delivery mechanisms
  • ccproxy/interfaces/http/: HTTP/REST API interface
    • app.py: FastAPI application factory and dependency injection
    • routes/: HTTP route handlers and controllers
    • streaming.py: SSE streaming for real-time responses
    • errors.py: HTTP error handling and response formatting
    • middleware.py: Request/response middleware chain
    • guardrails.py: Input validation and security guards
    • http_status.py: HTTP status code utilities
    • upstream_limits.py: Upstream service rate limiting

Cython Optimization Layer (ccproxy/_cython/)

  • High-performance Cython-compiled modules for CPU-bound operations (15-35% performance improvement)
  • ccproxy/_cython/type_checks.pyx: Optimized type checking and dispatch (30-50% improvement) - integrated
  • ccproxy/_cython/lru_ops.pyx: LRU cache operations (20-40% improvement) - integrated
  • ccproxy/_cython/cache_keys.pyx: Cache key generation (15-25% improvement) - integrated
  • ccproxy/_cython/json_ops.pyx: JSON operations (10.7x faster for size estimation) - integrated
  • ccproxy/_cython/string_ops.pyx: String and pattern matching (40-50% improvement) - integrated
  • ccproxy/_cython/serialization.pyx: Content serialization (25-35% improvement) - integrated
  • ccproxy/_cython/stream_state.pyx: SSE event formatting (20-30% improvement) - integrated
  • ccproxy/_cython/dict_ops.pyx: Dictionary operations (7.83x faster for nested key counting) - integrated
  • ccproxy/_cython/validation.pyx: Validation operations (30-40% improvement) - integrated
  • See CYTHON_INTEGRATION.md for detailed documentation and benchmarks
  • Automatic fallback to pure Python if Cython unavailable or disabled
  • Control via CCPROXY_ENABLE_CYTHON environment variable (default: enabled)

Cross-cutting Concerns

  • ccproxy/config.py: Pydantic Settings with environment validation
  • ccproxy/logging.py: Structured JSON logging with request tracing
  • ccproxy/monitoring.py: Performance metrics and health monitoring
  • ccproxy/constants.py: Global constants and configuration (includes reasoning effort model support)
  • ccproxy/enums.py: Enumeration types used across layers

Entry Points

  • main.py: Development server (uvicorn with auto-reload)
  • wsgi.py: Production ASGI application for Uvicorn
  • App factory: ccproxy/interfaces/http/app.py:create_app(Settings) provides dependency injection

Development notes for Claude Code

  • Always construct the FastAPI app through create_app(Settings); do not import globals directly
  • Thread pool automatically adjusts for multi-worker deployment to prevent resource exhaustion
  • Follow hexagonal architecture principles: domain models should not depend on external concerns
  • Application layer orchestrates use cases; infrastructure layer handles external integrations
  • When adding parameters, ensure OpenAI parity: warn or omit unsupported fields; map tool_choice carefully
  • For non-stream requests, use application/cache layer to avoid duplicate upstream calls
  • Use async converters (convert_messages_async, convert_response_async) for better performance
  • Cache warmup runs on startup when enabled, preloading common prompts and popular requests
  • Preserve UTF‑8 throughout; never assume ASCII; rely on provider handlers converting decode errors to APIError
  • Follow existing logging events (LogEvent) and avoid logging secrets; Settings controls log file path
  • Use dependency injection through the app factory for testability and loose coupling
  • Error tracking is centralized in application/error_tracker.py for comprehensive monitoring
  • Reasoning support: Implement provider-specific reasoning configurations (OpenRouter vs standard) based on base_url detection
  • Cython optimizations: Enabled by default for 15-35% performance improvement; use CCPROXY_ENABLE_CYTHON=false to disable
  • When integrating Cython modules, always provide pure Python fallback for compatibility
  • Run benchmarks to verify Cython performance gains: pytest benchmarks/ --benchmark-only
  • Run tests with uv: ./run-tests.sh or uv run pytest
  • Always run linting after changes: ./start-lint.sh --check

Testing

  • Pytest is configured via pyproject.toml (pythonpath and testpaths); tests live in tests/ (test_*.py)
  • For async tests, use pytest-anyio (migrated from pytest-asyncio); respx is available for httpx mocking
  • Test runner script: ./run-tests.sh (supports parallel execution, coverage, watch mode)
  • Comprehensive test coverage: 120+ test cases across 27 test files covering error_tracker, converters, cache, routes, async components, rate_limiter, thread_pool, cache_warmup, guardrails, streaming, and more

CI/CD and tooling

  • GitHub Actions workflows in .github/workflows/
    • ci.yml: Comprehensive CI pipeline (lint, test with/without Cython, benchmarks, Docker)
    • performance.yml: Performance regression detection on PRs
    • See .github/README.md for workflow documentation
  • Ruff and mypy configured in pyproject.toml (strict type checking enabled)
  • Mypy strict mode: disallow_untyped_defs=true, warn_return_any=true, strict_optional=true
  • Dockerfile includes production (Debian) and Alpine targets; docker-compose.yml wires healthcheck and volumes
  • start-lint.sh provides lint workflow; docker-compose-run.sh wraps common compose actions
  • scripts/test-cython-build.sh: Local verification of Cython build and fallback behavior
  • scripts/verify-cython-status.sh: Check Cython module availability and integration status

Important Instruction Reminders

  • Do what has been asked; nothing more, nothing less.
  • NEVER create files unless they're absolutely necessary for achieving your goal.
  • ALWAYS prefer editing an existing file to creating a new one.
  • NEVER proactively create documentation files (*.md) or README files. Only create documentation files if explicitly requested by the User.