CLAUDE.mdpython
VoiceBridge CLAUDE.md
This file provides guidance to Claude Code when working with the VoiceBridge project.
CLAUDE.md - VoiceBridge Project
This file provides guidance to Claude Code when working with the VoiceBridge project.
Project Overview
VoiceBridge is a comprehensive bidirectional voice-text CLI tool that bridges speech and text seamlessly. Built on OpenAI's Whisper for speech recognition and VibeVoice for text-to-speech synthesis, with advanced features including:
Core Features
- Speech-to-Text (STT): Real-time transcription, file processing, batch operations
- Text-to-Speech (TTS): High-quality voice synthesis with custom voices
- GPU acceleration (CUDA/Metal) with automatic device detection
- Memory optimization and streaming for large audio files
- Resume capability for interrupted transcriptions
- Performance monitoring and session management
- Audio processing: noise reduction, normalization, splitting, enhancement
- Export formats: JSON, SRT, VTT, plain text, CSV
- Hotkey support for hands-free operation
- Hexagonal architecture with ports and adapters pattern
Development Setup
Virtual Environment & Dependencies
IMPORTANT: This project uses uv for fast Python package management and a virtual environment at .venv/. Always use the Makefile commands or uv run for operations.
# Initialize environment and install all dependencies
make prepare
# CUDA support (for GPU acceleration)
make prepare-cuda
# System tray support (optional)
make prepare-tray
# Manual uv setup (if needed)
uv venv .venv
uv pip install --editable ".[dev]"
Key Commands
make help # Show all available commands
make prepare # Initialize .venv with uv and install dependencies
make prepare-cuda # Initialize .venv with CUDA support
make prepare-tray # Initialize .venv with system tray support
make lint # Run ruff linting and auto-fix issues
make test # Run all tests with coverage report
make test-fast # Run tests without coverage
make clean # Clean up cache files and .venv
Manual Commands (using uv directly)
# Always use uv run for any manual commands
uv run ruff check --fix . # Linting
uv run pytest # Testing
uv run python -m voicebridge --help # Run CLI
uv pip install package-name # Install packages
Architecture
Directory Structure
voicebridge/
├── domain/ # Core business logic and models
│ └── models.py # Data models (WhisperConfig, TTSConfig, GPUInfo, etc.)
├── ports/ # Interfaces/abstract base classes
│ └── interfaces.py
├── adapters/ # External integrations
│ ├── audio/ # Audio recording, playback, processing
│ ├── system.py # GPU detection, memory monitoring
│ ├── transcription.py # Whisper service implementation
│ ├── vibevoice_tts.py # VibeVoice TTS implementation
│ ├── session.py # Session persistence
│ └── config.py # Configuration management
├── services/ # Application services
│ ├── transcription_service.py # STT orchestration
│ ├── tts_service.py # TTS orchestration and daemon
│ ├── performance_service.py # Performance monitoring
│ ├── batch_service.py # Batch processing
│ ├── export_service.py # Export functionality
│ ├── confidence_service.py # Quality analysis
│ └── resume_service.py # Resume functionality
├── cli/ # Command line interface
│ ├── app.py # Main CLI app with Typer
│ └── commands.py # Command implementations
└── tests/ # Test suite
Key Features Implemented
Speech-to-Text (STT)
- Real-time Transcription: Hotkey-driven live speech recognition
- File Processing: Support for MP3, WAV, M4A, FLAC, OGG formats
- Batch Processing: Directory-wide transcription with parallel workers
- GPU Acceleration: Automatic detection and selection of CUDA/Metal devices
- Memory Optimization: Chunked processing and memory limit enforcement
- Resume Capability: Session persistence for interrupted transcriptions
- Export Formats: JSON, SRT, VTT, plain text, CSV output
Text-to-Speech (TTS)
- VibeVoice Integration: High-quality neural voice synthesis
- Multiple Input Modes: Clipboard monitoring, text selection, direct input
- Custom Voices: Voice sample detection and management
- Streaming/Non-streaming: Real-time or complete generation modes
- Hotkey Controls: Global shortcuts for hands-free operation
- Audio Output: Play immediately, save to file, or both
Advanced Processing
- Audio Enhancement: Noise reduction, normalization, silence trimming
- Audio Splitting: Duration, silence, or size-based segmentation
- Confidence Analysis: Quality assessment and review flagging
- Performance Monitoring: Comprehensive metrics collection and reporting
- Session Management: Progress tracking and resume functionality
- Profile Management: Multiple configuration profiles
- Webhook Integration: External notification support
Code Standards
- Architecture: Hexagonal/Ports & Adapters pattern
- Python Version: 3.10+
- Linting: ruff with auto-fix
- Testing: pytest with coverage reporting
- Type Hints: Required for all public interfaces
CLI Usage
Speech-to-Text Commands
# Real-time transcription with hotkeys
uv run python -m voicebridge listen
uv run python -m voicebridge hotkey --key f9 --mode toggle
# File transcription
uv run python -m voicebridge transcribe audio.mp3 --output transcript.txt
uv run python -m voicebridge batch-transcribe /path/to/audio/ --workers 4
# Resumable transcription for long files
uv run python -m voicebridge listen-resumable audio.wav --session-name "my-session"
# Real-time streaming transcription
uv run python -m voicebridge realtime --chunk-duration 2.0 --output-format live
Text-to-Speech Commands
# Generate speech from text
uv run python -m voicebridge tts generate "Hello, this is VoiceBridge!"
uv run python -m voicebridge tts generate "Text here" --voice en-Alice_woman --output audio.wav
# Clipboard and selection monitoring
uv run python -m voicebridge tts listen-clipboard --streaming
uv run python -m voicebridge tts listen-selection
# TTS daemon mode
uv run python -m voicebridge tts daemon start --mode clipboard
uv run python -m voicebridge tts daemon status
uv run python -m voicebridge tts daemon stop
# Voice management
uv run python -m voicebridge tts voices
uv run python -m voicebridge tts config show
Audio Processing Commands
# Audio information and formats
uv run python -m voicebridge audio info audio.mp3
uv run python -m voicebridge audio formats
# Audio enhancement and splitting
uv run python -m voicebridge audio preprocess input.wav output.wav --noise-reduction 0.8
uv run python -m voicebridge audio split large_file.mp3 --method duration --chunk-duration 300
System and Performance Commands
# GPU status and benchmarking
uv run python -m voicebridge gpu status
uv run python -m voicebridge gpu benchmark --model base
# Performance monitoring
uv run python -m voicebridge performance stats
# Session management
uv run python -m voicebridge sessions list
uv run python -m voicebridge sessions resume --session-id <id>
uv run python -m voicebridge sessions cleanup
Export and Analysis Commands
# Export transcriptions
uv run python -m voicebridge export session <session-id> --format srt
uv run python -m voicebridge export batch --format json --output-dir results/
# Confidence analysis
uv run python -m voicebridge confidence analyze <session-id> --detailed
uv run python -m voicebridge confidence analyze-all --threshold 0.7
Configuration Commands
# General configuration
uv run python -m voicebridge config --show
uv run python -m voicebridge config --set-key use_gpu --value true
# Profile management
uv run python -m voicebridge profile save my-profile
uv run python -m voicebridge profile load my-profile
uv run python -m voicebridge profile list
# TTS configuration
uv run python -m voicebridge tts config set --default-voice en-Alice_woman
uv run python -m voicebridge tts config set --cfg-scale 1.5
Development Workflow
- Setup:
make prepare - Development: Edit code, use
make lintfrequently - Testing:
make testormake test-fast - Before Commit: Ensure
make lintandmake testboth pass
Important Notes
Development
- Always use
uv runor.venv/bin/pythonfor Python commands - Use
make lintto auto-fix most style issues - Use
make testto run full test suite with coverage - Follow hexagonal architecture patterns for new features
- Add comprehensive tests for both STT and TTS functionality
System Requirements
- Python 3.10+ for modern type hints and async support
- FFmpeg for audio processing and format conversion
- GPU support: CUDA (NVIDIA) and Metal (Apple Silicon) detection
- Audio dependencies: pygame, pyaudio for playback
- Input handling: pyperclip, pynput for clipboard and hotkeys
Configuration
- Main config stored in
~/.config/voicebridge/ - Session files stored in local
sessions/directory - TTS voice samples in
demo/voices/or configured directory - Performance metrics kept in memory (last 1000 operations)
- Profile-based configuration for different use cases
TTS Setup
- VibeVoice model:
WestZhang/VibeVoice-Large-pt - Voice samples: 3-10 second WAV files, 24kHz recommended
- Naming convention:
language-name_gender.wav(e.g.,en-Alice_woman.wav) - Run
python setup_tts.pyfor guided TTS setup
Hotkeys
- Default STT hotkey: F9 (start/stop recording)
- Default TTS hotkey: F12 (generate speech from clipboard/selection)
- TTS stop hotkey: Ctrl+Alt+S (stop current generation)
- Hotkeys work globally across all applications