CLAUDE.mdpython
whisper-wayland CLAUDE.md
This document explains the architecture, implementation decisions, and development workflow for the Whisper Wayland voice-to-text service.
Whisper Wayland - Developer Documentation
This document explains the architecture, implementation decisions, and development workflow for the Whisper Wayland voice-to-text service.
For users: See README.md for installation instructions, usage guide, and user documentation.
Architecture Overview
Design Philosophy
Whisper Wayland follows a modular architecture with clear separation of concerns:
- Application Layer: Main orchestrator that coordinates all components
- Component Management: Manages lifecycle of audio, transcription, key monitoring, and text insertion
- Service Components: Independent modules for core functionality
- Configuration Management: Environment-based configuration with validation
Core Components
1. Application (whisper_wayland/application/)
application.py: Main orchestrator coordinating all componentscomponent_manager.py: Creates and manages component instancesruntime.py: Handles main application loop and signal managementhotkey_handler.py: Manages push-to-talk functionalitytranscription_processor.py: Processes audio data through transcription pipeline
2. Audio Recording (whisper_wayland/audio_recorder/)
audio_recorder.py: High-level audio recording interfacerecording_engine.py: Low-level PyAudio recording implementationaudio_system_validator.py: Validates audio system availabilitywav_converter.py: Converts audio frames to WAV format
3. Key Monitoring (whisper_wayland/key_monitor/)
key_monitor.py: High-level keyboard monitoring interfacemonitor_loop.py: Main monitoring loop with device handlingevent_handler.py: Processes key events and triggers callbacksdevice_manager.py: Manages input device discovery and lifecyclekey_mapping.py: Handles key combination parsing and mapping
4. Text Insertion (whisper_wayland/text_inserter/)
text_inserter.py: High-level text insertion interfacemethod_executors.py: Implements different insertion methods (wtype, ydotool, xdotool, clipboard)capability_tester.py: Tests which insertion methods are availablefallback_handler.py: Manages fallback between insertion methodstext_processor.py: Cleans and processes text for insertion
5. Transcription Client (whisper_wayland/transcription_client/)
transcription_client.py: High-level OpenAI API interfacetranscription_engine.py: Handles API communication and retry logicmodel_mapper.py: Maps model names to API parametersconnection_tester.py: Tests API connectivitytest_audio_generator.py: Generates test audio for validation
6. Configuration (whisper_wayland/config/)
config.py: Main configuration orchestratorproperty_handlers.py: Environment variable processing and validationconfig_validator.py: Configuration validation and requirements checkingenv_loader.py: Environment file loading (.env support)logging_setup.py: Logging configuration and setup
Key Design Decisions
Why Modular Architecture?
- Testability: Each component can be tested in isolation
- Maintainability: Clear boundaries make changes safer and easier
- Extensibility: New components or alternatives can be added easily
- Debugging: Issues can be traced to specific components
Why Configuration-First Design?
- Environment Variables: Follows 12-factor app principles
- No Hard-coding: All behavior can be controlled externally
- Easy Deployment: Configuration changes don't require code changes
- Testing: Different configurations can be tested independently
Why Push-to-Talk vs. Voice Activation?
- Privacy: Audio only sent when user explicitly requests it
- Accuracy: No false triggers from background noise
- Control: User has complete control over when transcription occurs
- Cost: Only pay for intentional transcriptions
Development Workflow
Repository Setup
-
Clone the repository:
git clone https://github.com/rolandtritsch/whisper-wayland.git cd whisper-wayland -
Install dependencies:
curl -LsSf https://astral.sh/uv/install.sh | sh # Install uv uv sync # Install project dependencies -
Configure environment:
cp .env.example .env # Edit .env and add your OPENAI_API_KEY
Branch Management
Branch naming convention:
- Format:
roland/<ticket-id>/<3-word-description> - If no ticket exists:
roland/ad-hoc/<3-word-description> - Examples:
roland/ISSUE-123/fix-audio-recordingroland/ad-hoc/update-dependencies
Creating a branch:
git checkout trunk
git pull origin trunk
git checkout -b roland/<ticket-id>/<3-word-description>
Pull Request Process
PR title format:
<ticket-id>: <3-word-description>- Examples:
ISSUE-123: Fix audio recordingad-hoc: Update dependencies
Creating a PR:
# Push your branch
git push -u origin roland/<ticket-id>/<3-word-description>
# Create PR with gh CLI
gh pr create --title "<ticket-id>: <3-word-description>" --body ""
Development Standards
Code Quality Requirements
- All tests must pass: Both unit and integration tests
- 80% code coverage minimum: Measured by pytest-cov
- Type checking: All code must pass mypy validation
- Code formatting: All code must pass ruff formatting and linting
Running Quality Checks
# Run all tests
make tests
# Run code quality checks
make check
# Fix formatting issues
make format-fix
Commit Guidelines
- Commit often: Small, focused commits are preferred
- Descriptive messages: Explain the "why", not just the "what"
- Test before commit: Ensure tests pass before each commit
Testing Strategy
Test Structure
tests/
├── unit/ # Fast, isolated tests with mocks
├── integration/ # End-to-end tests with real dependencies
└── conftest.py # Shared test configuration and fixtures
Testing Principles
- Unit tests: Mocked, no external dependencies, test single components
- Integration tests: End-to-end, real dependencies, test complete workflows
- Test isolation: Each test should be independent and repeatable
- Mock reuse: Common mocks defined in
tests/unit/conftest.py
Running Tests
# Run all tests
make tests
# Run only unit tests
make tests-unit
# Run only integration tests
make tests-integration
# Run with coverage report
uv run pytest --cov=whisper_wayland --cov-report=html
Merging and Cleanup
Merge process:
# Ensure your PR is up to date
git checkout trunk
git pull origin trunk
git checkout your-branch
git rebase trunk
# Push updates
git push --force-with-lease
# Squash merge via GitHub UI or gh CLI
gh pr merge --squash --delete-branch
Post-merge cleanup:
git checkout trunk
git pull origin trunk
git branch -d your-branch-name
Common Development Tasks
Adding New Configuration
- Add property method to
config/property_handlers.py - Add property to main
config/config.py - Add to
.env.examplewith documentation - Add tests in
tests/unit/test_config.py - Update README.md configuration table if user-facing
Adding New Component
- Create new module directory under
whisper_wayland/ - Add
__init__.pywith public interface exports - Implement component following existing patterns
- Add to
application/component_manager.pyif needed - Add unit tests in
tests/unit/ - Add integration tests if component has external dependencies
Debugging Tips
- Enable debug logging:
LOG_LEVEL=DEBUG uv run whisper-wayland - Component isolation: Test individual components in isolation
- Mock external services: Use mocks to isolate issues
- Check configuration: Verify environment variables are correct
Architecture Trade-offs
Current Limitations
- Linux/Wayland only: Designed specifically for modern Linux desktops
- OpenAI dependency: Requires internet and OpenAI API access
- Python performance: Not optimized for absolute minimum latency
- Single hotkey: Only supports one global hotkey combination
Future Extensibility Points
- Multiple transcription providers: Architecture supports pluggable transcription backends
- Additional text insertion methods: New insertion methods can be added easily
- Custom hotkey combinations: Framework supports complex hotkey patterns
- Local transcription: Could add local Whisper model support
Project Structure
whisper-wayland/
├── whisper_wayland/ # Main package
│ ├── application/ # Application orchestration
│ ├── audio_recorder/ # Audio recording components
│ ├── key_monitor/ # Keyboard monitoring
│ ├── text_inserter/ # Text insertion methods
│ ├── transcription_client/ # OpenAI API integration
│ ├── config/ # Configuration management
│ ├── constants.py # Application constants
│ └── main.py # Entry point
├── tests/ # Test suite
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ └── conftest.py # Test configuration
├── .env.example # Environment template
├── pyproject.toml # Project configuration
├── Makefile # Development commands
├── README.md # User documentation
└── CLAUDE.md # Developer documentation (this file)
Contributing Guidelines
- Follow the workflow: Use the branch naming and PR process described above
- Write tests: All new functionality must include appropriate tests
- Document changes: Update documentation for user-facing changes
- Review thoroughly: PRs require passing CI and code review
- Keep PRs focused: One feature or fix per PR
- Communicate: Use GitHub issues and discussions for planning
For users: See README.md for installation instructions and usage guide.