CLAUDE.mdpython

pr-summary CLAUDE.md

This document captures the design decisions, architecture, and development process for the PR Summary GitHub Action.

View Source

CLAUDE.md - Development Session Documentation

This document captures the design decisions, architecture, and development process for the PR Summary GitHub Action.

Project Overview

Goal: Create a GitHub Action that captures comprehensive PR activity (comments, reviews, checks, commits) and embeds it into the git repository as git notes, preserving GitHub collaboration history directly in git.

Implementation: Python 3.11+ with full type annotations, using git notes for storage.

Design Decisions

1. Storage Strategy: Git Notes

Decision: Use git notes (refs/notes/commits) instead of:

  • Modifying commit messages (irreversible, pollutes history)
  • Creating separate metadata files (clutters repository)

Rationale:

  • Git notes are separate from commits, preserving history integrity
  • Notes can be pushed/fetched independently
  • Easy to view with git notes show <sha>
  • Can be configured to display automatically in git log
  • Notes namespace prevents conflicts with other tools

Trade-offs:

  • Requires explicit fetch: git fetch origin refs/notes/commits:refs/notes/commits
  • Less discoverable than commit messages
  • Not all git UIs display notes by default

2. Implementation Language: Python with Type Annotations

Decision: Python 3.11+ with full type annotations throughout

Rationale:

  • Native GitHub Actions support
  • Excellent libraries (requests, subprocess)
  • Type safety via mypy reduces bugs
  • Readable and maintainable
  • Good datetime handling

Type Safety Examples:

def get_pull_request(self, pr_number: int) -> dict[str, Any]:
def collect_all_activity(self, pr_number: int) -> PRActivity:

3. Architecture: Separation of Concerns

Modules:

  1. models.py - Pure data models

    • No business logic
    • Computed properties only
    • Dataclasses with type annotations
  2. github_client.py - GitHub API interaction

    • Handles authentication, rate limiting, pagination
    • Retry logic with exponential backoff
    • Returns raw API responses
  3. collector.py - Data transformation

    • Converts API responses → typed models
    • Orchestrates multiple API calls
    • Extracts linked issues from PR body
  4. formatter.py - Markdown generation

    • Pure function: PRActivity → markdown string
    • Configurable (truncation, patches, etc.)
    • No side effects
  5. git_notes.py - Git operations

    • Encapsulates all git commands
    • Subprocess management
    • Error handling
  6. utils.py - Shared utilities

    • Environment variable handling
    • Input validation
    • GitHub Actions helpers
  7. main.py - Orchestration

    • Reads environment variables
    • Coordinates all modules
    • Exit codes for CI/CD

4. Error Handling Strategy

Layered approach:

  1. Custom Exceptions:

    GitHubAPIError (base)
    ├── RateLimitError
    └── AuthenticationError
    
    GitNotesError
    
  2. Retry Logic:

    • Network errors: 3 retries with backoff
    • Rate limits: Detected and reported
    • Transient failures: Automatic retry
  3. Graceful Degradation:

    • Missing check runs: Continue with empty list
    • Missing comments: Continue with partial data
    • Log warnings, don't fail
  4. Exit Codes:

    • 0: Success
    • 1: Any error (with specific error messages)

5. GitHub Actions Token Scopes

Critical Fix: GitHub Actions GITHUB_TOKEN doesn't have user scope.

Solution: Changed authentication validation from:

# ❌ Fails with GITHUB_TOKEN
response = self._make_request("/user")

To:

# ✅ Works with GITHUB_TOKEN
response = self._make_request(f"/repos/{owner}/{repo}")

Lesson: Always test with actual GitHub Actions tokens, not personal access tokens.

Architecture Diagrams

Data Flow

GitHub PR Merge
    ↓
GitHub Actions Trigger
    ↓
main.py
    ↓
┌─────────────────┬──────────────────┬────────────────┐
│                 │                  │                │
│ GitHubClient    │  PRActivityCollector  │  GitNotesManager
│                 │                  │                │
│ API Calls →     │  Transform →     │  Store →      │
│ Raw JSON        │  Typed Models    │  Git Notes    │
│                 │                  │                │
└─────────────────┴──────────────────┴────────────────┘
         ↓                  ↓                 ↓
    Pagination         PRActivity         refs/notes/commits
    Rate Limits        Validation
    Retry Logic        Statistics

Module Dependencies

main.py
├── github_client.py (no dependencies)
├── collector.py
│   ├── github_client.py
│   └── models.py
├── formatter.py
│   └── models.py
├── git_notes.py (no dependencies)
└── utils.py (no dependencies)

Clean dependency graph with no circular dependencies.

API Endpoints Used

GitHub REST API v2022-11-28:

# PR Data
GET /repos/{owner}/{repo}/pulls/{pr_number}
GET /repos/{owner}/{repo}/pulls/{pr_number}/commits
GET /repos/{owner}/{repo}/pulls/{pr_number}/files
GET /repos/{owner}/{repo}/pulls/{pr_number}/comments  # Review comments
GET /repos/{owner}/{repo}/issues/{pr_number}/comments  # Conversation
GET /repos/{owner}/{repo}/pulls/{pr_number}/reviews

# Check Runs
GET /repos/{owner}/{repo}/commits/{sha}/check-runs
GET /repos/{owner}/{repo}/commits/{sha}/status

# Repository (for auth validation)
GET /repos/{owner}/{repo}

Rate Limits:

  • Authenticated: 5000 requests/hour
  • Monitor via X-RateLimit-Remaining header
  • Warn when < 10 requests remaining

Testing Strategy

Test Coverage: 80%

100 tests across 7 test files:

  1. test_models.py (15 tests)

    • Data model creation
    • Computed properties
    • Filter methods
    • Statistics calculation
  2. test_utils.py (22 tests)

    • Environment variable handling
    • Input validation
    • GitHub Actions annotations
    • File size formatting
  3. test_github_client.py (18 tests)

    • Successful requests
    • Error handling (401, 403, 429, 500)
    • Pagination
    • Rate limit detection
    • Retry logic
  4. test_collector.py (11 tests)

    • Data collection orchestration
    • API response transformation
    • Issue extraction regex
    • Datetime parsing
  5. test_formatter.py (16 tests)

    • Markdown generation
    • Section formatting
    • Truncation
    • Empty section handling
  6. test_git_notes.py (18 tests)

    • Add/get/remove/list notes
    • Error handling
    • Git user configuration
    • Push/fetch operations (mocked)

Testing Tools:

  • pytest: Test runner
  • pytest-cov: Coverage reporting
  • pytest-mock: Mocking
  • fixtures: Shared test data in conftest.py

Key Testing Patterns:

  1. Mocking GitHub API:

    @pytest.fixture
    def mock_github_client(mock_pr_data, mock_commits_data):
        client = Mock(spec=GitHubClient)
        client.get_pull_request.return_value = mock_pr_data
        return client
    
  2. Temporary Git Repos:

    @pytest.fixture
    def temp_git_repo(tmp_path):
        # Creates isolated git repo for testing
        subprocess.run(["git", "init"], cwd=tmp_path)
        return tmp_path
    
  3. Shared Fixtures:

    • All mock data centralized in conftest.py
    • Consistent test data across all tests
    • Easy to extend

Development Process

Session Flow

  1. Planning Phase

    • Discussed storage options (git notes vs commits vs files)
    • Designed module structure
    • Identified GitHub API endpoints
    • Planned data models
  2. Implementation Phase

    • Models first (pure data, no dependencies)
    • GitHub client (external boundary)
    • Collector (data transformation)
    • Formatter (presentation)
    • Git notes manager (storage)
    • Main orchestration
    • Utils and helpers
  3. Testing Phase

    • Set up pytest configuration
    • Created shared fixtures
    • Wrote comprehensive unit tests
    • Fixed bugs found during testing
    • Achieved 80% coverage
  4. Deployment Phase

    • Created GitHub repository
    • Published as GitHub Action
    • Fixed authentication issue with GITHUB_TOKEN
    • Tested on real PR

Bugs Fixed During Development

  1. Authentication Failure with GITHUB_TOKEN

    • Issue: /user endpoint requires user scope
    • Fix: Use /repos/{owner}/{repo} for auth validation
    • Lesson: Test with actual GitHub Actions tokens
  2. Git Notes Error Message Variation

    • Issue: Git returns "has no note" vs "no note found"
    • Fix: Check for "no note" substring instead of exact match
    • Lesson: Don't assume exact error message format
  3. Mock Response Headers

    • Issue: Mock objects returned as header values
    • Fix: Always specify mock response headers as dicts
    • Lesson: Be explicit with mock data

Usage

As a GitHub Action (Recommended)

name: PR Summary to Git Notes

on:
  pull_request:
    types: [closed]

permissions:
  contents: write
  pull-requests: read
  checks: read

jobs:
  summarize:
    if: github.event.pull_request.merged == true
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: yan/pr-summary@v1
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          pr-number: ${{ github.event.pull_request.number }}

Locally (for testing)

cd ~/projects/pr-summary

# Install dependencies
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt

# Set environment variables
export GITHUB_TOKEN="your_token"
export PR_NUMBER="123"
export GITHUB_REPOSITORY="owner/repo"
export MERGE_COMMIT_SHA="abc123..."

# Run
.venv/bin/python -m src.main

# View the note
git notes --ref=refs/notes/commits show abc123...

Running Tests

cd ~/projects/pr-summary

# Install dev dependencies
.venv/bin/pip install -r requirements-dev.txt

# Run all tests
.venv/bin/pytest -v

# Run with coverage
.venv/bin/pytest --cov=src --cov-report=html

# Run specific test file
.venv/bin/pytest tests/test_models.py -v

Environment Variables

Required

  • GITHUB_TOKEN: GitHub authentication token
  • PR_NUMBER: Pull request number
  • GITHUB_REPOSITORY: Repository in format "owner/repo" (or use REPO_OWNER + REPO_NAME)

Optional

  • MERGE_COMMIT_SHA: Merge commit SHA (auto-detected if not provided)
  • REPO_PATH: Path to git repository (default: ".")
  • NOTES_REF: Git notes reference (default: "refs/notes/commits")
  • REMOTE: Git remote name (default: "origin")
  • PUSH_NOTES: Whether to push notes (default: "true")
  • LOG_LEVEL: Logging level (default: "INFO")

Output Format

The generated summary is structured markdown:

# 🟣 PR #123: Feature Title

## Metadata
- Author, dates, labels, linked issues, participants

## Description
PR description body

## Commits (N)
List of commits with authors

## File Changes (N)
Grouped by: Added, Modified, Removed, Renamed
With line counts

## Reviews (N)
Grouped by: Approved, Changes Requested, Commented

## Discussion (N comments)
### Conversation
PR-level comments

### Code Review Comments
Inline comments with file:line

## Checks (N)
Grouped by: Successful, Failed, Other
With durations

---
*Summary generated for PR #123 • stats*

Future Enhancements

Potential Improvements

  1. Parallel API Calls

    • Use asyncio or concurrent.futures
    • Fetch all endpoints simultaneously
    • Reduce total execution time
  2. Incremental Updates

    • Check if note already exists
    • Only update if PR has new activity
    • Useful for re-runs
  3. Custom Templates

    • User-provided Jinja2 templates
    • Different formats (JSON, YAML, custom markdown)
  4. Filtering Options

    • Exclude bots
    • Minimum comment length
    • Specific file patterns
  5. Integration Tests

    • Test against real GitHub API (with VCR.py for recording)
    • End-to-end workflow tests
    • Docker-based testing
  6. Performance Optimization

    • Cache API responses
    • Incremental data collection
    • Batch operations

Lessons Learned

  1. Type Annotations Are Invaluable

    • Caught bugs before runtime
    • Made refactoring safer
    • Improved IDE autocomplete
  2. Separate Data from Logic

    • Models are pure data (easy to test)
    • Business logic in separate modules
    • Clear dependency graph
  3. Test with Real Tokens

    • Personal Access Tokens ≠ GITHUB_TOKEN
    • Different scopes, different behaviors
    • Always test in actual environment
  4. Git Notes Are Powerful

    • Underutilized feature
    • Perfect for metadata storage
    • Non-invasive to history
  5. Error Messages Vary

    • Don't rely on exact error strings
    • Use substring matching
    • Handle multiple error formats

Maintenance

Adding a New Data Field

  1. Add to data model in models.py
  2. Update collector to extract from API in collector.py
  3. Update formatter to display in formatter.py
  4. Add tests in corresponding test file
  5. Update documentation

Changing Output Format

  1. Modify formatter.py methods
  2. Add configuration options if needed
  3. Update tests in test_formatter.py
  4. Update example output in README.md

Adding New API Endpoints

  1. Add method to github_client.py
  2. Add to collector orchestration
  3. Update data models if needed
  4. Add tests with mocked responses

Contact & Contributing

This project was developed in a single Claude session. For questions or contributions:

  1. Check existing issues on GitHub
  2. Review this CLAUDE.md for design rationale
  3. Ensure tests pass before submitting PRs
  4. Follow existing code style (type annotations, docstrings)

Project Stats

  • Language: Python 3.11+
  • Lines of Code: ~800 (excluding tests)
  • Test Coverage: 80%
  • Tests: 100 tests, all passing
  • Dependencies: requests, python-dateutil
  • Dev Dependencies: pytest, pytest-cov, pytest-mock, mypy

Development Time: Single Claude session (~2-3 hours)

Files:

  • 14 source files
  • 7 test files
  • 3 configuration files
  • 2 workflow examples
  • Full documentation (README.md + CLAUDE.md)