All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- New
retrocast transcriptioncommand group for audio-to-text transcription - Multiple backend support with automatic detection:
- MLX Whisper: Optimized for Apple Silicon (M1/M2/M3) Macs
- faster-whisper: CUDA GPU and CPU support for Linux/Windows
- Whisper model support (tiny, base, small, medium, large)
- Multiple output formats: TXT, JSON, SRT (subtitles), VTT (WebVTT)
- Content-based deduplication using SHA256 hashing
- Full-text search across transcribed content using SQLite FTS5
- Rich CLI with progress bars and colored output
retrocast transcription process PATH...- Transcribe audio filesretrocast transcription backends list- List available backends with statusretrocast transcription backends test BACKEND- Test specific backend availabilityretrocast transcription search QUERY- Full-text search across transcriptionsretrocast transcription summary- Show overall transcription statisticsretrocast transcription podcasts list- List podcasts with transcriptionsretrocast transcription podcasts summary [PODCAST]- Detailed podcast statisticsretrocast transcription episodes list- Paginated episode listing with filtersretrocast transcription episodes summary- Aggregate episode statistics
- New
transcriptionstable for transcription metadata - New
transcription_segmentstable for timestamped text segments - FTS5 full-text search enabled on segment text
- Content hash indexing for duplicate detection
- Comprehensive user guide:
docs/TRANSCRIPTION.md - Developer documentation:
docs/TRANSCRIPTION_DEVELOPER.md - Architecture diagrams and backend implementation guide
- Enhanced
pyproject.tomlwith optional transcription dependencies:transcription-mlxfor Apple Silicontranscription-cudafor NVIDIA GPUtranscription-cpufor CPU-onlytranscription-diarizationfor speaker diarization (future)
- Added poe tasks for transcription backend installation:
poe install:transcription-mlxpoe install:transcription-cudapoe install:transcription-cpu
- 126 tests passing with comprehensive coverage
- Type-checked with ty type checker
- Formatted with ruff and black
- Class-based backend architecture with abstract base classes
- Strategy pattern for backend selection
- Platform-specific dependency management via PEP 508 markers
- Core Overcast data extraction pipeline
- SQLite database storage with sqlite-utils
- Authentication with Overcast API
- OPML import/export
- Feed and episode metadata extraction
- Transcript download from podcast feeds
- Episode download database with full-text search
- HTML output generation
- Integration with podcast-archiver for downloads
- Chapter marker extraction
- Datasette compatibility