Name: faster-whisper
Author: cmdop

Overview

Local speech-to-text using faster-whisper, a CTranslate2 reimplementation of OpenAI's Whisper, for 4-6x faster transcription with identical accuracy. With GPU acceleration, expect ~20x realtime transcription.

Key Features

Transcribe audio/video files
Generate subtitles (SRT, VTT, ASS, LRC, TTML)
Identify speakers (diarization labels)
Transcribe from URLs (YouTube links and direct audio URLs)
Batch process files (glob patterns, directories, skip-existing support)

How It Works

This skill uses faster-whisper to transcribe audio files locally, without relying on API costs or online services. It supports 99+ languages, auto-detection, and multilingual transcription.

Use Cases

Transcribe meetings, interviews, podcasts, lectures, and YouTube videos
Generate subtitles for broadcast-standard formats
Identify speakers in audio files
Transcribe podcast feeds and YouTube links
Batch process files with ETA shown automatically
Transcribe audio with specific terms or jargon-heavy content
Preprocess noisy audio before transcription
Stream output and clip time ranges
Search the transcript and detect chapters
Export speaker audio and spreadsheet output

faster-whisper

Overview

Key Features

How It Works

Use Cases

Reviews