Batch Downloading and Transcribing Podcast Episodes

Aug. 5, 2025

I recently wanted to collect and peruse the transcripts of some older podcast episodes. While Pocket Casts, my go-to podcatcher, supports bulk downloading of audio, exporting the files for further processing is an extra, cumbersome step. This led me to look for a command-line workflow to automate the task.

After evaluating a few open-source projects, I settled on poddl. It’s been consistently updated and has no unnecessary features.

>> Downloading Episodes

The developer of poddl provides pre-compiled binaries for Windows and Linux. On macOS, you need to compile it from source, which is likely because the developer offers a separate paid Mac application (a fair approach).

First, clone the repository and build the executable:

# Clone the source code
git clone https://github.com/freshe/poddl.git
cd poddl

# Compile using g++, linking the curl library
g++ *.cpp -O2 -std=c++11 -lcurl -o poddl

# Make the output file executable
chmod +x poddl

Next, you need the podcast’s RSS feed URL. The quickest way to find this is to use the export feature in your podcatcher of choice, which typically generates an OPML file. Inside this file, find the xmlUrl attribute for the podcast you want.

With the URL, you can list all available episodes. The following command lists them in reverse chronological order (-r):

./poddl URL -l -r

The output will be a numbered list of episodes:

...
Fetching URL: https://example.org/podcast.xml
Listing 12 files

[12] S02 Episode 4
[11] S02 Episode 3
[10] S02 Episode 2
[9] S02 Episode 1
...

Once you’ve identified the episodes to download, use the indices (-n) to grab them. For example, this command downloads episodes 9 through 12 into a specified directory (-o):

./poddl URL -n 9-12 -o ~/Downloads

Drop -n if you want to download all episodes.

(poddl names files based on the <title> tag in the RSS feed. If the podcast you’re downloading doesn’t number its episodes, you can use -z to automatically prefix filenames with a zero-padded index. For a full list of available options, see the project documentation.)

A limitation of poddl is that it downloads files sequentially. If you’re in a hurry, you can open multiple terminal windows and run the command with different episode ranges in parallel.

>> Transcribing Audio to Text with Whisper

With the audio files ready, the next step is transcription. For batch processing, I used mlx-whisper, an implementation of OpenAI’s Whisper model optimized for Apple Silicon, which is reportedly 30–40% faster than the original.

First, install the mlx-whisper package. I prefer uv, a faster and more robust alternative to pip:

# Using uv
uv tool install mlx-whisper
# Or with pip
# pip install mlx-whisper

Then, navigate to the directory containing your audio files and run the following command to processes all MP3 files in the current directory:

mlx_whisper --model mlx-community/whisper-large-v3-turbo *.mp3

(Note that the name of the executable is different from the name of the project — there’s an underscore instead of a hyphen.)

Here, --model mlx-community/whisper-large-v3-turbo specifies the model. You can find more models in the MLX Community’s Hugging Face collection. The first time you run a transcription, the model will be downloaded and cached to ~/.cache/huggingface/hub.

(Optionally, use -f FORMAT to select an output format other than the default txt, which can be either srt, vtt, or json.)

On my MacBook Air (M2), transcribing a 48-minute episode using this model took 4 minutes and 56 seconds (9.8x). To estimate the total time for a large batch, you can calculate the total duration of your audio files in seconds with ffprobe (included in the ffmpeg package):

for f in *.mp3; do
    ffprobe -v error \ # suppressing all output except errors
            -show_entries format=duration \
            -of default=noprint_wrappers=1:nokey=1 \ # keeping only the raw number, no labels or formatting
            "$f"
done | awk '{
    sum += $1
} END {
    print sum
}'

>> Refining the Transcript with LLM

The raw text from Whisper is functional but often contains filler words, repetitions, and other conversational artifacts. A final pass with an LLM can clean this up effectively. With recent models, a simple prompt is usually sufficient:

Format the provided audio transcript into a clean, well-paragraphed written text. You may remove filler words and repetitions or correct grammatical errors, but do not otherwise summarize or delete content.

Once you have a collection of clean transcripts, you can drop them into a summarizer tool like NotebookLM. This allows you to search, ask questions, and generate summaries across the entire archive.