YouTube Transcript MCP pulls captions from a YouTube URL and returns them as plain text the agent can reason about. Works with auto-generated and uploaded captions, supports any language YouTube has captioned, and skips the “I’d need to download the video first” friction.
What it produces: a single tool call that returns timestamped or plain-text transcripts for a YouTube video. Supports language selection — fetches fr if available, falls back to en, then to auto-generated.
Best for: turning a 60-minute interview into a structured summary in 30 seconds. Pulling quotes from competitor podcasts, briefing yourself on a 90-minute keynote before a call, repurposing your own video content into blog posts or threads.
Skip if: the video has no captions (lecture-style content with no upload + no auto-cap). For those, you’d need Whisper or a paid transcription API — different tool.
Setup gotchas: zero-config install. The catch: YouTube auto-captions are noisy (“um”, missing punctuation, occasional word salad). Don’t paste them straight into a polished blog post — always have the model clean them up first (“rewrite this transcript as readable prose, fix punctuation, keep the speaker’s voice”).
Real-world workflow: every Anthropic / OpenAI announcement video gets fed in within hours. Agent extracts the 3-5 key claims with timestamps, drafts a Twitter thread, drafts a 500-word blog post. 10 minutes of work vs. 60 minutes of video-watching + writing.
Compatible alternatives: Fetch MCP for the article that usually accompanies a launch video, Firecrawl MCP for the surrounding announcement post.
Use it for first-draft repurposing, never for final copy without editing.