Docs / Skills Reference / Media Processing
Media Processing What it does Processes video, audio, and image files through a multi-phase pipeline — ingest, analyze with AI (Gemini for vision, Claude for reasoning), and generate clips or summaries.
Setup required Requires Gemini API key for visual analysis.
Permissions Gemini API key required for keyframe/video analysis File access permissions for media files Common prompts You say... What happens “Analyze this video and tell me what happens” Full video analysis pipeline “Extract the key moments from this recording” Keyframe extraction and analysis “Find the part where they discuss pricing” Query-based video search “Generate a 30-second clip of the product demo” Video clip extraction “Transcribe and analyze this podcast episode” Audio processing
Configuration Three-phase pipeline: preprocess (ingest, deduplicate), map (Gemini-powered visual analysis), reduce (Claude-powered reasoning) Supports keyframe extraction, dead time detection, and cost tracking Resumable if interrupted Tips & gotchas Automatic chunking. Large media files are handled automatically — video is split into keyframes or chunks.Cost tracking. Shows you how much API usage each analysis requires.Resumable. If processing is interrupted, it picks up where it left off.Simple transcription? For transcription without visual analysis, use the Transcribe skill instead.