How to extract metadata from a video file
FFprobe, MediaInfo, and yt-dlp — three tools that cover every format from MP4 to MKV to a YouTube URL. What each one is best at, and what you can pull out.
Video metadata splits into two layers. The container (MP4, MOV, MKV, AVI) carries duration, codecs, bitrate, and any custom tags. The streams inside carry per-stream metadata: video resolution, framerate, color space, audio sample rate, language tags. Both matter, and there's a right tool for each.
FFprobe: the reference tool
Comes with FFmpeg. The one-liner that gives you everything as JSON:
ffprobe -v quiet -print_format json -show_format -show_streams video.mp4
Pipe to jq for filtering. -show_chapters adds chapter markers if present. The output covers duration, bitrate, codec name and parameters, color profile, audio channel layout — every field a media pipeline cares about.
MediaInfo: nicer output for humans
Cross-platform GUI and CLI. mediainfo --Output=JSON video.mp4 produces output similar to ffprobe but with friendlier field names. The GUI is good for one-off inspection; the CLI is good for batch.
yt-dlp: metadata from a URL without downloading
yt-dlp -j "https://youtube.com/watch?v=..." returns a single JSON blob with title, description, upload date, duration, view count, like count, channel info, and every available format/quality/codec combination. The --write-info-json flag saves it alongside a download.
Useful for pipelines that catalog videos before deciding whether to download them, or that just need YouTube/Vimeo metadata for a search index.
Python: ffmpeg-python or pymediainfo
import json, subprocess out = subprocess.check_output([ "ffprobe", "-v", "quiet", "-print_format", "json", "-show_format", "-show_streams", "video.mp4", ]) meta = json.loads(out) print(meta["format"]["duration"], meta["streams"][0]["width"])
Wrapping ffprobe in subprocess is usually simpler than the dedicated bindings. If you're processing thousands of files, ffmpeg-python's Probe interface is a bit tidier.
What's worth extracting
- Duration, framerate, resolution — for catalog and search
- Codec and bitrate — for transcode planning
- Audio language tracks — for subtitle/dub workflows
- Creation date — sometimes carried in QuickTime/MOV containers, often not in MP4
- Camera make/model — present in iPhone/Android-recorded video, accessible via -show_entries stream_tags
What you can't get
Speech transcripts aren't metadata — they're content. For those, see the YouTube transcript post or run the audio through Whisper. Same for visual content: object detection, OCR on visible frames, scene changes — these all require separate processing pipelines on the actual video stream.