TutorialApril 23, 20264 min read

How to extract metadata from a video file

FFprobe, MediaInfo, and yt-dlp — three tools that cover every format from MP4 to MKV to a YouTube URL. What each one is best at, and what you can pull out.

By Dawid Sibinski

Video metadata splits into two layers. The container (MP4, MOV, MKV, AVI) carries duration, codecs, bitrate, and any custom tags. The streams inside carry per-stream metadata: video resolution, framerate, color space, audio sample rate, language tags. Both matter, and there's a right tool for each.

FFprobe: the reference tool

Comes with FFmpeg. The one-liner that gives you everything as JSON:

ffprobe -v quiet -print_format json -show_format -show_streams video.mp4

Pipe to jq for filtering. -show_chapters adds chapter markers if present. The output covers duration, bitrate, codec name and parameters, color profile, audio channel layout — every field a media pipeline cares about.

MediaInfo: nicer output for humans

Cross-platform GUI and CLI. mediainfo --Output=JSON video.mp4 produces output similar to ffprobe but with friendlier field names. The GUI is good for one-off inspection; the CLI is good for batch.

yt-dlp: metadata from a URL without downloading

yt-dlp -j "https://youtube.com/watch?v=..." returns a single JSON blob with title, description, upload date, duration, view count, like count, channel info, and every available format/quality/codec combination. The --write-info-json flag saves it alongside a download.

Useful for pipelines that catalog videos before deciding whether to download them, or that just need YouTube/Vimeo metadata for a search index.

Python: ffmpeg-python or pymediainfo

import json, subprocess out = subprocess.check_output([ "ffprobe", "-v", "quiet", "-print_format", "json", "-show_format", "-show_streams", "video.mp4", ]) meta = json.loads(out) print(meta["format"]["duration"], meta["streams"][0]["width"])

Wrapping ffprobe in subprocess is usually simpler than the dedicated bindings. If you're processing thousands of files, ffmpeg-python's Probe interface is a bit tidier.

What's worth extracting

Duration, framerate, resolution — for catalog and search
Codec and bitrate — for transcode planning
Audio language tracks — for subtitle/dub workflows
Creation date — sometimes carried in QuickTime/MOV containers, often not in MP4
Camera make/model — present in iPhone/Android-recorded video, accessible via -show_entries stream_tags

What you can't get

Speech transcripts aren't metadata — they're content. For those, see the YouTube transcript post or run the audio through Whisper. Same for visual content: object detection, OCR on visible frames, scene changes — these all require separate processing pipelines on the actual video stream.