All posts
Tutorial4 min read

Extract metadata from a video file with FFprobe, MediaInfo, or yt-dlp

How to extract video metadata as JSON: duration, codecs, bitrate, resolution, framerate, chapters, stream tags, and YouTube metadata with FFprobe, MediaInfo, and yt-dlp.

By · Updated

To extract metadata from a video file, run ffprobe -v quiet -print_format json -show_format -show_streams video.mp4. The JSON includes duration, bitrate, container tags, video codec, resolution, framerate, audio codec, sample rate, channel layout, and stream-level metadata.

Video metadata splits into two layers. The container (MP4, MOV, MKV, AVI) carries duration, codecs, bitrate, and any custom tags. The streams inside carry per-stream metadata: video resolution, framerate, color space, audio sample rate, language tags. Both matter, and there's a right tool for each.

FFprobe: the reference tool

Comes with FFmpeg. The one-liner that gives you everything as JSON:

ffprobe -v quiet -print_format json -show_format -show_streams video.mp4

Pipe to jq for filtering. -show_chapters adds chapter markers if present. The output covers duration, bitrate, codec name and parameters, color profile, audio channel layout — every field a media pipeline cares about.

MediaInfo: nicer output for humans

Cross-platform GUI and CLI. mediainfo --Output=JSON video.mp4 produces output similar to ffprobe but with friendlier field names. The GUI is good for one-off inspection; the CLI is good for batch.

yt-dlp: metadata from a URL without downloading

yt-dlp -j "https://youtube.com/watch?v=..." returns a single JSON blob with title, description, upload date, duration, view count, like count, channel info, and every available format/quality/codec combination. The --write-info-json flag saves it alongside a download.

Useful for pipelines that catalog videos before deciding whether to download them, or that just need YouTube/Vimeo metadata for a search index.

Python: ffmpeg-python or pymediainfo

import json, subprocess out = subprocess.check_output([ "ffprobe", "-v", "quiet", "-print_format", "json", "-show_format", "-show_streams", "video.mp4", ]) meta = json.loads(out) print(meta["format"]["duration"], meta["streams"][0]["width"])

Wrapping ffprobe in subprocess is usually simpler than the dedicated bindings. If you're processing thousands of files, ffmpeg-python's Probe interface is a bit tidier.

What's worth extracting

  • Duration, framerate, resolution — for catalog and search
  • Codec and bitrate — for transcode planning
  • Audio language tracks — for subtitle/dub workflows
  • Creation date — sometimes carried in QuickTime/MOV containers, often not in MP4
  • Camera make/model — present in iPhone/Android-recorded video, accessible via -show_entries stream_tags

What you can't get

Speech transcripts aren't metadata — they're content. For those, see the YouTube transcript post or run the audio through Whisper. Same for visual content: object detection, OCR on visible frames, scene changes — these all require separate processing pipelines on the actual video stream.

More on tutorial

Stop reading, start extracting

Drop a PDF or image into ExtractFox and get structured data back in seconds.

Try a free extraction →