All posts
TutorialApril 24, 20265 min read

How to extract images from a website (or a URL)

Browser tools, wget, gallery-dl, and the legal lines around scraping images from sites like Instagram, Pinterest, and stock photography. What's safe, what's gray, and what to skip.

By Dawid Sibinski

Scraping images from a website is a one-liner technically and a minefield legally. The technical part is well-trodden; the legal and ethical part decides whether you should run the one-liner.

One image: browser tools

Right-click → Save image as — the simplest case. For images loaded as CSS backgrounds or via canvas (where right-click won't surface a file), open DevTools → Network → filter by Img, find the request, right-click → Open in new tab → save.

Every image on a page

Browser extensions (Image Downloader, DownThemAll) walk the DOM, collect every <img> source, and offer a multi-select download. Quick, no install beyond the extension.

Command line: wget -r -l 1 -A jpg,jpeg,png,webp -nd -P out/ https://example.com/page recursively pulls images linked from the page into out/. -l 1 limits recursion depth, -A filters by extension, -nd flattens the directory structure.

Galleries and social platforms: gallery-dl

gallery-dl is the canonical tool for downloading image sets from sites with structured galleries — Imgur, Flickr, Pinterest, Reddit, DeviantArt, plenty of others. Reads cookies if you sign in, respects rate limits, names files predictably:

gallery-dl https://www.flickr.com/photos/username/albums/...

Instagram, specifically

Instagram is the case where the legal/operational risk dwarfs the technical complexity. They actively block scrapers, change their internal API regularly, and their terms explicitly prohibit automated collection.

If you have a legitimate reason to download a few images you have permission to use: open the post, view the source HTML, find the og:image meta tag — the URL points to the full-resolution version. Right-click that URL → save.

For volume work, the only sustainable path is the official Instagram Graph API for business accounts you own, or a paid third-party data provider that wears the operational risk. Don't build a long-running Instagram scraper from scratch unless you enjoy losing access.

From a single URL

Sometimes the URL points to a page that contains an image, not the image itself. Two patterns:

  • The page is an article with a hero image — the og:image meta tag almost always points to the right URL at the right resolution.
  • The page is an image hosting service (Imgur, i.imgur.com) — strip query parameters and the resulting URL is usually the file directly.

Legal and ethical lines

  • Copyright applies to images on the public web. Downloading for personal reference is generally fine; republishing is not.
  • Stock photo sites (Getty, Shutterstock) watermark thumbnails specifically because they expect scraping. Removing watermarks adds copyright violation on top of unauthorized use.
  • Robots.txt and ToS — both worth reading before bulk operations. They don't make scraping illegal but they shape the legal exposure if a dispute happens.

When you actually want the data, not the pixels

If the goal is what's in the image (text, structured data, product info) rather than the image file itself, ExtractFox's website extractor reads the rendered page and returns structured fields. Skips the "download then OCR then parse" step.

More on tutorial

Stop reading, start extracting

Drop a PDF or image into ExtractFox and get structured data back in seconds.

Try a free extraction →