How to extract data from a website to Excel automatically
Power Query, Make/Zapier, and a no-code AI route — three ways to set up an automated pipeline from a website to an Excel file that updates on its own.
Manual web-to-Excel is fine once. Manual web-to-Excel every Monday is a different problem. The right automation depends on whether the source has a stable HTML table, an API, or neither.
1. Power Query (built into Excel)
If the page has an HTML table or a JSON endpoint, Power Query is the right tool and it's already installed. Data → Get Data → From Web → paste the URL. Power Query inspects the page, lists every table it can find, and lets you preview before loading.
Once loaded, the query lives in the workbook. Right-click → Refresh on the table to re-pull. Schedule the workbook to refresh on open, or wire it to a Power Automate flow for hourly/daily refreshes.
Limits: Power Query doesn't run JavaScript, so single-page apps that render data client-side are invisible to it. It also doesn't authenticate beyond basic and OAuth — sites that require complex login flows won't work.
2. Make, Zapier, or n8n + Google Sheets / Excel
For sites that don't play nice with Power Query, the common pattern is: a scraping service (Apify, ScrapingBee, Browserless) feeds rows into Make/Zapier, which writes them to a Google Sheet or Excel Online. Total monthly cost is usually $20–$60 depending on volume.
This works well for ecommerce price tracking, job board monitoring, and listing aggregation — anything where a structured page changes regularly and you need updates in a sheet.
3. AI-based extraction on a schedule
The newer pattern: a scheduled job hits the page, screenshots it (or grabs the rendered HTML), runs it through a multimodal extraction API, and appends rows to a sheet. Wins when the page layout changes frequently or you want to extract from many pages with slightly different structures.
ExtractFox's website extractor handles this manually today; for full automation, hit the API on a Vercel cron or GitHub Action and write to Excel via the Microsoft Graph API. A few dozen lines of code.
Choosing
- Stable HTML table → Power Query. Free, no extra services.
- Dynamic JS-heavy site, recurring schedule → Apify + Make → Google Sheets.
- Variable layouts, multiple sources → AI extraction on a cron.
What to watch for
Whatever route you pick: respect robots.txt, throttle requests, and check the site's terms of service. Automated scraping at scale draws attention even when the site doesn't block you on day one.