How to Do Web Scraping on Claude Code (Practical Guide)
Web scraping on Claude Code means using its built-in bash execution and tool-use capabilities to fetch, parse, and extract data from websites directly inside your terminal workflow. It's best suited for developers who want to automate data extraction without switching to a separate notebook or browser environment. The main trade-off: Claude Code's usage window is shared, so scraping-heavy sessions burn through tokens fast.
- Claude Code can run
curl,wget, Python (requests,BeautifulSoup), and Node.js scrapers natively via its bash tool - Heavy scraping workloads can push you into a usage limit mid-session
- Knowing your reset window before you start a scraping job prevents surprise lockouts
What is web scraping in the context of Claude Code?
Claude Code is a terminal-native AI coding agent. It can read files, edit code, and run shell commands directly on your machine. Web scraping fits into this naturally: you describe what data you need, and Claude Code writes and executes the scraper for you.
Unlike ChatGPT with browser plugins, Claude Code scraping runs locally. The HTTP requests come from your machine, the files land in your project directory, and every step is transparent in your terminal. There's no cloud sandbox you can't inspect.
How to scrape a website using Claude Code: step by step
1. Describe the target and desired output
Start a Claude Code session in your project folder and tell it exactly what you want:
Scrape the product names, prices, and URLs from https://example.com/products and save them to products.csv
Being specific about the output format (CSV, JSON, SQLite) upfront avoids iterative back-and-forth that eats into your usage window.
2. Let Claude Code scaffold the scraper
Claude Code will typically generate a Python script using requests and BeautifulSoup, or a Node.js script with cheerio, depending on what's already in your environment. For JavaScript-heavy pages, it may reach for playwright or puppeteer.
A typical generated Python scraper looks like:
import requests
from bs4 import BeautifulSoup
import csv
url = "https://example.com/products"
resp = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
soup = BeautifulSoup(resp.text, "html.parser")
with open("products.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["name", "price", "url"])
for item in soup.select(".product-card"):
name = item.select_one(".product-name").text.strip()
price = item.select_one(".price").text.strip()
link = item.select_one("a")["href"]
writer.writerow([name, price, link])
3. Run it via the bash tool
Claude Code executes the script using its bash tool, shows you stdout/stderr in real time, and iterates on errors automatically. If BeautifulSoup isn't installed, it runs pip install beautifulsoup4 first. You stay in the terminal the whole time.
4. Handle pagination and rate limiting
Ask Claude Code to handle pagination by detecting "next page" links or looping over URL patterns. For rate limiting, describe the delay you want:
Add a 1-second delay between requests and handle 429 responses with exponential backoff
Claude Code will add time.sleep() and retry logic without you having to write it manually.
5. Use Playwright for JavaScript-rendered pages
Static HTML scrapers won't work on React or Vue SPAs. Tell Claude Code the page requires JavaScript rendering and it will scaffold a Playwright script instead:
The page at https://example.com uses React. Use Playwright to scrape the product list after it renders.
Claude Code installs Playwright, writes the async script, and runs it. The only prerequisite is that Playwright's browser binaries are available on your system (playwright install chromium).
Using the /web-fetch slash command for quick single-page scraping
For one-off extractions, Claude Code's /web-fetch slash command is the fastest path. It fetches a URL and returns the content directly into the conversation context:
/web-fetch https://news.ycombinator.com
From there you can ask Claude Code to extract specific data, summarize, or reformat the content. No script required. See the full Claude Code slash commands list for other shortcuts that speed up your workflow.
The limitation: /web-fetch is best for static pages and single URLs. For bulk scraping or pagination, a dedicated script is more reliable.
Web scraping patterns that work well with Claude Code
- E-commerce price monitoring: Scrape competitor pricing into a CSV or SQLite database on a cron schedule
- Job board aggregation: Pull listings from multiple boards into a unified JSON feed
- Documentation extraction: Scrape API docs or changelogs and pipe them into a local vector store for RAG
- Research data collection: Pull structured tables from Wikipedia, government data portals, or academic sites
- Link checking: Crawl a site and flag 404s or broken internal links
Watch your usage window during long scraping sessions
This is where most developers get caught out. Web scraping sessions are verbose: Claude Code reads the page structure, writes the script, debugs selectors, handles errors, and iterates. That's a lot of tokens per page scraped.
If you're mid-scrape when your usage limit hits, you're locked out for up to 5 hours. You can check your current usage with the /usage command inside Claude Code, or visit claude.ai/settings/usage. Neither gives you proactive alerts before the limit hits.
Usagebar sits in your macOS menu bar and shows a live usage gauge with smart alerts at 50%, 75%, and 90% of your limit. You see the warning before you're blocked, not after. It uses your macOS Keychain to store credentials securely and shows exactly when your usage window resets so you can plan your scraping jobs accordingly.
Pricing is pay-what-you-want, with a free option for students. Get Usagebar before your next big scraping session.
| Usage check method | Real-time? | Proactive alerts? | Reset time visible? |
|---|---|---|---|
/usage command | Yes | No | No |
| claude.ai/settings/usage | Yes | No | No |
| Usagebar | Yes | Yes (50/75/90%) | Yes |
Common issues and how Claude Code handles them
Cloudflare and bot protection
Many sites use Cloudflare or similar WAFs. Claude Code can't bypass these automatically. For protected sites, ask Claude Code to use Playwright with stealth mode plugins (playwright-stealth), or to scrape via a proxy service. Understand and comply with each site's robots.txt and terms of service before scraping.
Selector drift
Sites redesign their HTML and break selectors. Claude Code is good at diagnosing selector failures when you paste the current error output and ask it to fix the script. Keep your scrapers in version control so you can diff changes when a scraper breaks.
Missing dependencies
Claude Code checks your environment and installs missing packages via pip or npm. If you're in a restricted environment (no internet, corporate proxy), tell Claude Code upfront so it generates requirements.txt or package.json for offline install.
Key takeaways
- Describe your target URL and desired output format clearly at the start of the session
- Use
/web-fetchfor quick single-page extractions; ask for a Python or Playwright script for bulk scraping - Request rate limiting and pagination handling explicitly
- Always respect
robots.txtand site terms of service - Monitor your usage with Usagebar before a long scraping session to avoid mid-job lockouts
- Keep scraper scripts in version control to handle selector drift over time
Sources
Track Your Claude Code Usage
Never hit your usage limits unexpectedly. Usagebar lives in your menu bar and shows your 5-hour and weekly limits at a glance.
Get Usagebar