← All Posts

How to Do Web Scraping on Claude Code (Practical Guide)

Web scraping on Claude Code means using its built-in bash execution and tool-use capabilities to fetch, parse, and extract data from websites directly inside your terminal workflow. It's best suited for developers who want to automate data extraction without switching to a separate notebook or browser environment. The main trade-off: Claude Code's usage window is shared, so scraping-heavy sessions burn through tokens fast.

  • Claude Code can run curl, wget, Python (requests, BeautifulSoup), and Node.js scrapers natively via its bash tool
  • Heavy scraping workloads can push you into a usage limit mid-session
  • Knowing your reset window before you start a scraping job prevents surprise lockouts

What is web scraping in the context of Claude Code?

Claude Code is a terminal-native AI coding agent. It can read files, edit code, and run shell commands directly on your machine. Web scraping fits into this naturally: you describe what data you need, and Claude Code writes and executes the scraper for you.

Unlike ChatGPT with browser plugins, Claude Code scraping runs locally. The HTTP requests come from your machine, the files land in your project directory, and every step is transparent in your terminal. There's no cloud sandbox you can't inspect.

How to scrape a website using Claude Code: step by step

1. Describe the target and desired output

Start a Claude Code session in your project folder and tell it exactly what you want:

Scrape the product names, prices, and URLs from https://example.com/products and save them to products.csv

Being specific about the output format (CSV, JSON, SQLite) upfront avoids iterative back-and-forth that eats into your usage window.

2. Let Claude Code scaffold the scraper

Claude Code will typically generate a Python script using requests and BeautifulSoup, or a Node.js script with cheerio, depending on what's already in your environment. For JavaScript-heavy pages, it may reach for playwright or puppeteer.

A typical generated Python scraper looks like:

import requests
from bs4 import BeautifulSoup
import csv

url = "https://example.com/products"
resp = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
soup = BeautifulSoup(resp.text, "html.parser")

with open("products.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["name", "price", "url"])
    for item in soup.select(".product-card"):
        name = item.select_one(".product-name").text.strip()
        price = item.select_one(".price").text.strip()
        link = item.select_one("a")["href"]
        writer.writerow([name, price, link])

3. Run it via the bash tool

Claude Code executes the script using its bash tool, shows you stdout/stderr in real time, and iterates on errors automatically. If BeautifulSoup isn't installed, it runs pip install beautifulsoup4 first. You stay in the terminal the whole time.

4. Handle pagination and rate limiting

Ask Claude Code to handle pagination by detecting "next page" links or looping over URL patterns. For rate limiting, describe the delay you want:

Add a 1-second delay between requests and handle 429 responses with exponential backoff

Claude Code will add time.sleep() and retry logic without you having to write it manually.

5. Use Playwright for JavaScript-rendered pages

Static HTML scrapers won't work on React or Vue SPAs. Tell Claude Code the page requires JavaScript rendering and it will scaffold a Playwright script instead:

The page at https://example.com uses React. Use Playwright to scrape the product list after it renders.

Claude Code installs Playwright, writes the async script, and runs it. The only prerequisite is that Playwright's browser binaries are available on your system (playwright install chromium).

Using the /web-fetch slash command for quick single-page scraping

For one-off extractions, Claude Code's /web-fetch slash command is the fastest path. It fetches a URL and returns the content directly into the conversation context:

/web-fetch https://news.ycombinator.com

From there you can ask Claude Code to extract specific data, summarize, or reformat the content. No script required. See the full Claude Code slash commands list for other shortcuts that speed up your workflow.

The limitation: /web-fetch is best for static pages and single URLs. For bulk scraping or pagination, a dedicated script is more reliable.

Web scraping patterns that work well with Claude Code

  • E-commerce price monitoring: Scrape competitor pricing into a CSV or SQLite database on a cron schedule
  • Job board aggregation: Pull listings from multiple boards into a unified JSON feed
  • Documentation extraction: Scrape API docs or changelogs and pipe them into a local vector store for RAG
  • Research data collection: Pull structured tables from Wikipedia, government data portals, or academic sites
  • Link checking: Crawl a site and flag 404s or broken internal links

Watch your usage window during long scraping sessions

This is where most developers get caught out. Web scraping sessions are verbose: Claude Code reads the page structure, writes the script, debugs selectors, handles errors, and iterates. That's a lot of tokens per page scraped.

If you're mid-scrape when your usage limit hits, you're locked out for up to 5 hours. You can check your current usage with the /usage command inside Claude Code, or visit claude.ai/settings/usage. Neither gives you proactive alerts before the limit hits.

Usagebar sits in your macOS menu bar and shows a live usage gauge with smart alerts at 50%, 75%, and 90% of your limit. You see the warning before you're blocked, not after. It uses your macOS Keychain to store credentials securely and shows exactly when your usage window resets so you can plan your scraping jobs accordingly.

Pricing is pay-what-you-want, with a free option for students. Get Usagebar before your next big scraping session.

Usage check methodReal-time?Proactive alerts?Reset time visible?
/usage commandYesNoNo
claude.ai/settings/usageYesNoNo
UsagebarYesYes (50/75/90%)Yes

Common issues and how Claude Code handles them

Cloudflare and bot protection

Many sites use Cloudflare or similar WAFs. Claude Code can't bypass these automatically. For protected sites, ask Claude Code to use Playwright with stealth mode plugins (playwright-stealth), or to scrape via a proxy service. Understand and comply with each site's robots.txt and terms of service before scraping.

Selector drift

Sites redesign their HTML and break selectors. Claude Code is good at diagnosing selector failures when you paste the current error output and ask it to fix the script. Keep your scrapers in version control so you can diff changes when a scraper breaks.

Missing dependencies

Claude Code checks your environment and installs missing packages via pip or npm. If you're in a restricted environment (no internet, corporate proxy), tell Claude Code upfront so it generates requirements.txt or package.json for offline install.

Key takeaways

  1. Describe your target URL and desired output format clearly at the start of the session
  2. Use /web-fetch for quick single-page extractions; ask for a Python or Playwright script for bulk scraping
  3. Request rate limiting and pagination handling explicitly
  4. Always respect robots.txt and site terms of service
  5. Monitor your usage with Usagebar before a long scraping session to avoid mid-job lockouts
  6. Keep scraper scripts in version control to handle selector drift over time

Sources

Track Your Claude Code Usage

Never hit your usage limits unexpectedly. Usagebar lives in your menu bar and shows your 5-hour and weekly limits at a glance.

Get Usagebar