← All Posts

Using Claude Code for Web Scraping: A Practical Developer Guide

Claude Code can write, run, and debug web scraping scripts end-to-end inside your terminal, making it one of the fastest ways to go from "I need this data" to a working scraper. It works best for developers who want to skip boilerplate and stay in the flow. The main trade-off: scraping sessions are token-heavy, and hitting a usage limit mid-crawl means a 5-hour lockout. Track limits with Usagebar to avoid interruptions.

  • Claude Code can scaffold a full Playwright or Puppeteer scraper in minutes
  • Claude Pro and Max plans share a usage window that resets every 5 hours
  • Scraping tasks with large HTML payloads can consume tokens quickly

What is Claude Code's role in a web scraping workflow?

Claude Code is Anthropic's agentic CLI that runs directly in your terminal. It can read your project files, write new scripts, execute shell commands, and iterate based on errors it sees in real time. For web scraping, this means you can describe what data you want and Claude Code will generate the scraper, install dependencies, run it, and fix selector issues as they appear.

Unlike a chatbot where you copy-paste code back and forth, Claude Code stays in your project context. It knows your existing file structure, can commit changes to a branch, and can chain multiple steps together: fetch the page, parse the HTML, clean the data, write to a CSV or database.

How to set up web scraping with Claude Code

Open your project directory in the terminal and start a Claude Code session. From there you can direct Claude in plain language.

Step 1: Choose your scraping library

Tell Claude Code which tool you prefer or let it recommend one based on the target site:

  • Playwright: Best for JavaScript-rendered pages, handles headless Chromium/Firefox/WebKit
  • Puppeteer: Similar to Playwright, Node.js native, slightly smaller API surface
  • BeautifulSoup + Requests: Best for static HTML pages, faster and lighter on tokens
  • Scrapy: Best for large-scale crawl pipelines where you need middleware and pipelines built in

For most one-off scraping tasks, Playwright (Python or Node) is the safest default because it handles modern single-page applications without extra configuration.

Step 2: Scaffold the scraper

In your Claude Code session, a prompt like the following is enough to get started:

Write a Playwright Python scraper that extracts product names and prices
from [target URL]. Save results to products.csv. Handle pagination.

Claude Code will create the file, install playwright via pip if needed, and run a test crawl. If selectors fail, it reads the error output and corrects them automatically.

Step 3: Iterate on selectors and edge cases

Real scraping work is mostly edge cases: pages with lazy-loaded content, infinite scroll, login walls, or rate limiting. Claude Code can inspect the HTML it fetched, rewrite the selector logic, and add retry/backoff logic in the same session. You stay in the terminal rather than switching between browser DevTools, a text editor, and the docs.

Step 4: Add data cleaning and export

Once the scraper returns raw data, ask Claude Code to clean it: strip whitespace, normalise price formats, deduplicate rows, and write to your target format (CSV, JSON, SQLite, Postgres). The entire pipeline can live in a single Claude Code session.

Useful Claude Code slash commands for scraping tasks

Claude Code's slash commands help you manage longer scraping sessions efficiently:

  • /clear: Clears the conversation context. Use this between scraping tasks to avoid carrying irrelevant context that burns tokens
  • /compact: Compresses context to a summary, useful mid-session when you've iterated through many selector fixes
  • /usage: Shows your current usage against your plan's limits so you know how much headroom you have before the next window

You can see a full list of Claude Code slash commands on the Usagebar blog, including ones for memory, configuration, and project management.

Why usage limits matter more for scraping than for other tasks

Web scraping sessions tend to be token-intensive for a few reasons:

  • Claude Code reads raw HTML to identify selectors, and HTML pages can be large
  • Iterative debugging (run, fail, fix, re-run) multiplies context size fast
  • If you paste example HTML or ask Claude to analyse a page structure, that payload counts against your limit

According to Anthropic's usage limit guidance, Pro and Max plan users share a usage window that resets on a 5-hour rolling basis. If you hit the cap mid-crawl, you're locked out until the window resets. That's especially frustrating when you're 80% through a scrape and the data isn't written yet.

This is the exact problem that Usagebar was built to solve. It sits in your macOS menu bar and shows your live Claude Code usage with alerts at 50%, 75%, and 90% of your limit. You see exactly when your window resets, so you can time heavy scraping sessions to start at the beginning of a fresh window rather than discovering mid-task that you're out of tokens.

For a deeper look at how usage is calculated across sessions, see how Claude Code usage affects Pro limits.

How to reduce token usage during scraping sessions

A few practical habits keep scraping sessions efficient:

  • Scope your HTML input: Instead of pasting a full page, extract just the relevant container element (a product grid, a table, a list) and give Claude that fragment
  • Use static parsers first: If the page doesn't require JavaScript, tell Claude to use BeautifulSoup + Requests rather than Playwright. Faster, fewer round-trips, less context
  • Clear between tasks: Use /clear once you've confirmed the scraper works before moving to the data-cleaning phase
  • Write scripts to files early: Ask Claude to write the script to disk and run it as a subprocess rather than keeping all logic in the conversation context

The Usagebar blog has a full guide on reducing Claude Code token usage that covers these and other patterns in more depth.

Common web scraping patterns Claude Code handles well

Here are specific scraping scenarios where Claude Code adds the most leverage:

Scraping taskRecommended libraryClaude Code advantage
Static HTML extractionBeautifulSoup + RequestsFast scaffold, low token cost
JavaScript-rendered pagesPlaywrightHandles waitForSelector and scroll logic automatically
Paginated resultsAnyWrites loop logic and handles "next page" selectors
Login-gated contentPlaywrightCan write session/cookie management code
Large-scale crawlsScrapyScaffolds pipelines, middleware, and settings files
API reverse-engineeringRequests + DevToolsReads network responses and builds direct API calls

Monitoring usage during long scraping sessions

For long scraping projects spanning multiple sessions, the native way to check your remaining usage is the /usage slash command inside Claude Code, or by visiting claude.ai/settings/usage. Both require you to switch context away from the terminal.

Usagebar keeps that data in your menu bar at all times, without a context switch. It stores your credentials securely in the macOS Keychain and displays a live usage indicator that updates continuously. When you hit the 75% alert, you know to wrap up the current crawl and save your data before the lockout. When the window resets, you see it immediately and can start the next session right away.

The tool uses a pay-what-you-want model and is free for students. You can get Usagebar with an instant download, no account required.

You can also check when Claude Code usage resets to plan the timing of heavy scraping work around your usage windows.

Key takeaways

  1. Claude Code can scaffold, run, and debug full web scrapers end-to-end in your terminal
  2. Playwright is the best default for modern sites; BeautifulSoup is better for static pages and token efficiency
  3. Use /clear between phases and scope your HTML input to keep token usage low
  4. Scraping sessions are token-heavy. Hitting a usage limit mid-crawl means a 5-hour wait
  5. Monitor your usage with /usage in Claude Code or with Usagebar for always-on menu bar visibility
  6. Plan heavy sessions to start at the beginning of a fresh usage window

Sources

Track Your Claude Code Usage

Never hit your usage limits unexpectedly. Usagebar lives in your menu bar and shows your 5-hour and weekly limits at a glance.

Get Usagebar