← All Posts

How to Reduce Claude Code Token Usage (Without Slowing Down)

The fastest way to reduce Claude Code token usage is to use /compact to compress conversation history, scope your context to relevant files only, and break large tasks into smaller prompts. These habits can cut token burn by 40-60% in typical sessions. This guide is for developers on Pro or Max plans who want to stay in flow without hitting the 5-hour usage lockout at the worst possible moment.

  • Claude Code tracks usage in rolling 5-hour windows, not calendar days
  • Context window size is the single biggest driver of token consumption per message
  • The /compact command summarizes history and can recover thousands of tokens mid-session

Why token usage spikes faster than you expect

Every message you send to Claude Code doesn't just cost the tokens in your prompt. It sends the entire conversation history plus all referenced file contents on every turn. A session that starts lean can balloon quickly once you've pulled in a few large files, run a few tool calls, and gone back and forth on implementation details.

According to Anthropic's usage limit best practices, usage is measured across all messages in a rolling window. Agentic tasks, which involve multiple back-and-forth tool calls, consume significantly more than single-turn questions.

How to use /compact to recover token headroom

The most effective single action you can take mid-session is running /compact. This is a built-in Claude Code slash command that replaces the full conversation history with a concise summary. It keeps Claude aware of what you've done without re-sending every exchange.

When to run it:

  • After completing a discrete subtask (feature implemented, bug fixed)
  • Before starting a new section of work in a long session
  • When you notice responses slowing down or becoming less focused

You can also pass a custom summary instruction: /compact Focus on the auth module changes only. This tells Claude what context actually matters going forward, rather than letting it summarize everything equally.

How to scope context to reduce tokens per message

The second-biggest lever is controlling which files Claude Code pulls into context. When you reference a large codebase without scoping, Claude may load entire directories. Here's how to keep context tight:

Use specific file paths, not broad globs

Instead of asking Claude to "look at the project," reference the exact file: src/auth/login.ts. This prevents Claude from loading adjacent files that aren't relevant to the task.

Add a CLAUDE.md with project conventions

A CLAUDE.md file in your repo root lets you pre-define project conventions, architecture decisions, and coding standards. This replaces repetitive explanation in every prompt. One well-written CLAUDE.md can eliminate dozens of clarification tokens per session.

Split large tasks into smaller prompts

A single prompt asking Claude to "refactor the entire API layer" will use far more tokens than three sequential prompts each targeting one endpoint. Smaller prompts also produce better output: Claude stays focused, makes fewer assumptions, and needs less correction.

Prompt habits that compound into big savings

Token efficiency comes from consistent habits, not one-time fixes. These practices reduce waste across every session:

  • Be explicit about output format. If you want only the changed function, say so. Claude won't wrap it in explanation you'll discard.
  • Avoid open-ended exploration prompts. "What's wrong with my code?" invites a broad scan. "Why is getUserById returning null when the user exists?" targets the problem.
  • Use headless mode for scripted tasks. For repetitive or automated operations, Claude Code's non-interactive mode can be more token-efficient than interactive sessions.
  • Clear completed context before switching tasks. Start a new session or run /compact when moving from one feature to a completely different area of the codebase.
  • Avoid re-explaining context Claude already knows. If you covered architecture decisions earlier in the session, reference them by name rather than re-explaining.

How to monitor usage before you hit the limit

Reducing token usage is more effective when you can see where you stand in real time. Without visibility, you're guessing whether your efficiency habits are working, and you won't know you're close to the limit until Claude stops responding.

You can check your current usage with the /usage command inside Claude Code, or by visiting claude.ai/settings/usage. Both options require you to leave your editor and break focus.

Usagebar puts this data directly in your macOS menu bar. It shows your live usage percentage, fires smart alerts at 50%, 75%, and 90%, and shows exactly when your 5-hour window resets. No context switching, no manual checking. If you're wrapping up a PR and at 88%, you'll know before Claude cuts out mid-review.

Usagebar uses macOS Keychain to store credentials securely and runs entirely on your machine. It's available on a pay-what-you-want model, with a free tier for students. Get Usagebar for an instant download.

Tracking usage in real time also helps you calibrate which habits are actually working. If a session stays under 60% after adopting scoped context and regular /compact runs, that's signal worth building on.

Comparison: token-heavy vs. token-efficient patterns

HabitToken-heavy patternToken-efficient pattern
Context scope"Look at my project""Look at src/auth/login.ts"
Task sizeRefactor entire module in one promptBreak into one function per prompt
History managementLong session with no compacting/compact after each subtask
Prompt style"What's wrong with my code?""Why does X return null when Y?"
Output scopeFull file rewrites for small changesRequest only the changed function
Usage monitoringCheck manually when something feels slowLive menu bar alerts via Usagebar

Key takeaways

  1. Run /compact between subtasks to compress history and recover token headroom.
  2. Scope file references precisely. Never ask Claude to load the whole project when one file will do.
  3. Write a CLAUDE.md once; save explanation tokens on every session going forward.
  4. Break large refactors into sequential, single-concern prompts.
  5. Check your rolling usage window proactively. The /usage command or Usagebar prevents getting locked out mid-task.
  6. Efficient prompting compounds: each good habit reduces the baseline for every subsequent message.

Related reading

Sources

Track Your Claude Code Usage

Never hit your usage limits unexpectedly. Usagebar lives in your menu bar and shows your 5-hour and weekly limits at a glance.

Get Usagebar