The first time you add an AI feature to a Laravel app — summarize a ticket, classify an inbound email, draft a reply — the obvious move is to call the model right there in the controller. Don't. The way I build ai features laravel apps actually ship with is to treat the model call like any other slow, rate-limited third-party API: call the Claude API from a queued job, store the result, and notify the user when it's ready. That keeps your request fast, keeps your API key off the client, and gives you a place to handle retries and caching. This is the shape I land on every time.
Why not just call the API in the request?
A model call is slow. A summary of a long document can take a few seconds to well over a minute, and a synchronous controller action that blocks for that long will hit your PHP-FPM timeout, hold a worker hostage, and time out the browser. It's also rate-limited — the Anthropic API returns 429 when you exceed your per-minute token budget, and you don't want a user-facing request failing because three other users hit submit at the same moment.
So the request does almost nothing: it validates input, dispatches a job, and returns immediately. The job does the actual work on a queue worker, off the request lifecycle entirely.
- Controller: validate, dispatch the job, return a 202 or redirect with a 'processing' state.
- Queued job: call the Claude API, parse the response, persist it.
- Notification: broadcast an event, send an email, or flip a status column the frontend polls.
If you don't already have queue workers running under Supervisor in production, set that up first — a dispatched job that nothing processes just sits in the table forever. I wrote up the worker and Supervisor config in running Laravel queue workers in production.
How do you actually call the Claude API from Laravel?
Two options. You can pull in the official PHP SDK (composer require anthropic-ai/sdk) for a typed client, or — since this is Laravel — you can hit the endpoint with the built-in Http client. For a single endpoint I reach for Http first; it's one less dependency and the request shape is simple. You POST to https://api.anthropic.com/v1/messages with two headers, x-api-key and anthropic-version, and a JSON body of model, max_tokens, and messages.
The API key comes from config, which reads it from .env. Never inline it, never ship it to the browser — the whole point of doing this server-side is that the key stays on the server.
return [
// ...existing services...
'anthropic' => [
'key' => env('ANTHROPIC_API_KEY'),
'model' => env('ANTHROPIC_MODEL', 'claude-opus-4-8'),
],
];<?php
namespace App\Jobs;
use App\Models\Document;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Http;
class SummarizeDocument implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
// Retry on 429 / 5xx with growing backoff (seconds).
public array $backoff = [10, 30, 60];
public int $tries = 4;
public function __construct(public Document $document) {}
public function handle(): void
{
$response = Http::withHeaders([
'x-api-key' => config('services.anthropic.key'),
'anthropic-version' => '2023-06-01',
])
->timeout(120)
->post('https://api.anthropic.com/v1/messages', [
'model' => config('services.anthropic.model'),
'max_tokens' => 1024,
'messages' => [[
'role' => 'user',
'content' => "Summarize this document in 3 bullet points:\n\n"
. $this->document->body,
]],
])
->throw(); // raises on 4xx/5xx so the job retries
// content is an array of blocks; the text lives in the first one.
$summary = $response->json('content.0.text');
$this->document->update([
'summary' => $summary,
'summarized_at' => now(),
]);
}
}A note on the model: claude-opus-4-8 is the top of the Opus tier, around $5 per million input tokens and $25 per million output. For high-volume classification where you don't need the smartest model, drop to claude-sonnet-4-6 ($3 / $15) or claude-haiku-4-5 ($1 / $5) and your bill changes shape entirely. Set it per-job if different features want different tiers.
What about retries, backoff, and caching?
When you call the endpoint with the raw Http client, you own the retry logic — the job above leans on Laravel's own machinery. The $tries and $backoff properties give you a few attempts with growing delays, and .throw() turns a 429 or 500 into an exception so the job gets re-queued instead of silently persisting a broken result. That covers the common case without a dependency.
Caching matters more than people expect. If two users submit the same document, or the same support macro runs against an identical ticket, you're paying twice for the same tokens. Key a cache entry on a hash of the prompt and short-circuit before the call. The mechanics are the same as any external response, and I covered them in caching third-party API responses in Laravel.
$cacheKey = 'claude:summary:' . hash('sha256', $this->document->body);
$summary = Cache::remember($cacheKey, now()->addDays(7), function () {
$response = Http::withHeaders([
'x-api-key' => config('services.anthropic.key'),
'anthropic-version' => '2023-06-01',
])->post('https://api.anthropic.com/v1/messages', [
'model' => config('services.anthropic.model'),
'max_tokens' => 1024,
'messages' => [/* ... */],
])->throw();
return $response->json('content.0.text');
});The API key in .env, the call in a queued job, the result in the database — get those three boundaries right and everything else is just prompt tuning.
How do you get machine-parseable results instead of prose?
Summaries are fine as free text. Classification isn't — if you're routing a ticket to a department or extracting fields from an invoice, you need a value your code can switch on, not a paragraph you have to regex. Claude's tool use is built for this. You pass a tools array where each tool has a name, a description, and an input_schema written as JSON Schema. The model replies with stop_reason of tool_use and a tool_use block whose input matches your schema. You read input straight off the block — no parsing of natural language.
$response = Http::withHeaders([
'x-api-key' => config('services.anthropic.key'),
'anthropic-version' => '2023-06-01',
])->post('https://api.anthropic.com/v1/messages', [
'model' => 'claude-sonnet-4-6', // cheaper tier is plenty for classification
'max_tokens' => 256,
'tools' => [[
'name' => 'route_ticket',
'description' => 'Assign a support ticket to the correct team.',
'input_schema' => [
'type' => 'object',
'properties' => [
'department' => [
'type' => 'string',
'enum' => ['billing', 'technical', 'sales', 'abuse'],
],
'priority' => [
'type' => 'string',
'enum' => ['low', 'normal', 'urgent'],
],
],
'required' => ['department', 'priority'],
],
]],
'tool_choice' => ['type' => 'tool', 'name' => 'route_ticket'],
'messages' => [[
'role' => 'user',
'content' => $ticket->body,
]],
])->throw();
// Find the tool_use block and read its typed input.
$block = collect($response->json('content'))
->firstWhere('type', 'tool_use');
$ticket->update([
'department' => $block['input']['department'],
'priority' => $block['input']['priority'],
]);Setting tool_choice to force the tool guarantees you get a tool_use block back rather than the model deciding to chat. The enum in the schema constrains the department to values your application actually knows how to handle, so you never get a made-up category that breaks your routing. Sharpening the descriptions in that schema is where most of the accuracy gains come from — the same instinct as the rest of prompt engineering for developers.
When the output is long, stream it
For a short summary or a classification, a single blocking response inside the job is fine. For long generations — a full draft, a multi-section report — request the streaming variant so you don't risk an HTTP timeout waiting for the whole thing, and so you can surface partial output. In a queued-job world that usually means streaming the response server-side and pushing chunks to the client over websockets as they arrive. You still keep the call in the job; you're just consuming the response incrementally instead of all at once.
That's the whole pattern. The web request stays under a second, the API key never leaves the server, failed calls retry on their own, identical prompts hit the cache instead of your wallet, and anything you need to act on comes back as structured data. Start with one feature behind one job, watch it run through Supervisor for a week, and only then reach for streaming and tool use where they earn their keep. AI features in Laravel aren't special infrastructure — they're a slow external API, handled the way you'd handle any other.

