A resilient API client retries only transient errors and fails fast on the rest, and the difference shows up the first time a third-party provider has a bad afternoon. Third-party APIs time out, return 503s, and rate-limit you with no warning. A naive client that just calls Http::get() and trusts the result turns the provider's outage into yours: workers pile up waiting on a hung socket, a retry loop hammers an already-struggling endpoint, and a retried payment POST quietly charges the customer twice. I have cleaned up all three. The fix is a client that retries only transient errors (429, 5xx, connection failures), backs off with jitter, honors Retry-After, sets hard timeouts, and makes retried writes idempotent. Here is the Laravel HTTP client setup I actually ship.
Why retry only transient errors?
The most common retry bug I see is retrying everything. If the provider returns 422 because your request body is malformed, retrying it five times changes nothing except the load you put on them and the latency you inflict on your own user. A 401 means your token is wrong; retrying will not fix it. Those are your bugs, not transient blips, and they should fail immediately so you see them. The errors worth retrying are the ones likely to succeed on a second attempt: a 429 (rate limited), 5xx server errors, and low-level connection failures (DNS, refused connection, read timeout). Laravel's Http::retry() takes a when callback so you decide exactly which failures are eligible.
<?php
namespace App\Services;
use Illuminate\Http\Client\ConnectionException;
use Illuminate\Http\Client\RequestException;
use Illuminate\Support\Facades\Http;
class ProviderClient
{
public function getCustomer(string $id): array
{
return Http::retry(
times: 4,
sleepMilliseconds: 200,
// Only retry transient failures. A 4xx like 422/401 is OUR bug; fail fast.
when: function (\Throwable $e) {
if ($e instanceof ConnectionException) {
return true; // DNS, refused, read timeout
}
if ($e instanceof RequestException) {
$status = $e->response->status();
return $status === 429 || $status >= 500;
}
return false;
},
throw: true,
)
->connectTimeout(3)
->timeout(10)
->acceptJson()
->withToken(config('services.provider.token'))
->get(config('services.provider.url') . "/customers/{$id}")
->throw()
->json();
}
}The when closure receives the exception thrown by each attempt. With throw: true, Http::retry() throws a RequestException on a 4xx/5xx between attempts, which is what gives the closure a status code to inspect; a ConnectionException covers the cases where there is no response at all. I also call ->throw() on the response so the final attempt after retries are exhausted still raises instead of handing back a Response with ok() === false that a caller might ignore. The when closure checks instanceof ConnectionException first and returns before touching ->response, because a connection-level failure has no response object to read.
How do you back off without creating a thundering herd?
A fixed retry delay is better than nothing, but if your provider blips and a thousand of your requests all fail at once, a fixed delay means all thousand retry at the same instant, hitting the recovering provider with a synchronized wall of traffic. That is the thundering-herd problem, and it can keep an upstream pinned down well after it would otherwise have recovered. The answer is exponential backoff with jitter: each attempt waits longer than the last, and a random component spreads the retries out so they do not align. In Laravel you get this by passing a closure as the sleep argument; it receives the attempt number and the exception, and returns the milliseconds to wait.
use Illuminate\Http\Client\ConnectionException;
use Illuminate\Http\Client\RequestException;
Http::retry(
times: 4,
sleepMilliseconds: function (int $attempt, \Throwable $e) {
// Honor Retry-After on a 429 instead of guessing.
if ($e instanceof RequestException && $e->response->status() === 429) {
$retryAfter = $e->response->header('Retry-After');
if (is_numeric($retryAfter)) {
return ((int) $retryAfter) * 1000; // header is in seconds
}
}
// Exponential backoff: 200ms, 400ms, 800ms, 1600ms ... ($attempt is 1-based)
$base = 200 * (2 ** ($attempt - 1));
// Full jitter: random point between 0 and the backoff ceiling.
return random_int(0, $base);
},
when: fn (\Throwable $e) => $e instanceof RequestException
? ($e->response->status() === 429 || $e->response->status() >= 500)
: $e instanceof ConnectionException,
);Two things matter here. First, when the provider tells you how long to wait via a Retry-After header on a 429, use it. Guessing a backoff when the server has handed you the exact number is how you get throttled again. Retry-After can be either a number of seconds or an HTTP date; the snippet handles the common numeric form and falls back to the exponential delay otherwise, which you would want to extend to parse the date form in production. Second, the jitter. I use full jitter (a random value between zero and the exponential ceiling) because AWS's backoff-and-jitter analysis found it spreads competing retries well under contention; equal jitter performs about as well and is also fine. The point is that no two clients retry in lockstep.
Why do timeouts matter more than retries?
Retries get the attention, but the failure that takes down production is almost always a missing timeout. If a provider accepts your connection and then just stops responding, a request with no timeout waits far too long, and the PHP-FPM worker handling it is stuck for the full max_execution_time. Get a few hundred of those during a provider stall and every worker in your pool is blocked, your own healthy endpoints stop responding, and the provider's slowness has become your full outage. You need two separate limits, because they fail differently.
- connectTimeout: how long to wait to establish the TCP connection. Keep this short (2 to 3 seconds). A provider that cannot even accept a connection quickly is already in trouble, and you want to fail and move on.
- timeout: the total request budget, including the response body. This is the one that protects your workers from a hung upstream. Set it to the slowest response you are actually willing to wait for, not the slowest the provider might ever take.
- Retry interaction: the timeout applies per attempt, so with four attempts and a 10-second timeout your worst case is roughly 40 seconds plus backoff. Size the total against your queue or request budget so a retry storm cannot outlive the worker's own limit.
Timeouts cap the damage from one slow call. A circuit breaker caps the damage from a provider that is down entirely. Once you have seen, say, ten consecutive failures, there is no point letting every new request pay the full timeout to discover the provider is still down. Open the circuit: short-circuit calls for a cool-off window, then let one probe through to see if it recovered. You can build this with a cache counter, and if the work is happening in a queued job, lean on the queue's own retry and backoff so a down provider does not stall your synchronous request path at all.
<?php
namespace App\Services;
use Illuminate\Support\Facades\Cache;
use RuntimeException;
class CircuitBreaker
{
private int $threshold = 10; // failures before opening
private int $cooldown = 60; // seconds the circuit stays open
public function __construct(private string $service) {}
/** @throws RuntimeException when the circuit is open */
public function call(callable $request): mixed
{
if (Cache::get($this->key('open'))) {
throw new RuntimeException("Circuit open for {$this->service}; failing fast.");
}
try {
$result = $request();
Cache::forget($this->key('failures')); // success resets the count
return $result;
} catch (\Throwable $e) {
$failures = Cache::increment($this->key('failures'));
if ($failures >= $this->threshold) {
Cache::put($this->key('open'), true, $this->cooldown);
}
throw $e;
}
}
private function key(string $suffix): string
{
return "breaker:{$this->service}:{$suffix}";
}
}Retries are for the provider's brief stumbles. Timeouts and circuit breakers are for the provider's bad days. You need all three, because they protect you from different failures.
How do you retry a POST without creating duplicates?
This is the part people forget, and it is the one that costs real money. Retrying a GET is safe; it has no side effect. Retrying a POST is dangerous, because the failure you retried on may have happened after the provider already processed the request. Picture this: you POST a charge, the provider creates it, then the response times out on the way back to you. Your client sees a timeout, retries, and now there are two charges. The fix is an idempotency key: a unique value you generate once per logical operation and send on every retry of it. The provider records the key against the first result and returns that same result for any repeat, so retries become safe. Stripe and most serious payment APIs support an Idempotency-Key header for exactly this.
use Illuminate\Http\Client\ConnectionException;
use Illuminate\Http\Client\RequestException;
public function createCharge(array $payload, string $operationId): array
{
// ONE key per logical operation, reused across every retry of it.
// Derive it from your own record id so a re-run of the job uses the same key.
$idempotencyKey = 'charge-' . $operationId;
return Http::retry(
times: 3,
sleepMilliseconds: 250,
when: fn (\Throwable $e) =>
$e instanceof ConnectionException
|| ($e instanceof RequestException
&& ($e->response->status() === 429 || $e->response->status() >= 500)),
throw: true,
)
->connectTimeout(3)
->timeout(15)
->withHeaders(['Idempotency-Key' => $idempotencyKey])
->withToken(config('services.provider.token'))
->post(config('services.provider.url') . '/charges', $payload)
->throw()
->json();
}The critical detail is that the key must be stable across retries of the same operation, which is why I derive it from a record id (the order or invoice you are charging for) rather than calling Str::uuid() inside the method, where every retry would generate a fresh key and defeat the whole point. The same discipline applies on the receiving side of an integration; when you consume webhooks, you need idempotent handling there too, which I cover in handling payment webhooks with idempotency and retries.
Where does the resilient client fit in your stack?
Retries and backoff are one layer of a reliable integration, not the whole thing. They pair with a few neighbours worth wiring up at the same time so you are not solving the same provider's flakiness in five different controllers.
- Cache the responses you can. The cheapest retry is the one you never make, because the answer was already in your cache.
- Run the call from a queued job. A queue worker with its own retry and backoff isolates a slow provider from your web request path entirely, so a user never waits on it.
- Wrap it once. Put all of this behind a single client class so the timeout, retry policy, and idempotency logic live in one place, not copy-pasted into every caller.
The cleanest way to keep this consistent is to wrap the whole thing in one place, which is the case I make in building a reusable API wrapper package for Laravel, and to lean on caching where you can, covered in caching third-party API responses in Laravel. For the exact retry and timeout method signatures, Laravel's HTTP Client documentation is the canonical reference.
None of this is exotic. A resilient API client is just a naive one plus four habits: retry only transient errors, back off with jitter and honor Retry-After, set both a connect and a total timeout, and make retried writes idempotent. The naive version works perfectly right up until the day the provider does not, and on that day it is the difference between a logged warning and a 2am incident with duplicate charges to refund. Write the client once, the right way, route every integration through it, and the next provider outage becomes something your system absorbs instead of something it amplifies.

