Graceful retries in Python with backoff
Use this helper when a dependency is mostly reliable but occasionally flaky:
- HTTP APIs under moderate load
- internal services during deploys
- third‑party integrations with rate limits
Failed HTTP calls are normal; silent failures are not. This pattern adds retries with jitter, logs every attempt, and keeps the code compact.
Core helper
import random
import time
import logging
from typing import Callable, TypeVar, Iterable
import requests
T = TypeVar("T")
logger = logging.getLogger(__name__)
def with_backoff(
fn: Callable[[], T],
attempts: int = 4,
base: float = 0.4,
factor: float = 2.0,
jitter: float = 0.25,
retry_on: Iterable[int] = (500, 502, 503, 504),
) -> T:
for i in range(1, attempts + 1):
try:
return fn()
except requests.HTTPError as exc:
status = exc.response.status_code
if status not in retry_on or i == attempts:
logger.error("giving up", extra={"status": status, "attempt": i})
raise
delay = base * (factor ** (i - 1))
delay = delay * (1 + random.uniform(-jitter, jitter))
logger.warning("retrying", extra={"status": status, "attempt": i, "sleep": round(delay, 3)})
time.sleep(delay)
raise RuntimeError("exhausted retries")Using it
logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")
API = "https://api.example.com/health"
def fetch_health() -> dict:
resp = requests.get(API, timeout=3)
resp.raise_for_status()
return resp.json()
result = with_backoff(fetch_health)
print("service status:", result["status"])Why this shape works
- Keep it small: pure function, no decorators or globals.
- Control backoff: jitter reduces thundering herd;
factorcontrola o crescimento entre tentativas. - Log with structure: logging é amigável a JSON via
extra, pronto para pipelines de logs. - Client-agnostic: troque
requestspor qualquer cliente ajustando a lógica deretry_on.
Extension ideas
- Add circuit-breaking after repeated failures.
- Expose metrics for attempts and durations.
- Move retry policy to config so CI can run with fewer retries.