Python Requests - Retry Failed Requests

Python Requests: Retry Failed Requests

In this guide for The Python Web Scraping Playbook, we will look at how to configure the Python Requests library to retry failed requests so you can build a more reliable system.

There are a couple of ways to approach this, so in this guide we will walk you through the 2 most common ways to retry failed requests and show you how to use them with Python Requests:

Retry Failed Requests Using Sessions & HTTPAdapter
Build Your Own Retry Logic Wrapper

Let's begin...

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.

Proxy Manager

Scraper Monitoring

Job Scheduling

Retry Failed Requests Using Sessions & HTTPAdapter

If you are okay with using Python Sessions, then you can define the retry logic using a HTTPAdapter.

Here is an example:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

s = requests.Session()

retries = Retry(total=5,
                status_forcelist=[429, 500, 502, 503, 504])

s.mount('http://', HTTPAdapter(max_retries=retries))

s.get('http://quotes.toscrape.com/')

Here we:

Create a retry strategy with urllib3's Retry util, telling it how many retries should it make and for which status codes should it retry status_forcelist.
Add this retry strategy to a HTTPAdapter and mount it to the session.

We can also define a backoff strategy using the backoff_factor attribute (backoff_factor set to 0 by default).

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

s = requests.Session()

retries = Retry(total=5,
                backoff_factor=1,
                status_forcelist=[429, 500, 502, 503, 504])

s.mount('http://', HTTPAdapter(max_retries=retries))

s.get('http://quotes.toscrape.com/')

Using the backoff_factor we can configure our script to exponentially increase the timeout between each retry.

Here is the backoff algorithm:

{backoff_factor} * (2 ** ({number_retries} - 1))

Here are some example sleep sequences different backoff factors will apply:

## backoff_factor = 1
0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256

## backoff_factor = 2
1, 2, 4, 8, 16, 32, 64, 128, 256, 512

## backoff_factor = 3
5, 10, 20, 40, 80, 160, 320, 640, 1280, 2560

Build Your Own Retry Logic Wrapper

Another method of retrying failed requests with Python Requests is to build your own retry logic around your request functions.

import requests

NUM_RETRIES = 3
for _ in range(NUM_RETRIES):
    try:
        response = requests.get('http://quotes.toscrape.com/')
        if response.status_code in [200, 404]:
            ## Escape for loop if returns a successful response
            break
    except requests.exceptions.ConnectionError:
        pass
    
## Do something with successful response
if response is not None and response.status_code == 200:
    pass

The advantage of this approach is that you have a lot of control over what is a failed response.

Above we are only look at the response code to see if we should retry the request, however, we could adapt this so that we also check the response to make sure the HTML response is valid.

Below we will add an additional check to make sure the HTML response doesn't contain a ban page.

import requests

NUM_RETRIES = 3
for _ in range(NUM_RETRIES):
    try:
        response = requests.get('http://quotes.toscrape.com/')
        if response.status_code in [200, 404]:
            if response.status_code == 200 and '<title>Robot or human?</title>' not in response.text:
                break
    except requests.exceptions.ConnectionError:
        pass
    
## Do something with successful response
if response is not None and response.status_code == 200:
    pass

We could also wrap this logic into our own request_retry function if we like:

import requests

def request_retry(url, num_retries=3, success_list=[200, 404], **kwargs):
    for _ in range(num_retries):
        try:
            response = requests.get(url, **kwargs)
            if response.status_code in success_list:
                ## Return response if successful
                return response
        except requests.exceptions.ConnectionError:
            pass
    return None

response = request_retry('http://quotes.toscrape.com/')

Python Requests: Retry Failed Requests

Need help scraping the web?

Retry Failed Requests Using Sessions & HTTPAdapter​

Build Your Own Retry Logic Wrapper​

More Web Scraping Tutorials​

Retry Failed Requests Using Sessions & HTTPAdapter

Build Your Own Retry Logic Wrapper

More Web Scraping Tutorials