Python hRequests - Retry Failed Requests

Python hRequests: Retry Failed Requests

In this guide for The Python Web Scraping Playbook, we will look at how to configure the Python Hrequests library to retry failed requests so you can build a more reliable system.

There are a couple of ways to approach this, so in this guide we will walk you through the 2 most common ways to retry failed requests and show you how to use them with Python Hrequests:

Retry Failed Requests Using Retry Library
Build Your Own Retry Logic Wrapper

Let's begin...

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.

Proxy Manager

Scraper Monitoring

Job Scheduling

Retry Failed Requests Using Retry Library

Here we use the Retry library to define the retry logic and trigger retries on failed requests:

import hrequests
from retry.api import retry_call

# a list of status codes that should trigger retry
failed_statuses = [429, 500, 502, 503, 504]

def make_request():
    # unsafe code that is likely to throw exception
    response = hrequests.get('http://quotes.toscrape.com/')
    if (response.status_code in failed_statuses):
        print("bad status code - retrying . .")
        raise Exception(response.status_code)
    return response

# use retry_call to make the request with retry logic
response = retry_call(make_request, tries=5, delay=0, backoff=2, exceptions=Exception)

# process the response
print(response.status_code)
print(response.text)

In the provided Python script, we use hrequests library together with retry library to make an http request with retry functionality.

First we initialize failed_statuses variable to a list of status codes which should trigger retry. Then inside make_request function we trigger retry by raising an exception if response.status_code happens to be in failed_statuses list.

Finally we call retry_call function to handle the retry logic. This function manages the execution of the make_request function, based on the following parameters:

tries: The maximum amount of times to retry the request. Default is -1, which means keep retrying until the request succeeds.
delay: The initial delay before the first retry.
backoff: The backoff factor for subsequent retries. A backoff factor greater than 1 results in progressively longer waits between retries. This helps prevent overwhelming a server with repeated requests in rapid succession and gives the system more time to recover from temporary issues.
exception: The type of exception that should initiate a retry. Here we set it to Exception to trigger retry in case of any exception.

Build Your Own Retry Logic Wrapper

Another method of retrying failed requests with Python Hrequests is to build your own retry logic around your request functions.

import hrequests

NUM_RETRIES = 3
response = None

for _ in range(NUM_RETRIES):
    try:
        response = hrequests.get('http://quotes.toscrape.com/')
        if response.status_code in [200, 404]:
            ## Escape for loop if returns a successful response
            break
    except hrequests.exceptions.ClientException:
        ## Handle connection errors
        pass
    
## Do something with successful response
if response is not None and response.status_code == 200:
    pass

In the above code, we use the hrequests.get to send HTTP requests. We first initialize a variable response to None in order to store the response from the successful request.

We then use a for loop with a maximum of NUM_RETRIES iterations. Inside the loop, we make a GET request using hrequests.get to the specified URL. If the response status code is either 200 or 404, we break out of the loop.

If a network error occurs, we catch it in except block and continue to the next iteration.

Finally, after the loop, we check if the response variable is not None and has a status code of 200. If these conditions are met, you can perform actions with the successful response.

The advantage of this approach is that you have a lot of control over what is a failed response.

Above we only look at the response code to see if we should retry the request, however, we could adapt this so that we also check the response to make sure the HTML response is valid.

Below we will add an additional check to make sure the HTML response doesn't contain a ban page.

import hrequests

NUM_RETRIES = 3
response = None

for _ in range(NUM_RETRIES):
    try:
        response = hrequests.get('http://quotes.toscrape.com/')
        if response.status_code in [200, 404]:
            if response.status_code == 200 and '<title>Robot or human?</title>' not in response.text:
                break
    except hrequests.exceptions.ClientException:
        pass
    
## Do something with successful response
if response is not None and response.status_code == 200:
    pass

We could also wrap this logic into our own request_retry function if we like:

import hrequests

def request_retry(url, num_retries=3, success_list=[200, 404], **kwargs):
    for _ in range(num_retries):
        try:
            response = hrequests.get(url, **kwargs)
            if response.status_code in success_list:
                ## Return response if successful
                return response
        except hrequests.exceptions.ClientException:
            pass
    return None

response = request_retry('http://quotes.toscrape.com/')

Python hRequests: Retry Failed Requests

Need help scraping the web?

Retry Failed Requests Using Retry Library​

Build Your Own Retry Logic Wrapper​

More Web Scraping Tutorials​

Retry Failed Requests Using Retry Library

Build Your Own Retry Logic Wrapper

More Web Scraping Tutorials