NodeJs SuperAgent: Retry Failed Requests

NodeJS SuperAgent: Retry Failed Requests

In this guide for The NodeJs Web Scraping Playbook, we will look at how to configure the NodeJS SuperAgent library to retry failed requests so you can build a more reliable system.

There are a couple of ways to approach this, so in this guide we will walk you through the 2 most common ways to retry failed requests and show you how to use them with the NodeJS SuperAgent library:

Retry Failed Requests Using Retry Library
Build Your Own Retry Logic Wrapper

Let's begin...

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.

Proxy Manager

Scraper Monitoring

Job Scheduling

Retry Failed Requests Using Retry Library

Here we use the Retry package to define the retry logic and trigger any retries on failed requests.

Here is an example:

const request = require('superagent');
const retry = require('retry');

const retryOptions = {
  retries: 5,
  factor: 2,
  minTimeout: 1000,
  maxTimeout: 10000,
  randomize: true,
  statusCode: [429, 500, 502, 503, 504]
};

const retryOperation = retry.operation(retryOptions);

const url = 'http://quotes.toscrape.com/';

retryOperation.attempt(async function(currentAttempt) {
  try {
    const response = await request.get(url);
    const html = response.text;
    console.log(html);
  } catch (error) {
    if (retryOperation.retry(error)) {
      console.log(`Retry attempt: ${currentAttempt}`);
      return;
    }
    console.error(`Maximum number of retries reached. Error: ${error}`);
  }
});

In the above code, we use the node.js SuperAgent library to send HTTP requests with retry functionality. We also utilize the retry package to control the retry behavior.

We define the retry options, including the maximum number of retries, the factor by which to increase the retry timeout, the minimum and maximum timeout values, and the status codes that trigger a retry:

retries: The maximum amount of times to retry the operation. Default is 10. Setting this to 1 means do it once, then retry it once.
factor: The exponential factor to use. Default is 2.
minTimeout: The number of milliseconds before starting the first retry. Default is 1000.
maxTimeout: The maximum number of milliseconds between two retries. Default is Infinity.
randomize: Randomizes the timeouts by multiplying with a factor between 1 to 2. Default is false.

The formula used to calculate the individual timeouts is:

Math.min(random * minTimeout * Math.pow(factor, attempt), maxTimeout)

The retryOperation.attempt function handles the retry logic. Inside the function, we make the GET request using request.get method (from superagent) to the specified URL. If an error occurs, we check if a retry should be attempted using retryOperation.retry.

If a retry is required, we log the attempt number and make another attempt. If the maximum number of retries is reached, we log an error message.

Build Your Own Retry Logic Wrapper

Another method of retrying failed requests with NodeJS SuperAgent is to build your own retry logic around your request functions.

const request = require('superagent');
const NUM_RETRIES = 3;

(async () => {
  let response;

  for (let i = 0; i < NUM_RETRIES; i++) {
    try {
      response = await request.get('http://quotes.toscrape.com/');
      if (response.status === 200) {
        // Escape the loop if a successful response is returned
        break;
      }
    } catch (error) {
      // Absence of response field in error object indicates network error
      const networkError = error.response === undefined;
      if (networkError) {
        // Handle connection errors
        continue;
      }
      response = error.response;
      if (response.status === 404) {
        break;
      }
    }
  }

  // Do something with the successful response
  if (response && response.status === 200) {
    // Perform actions with the successful response
    console.log(response.text)
  }
})();

In the above code, we use the request.get method from superagent library to send HTTP requests and handle retries. We initialize a variable response to store the response from the successful request.

We then use a for loop with a maximum of NUM_RETRIES iterations. Inside the loop, we make a GET request using request.get (superagent) to the specified URL. If the response status code is either 200 or 404, we break out of the loop.

If a connection error occurs, we catch the error and continue to the next iteration.

Finally, after the loop, we check if the response variable is not null and has a status code of 200. If these conditions are met, you can perform actions with the successful response.

The advantage of this approach is that you have a lot of control over what is a failed response.

Above we are only look at the response code to see if we should retry the request, however, we could adapt this so that we also check the response to make sure the HTML response is valid.

Below we will add an additional check to make sure the HTML response doesn't contain a ban page.

const request = require('superagent');

const NUM_RETRIES = 3;

(async () => {
  let response;
  let vaildResponse = false;

  for (let i = 0; i < NUM_RETRIES; i++) {
    try {
      response = await request.get('http://quotes.toscrape.com/');
      const html = response.text;
      if (response.status === 200 && !html.includes('<title>Robot or human?</title>')) {
        // Break the loop if a successful response is returned and the expected content is not present
        vaildResponse = true;
        break;
      }
    } catch (error) {
      const networkError = error.response === undefined;
      if (networkError) {
        // Handle connection errors
        continue;
      }
      response = error.response;
      if (response.status === 404) {
        vaildResponse = true;
        break;
      }
    }
  }

  // Do something with the successful response
  if (response && vaildResponse && response.status === 200) {
    // Perform actions with the successful response
    console.log(response.text);
  }
})();

In this example, we also check the successful 200 status code responses to make sure they don't contain a ban page.

"<title>Robot or human?</title>"

If it does then the code will retry the request.

NodeJS SuperAgent: Retry Failed Requests

Need help scraping the web?

Retry Failed Requests Using Retry Library​

Build Your Own Retry Logic Wrapper​

More Web Scraping Tutorials​

Retry Failed Requests Using Retry Library

Build Your Own Retry Logic Wrapper

More Web Scraping Tutorials