Skip to main content

Node-Fetch: Retry Failed Requests

Node-Fetch: Retry Failed Requests

In this guide for The NodeJs Web Scraping Playbook, we will look at how to configure the Node-Fetch to retry failed requests so you can build a more reliable system.

There are a couple of ways to approach this, so in this guide we will walk you through the 2 most common ways to retry failed requests and show you how to use them with the NodeJS Node-Fetch library:

Let's begin...

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.


Retry Failed Requests Using Retry Library

Here we use the Retry package to define the retry logic and trigger any retries on failed requests.

Here is an example:


import fetch from 'node-fetch';
import retry from 'retry';

const retryOptions = {
retries: 5,
factor: 2,
minTimeout: 1000,
maxTimeout: 10000,
randomize: true,
statusCode: [429, 500, 502, 503, 504]
};

const retryOperation = retry.operation(retryOptions);

const url = 'http://quotes.toscrape.com/';

retryOperation.attempt(async function(currentAttempt) {
try {
const response = await fetch(url);
const html = await response.text();
console.log(html);
} catch (error) {
if (retryOperation.retry(error)) {
console.log(`Retry attempt: ${currentAttempt}`);
return;
}
console.error(`Maximum number of retries reached. Error: ${error}`);
}
});

In the above code, we use the node-fetch library to send HTTP requests with retry functionality. We also utilize the retry package to control the retry behavior.

We define the retry options, including the maximum number of retries, the factor by which to increase the retry timeout, the minimum and maximum timeout values, and the status codes that trigger a retry:

  • retries: The maximum amount of times to retry the operation. Default is 10. Setting this to 1 means do it once, then retry it once.
  • factor: The exponential factor to use. Default is 2.
  • minTimeout: The number of milliseconds before starting the first retry. Default is 1000.
  • maxTimeout: The maximum number of milliseconds between two retries. Default is Infinity.
  • randomize: Randomizes the timeouts by multiplying with a factor between 1 to 2. Default is false.

The formula used to calculate the individual timeouts is:


Math.min(random * minTimeout * Math.pow(factor, attempt), maxTimeout)

The retryOperation.attempt function handles the retry logic. Inside the function, we make the GET request using fetch method (node-fetch) to the specified URL. If an error occurs, we check if a retry should be attempted using retryOperation.retry.

If a retry is required, we log the attempt number and make another attempt. If the maximum number of retries is reached, we log an error message.


Build Your Own Retry Logic Wrapper

Another method of retrying failed requests with NodeJS Node-Fetch is to build your own retry logic around your request functions.


import fetch from 'node-fetch';

const NUM_RETRIES = 3;

(async () => {
let response;

for (let i = 0; i < NUM_RETRIES; i++) {
try {
response = await fetch('http://quotes.toscrape.com/');
if (response.status === 200 || response.status === 404) {
// Escape the loop if a successful response is returned
break;
}
} catch (error) {
// Absence of response field in error object indicates network error
if (error.name === 'FetchError') {
// Handle connection errors
continue;
}
}
}

// Do something with the successful response
if (response && response.status === 200) {
// Perform actions with the successful response
console.log(await response.text())
}
})();


In the above code, we use the fetch method from node-fetch library to send HTTP requests and handle retries. We initialize a variable response to store the response from the successful request.

We then use a for loop with a maximum of NUM_RETRIES iterations. Inside the loop, we make a GET request using fetch (node-fetch) to the specified URL. If the response status code is either 200 or 404, we break out of the loop.

If a connection error occurs, we catch the error and continue to the next iteration.

Finally, after the loop, we check if the response variable is not null and has a status code of 200. If these conditions are met, you can perform actions with the successful response.

The advantage of this approach is that you have a lot of control over what is a failed response.

Above we are only look at the response code to see if we should retry the request, however, we could adapt this so that we also check the response to make sure the HTML response is valid.

Below we will add an additional check to make sure the HTML response doesn't contain a ban page.


import fetch from 'node-fetch';

const NUM_RETRIES = 3;

(async () => {
let response;
let responseText;
let vaildResponse = false;

for (let i = 0; i < NUM_RETRIES; i++) {
try {
response = await fetch('http://quotes.toscrape.com/');
responseText = await response.text()
if (response.status === 200 && !responseText.includes('<title>Robot or human?</title>')) {
// Break the loop if a successful response is returned and the expected content is not present
vaildResponse = true
break;
}

if (response.status === 404) {
// Break the loop if a 404 page is returned
vaildResponse = true
break
}
} catch (error) {
if (error.name === 'FetchError') {
// Handle connection errors
continue;
}
}
}

// Do something with the successful response
if (response && vaildResponse && response.status === 200) {
// Perform actions with the successful response
console.log(responseText)
}
})();


In this example, we also check the successful 200 status code responses to make sure they don't contain a ban page.


"<title>Robot or human?</title>"

If it does then the code will retry the request.


More Web Scraping Tutorials

So that's how you can configure NodeJS Node-Fetch to automatically retry failed requests.

If you would like to learn more about Web Scraping, then be sure to check out The Web Scraping Playbook.

Or check out one of our more in-depth guides: