NodeJS Got: Retry Failed Requests
In this guide for The NodeJs Web Scraping Playbook, we will look at how to configure the NodeJS Got library to retry failed requests so you can build a more reliable system.
There are a couple of ways to approach this, so in this guide we will walk you through the 2 most common ways to retry failed requests and show you how to use them with the NodeJS Got library:
Let's begin...
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
Retry Failed Requests Using Retry Library
Here we use the Retry package to define the retry logic and trigger any retries on failed requests.
Here is an example:
import got from 'got';
import retry from 'retry';
const retryOptions = {
retries: 5,
factor: 2,
minTimeout: 1000,
maxTimeout: 10000,
randomize: true,
statusCode: [429, 500, 502, 503, 504]
};
const retryOperation = retry.operation(retryOptions);
const url = 'http://quotes.toscrape.com/';
retryOperation.attempt(async function(currentAttempt) {
try {
const response = await got.get(url);
const html = response.body;
console.log(html);
} catch (error) {
if (retryOperation.retry(error)) {
console.log(`Retry attempt: ${currentAttempt}`);
return;
}
console.error(`Maximum number of retries reached. Error: ${error}`);
}
});
In the above code, we use the node.js Got
library to send HTTP requests with retry functionality. We also utilize the retry
package to control the retry behavior.
We define the retry options
, including the maximum number of retries, the factor by which to increase the retry timeout, the minimum and maximum timeout values, and the status codes that trigger a retry:
retries
: The maximum amount of times to retry the operation. Default is10
. Setting this to1
means do it once, then retry it once.factor
: The exponential factor to use. Default is2
.minTimeout
: The number of milliseconds before starting the first retry. Default is1000
.maxTimeout
: The maximum number of milliseconds between two retries. Default is Infinity.randomize
: Randomizes the timeouts by multiplying with a factor between 1 to 2. Default isfalse
.
The formula used to calculate the individual timeouts is:
Math.min(random * minTimeout * Math.pow(factor, attempt), maxTimeout)
The retryOperation.attempt
function handles the retry logic. Inside the function, we make the GET
request using got.get
method to the specified URL. If an error occurs, we check if a retry should be attempted using retryOperation.retry
.
If a retry is required, we log the attempt number and make another attempt. If the maximum number of retries is reached, we log an error message.
Build Your Own Retry Logic Wrapper
Another method of retrying failed requests with NodeJS Got is to build your own retry logic around your request functions.
import got from 'got';
const NUM_RETRIES = 3;
(async () => {
let response;
for (let i = 0; i < NUM_RETRIES; i++) {
try {
response = await got.get('http://quotes.toscrape.com');
if (response.statusCode === 200) {
// Escape the loop if a successful response is returned
break;
}
} catch (error) {
// Absence of response field in error object indicates network error
const networkError = error.response === undefined;
if (networkError) {
// Handle network errors
continue;
}
response = error.response;
if (response.statusCode === 404) {
break;
}
}
}
// Do something with the successful response
if (response && response.statusCode === 200) {
// Perform actions with the successful response
console.log(response.body)
}
})();
In the above code, we use the got.get
method to send HTTP requests and handle retries. We initialize a variable response
to store the response from the successful request.
We then use a for
loop with a maximum of NUM_RETRIES
iterations. Inside the loop, we make a GET
request using got.get
to the specified URL. If the response status code is either 200
or 404
, we break out of the loop.
If a connection error occurs, we catch the error
and continue to the next iteration.
Note that got
throws an error when response status code is greater than 299. So we have to handle 404 error in the catch block. In this case, we also set response to error.response.
Finally, after the loop, we check if the response
variable is not null and has a status code of 200
. If these conditions are met, you can perform actions with the successful response.
The advantage of this approach is that you have a lot of control over what is a failed response.
Above we are only looking at the response code to see if we should retry the request, however, we could adapt this so that we also check the response to make sure the HTML response is valid.
Below we will add an additional check to make sure the HTML response doesn't contain a ban page.
const request = require('got');
const NUM_RETRIES = 3;
(async () => {
let response;
let validResponse = false;
for (let i = 0; i < NUM_RETRIES; i++) {
try {
response = await got.get('http://quotes.toscrape.com');
const html = response.body;
if (response.statusCode === 200 && !html.includes('<title>Robot or human?</title>')) {
// Break the loop if a successful response is returned and the expected content is not present
validResponse = true;
break;
}
} catch (error) {
// Absence of response field in error object indicates network error
const networkError = error.response === undefined;
if (networkError) {
// Handle network errors
continue;
}
response = error.response;
if (response.statusCode === 404) {
validResponse = true;
break;
}
}
}
// Do something with the successful response
if (response && validResponse && response.statusCode === 200) {
// Perform actions with the successful response
console.log(response.body)
}
})();
In this example, we also check the successful 200 status code responses to make sure they don't contain a ban page.
"<title>Robot or human?</title>"
If it does then the code will retry the request.
More Web Scraping Tutorials
So that's how you can configure NodeJS Got to automatically retry failed requests.
If you would like to learn more about Web Scraping, then be sure to check out The Web Scraping Playbook.
Or check out one of our more in-depth guides: