Java Apache HttpClient: Retry Failed Requests
In this guide for The Java Web Scraping Playbook, we will look at how to configure the Java Apache HttpClient library to retry failed requests so you can build a more reliable system.
There are a couple of ways to approach this, so in this guide we will walk you through the 2 most common ways to retry failed requests and show you how to use them with the Java Apache HttpClient:
Let's begin...
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
Retry Failed Requests Using Retry Library
Here we use the Retry4j package to define the retry logic and trigger any retries on failed requests.
Here is an example:
import java.time.temporal.ChronoUnit;
import java.util.concurrent.Callable;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.Callable;
import org.apache.hc.client5.http.async.methods.SimpleHttpRequest;
import org.apache.hc.client5.http.async.methods.SimpleHttpResponse;
import org.apache.hc.client5.http.async.methods.SimpleRequestBuilder;
import org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient;
import org.apache.hc.client5.http.impl.async.HttpAsyncClients;
import com.evanlennick.retry4j.CallExecutor;
import com.evanlennick.retry4j.CallExecutorBuilder;
import com.evanlennick.retry4j.Status;
import com.evanlennick.retry4j.config.RetryConfig;
import com.evanlennick.retry4j.config.RetryConfigBuilder;
public class RetryFailedRequests {
public static List<Integer> badStatusCodes = new ArrayList<>(Arrays.asList(429, 500, 502, 503, 504));
public static void main(String[] args) throws Exception {
CloseableHttpAsyncClient client = HttpAsyncClients.createDefault();
client.start();
Callable<Void> makeRequest = () -> {
SimpleHttpRequest request = SimpleRequestBuilder.get("https://quotes.toscrape.com").build();
SimpleHttpResponse response = client.execute(request, null).get();
int statusCode = response.getCode();
if (badStatusCodes.contains(statusCode)) {
throw new Exception("Bad status code: " + statusCode);
}
System.out.println("Response body: " + response.getBodyText());
return null;
};
RetryConfig config = new RetryConfigBuilder()
.retryOnAnyException()
.withMaxNumberOfTries(5)
.withDelayBetweenTries(10, ChronoUnit.SECONDS)
.withExponentialBackoff()
.build();
CallExecutor callExecutor = new CallExecutorBuilder()
.config(config)
.onFailureListener((Status s) -> {
System.out.println("Maximum number of retries reached.");
})
.afterFailedTryListener((Status s) -> {
System.out.println("Total tries: " + s.getTotalTries());
})
.onCompletionListener((Status s) -> { // clean up
try {
client.close();
} catch(Exception e) {}
})
.build();
callExecutor.execute(makeRequest);
}
}
In the above code, we use the java Apache HttpClient 5
library to send HTTP requests with retry functionality. We also utilize the retry4j
package to control the retry behavior.
First we create a Callable
named makeRequest
to contain our request logic.
Next, we use RetryConfigBuilder
to define the our retry config
, including the maximum number of retries, condition that triggers retry, delay between retries and backoff function to calculate each successive delays between retries:
-
retryOnAnyException()
: This specifies that the retry mechanism should trigger a retry for any exception that occurs during the operation. -
withMaxNumberOfTries(5)
: This sets the maximum number of retry attempts to 5. -
withDelayBetweenTries(10, ChronoUnit.SECONDS)
: This sets a fixed 10 seconds delay between each retry attempt. -
withExponentialBackoff()
: This indicates that exponential backoff function should be used for calculating the delay between retry attempts. This approach is often used to avoid overwhelming a service with rapid retries when it's already experiencing issues.
Then we use CallExecutorBuilder
to create a callExecutor
for controlling and executing our retry operation based on config
we just defined. We also configure callExecutor
with these listeners:
-
onFailureListener
: This listener gets called after we reach maximum number of retries and the operation still hasn't succeeded. -
afterFailedTryListener
: This listener gets called after each failed retry attempt. Here we print the total number of tries made so far. -
onCompletionListener
: This listener gets called after our retry logic has finished running. Here we perform clean up task, more specifically we callclient.close
method.
Finally we use callExecutor.execute(makeRequest)
to run our request through the retry mechanism.
Build Your Own Retry Logic Wrapper
Another method of retrying failed requests with Java Apache HttpClient is to build your own retry logic around your request functions.
import org.apache.hc.client5.http.async.methods.SimpleHttpRequest;
import org.apache.hc.client5.http.async.methods.SimpleHttpResponse;
import org.apache.hc.client5.http.async.methods.SimpleRequestBuilder;
import org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient;
import org.apache.hc.client5.http.impl.async.HttpAsyncClients;
import org.apache.hc.core5.http.ConnectionClosedException;
import org.apache.hc.core5.http.ConnectionRequestTimeoutException;
public class CustomRetryLogic {
final public static int NUM_RETRIES = 3;
public static void main(String[] args) throws Exception {
// initialization
SimpleHttpResponse response = null;
CloseableHttpAsyncClient client = HttpAsyncClients.createDefault();
client.start();
for (int i = 0; i < NUM_RETRIES; i++) {
try {
SimpleHttpRequest request = SimpleRequestBuilder.get("https://quotes.toscrape.com").build();
response = client.execute(request, null).get();
int status = response.getCode();
if (status == 200 || status == 404) {
// escape the loop if a successful response is returned or the page is not found
break;
}
} catch (Exception e) {
boolean connectionError = e instanceof ConnectionClosedException || e instanceof ConnectionRequestTimeoutException || e instanceof ConnectionRequestTimeoutException;
if (connectionError) {
// handle connection exceptions
}
} finally {
System.out.println("Total tries: " + (i + 1));
}
}
if (response != null && response.getCode() == 200) {
// do something with the successful response
System.out.println("Response body: " + response.getBodyText());
} else {
System.out.println("No valid response after maximum number of tries");
}
client.close();
}
}
In the above code, we use Apache HttpClient
library to send HTTP requests and use our custom retry wrapper logic to handle retries. We first initialize a variable response
to store the response from the successful request.
We then use a for
loop with a maximum of NUM_RETRIES
iterations. Inside the loop, we make a GET
request to the specified URL. If the response status code is either 200
or 404
, we break out of the loop.
If a connection error occurs, we catch the error
and continue to the next iteration.
Finally, after the loop, we check if the response
variable is not null and has a status code of 200
. If these conditions are met, you can perform actions with the successful response.
The advantage of this approach is that you have a lot of control over what is a failed response.
Above we are only looking at the response code to see if we should retry the request, however, we could adapt this so that we also check the response to make sure the HTML response is valid.
Below we will add an additional check to make sure the HTML response doesn't contain a ban page.
import org.apache.hc.client5.http.async.methods.SimpleHttpRequest;
import org.apache.hc.client5.http.async.methods.SimpleHttpResponse;
import org.apache.hc.client5.http.async.methods.SimpleRequestBuilder;
import org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient;
import org.apache.hc.client5.http.impl.async.HttpAsyncClients;
import org.apache.hc.core5.http.ConnectionClosedException;
import org.apache.hc.core5.http.ConnectionRequestTimeoutException;
import org.jsoup.Jsoup;
public class CustomRetryLogic {
final public static int NUM_RETRIES = 3;
public static void main(String[] args) throws Exception {
// initialization
SimpleHttpResponse response = null;
boolean validResponse = false;
CloseableHttpAsyncClient client = HttpAsyncClients.createDefault();
client.start();
for (int i = 0; i < NUM_RETRIES; i++) {
try {
SimpleHttpRequest request = SimpleRequestBuilder.get("https://quotes.toscrape.com").build();
response = client.execute(request, null).get();
String pageTitle = Jsoup.parse(response.getBodyText()).title();
int status = response.getCode();
boolean validStatus = status == 200 || status == 404;
if (validStatus && !pageTitle.contains("Robot or human?")) {
// escape the loop if valid status code is returned and the expected content is not present
validResponse = true;
break;
}
} catch (Exception e) {
boolean connectionError = e instanceof ConnectionClosedException || e instanceof ConnectionRequestTimeoutException || e instanceof ConnectionRequestTimeoutException;
if (connectionError) {
// handle connection exceptions
}
} finally {
System.out.println("Total tries: " + (i + 1));
}
}
if (response != null && validResponse && response.getCode() == 200) {
// do something with the successful response
System.out.println("Response body: " + response.getBodyText());
} else {
System.out.println("No valid response after maximum number of tries");
}
client.close();
}
}
In this example, we also check the successful 200 status code responses to make sure they don't contain a ban page.
"<title>Robot or human?</title>"
If it does then the code will retry the request.
More Web Scraping Tutorials
So that's how you can configure Java Apache HttpClient to automatically retry failed requests.
If you would like to learn more about Web Scraping, then be sure to check out The Web Scraping Playbook.
Or check out one of our more in-depth guides: