Skip to main content

Java Apache HttpClient: How to Use & Rotate Proxies

Using Proxies With Apache HttpClient

In order to use proxies with a Java Apache HttpClient, first create a new HttpHost instance from your proxy scheme, hostname and port, and set it to proxyHost variable. Then create an HttpAsyncClientBuilder instance by calling HttpAsyncClients.custom factory method, and use its setProxy method with proxyHost as an argument.


import org.apache.hc.client5.http.async.methods.SimpleHttpRequest;
import org.apache.hc.client5.http.async.methods.SimpleHttpResponse;
import org.apache.hc.client5.http.async.methods.SimpleRequestBuilder;
import org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient;
import org.apache.hc.client5.http.impl.async.HttpAsyncClients;
import org.apache.hc.core5.http.HttpHost;

public class Proxy {
public static HttpHost proxyHost = new HttpHost("http", "proxy.example.com", 80);
public static void main(String[] args) throws Exception {
CloseableHttpAsyncClient client = HttpAsyncClients.custom()
.setProxy(proxyHost)
.build();
client.start();

SimpleHttpRequest request = SimpleRequestBuilder.get("http://httpbin.org/ip").build();
SimpleHttpResponse response = client.execute(request, null).get();
System.out.println("Response body: " + response.getBodyText());
client.close();
}
}

In this guide for The Apache HttpClient Web Scraping Playbook, we will look at how to integrate the 3 most common types of proxies into our Apache HttpClient Apache HttpClient based web scraper.

Using proxies with the Apache HttpClient library allows you to spread your requests over multiple IP addresses making it harder for websites to detect & block your web scrapers.

In this guide we will walk you through the 3 most common proxy integration methods and show you how to use them with Apache HttpClient Apache HttpClient:

Let's begin...

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.


Using Proxy IPs With Apache HttpClient

Using a proxy with Apache HttpClient is very straightforward. To configure your client to use proxy, simply call setProxy method of client builder and provide your proxyHost as an argument.


import org.apache.hc.client5.http.async.methods.SimpleHttpRequest;
import org.apache.hc.client5.http.async.methods.SimpleHttpResponse;
import org.apache.hc.client5.http.async.methods.SimpleRequestBuilder;
import org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient;
import org.apache.hc.client5.http.impl.async.HttpAsyncClients;
import org.apache.hc.core5.http.HttpHost;

public class Proxy {
public static HttpHost proxyHost = new HttpHost("http", "proxy.example.com", 80);
public static void main(String[] args) throws Exception {
CloseableHttpAsyncClient client = HttpAsyncClients.custom()
.setProxy(proxyHost)
.build();
client.start();

SimpleHttpRequest request = SimpleRequestBuilder.get("http://httpbin.org/ip").build();
SimpleHttpResponse response = client.execute(request, null).get();
System.out.println("Response body: " + response.getBodyText());
client.close();
}
}

Proxy Authentication With Apache HttpClient

Some proxy IPs require authentication in the form of a username and password to use the proxy.

First create CredentialsProvider instance called credsProvider using CredentialsProviderBuilder. The credsProvider stores authentication credentials (username and password) for a specific authentication scope (proxyHostname and proxyPort), allowing them to be used for authenticated requests to that scope.

Then to configure your client to use authenticated proxy, simply call setProxy method of client builder with proxyHost, and setDefaultCredentialsProvider method with credsProvider.


import org.apache.hc.client5.http.async.methods.SimpleHttpRequest;
import org.apache.hc.client5.http.async.methods.SimpleHttpResponse;
import org.apache.hc.client5.http.async.methods.SimpleRequestBuilder;
import org.apache.hc.client5.http.auth.AuthScope;
import org.apache.hc.client5.http.auth.CredentialsProvider;
import org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient;
import org.apache.hc.client5.http.impl.async.HttpAsyncClients;
import org.apache.hc.client5.http.impl.auth.CredentialsProviderBuilder;
import org.apache.hc.core5.http.HttpHost;

public class AuthProxy {
public static String proxyHostname = "example.com";
public static int proxyPort = 80;
public static String username = "USERNAME";
public static String password = "PASSWORD";
public static void main(String[] args) throws Exception {
HttpHost proxyHost = new HttpHost("http", proxyHostname, proxyPort);
CredentialsProvider credsProvider = CredentialsProviderBuilder.create()
.add(new AuthScope(proxyHostname, proxyPort), username, password.toCharArray())
.build();
CloseableHttpAsyncClient client = HttpAsyncClients.custom()
.setProxy(proxyHost)
.setDefaultCredentialsProvider(credsProvider)
.build();
client.start();

SimpleHttpRequest request = SimpleRequestBuilder.get("https://httpbin.org/ip").build();
SimpleHttpResponse response = client.execute(request, null).get();
System.out.println("Response body: " + response.getBodyText());
client.close();
}
}

The 3 Most Common Proxy Formats

That covered the basics of integrating a proxy into Apache HttpClient. In the next sections we will show you how to integrate Apache HttpClient into the 3 most common proxy formats:

  • Rotating Through List of Proxy IPs
  • Using Proxy Gateways
  • Using Proxy APIs

A couple years ago, proxy providers would sell you a list of proxy IP addresses and you would configure your scraper to rotate through these IP addresses and use a new one with each request.

However, today more and more proxy providers don't sell raw lists of proxy IP addresses anymore. Instead they provide access to their proxy pools via proxy gateways or proxy API endpoints.

We will look at how to integrate with all 3 proxy formats.

Finding Proxy Providers

If you are looking to find a good proxy provider then check out our web scraping proxy comparison tool where you can compare the plans of all the major proxy providers.


Proxy Integration #1: Rotating Through Proxy IP List

Here a proxy provider will normally provide you with a list of proxy IP addresses that you will need to configure your scraper to rotate through and select a new IP address for every request.

The proxy list you recieve will look something like this:

[
{
"scheme": "http",
"host": "85.237.57.198",
"username": "Username",
"password": "Password",
"port": 20000
},
{
"scheme": "http",
"host": "85.237.57.198",
"username": "Username",
"password": "Password",
"port": 21000
},
{
"scheme": "http",
"host": "85.237.57.198",
"username": "Username",
"password": "Password",
"port": 22000
},
{
"scheme": "http",
"host": "85.237.57.198",
"username": "Username",
"password": "Password",
"port": 23000
}
]

To integrate them into our scrapers we need to configure our code to pick a random proxy from this list everytime we make a request.

In our Java Apache HttpClient scraper we could do it like this:


import org.apache.hc.client5.http.async.methods.SimpleHttpRequest;
import org.apache.hc.client5.http.async.methods.SimpleHttpResponse;
import org.apache.hc.client5.http.async.methods.SimpleRequestBuilder;
import org.apache.hc.client5.http.auth.AuthScope;
import org.apache.hc.client5.http.auth.CredentialsProvider;
import org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient;
import org.apache.hc.client5.http.impl.async.HttpAsyncClients;
import org.apache.hc.client5.http.impl.auth.CredentialsProviderBuilder;
import org.apache.hc.core5.http.HttpHost;
import org.json.JSONArray;
import org.json.JSONObject;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;


public class ProxyRotation {
public static String proxyListString = """
[
{
"host": "85.237.57.198",
"port": 20000,
"username": "Username",
"password": "Password"
},
{
"host": "85.237.57.198",
"port": 21000,
"username": "Username",
"password": "Password"
},
{
"host": "85.237.57.198",
"port": 22000,
"username": "Username",
"password": "Password"
},
{
"host": "85.237.57.198",
"port": 23000,
"username": "Username",
"password": "Password"
}
]
""";

public static List<Map<String, Object>> parseProxies(String proxyListString) {
JSONArray proxyListJson = new JSONArray(proxyListString);
List<Map<String, Object>> proxyList = new ArrayList<Map<String, Object>>();

for (int i = 0; i < proxyListJson.length(); i++) {
JSONObject proxyJson = proxyListJson.getJSONObject(i);
Map<String, Object> proxy = new HashMap<>();
proxy.put("host", proxyJson.getString("host"));
proxy.put("port", proxyJson.getInt("port"));
proxy.put("username", proxyJson.getString("username"));
proxy.put("password", proxyJson.getString("password"));
proxyList.add(proxy);
}
return proxyList;
}

public static Map<String, Object> getRandomProxy(List<Map<String, Object>> proxyList) {
int rnd = new Random().nextInt(proxyList.size());
return proxyList.get(rnd);
}

public static void main(String[] args) throws Exception {
// parse proxy json data into a List of proxy data Map
List<Map<String, Object>> proxyList = parseProxies(proxyListString);

// pick a random proxy from the list of proxies
Map<String, Object> proxy = getRandomProxy(proxyList);

String proxyHostname = (String) proxy.get("host");
int proxyPort = (int) proxy.get("port");
String proxyUsername = (String) proxy.get("username");
String proxyPassword = (String) proxy.get("password");

HttpHost proxyHost = new HttpHost("http", proxyHostname, proxyPort);
CredentialsProvider credsProvider = CredentialsProviderBuilder.create()
.add(new AuthScope(proxyHostname, proxyPort), proxyUsername, proxyPassword.toCharArray())
.build();
CloseableHttpAsyncClient client = HttpAsyncClients.custom()
.setProxy(proxyHost)
.setDefaultCredentialsProvider(credsProvider)
.build();
client.start();

SimpleHttpRequest request = SimpleRequestBuilder.get("https://httpbin.org/ip").build();
SimpleHttpResponse response = client.execute(request, null).get();
System.out.println("Response body: " + response.getBodyText());
client.close();
}
}


This is a simplistic example, as when scraping at scale we would also need to build a mechanism to monitor the performance of each individual IP address and remove it from the proxy rotation if it got banned or blocked.


Proxy Integration #2: Using Proxy Gateway

Increasingly, a lot of proxy providers aren't selling lists of proxy IP addresses anymore. Instead, they give you access to their proxy pools via a proxy gateway.

Here, you only have to integrate a single proxy into your Apache HttpClient scraper and the proxy provider will manage the proxy rotation, selection, cleaning, etc. on their end for you.

This is the most comman way to use residential and mobile proxies, and is becoming increasingly common when using datacenter proxies too.

Here is an example of how to integrate a BrightData's residential proxy gateway into our Apache HttpClient scraper:

import org.apache.hc.client5.http.async.methods.SimpleHttpRequest;
import org.apache.hc.client5.http.async.methods.SimpleHttpResponse;
import org.apache.hc.client5.http.async.methods.SimpleRequestBuilder;
import org.apache.hc.client5.http.auth.AuthScope;
import org.apache.hc.client5.http.auth.CredentialsProvider;
import org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient;
import org.apache.hc.client5.http.impl.async.HttpAsyncClients;
import org.apache.hc.client5.http.impl.auth.CredentialsProviderBuilder;
import org.apache.hc.core5.http.HttpHost;

public class BrightDataProxy {
/**
* Initializing global variables for bright data proxy url parameters
* a typical url looks like this: http://USERNAME:PASSWORD@zproxy.lum-superproxy.io:22225
*/
public static String BRIGHTDATA_USERNAME = "USERNAME";
public static String BRIGHTDATA_PASSWORD = "PASSWORD";
public static String BRIGHTDATA_HOSTNAME = "zproxy.lum-superproxy.io";
public static int BRIGHTDATA_PORT = 22225;
public static void main(String[] args) throws Exception {
HttpHost proxyHost = new HttpHost("http", BRIGHTDATA_HOSTNAME, BRIGHTDATA_PORT);
CredentialsProvider credsProvider = CredentialsProviderBuilder.create()
.add(new AuthScope("example.com", BRIGHTDATA_PORT), BRIGHTDATA_USERNAME, BRIGHTDATA_PASSWORD.toCharArray())
.build();
CloseableHttpAsyncClient client = HttpAsyncClients.custom()
.setProxy(proxyHost)
.setDefaultCredentialsProvider(credsProvider)
.build();
client.start();

SimpleHttpRequest request = SimpleRequestBuilder.get("https://httpbin.org/ip").build();
SimpleHttpResponse response = client.execute(request, null).get();
System.out.println("Response body: " + response.getBodyText());
client.close();
}
}

As you can see, it is much easier to integrate than using a proxy list as you don't have to worry about implementing all the proxy rotation logic.


Proxy Integration #3: Using Proxy API Endpoint

Recently, a lot of proxy providers have started offering smart proxy APIs that take care of managing your proxy infrastructure for you by rotating proxies and headers for you so you can focus on extracting the data you need.

Here you typically send the URL you want to scrape to their API endpoint and then they will return the HTML response to you.

Although every proxy API provider has a slightly different API integration, they are all very similar and are very easy to integrate with.

Here is an example of how to integrate with the ScrapeOps Proxy Manager:


import org.apache.hc.client5.http.async.methods.SimpleHttpRequest;
import org.apache.hc.client5.http.async.methods.SimpleHttpResponse;
import org.apache.hc.client5.http.async.methods.SimpleRequestBuilder;
import org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient;
import org.apache.hc.client5.http.impl.async.HttpAsyncClients;

public class ScrapeOpsProxy {
public static String SCRAPEOPS_API_KEY = "your_api_key";

public static void main(String[] args) throws Exception {
CloseableHttpAsyncClient client = HttpAsyncClients.createDefault();
client.start();

String targetUrl = "http://httpbin.org/ip";
String proxyAPIUrl = String.format("https://proxy.scrapeops.io/v1?api_key=%s&url=%s", SCRAPEOPS_API_KEY, targetUrl);

SimpleHttpRequest request = SimpleRequestBuilder.get()
.setUri(proxyAPIUrl)
.build();
SimpleHttpResponse response = client.execute(request, null).get();
System.out.println("Response body: " + response.getBodyText());
client.close();
}
}

Here you simply send the targetUrl you want to scrape to the ScrapeOps API endpoint in the url query parameter, along with your SCRAPEOPS_API_KEY in the api_key query parameter, and ScrapeOps will deal with finding the best proxy for that domain and return the HTML response to you.

Note that we did not have to configure proxy parameters (proxy host and auth scope). Using ScrapeOps Proxy API Aggregator simplifies our code a lot.

You can get your own free API key with 1,000 free requests by signing up here.


More Web Scraping Tutorials

So that's how you can integrate proxies into your Apache HttpClient scrapers.

If you would like to learn more about Web Scraping, then be sure to check out The Web Scraping Playbook.

Or check out one of our more in-depth guides: