Skip to main content

Java OkHttp: Make Concurent Requests

In this guide for The Java Web Scraping Playbook, we will look at how to configure Java OkHttp library to make concurrent requests so that you can increase the speed of your scrapers.

The more concurrent threads you have, the more requests you can have active in parallel, and the faster you can scrape.

So in this guide we will walk you through the best way to send concurrent requests with OkHttp:

Let's begin...


Make Concurrent Requests Using Java util.concurrent Package

The first approach to making concurrent requests with OkHttp is to use Executor class from java.util.concurrent package to execute our requests concurrently.

Here is an example:


import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.Response;
import org.jsoup.Jsoup;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.*;

public class ConcurrentThreads {
public static void main(String[] args) throws Exception {
OkHttpClient client = new OkHttpClient();

String[] requestUris = new String[] {
"http://quotes.toscrape.com/page/1/",
"http://quotes.toscrape.com/page/2/",
"http://quotes.toscrape.com/page/3/",
"http://quotes.toscrape.com/page/4/",
"http://quotes.toscrape.com/page/5/"
};
List<String> outputData = new ArrayList<String>();

int numberOfThreads = 5; // Number of threads to use for making requests

ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
List<Callable<Void>> tasks = new ArrayList<>();

for (String requestUri : requestUris) {
Callable<Void> task = () -> {
Request request = new Request.Builder()
.url(requestUri)
.build();
Response response = client.newCall(request).execute();
String html = response.body().string();
String title = Jsoup.parse(html).title();
outputData.add(title);
return null;
};
tasks.add(task);
}
executor.invokeAll(tasks);
outputData.forEach(System.out::println);

executor.shutdown();
}
}

Here:

  1. We import the necessary classes from these packages: jsoup, okhttp3 and java.util.concurrent. okhttp3 is used for making HTTP requests. jsoup is used for parsing the HTML response, while Executors from java.util.concurrent is used for executing requests concurrently and configuring the number of concurrent threads.

  2. We define the numberOfThreads variable, which represents the maximum number of concurrent threads we want to allow for scraping.

  3. We create an array requestUris containing the URIs we want to scrape.

  4. We define an empty list called tasks to store the functions used to make our requests.

  5. Then we loop through requestUris array. And for each requestUri, we create a task and add it to tasks list. A task is an instance of Callable and is just a lambda function for encapsulating the logic for making a request and handling the response. Inside this task function:

  • We create an instance of Request using Request.Builder and set it to request variable. Then we send this request using client.newCall(request).execute() call and keep track of resulting response.
  • We read response body with response.body().string(), which store as html variable. Next we parse the html and get its title using Jsoup.parse(html).title(). We then add title into outputData list.
  1. We call executor.invokeAll method with tasks list as an argument to run our scraping tasks concurrently.

  2. Finally we print out the outputData, which contains the scraped data from all the URLs.

Overall, the code uses java.util.concurrency package to scrape multiple web pages concurrently and utilizes OkHttp and Jsoup for making HTTP requests and parsing the HTML response, respectively.

Using this approach we can significantly increase the speed at which we can make requests with OkHttp library.


Adding Concurrency To ScrapeOps Scrapers

The following is an example sending requests to the ScrapeOps Proxy API Aggregator, which enables you to use all the available threads your proxy plan allows you to make.

Just set SCRAPEOPS_API_KEY to your ScrapeOps API key, and change the numberOfThreads value to the number of concurrent threads your proxy plan allows.


// same imports as the previous code block

public class ConcurrentThreads {
final public static String SCRAPEOPS_API_KEY = "your_api_key";
public static void main(String[] args) throws Exception {
OkHttpClient client = new OkHttpClient();

String[] requestUris = new String[] {
"http://quotes.toscrape.com/page/1/",
"http://quotes.toscrape.com/page/2/",
"http://quotes.toscrape.com/page/3/",
"http://quotes.toscrape.com/page/4/",
"http://quotes.toscrape.com/page/5/"
};
List<String> outputData = new ArrayList<String>();

int numberOfThreads = 5; // Number of threads to use for making requests

ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
List<Callable<Void>> tasks = new ArrayList<>();

for (String requestUri : requestUris) {
// construct ScrapeOps proxy URL out of SCRAPEOPS_API_KEY and requestUri
String proxyUrl = String.format("https://proxy.scrapeops.io/v1?api_key=%s&url=%s", SCRAPEOPS_API_KEY, requestUri);
Callable<Void> task = () -> {
Request request = new Request.Builder()
.url(proxyUrl)
.build();
Response response = client.newCall(request).execute();
String html = response.body().string();
String title = Jsoup.parse(html).title();
outputData.add(title);
return null;
};
tasks.add(task);
}
executor.invokeAll(tasks);
outputData.forEach(System.out::println);

executor.shutdown();
}
}


You can get your own free API key with 1,000 free requests by signing up here.


More Web Scraping Tutorials

So that's how you can configure OkHttp to send requests concurrently.

If you would like to learn more about Web Scraping, then be sure to check out The Web Scraping Playbook.

Or check out one of our more in-depth guides: