Java OkHttp: Make Concurent Requests
In this guide for The Java Web Scraping Playbook, we will look at how to configure Java OkHttp library to make concurrent requests so that you can increase the speed of your scrapers.
The more concurrent threads you have, the more requests you can have active in parallel, and the faster you can scrape.
So in this guide we will walk you through the best way to send concurrent requests with OkHttp:
Let's begin...
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
Make Concurrent Requests Using Java util.concurrent Package
The first approach to making concurrent requests with OkHttp is to use Executor class from java.util.concurrent package to execute our requests concurrently.
Here is an example:
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.Response;
import org.jsoup.Jsoup;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.*;
public class ConcurrentThreads {
public static void main(String[] args) throws Exception {
OkHttpClient client = new OkHttpClient();
String[] requestUris = new String[] {
"http://quotes.toscrape.com/page/1/",
"http://quotes.toscrape.com/page/2/",
"http://quotes.toscrape.com/page/3/",
"http://quotes.toscrape.com/page/4/",
"http://quotes.toscrape.com/page/5/"
};
List<String> outputData = new ArrayList<String>();
int numberOfThreads = 5; // Number of threads to use for making requests
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
List<Callable<Void>> tasks = new ArrayList<>();
for (String requestUri : requestUris) {
Callable<Void> task = () -> {
Request request = new Request.Builder()
.url(requestUri)
.build();
Response response = client.newCall(request).execute();
String html = response.body().string();
String title = Jsoup.parse(html).title();
outputData.add(title);
return null;
};
tasks.add(task);
}
executor.invokeAll(tasks);
outputData.forEach(System.out::println);
executor.shutdown();
}
}
Here:
-
We import the necessary classes from these packages:
jsoup
,okhttp3
andjava.util.concurrent
.okhttp3
is used for making HTTP requests.jsoup
is used for parsing the HTML response, whileExecutors
fromjava.util.concurrent
is used for executing requests concurrently and configuring the number of concurrent threads. -
We define the
numberOfThreads
variable, which represents the maximum number of concurrent threads we want to allow for scraping. -
We create an array
requestUris
containing the URIs we want to scrape. -
We define an empty list called
tasks
to store the functions used to make our requests. -
Then we loop through
requestUris
array. And for eachrequestUri
, we create atask
and add it totasks
list. Atask
is an instance ofCallable
and is just a lambda function for encapsulating the logic for making a request and handling the response. Inside thistask
function:
- We create an instance of
Request
usingRequest.Builder
and set it torequest
variable. Then we send this request usingclient.newCall(request).execute()
call and keep track of resultingresponse
. - We read response body with
response.body().string()
, which store ashtml
variable. Next we parse thehtml
and get itstitle
usingJsoup.parse(html).title()
. We then addtitle
intooutputData
list.
-
We call
executor.invokeAll
method withtasks
list as an argument to run our scrapingtasks
concurrently. -
Finally we print out the
outputData
, which contains the scraped data from all the URLs.
Overall, the code uses java.util.concurrency
package to scrape multiple web pages concurrently and utilizes OkHttp
and Jsoup
for making HTTP requests and parsing the HTML response, respectively.
Using this approach we can significantly increase the speed at which we can make requests with OkHttp library.
Adding Concurrency To ScrapeOps Scrapers
The following is an example sending requests to the ScrapeOps Proxy API Aggregator, which enables you to use all the available threads your proxy plan allows you to make.
Just set SCRAPEOPS_API_KEY
to your ScrapeOps API key, and change the numberOfThreads
value to the number of concurrent threads your proxy plan allows.
// same imports as the previous code block
public class ConcurrentThreads {
final public static String SCRAPEOPS_API_KEY = "your_api_key";
public static void main(String[] args) throws Exception {
OkHttpClient client = new OkHttpClient();
String[] requestUris = new String[] {
"http://quotes.toscrape.com/page/1/",
"http://quotes.toscrape.com/page/2/",
"http://quotes.toscrape.com/page/3/",
"http://quotes.toscrape.com/page/4/",
"http://quotes.toscrape.com/page/5/"
};
List<String> outputData = new ArrayList<String>();
int numberOfThreads = 5; // Number of threads to use for making requests
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
List<Callable<Void>> tasks = new ArrayList<>();
for (String requestUri : requestUris) {
// construct ScrapeOps proxy URL out of SCRAPEOPS_API_KEY and requestUri
String proxyUrl = String.format("https://proxy.scrapeops.io/v1?api_key=%s&url=%s", SCRAPEOPS_API_KEY, requestUri);
Callable<Void> task = () -> {
Request request = new Request.Builder()
.url(proxyUrl)
.build();
Response response = client.newCall(request).execute();
String html = response.body().string();
String title = Jsoup.parse(html).title();
outputData.add(title);
return null;
};
tasks.add(task);
}
executor.invokeAll(tasks);
outputData.forEach(System.out::println);
executor.shutdown();
}
}
You can get your own free API key with 1,000 free requests by signing up here.
More Web Scraping Tutorials
So that's how you can configure OkHttp to send requests concurrently.
If you would like to learn more about Web Scraping, then be sure to check out The Web Scraping Playbook.
Or check out one of our more in-depth guides: