Skip to main content

Using Proxies With NodeJS Playwright

Using Proxies With NodeJS Playwright

Playwright is a powerful browser automation library that is primarily used for automated testing of websites, but it is equally useful for building bots and scrapers that can load and interact with web pages in the browser like a real user. As a result, Playwright is very popular amongst the NodeJS web scraping community.

In this guide for The NodeJS Playwright Web Scraping Playbook, we will look at how to integrate proxies into our NodeJS Playwright based web scraper.

There are number of different types of proxies which you need to integrate differently with Playwright, so we will walk through how to integrate each type:

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.


Using Proxies With Playwright

The first and simplest type of proxy to integrate with NodeJS Playwright are simple HTTP proxies (in the form of an IP address) that don't require authentication. For example:


"11.456.448.110:8080"

To integrate this proxy IP into a Playwright scraper, first add proxy object to your browser launchOptions. Inside proxy object, set server param to your proxyUrl. Then simply launch the browser by calling chromium.launch function providing launchOptions object you just defined.


const { chromium } = require('playwright');

const proxyUrl = '11.456.448.110:8080';

const launchOptions = {
proxy: {
server: proxyUrl
}
};

(async () => {
const browser = await chromium.launch(launchOptions);
const page = await browser.newPage();

await page.goto('https://httpbin.org/ip');

const pageContent = await page.textContent('body');
console.log(pageContent);
})();

Now when we run the script we can see that Playwright is using the defined proxy IP:


{
"origin": "11.456.448.110:8080"
}

tip

If you want to use firefox, all you have to do is import firefox from playwright instead of chromium and launch the browser with firefox.launch method.


Using Authenticated Proxies With Playwright

It is very common for commercial proxy providers to sell access to their proxy pools by giving you single proxy endpoint that you send your requests to and authenticate your account using a username and password.

Using proxies that require username and password authentication isn't that much different from how we just used proxies without authentication. Simply set username to proxy username and password to proxy password inside launchOptions.proxy object.


const { chromium } = require('playwright');

const proxyUrl = '11.456.448.110:8080';
const username = 'PROXY_USERNAME';
const password = 'PROXY_PASSWORD';

const launchOptions = {
proxy: {
server: proxyUrl,
username: username,
password: password
}
};

(async () => {
const browser = await chromium.launch(launchOptions);
const page = await browser.newPage();

await page.goto('https://httpbin.org/ip');

const pageContent = await page.textContent('body');
console.log(pageContent);
})();

Now when we run the script we can see that Playwright is using a proxy IP:


{
"origin": "201.88.548.330:8080"
}

Integrating Proxy APIs

Over the last few years there has been a huge surge in proxy providers that offer smart proxy solutions that handle all the proxy rotation, header selection, ban detection and retries on their end. These smart APIs typically provide their proxy services in a API endpoint format.

However, these proxy API endpoints don't integrate well with headless browsers when the website is using relative links as Playwright will try to attach the relative URL onto the proxy API endpoint not the websites root URL. Resulting, in some pages not loading correctly.

As a result, when integrating your Playwright scrapers it is recommended that you use their proxy port integration over the API endpoint integration when they provide them (not all do have a proxy port integration).

For example, in the case of the ScrapeOps Proxy Aggregator we offer a proxy port integration for situations like this.

The proxy port integration is a light front-end for the API and has all the same functionality and performance as sending requests to the API endpoint but allow you to integrate our proxy aggregator as you would with any normal proxy.

The following is an example of how to integrate the ScrapeOps Proxy Aggregator into your Playwright scraper:


const { chromium } = require('playwright');

const proxyUrl = 'proxy.scrapeops.io:5353';
const SCRAPEOPS_API_KEY = 'YOUR_API_KEY';

const launchOptions = {
proxy: {
server: proxyUrl,
username: 'scrapeops',
password: SCRAPEOPS_API_KEY
}
};

(async () => {
const browser = await chromium.launch(launchOptions);
const page = await browser.newPage();

await page.goto('https://httpbin.org/ip');

const pageContent = await page.textContent('body');
console.log(pageContent);
})();

Here we set username to 'scrapesops' and password to SCRAPEOPS_API_KEY inside launchOptions.proxy object.

SSL CERTIFICATE VERIFICATION

Note: So that we can properly direct your requests through the API, your code must be configured to ignore SSL certificate verification errors ignoreHTTPSErrors: true.


Full integration docs for NodeJS Playwright and the ScrapeOps Proxy Aggregator can be found here.

tip

To use the ScrapeOps Proxy Aggregator, you first need an API key which you can get by signing up for a free account here which gives you 1,000 free API credits.


More Web Scraping Tutorials

So that's how you can use both authenticated and unauthenticated proxies with Playwright to scrape websites without getting blocked.

If you would like to learn more about Web Scraping with Playwright, then be sure to check out The Playwright Web Scraping Playbook.

Or check out one of our more in-depth guides: