Playwright Integration

How to Integrate ScrapeOps Residential Proxies with Playwright

Playwright is a headless browser similar to Puppeteer or Selenium. From 2017 to 2019, Microsoft contributed to Google's Puppeteer headless browser. In 2020, Microsoft decided to launch a cross-platform headless browser, Playwright was born.

Playwright gives us the ability to automate all major browser engines: Chromium, Webkit and Firefox. This is a huge leap forward in both automated testing and web scraping.

In this tutorial, we're going to go over proxy integration using Playwright. We'll use ScrapeOps Residential Proxies to provide a stable, reliable proxy connection.

Introduction
Prerequisites
Setting Up Playwright With ScrapeOps Proxies
Testing the Integration
Configuring Playwright for Proxy Integration
Common Issues and Troubleshooting
Conclusion
More Playwright Web Scraping Guides

Introduction

Playwright

Since its release, Playwright has been one of the major players in the headless browser space. Whether you're looking to automate tests, or scrape the web, Playwright has you covered.

Playwright's origin, like that of Puppeteer, comes from Chrome's DevTools. DevTools is a protocol that allows external services to control Chromium based browsers. Microsoft contributed to Puppeteer for several years and saw the need for more robust browser automation.

While Puppeteer is great for what it does, it gives no ability to control Firefox or Safari. These are also very popular browsers. Microsoft used this as an opportunity to expand and created Playwright. With Playwright, you get a very similar experience to Puppeteer, but you can automate virtually any browser.

However, headless browsers don't cover enough to complete all of our scraping needs. When you use a headless browser, your IP address and location are still exposed. Your headless browser also gives itself away when it sends its user agent header. These actions alone will trip up just about any anti-bot system.

ScrapeOps Residential Proxies

To blend in with normal traffic, you should use proxies with Playwright. Datacenter proxies tend to work, but, it immediately looks suspicious when there are HTTP requests coming from a datacenter... a place where humans don't actually live. ScrapeOps Residential Proxies give you a stable connection to a real residential or mobile network.

When you connect using our Residential Proxies, all of your HTTP requests are handled with our proxy server. Our proxy server acts kind of like a middleman between you and the site you're trying to scrape. You make a request to our proxy server. The server pings your target url and waits for a response. If it doesn't get one, it retries the request. Once it's got your response, the server sends it back to you.

Without a proxy, if you get IP blocked, it's over. If you get an IP block on our proxy network, we throw out the IP address and get you a new one. You always show up inside a real residential or mobile network so you should have no problem blending in with other normal network traffic.

Prerequisites

If you'd like to follow along with us, you need to have NodeJS installed. NodeJS allows you to run JavaScript code. You also need a code editor and an internet connection. Chances are, you're already reading this article on the internet. All you really need is JavaScript and a code editor. Even a plain old text editor will work, but modern code editors make the experience much more tolerable. We really like VSCode.

You can use the links below to install NodeJS and VSCode. Follow the instructions for your specific operating system.

NodeJS
VSCode.

Once you have NodeJS and a code editor, you're ready to get started.

Setting Up ScrapeOps Residential Proxies

We'll start by creating a new project folder and we'll initialize a new JavaScript project. Then we'll install Playwright.

Before we start writing code, you'll need a ScrapeOps account and you'll need to make sure you have some Residential Proxy bandwidth.

This might sound like a lot, but we'll walk you through step by step.

Setting Up Our NodeJS Project

Create a new project folder.

mkdir playwright-residential

Initialize a new JavaScript project. The --y flag allows us to just use the quick, default setup for our NodeJS project. You should have a package.json and a package-lock.json file appear in your new folder.

npm init --y

Now, we need to download Playwright.

npm install playwright

Next, run Playwright's install script in order to download the available browsers.

npx playwright install

Creating a ScrapeOps Account

You need bandwidth for your proxy connection to work. Head on over to our registration page to create your account. We don't ask for much information, just what you see in the screenshot below. You'll also need to complete a CAPTCHA to prove that you're a human.

ScrapeOps Registration Page

Next, you'll be taken to our dashboard. Click on Residential Proxy Aggregator and you should see your bandwidth used and your total bandwidth. As you can see in the image below, I've used 9.88MB of my 3GB allotment for the month.

ScrapeOps Dashboard

If you scroll down and click on Request Builder, your API key will be displayed. This API key is very important. You'll need in order to access your proxy bandwidth. Back this key up somewhere safe and guard it with your life.

In the command below, replace <YOUR_API_KEY> with your actual API key. You can use this to test out your account and make sure everything is working.

curl -x "http://scrapeops:<YOUR_API_KEY>@residential-proxy.scrapeops.io:8181" "https://lumtest.com/myip.json"

If your account is working correctly, you should see a response similar to what you see below.

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
100   266  100   266    0     0     69      0  0:00:03  0:00:03 --:--:--    69
{"country":"ID","asn":{"asnum":7713,"org_name":"PT Telekomunikasi Indonesia"},"geo":{"city":"Surabaya","region":"JI","region_name":"East Java","postal_code":"60112","latitude":-7.2484,"longitude":112.7419,"tz":"Asia/Jakarta","lum_city":"surabaya","lum_region":"ji"}}

Testing the Integration

Now that we've tested our API key using cURL, it's time to run a real test with Playwright.

The code below sets up the exact same proxy connect we just used. Then, it navigates to Browserscan. We wait for the site to finish scanning our browser and we take a screenshot. This site reveals all sorts of information about our browser.

const { chromium } = require("playwright");

const proxyServer = "residential-proxy.scrapeops.io:8181";
const proxyUsername = "scrapeops";
const proxyPassword = "your-super-secret-api-key";

(async () => {
    const browser = await chromium.launch({
        proxy: {
            server: proxyServer
        }
    });

    const context = await browser.newContext({
        httpCredentials: {
            username: proxyUsername,
            password: proxyPassword
        }
    });

    const page = await context.newPage();
    await page.goto("https://browserscan.net", { waitUntil: "networkidle", timeout: 0 });

    await page.screenshot({ path: "proxy-test.png" });

    await browser.close();
})();

First, we create all of our configuration variables: proxyServer, proxyUsername, proxyPassword.
Then, we create an IIFE (immediately invoked function expression), this is JavaScript's version of a main function.
We open a new browser instance with await chromium.launch(). Pay attention to the proxy object we pass into the launch() function. This is the proxyServer we created earlier. This is used to tell Playwright to talk to the ScrapeOps proxy server.
Next, we create a new context with await browser.newContext(). We pass an httpCredentials object into this function. The credentials object contains our proxyUsername and proxyPassword (API key).

If you run the code, you should receive a screenshot similar to what you see below. As you can see, we're showing up in Egypt. Our proxy connection is working.

Location Test

Configuring Playwright for Proxy Integration

Above, you learned how to create a basic proxy connection. There are actually things we can do to tweak and fine-tune this connection. We can use geotargeting to control our location.

We can also use Static Proxies (Sticky Sessions) to reuse proxies. Sticky Sessions are very useful for pages that use tracking cookies or require you to log in for a session.

We accomplish both of these things by adding flags to our username. In the sections below, we'll explain this further so it makes sense.

Geotargeting

The script below is very similar to the one we just ran. There are just a few differences.

We have another configuration variable, country and our username now has the country flag added to it. When using this flag, you pass in a two letter country code. For example, scrapeops.country=us will route us through a proxy in the US.

You can see our list of fully supported countries here. We do support many additional countries, but we only offer permanent support for the list you see in the ScrapeOps documentation.

If you want to use a country that's not listed in our documentation, feel free to look it up on the list here.

For example, we don't officially support Portugal, but if you pass in pt as a country code, quite often, you'll get routed through a proxy in Portugal.

const { chromium } = require("playwright");

const country = "us";
const proxyServer = "residential-proxy.scrapeops.io:8181";
const proxyUsername = `scrapeops.country=${country}`;
const proxyPassword = "your-super-secret-api-key";

(async () => {
    const browser = await chromium.launch({
        proxy: {
            server: proxyServer
        }
    });

    const context = await browser.newContext({
        httpCredentials: {
            username: proxyUsername,
            password: proxyPassword
        }
    });

    const page = await context.newPage();
    await page.goto("https://browserscan.net", { waitUntil: "networkidle", timeout: 0 });

    await page.screenshot({ path: "geotarget.png" });

    await browser.close();
})();

If you look at the screenshot below, our IP address is showing up in the US. Our ISP is Verizon Business. We're showing up in the IP Time Zone of America/New York. Our Postal Code is 19023.

Geotargeting Screenshot

Using Static Proxies

Static Proxies (Sticky Sessions) allow us to reuse a proxy connection. This allows us to make multiple requests all through the same machine. This is useful when you need to stay logged in and keep a browsing session intact between requests.

Each sticky session needs to be given a session number. It can be any number between 0 and 10000. Sessions can last up to 10 minutes. Every 10 minutes, we rotate IP addresses regardless of your session. This helps us maintain healthy proxy pools and keep your connection safe.

In the code below, we use sticky_session=1000. This assigns our proxy IP address to the session number, 1000. This time, we take a screenshot, and we reload the page.

After reloading the page, we take another screenshot to verify that our connection is still the same. Our full username is scrapeops.sticky_session=1000. Everything else in the code remains virtually the same.

const { chromium } = require("playwright");

const sessionNumber = 1000;
const proxyServer = "residential-proxy.scrapeops.io:8181";
const proxyUsername = `scrapeops.sticky_session=${sessionNumber}`;
const proxyPassword = "your-super-secret-api-key";

(async () => {
    const browser = await chromium.launch({
        proxy: {
            server: proxyServer
        }
    });

    const context = await browser.newContext({
        httpCredentials: {
            username: proxyUsername,
            password: proxyPassword
        }
    });

    const page = await context.newPage();
    await page.goto("https://browserscan.net", { waitUntil: "networkidle", timeout: 0 });

    await page.screenshot({ path: "sticky-1.png" });

    await page.reload({ waitUntil: "networkidle", timeout: 0 });
    await page.screenshot({ path: "sticky-2.png" });

    await browser.close();
})();

Here is our first screenshot. As you can see below, we're showing up in Taiwan with an IP address of 111.242.146.75. Remember this number so we can compare it to the next screenshot.

Sticky Session: First Screenshot

Here is our next screenshot after reloading the page. We're still showing up in Taiwan. Our IP address is still 111.242.146.75. Through tons and tons of network requests, we're still maintaining a stable proxy connection through a single IP address.

Sticky Session: Second Screenshot

Common Issues and Troubleshooting

Take a look at the sections below and you should be able to handle most problems that might come your way.

No Proxy Connection

If you don't have a proxy connection, your browser should crash and you should get an error similar to what you see below.

node:internal/process/promises:289
            triggerUncaughtException(err, true /* fromPromise */);
            ^

page.goto: net::ERR_TUNNEL_CONNECTION_FAILED at https://browserscan.net/
Call log:
  - navigating to "https://browserscan.net/", waiting until "networkidle"

    at /home/nultinator/clients/ahmet/playwright-residential/scraper-test.js:22:16 {
  name: 'Error'
}

Node.js v20.14.0

The part you really need to pay attention to here is net::ERR_TUNNEL_CONNECTION_FAILED. This means that the scraper failed to create the tunnel through the proxy. To resolve this issue, you should double-check your entire proxy configuration.

You configuration should be as follows:

const proxyServer = "residential-proxy.scrapeops.io:8181"
const proxyUsername = "scrapeops"
const proxyPassword = "your-super-secret-api-key"

Incorrect Location

If you're passing in a custom country and it's not working, first double-check your username. It should be scrapeops.country=<your-country-code>. If this is correct, check to see if your country is on our list of permanently supported countries.

While we do often support countries off this list, it's not guaranteed. The countries listed in our documentation are permanently supported.

Static Proxy Issues

If you're experiencing problems with a sticky session, first, double-check your configuration. If your proxyServer, proxyPassword, proxyUsername, and sessionNumber are incorrect, you need to fix them. If your configuration is correct and you're still having problems, you're probably running into our automated proxy rotation.

Our static proxies get rotated every 10 minutes. This is for the good of everyone (including you). This keeps our proxy pools healthy and allows you to maintain your anonymity on the web.

If you're running into session expiration, you need to speed up your scraper so it can finish in the allotted 10 minutes.

Conclusion

You reached the end! You should have a decent understanding of how to create a new Playwright project. You should also understand how to integrate our Residential Proxies with it. With our Residential Proxies, you can now effectively use geotargeting and sticky sessions. You might even have a free trial with ScrapeOps!

Take this new knowledge and put it in your scraping toolbox. Whether you're scraping the web for a living, or just doing it for fun, proxies are a must. A carpenter wouldn't do very well without a hammer, and an extraction specialist won't do very well without proxies.

If you'd like to know more about the tech we used in this article, take a look at their docs below.

More Playwright Web Scraping Guides

We wrote the playbook on web scraping with Playwright. Whether you're brand new, or just looking to learn a new skill, take a look at it. We've got something for you.

The articles below should give you a good taste of the playbook.

How to Integrate ScrapeOps Residential Proxies with Playwright

Introduction​

Prerequisites​

Setting Up ScrapeOps Residential Proxies​

Testing the Integration​

Configuring Playwright for Proxy Integration​

Geotargeting​

Using Static Proxies​

Common Issues and Troubleshooting​

No Proxy Connection​

Incorrect Location​

Static Proxy Issues​

Conclusion​

More Playwright Web Scraping Guides​