Skip to main content

Puppeteer Integration

How to Integrate ScrapeOps Residential Proxies with Puppeteer

Since its creation, Puppeteer has been an indispensable web scraping tool. With Puppeteer, you can control an actual browser from inside your NodeJS environment. You can do all sorts of things such as rendering content, and interacting with the page. If you want to scrape using JavaScript, it's hard to find a more balanced tool than Puppeteer... Not too big, not too small... just right.

In today's guide, we're going to learn how to use ScrapeOps Residential Proxies with Puppeteer. When you use our proxies with Puppeteer, you get anonymity, geotargeting, and static proxies all at a really reasonable price.


Introduction

Puppeteer

Puppeteer is a headless browser. With a headless browser, you get all of the benefits of running an actual browser without the additional overhead of actually rendering the browser on your screen. This allows you to control your browser directly from your coding environment. When you can automate a browser, you can do all sorts of things with speed and efficiency.

However, Puppeteer doesn't give you a complete solution when using it alone. Many websites employ anti-bots to block any sort of bot coming in contact with the site. These types of software are designed to protect from malware. While a scraper isn't particularly malware, it is a bot and it will trip these systems up.

To get full functionality out of Puppeteer, we need to be able to control our location and cycle through IP addresses to prevent from getting blocked.

ScrapeOps Residential Proxies

Our proxies pick up where Puppeteer leaves off. When you use our proxies, your request gets routed through a new IP address. Unless you're using a sticky session, each request goes through a different IP address. If you try to access a page and get blocked, we retry it for you using a different IP address.

When you make a request using a proxy, the proxy server acts as a middleman. Your request goes to the proxy server. The proxy server forwards your request to the website through a healthy proxy and waits for a response. If it doesn't get a response, it will retry using a different proxy.

When you use ScrapeOps Residential Proxies, all of your requests get forwarded to a proxy on someone's actual home network. This allows you to blend in completely with the rest of the web.

With geotargeting and sticky sessions, not only do you get the benefits mentioned above, you get to choose where you're located and you have the ability to stay signed in. If you want shopping and shipping data from a specific country, simply pick your country! If you want to stay logged in, use a sticky session!


Prerequisites

To follow along with our guide, you need a basic understanding of JavaScript, and a ScrapeOps account. Previous experience with Puppeteer is helpful, but not a requirement. You should also have NodeJS installed. NodeJS is the basic runtime environment for JavaScript.

  • NodeJS: We use NodeJS in order to run JavaScript code. You can install NodeJS here. Simply follow the instructions for your operating system and you're good to go.

  • ScrapeOps Account: ScrapeOps gives us the bandwidth we need in order to use our proxies. If you don't have a ScrapeOps account, don't worry, we'll walk you through the process and you'll even get a free trial... no credit card required!


Setting Up Puppeteer and ScrapeOps Residential Proxies

If you've got NodeJS installed, you're ready to create your project. If you follow the steps below, you can create a new project folder and install Puppeteer.

After you're finished setting up your project, we'll walk you through creating a ScrapeOps account.

Creating the Project

Create a new project folder.

mkdir puppeteer-residential

Move into the folder and create a new JavaScript project.

cd puppeteer-residential
npm init --y

Install Puppeteer.

npm install puppeteer

Creating a ScrapeOps Account

Before we continue on, you need a ScrapeOps API key. To get your API key, you need a ScrapeOps account. Follow the steps below, and you'll be ready to go in no time!

First, head on over to our registration page. Enter your details below to create the account.

ScrapeOps Registration

Once you've got an account, you'll be taken to your account dashboard. On the left side of the page, select Proxy Aggregator. Then, you need to click the Residential Proxy Aggregator box. You should be able to see your plan information and the bandwidth you've used versus your alotted bandwidth.

ScrapeOps Account Dashboard

Click on Request Builder. This should reveal your API key. Save this key somewhere safe, you're going to need it.

ScrapeOps Request Builder

Remember to save your API key. This is what you'll use to access our proxies.


Testing the Integration

Now that you've got everything you need, create a new JavaScript file inside the project folder you created.

The code below uses Puppeteer to open a new browser and page.goto() takes us to Browserscan. Browserscan is a really useful tool that scans your browser to discover all sorts of information about your connection.

Feel free to copy and paste the code below into your new JavaScript file. It takes a screenshot of our Browserscan results in order to examine our proxy.

const puppeteer = require("puppeteer");

const username = "scrapeops";
const password = "your-super-secret-api-key";
const serverUrl = "residential-proxy.scrapeops.io";
const port = 8181;



(async () => {
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
args: [ `--proxy-server=http://${serverUrl}:${port}` ]
});

const page = await browser.newPage();

await page.authenticate({
username: username,
password: password
});

await page.goto("https://browserscan.net", { waitUntil: "networkidle2", timeout: 0 });
await page.screenshot({ path: "browser.png" });

await browser.close();

})();
  • When we use puppeteer.launch(), we add the following arguments: ignoreHTTPSErrors: true` and `args: [ `--proxy-server=http://${serverUrl}:${port}` ]. These tell Puppeteer where to find our proxy server and also to ignore any HTTPS errors that might interfere with our proxy connection.

  • We open a new page with browser.newPage().

  • We use our credentials with the page.authenticate() method to login into our proxy:

    • username: username assigns our username (scrapeops) to the username of the proxy connection.
    • password: password uses our API key as the password for the connection.
  • await page.goto("https://browserscan.net", { waitUntil: "networkidle2", timeout: 0 }); takes us to Browserscan. networkidle2 waits until there are no more network requests coming from the page. timeout: 0 might look weird, but we actually use this to disable timeouts. Using proxies can slow things down. We don't want Puppeteer to crash while it's still waiting for a response.

  • await page.screenshot({ path: "browser.png" }); is pretty self explanatory. This takes a screenshot and gives it the name, browser.png.

If you run the code, it should output a screenshot similar to what you see below. As you can see in the image, our location is Brazil with an IP time zone of America/Sao Paulo. There are a number of other details we can view as well, but most importantly, this is proof that our proxy is working correctly.

Testing the Connection


Configuring Puppeteer and ScrapeOps for Proxy Integration

When you want to do certain things with your proxy connection, configuration is key. If you want to appear in a certain location, we need to configure it. If you want to use a sticky session, we'll need to configure that as well.

To configure these things, we add different flags to our username.

Geotargeting

To use geotargeting, you need to append your username with the country flag. If you want to appear in the US, your username will be scrapeops.country=us.

We support a large number of country codes. If you're curious about whether or not we support a certain country, try it! If we don't support your country of interest, the countries on this list are always supported.

In the code below, the only things we change are our username and the name of our screenshot. Our full username is now scrapeops.country=us and our screenshot is now named geotarget.png.

const puppeteer = require("puppeteer");

const username = "scrapeops.country=us";
const password = "your-super-secret-api-key";
const serverUrl = "residential-proxy.scrapeops.io";
const port = 8181;



(async () => {
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
args: [ `--proxy-server=http://${serverUrl}:${port}` ]
});

const page = await browser.newPage();

await page.authenticate({
username: username,
password: password
});

await page.goto("https://browserscan.net", { waitUntil: "networkidle2", timeout: 0 });
await page.screenshot({ path: "geotarget.png" });

await browser.close();

})();

If you look at the screenshot below, our location is now showing up in the United States. Geotargeting is working correctly.

Geotargeting Screenshot

To use geotargeting, add the country flag and a country code to your username. scrapeops.country=us will tell ScrapeOps that we want to show up in the US. Change the country code to any country you desire.

Using Static Proxies

Static proxies (sticky sessions) follow the same principle as geotargeting. All we need to do is add a flag to our username. This time, we'll use the sticky_session flag. You need to give your session a number. It can be anything between 0 and 10000.

In the code below, we use sticky_session=7. Our full username is scrapeops.sticky_session=7.

const puppeteer = require("puppeteer");

const username = "scrapeops.sticky_session=7";
const password = "your-super-secret-api-key";
const serverUrl = "residential-proxy.scrapeops.io";
const port = 8181;



(async () => {
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
args: [ `--proxy-server=http://${serverUrl}:${port}` ]
});

const page = await browser.newPage();

await page.authenticate({
username: username,
password: password
});

await page.goto("https://browserscan.net", { waitUntil: "networkidle2", timeout: 0 });
await page.screenshot({ path: "sticky1.png"});

await page.reload({ waitUntil: "networkidle2", timeout: 0 });
await page.screenshot({ path: "sticky2.png"});

await browser.close();

})();

In our first resulting screenshot, we show in in Bahrain with an IP address of 46.184.249.253. Remember these so we can compare them with our second screenshot. If we're on a different machine, we'll get a different IP address.

First Screenshot

As you can see, our IP is still 46.184.249.253, and we're still showing up in Bahrain. Our proxy is being reused through multiple requests. Our sticky session is working.

Second Screenshot

To use sticky sessions, add sticky_session to your username and give it any number between 0 and 10000. scrapeops.sticky_session=7 tells ScrapeOps that we're creating a new sticky session with a session number, 7.


Common Issues and Troubleshooting

Tech wouldn't be tech without errors and basic troubleshooting. Luckily, with this basic setup, there aren't a whole lot of errors we can run into.

Take a look at the sections below and you should be able to handle most problems that might come your way.

No Proxy Connection

If your proxy connection isn't working at all, you need to double check your code for and make sure you're using page.authenticate(). If you are using page.authenticate(), make sure that your username and password are correct.

Your username and password should go as follows:

  • username: scrapeops
  • password: This should be your ScrapeOps API key.

If all these are correct, you need to head back to the account dashboard and make sure that have bandwidth available on your account.

Geotargeting Issues

If you're running into issues with geotargeting, you need to make sure that your username includes the country flag. scrapeops.country={your_country_code}.

If all of this is correct and you're still not getting your desired location results, consult the list here and choose a country listed in our documentation. Countries on this list are always supported.

Static Proxy Issues

Similar to geotargeting, you should first double check your username. It should be laid out as follows: scrapeops.sticky_session={you_session_number}. If your username is correct, you might be running into expiration issues. At ScrapeOps all of our proxies get rotated eventually.

We rotate static proxies every 10 minutes. This might seem like a bit of an inconvenience, but we do this to maintain the health and safety of our proxy pools.

If your sticky session is expiring, it might be inconvenient, but you need to make your scraper faster. It should be able to perform all of your required actions within that 10 minute window.


Conclusion

Congratulations! You now have one of the most important web scraping skills around. You know how to use a proxy with a headless browser and you know how to integrate it with a proxy.

With Puppeteer and ScrapeOps Residential Proxies, you can access pretty much any website in the world without running into issues. Puppeteer allows you to handle JavaScript and dynamic content. ScrapeOps gets you past anti-bots and geoblocking.

To learn more about the tools we used in this article, take a look at their documentation in this links below.


More Puppeteer Web Scraping Guides

We love web scraping and Puppeteer. We like web scraping so much, we wrote the Puppeteer Playbook on scraping the web. This playbook contains all sorts of useful information for both new and experienced developers.

Whether you simply want to learn more about using Puppeteer, or if you want to scrape an extremely difficult site, we've got you covered.

Check out the links below if you're interested in learning more.