Skip to main content

Playwright Extra Guide

What is Playwright Extra - A Web Scrapers Guide

If you're serious about web scraping, privacy, or automation, then Playwright Extra, is a must-learn and great option to consider. It can help you avoid bot detection, enhance privacy and security by hiding your IP address and location, and improve performance.

In this comprehensive guide, we delve into the:


What Is Playwright Extra?

Playwright Extra is an open-source framework that extends the functionality of Playwright by offering a rich ecosystem of plugins and custom features that empower developers to overcome various challenges encountered during web scraping and automation tasks.

This package is inspired by Puppeteer Extra and supports most of the plugins made for the later right out-of-the-box.

These plugins provide solutions for common issues like bypassing anti-scraping measures, handling captchas, and interacting with websites in a more human-like manner.

Playwright Extra, as an advanced web scraping framework, offers several advantages and disadvantages, which are important to consider when evaluating its suitability for specific projects.

Advantages of Playwright Extra

  1. Enhanced Functionality: Playwright Extra extends the capabilities of Playwright and allows developers to go beyond the standard features offered by Playwright. Provides a rich collection of plugins that facilitate tasks such as bypassing anti-scraping measures, handling captchas, and simulating human interactions, thereby enabling more comprehensive and sophisticated scraping processes.

  2. Flexibility: The framework's modular design allows users to customize their scraping workflows by integrating specific plugins according to their requirements, making it adaptable to a wide range of scraping scenarios and target websites.

  3. Community Support: Playwright Extra benefits from an active community of developers who contribute to the development of plugins, share insights, and provide assistance, creating a robust support network for users encountering challenges or seeking guidance. It also allows Playwright developers to use the plugins developed by puppeteer community because of the interoperability of plugins between playwright extra and puppeteer extra.

  4. Integration with Playwright: Playwright Extra is designed to work seamlessly with Playwright, which itself is a powerful browser automation library. Users can take advantage of Playwright's features alongside the extended functionality provided by Playwright Extra.

Disadvantages of Playwright Extra

  1. Learning Curve: While Playwright Extra offers advanced features, the introduction of additional features and plugins may result in a steeper learning curve, especially for users who are new to Playwright or browser automation.

  2. Maintenance and Updates: The stability and reliability of plugins can vary, as they are developed and maintained by the community. Some plugins might be actively maintained, while others may not receive regular updates. This could potentially lead to compatibility issues or unexpected behavior.

  3. Overhead and Complexity: The inclusion of multiple plugins and custom features may introduce additional overhead and complexity to your codebase. Depending on the requirements of your project, a simpler solution without additional plugins might be more suitable.

  4. Compatibility Risks: While Playwright Extra supports most Puppeteer Extra plugins, it's important to note that Playwright and Puppeteer are separate projects. There may be cases where plugin updates for one library do not seamlessly translate to the other, leading to compatibility risks.


Integrating playwright-extra

In order to use Playwright Extra, you first need to install it. You can use npm (Node Package Manager) to install Playwright Extra.

Run the following command on your preferred terminal or command prompt:

npm install playwright playwright-extra

This command will install both Playwright and Playwright Extra, allowing you to leverage the extended functionalities provided by Playwright Extra.

We also need to install playwright-chromium, which can be done with the following command:

npm install playwright-chromium

playwright-chromium enables us to use playwright with chromium web browser.

Before we jump into exploring different Playwright Extra plugins, let's do a simple demonstration of how to use Playwright to automate browser tasks, such as navigation and taking screenshots.

// Import the Playwright library
const playwright = require("playwright");

// Asynchronously launch a Playwright browser in non-headless mode
(async () => {
const browser = await playwright.launch({
headless: false,
});

// Create a new Playwright page.
const page = await browser.newPage();

// Navigate the Playwright page to the website
await page.goto('https://quotes.toscrape.com/');

// Take a screenshot of the current page and save it
await page.screenshot({
path: 'screenshot.png',
});

// Close the Playwright browser
await browser.close();
})();
  • The script above navigates to the URL Quotes to Scrape and captures a screenshot of the current page using the page.screenshot method.

  • The sample code that uses Playwright but not Playwright Extra.

Now, let's integrate Playwright Extra with the Stealth plugin from puppeteer-extra to avoid bot detection on websites. Remember that plugins designed for puppeter-extra work well with playwright-extra and vice versa.

If you need specific plugins for Playwright Extra, you can install them separately using npm. Since we’re using one of the plugins of puppeteer-extra (i.e. puppeteer-extra-plugin-stealth), install it as well.

npm install puppeteer-extra-plugin-stealth

Here's a script with Playwright Extra functionality similar to the above.

// Import the Playwright and Playwright Extra Stealth Plugin libraries
const playwright = require('playwright-extra');
const stealthPlugin = require('puppeteer-extra-plugin-stealth');

// Use the Playwright Extra Stealth Plugin
playwright.use(stealthPlugin());

(async () => {
const browser = await playwright.launch({
headless: false,
args: ['--no-sandbox'],
});

// Create a new Playwright page
const page = await browser.newPage();

// Navigate to the website.
await page.goto('https://quotes.toscrape.com/');

// Take a screenshot of the current page and save it
await page.screenshot({
path: 'screenshot.png'
});

// Close the Playwright browser
await browser.close();
})();

We did the same thing but used the Puppeteer Extra Stealth Plugin as well.

The integration of the Puppeteer Extra Stealth Plugin in this code demonstrates how to enhance web scraping capabilities by implementing stealth measures to avoid detection and blocking by websites.

To refresh your fundamentals of Playwright, check out our The NodeJS Playwright Guide.


Best Playwright Extra Plugins for Web Scraping

Playwright Extra was developed to allow Playwright developers take advantage of Puppeteer Extra plugins.

With that in mind, below are a few Puppeteer Extra plugins that are commonly used for web scraping, have already been successfully tested, and are compatible with playwright-extra.

Choosing the best plugin will depend on your specific requirements and the website you wish to target.

puppeteer-extra-plugin-stealth

puppeteer-extra-plugin-stealth is a plugin for Puppeteer Extra to prevent detection by anti-bots and other systems designed to detect web scrapers.

This plugin applies various techniques to make the detection of Playwright harder. The use of Playwright can easily be detected by a target website, and the goal of this plugin is to avoid detection; otherwise, your requests will be flagged as a bot.

The puppeteer-extra-plugin-stealth is particularly beneficial when dealing with websites that actively employ anti-scraping measures or those that are sensitive to automated data extraction, enabling smoother and more efficient scraping operations while minimizing the risk of detection and interference.

If this is your first plugin, install playwright-extra and the playwright-extra-plugin-stealth plugin using the following command:

npm install playwright-extra playwright-extra-plugin-stealth

In the following example, learn how to use Playwright Extra with the Stealth plugin.

// Import Playwright and the Playwright Extra Stealth plugin
const { chromium } = require('playwright-extra');
const stealthPlugin = require('playwright-extra-plugin-stealth');

// Enable the Stealth plugin with all evasions
chromium.use(stealthPlugin());

(async () => {
// Launch the browser in headless mode
const browser = await chromium.launch({
args: ['--no-sandbox'],
headless: true
});
const page = await browser.newPage();

// Navigate to the page
const testUrl = 'https://quotes.toscrape.com/';
await page.goto(testUrl);

// Save a screenshot
const screenshotPath = 'screenshot.png';
await page.screenshot({
path: screenshotPath
});

console.log('Screenshot saved.');

// Close the browser.
await browser.close();
})();
  • Here we start by importing Playwright Extra to configure Playwright Stealth.
  • Then, we add the Stealth plugin and use it in the default mode, which ensures that our script uses all evasion modules.
  • Next, we launch Playwright Stealth.
  • Finally, as in our basic Playwright script, we create a new page, navigate to the target website, and take a screenshot.

puppeteer-extra-plugin-anonymize-ua

User agents (UAs) are strings that are sent by the browser of a user to the server. The UA contains information such as the browser type and version, as well as the operating system. Anonymizing the User Agent string can help you in avoiding detection by websites.

A UA string looks like this:


"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"

The puppeteer-extra-plugin-anonymize-ua plugin anonymizes the user-agent header sent by Playwright.

You can visit useragentstring.com to see the UA for your web browsing environment.

What is Playwright Extra - puppeteer-extra-plugin-anonymize-ua - User-Agent

To use the puppeteer-extra-plugin-anonymize-ua plugin, first, install it using the following command:

npm install puppeteer-extra-plugin-anonymize-ua

Once you have used the anonymizeUaPlugin() method, all requests made by Playwright will have their User-Agent (UA) strings anonymized.

// Import Playwright and the Playwright Extra Anonymize UA plugin
const {chromium} = require('playwright-extra');
const anonymizeUaPlugin = require('puppeteer-extra-plugin-anonymize-ua');

// Use the Anonymize UA plugin
chromium.use(anonymizeUaPlugin());

(async () => {
const browser = await chromium.launch();
const page = await browser.newPage();

// Navigate to the target website
await page.goto('https://quotes.toscrape.com/');

// Take a screenshot of the website
await page.screenshot({
path: 'screenshot.png'
});

// Close the browser
await browser.close();
})();

This code will launch Playwright with the anonymize-ua plugin enabled. This will anonymize the User-Agent (UA) string of all requests made by Playwright.

The puppeteer-extra-plugin-anonymize-ua plugin serves as a valuable tool for maintaining anonymity, preventing detection, and ensuring the integrity of data collection and testing processes, enabling users to conduct operations discreetly and without interference from target websites.

puppeteer-extra-plugin-recaptcha

CAPTCHAs are an obstacle designed to keeps scrapers and bots out. However, Playwright can help you overcome this issue.

To solve the CAPTCHA with Playwright, you can use the puppeteer-extra-plugin-recaptcha plugin, which can solve reCAPTCHAs and hCaptCHAs automatically. We'll be using the 2Captcha API-based CAPTCHA-solving service.

First, install the plugin using the following command

npm install puppeteer-extra-plugin-recaptcha

In the following, we will demonstrate how to use Playwright along with the Puppeteer Extra Recaptcha Plugin to automate the login process on a webpage that includes a reCAPTCHA verification.

// Import necessary modules
const { chromium } = require('playwright-extra');
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha');

// Configure the reCAPTCHA solving provider
chromium.use(
RecaptchaPlugin({
provider: {
id: '2captcha',
token: 'API Key', // Replace with your own 2CAPTCHA API key
},
visualFeedback: true, // Colorize reCAPTCHAs (violet = detected, green = solved)
})
);

// Define the login function
const logIn = async () => {
// Launch a headful browser instance
const browser = await chromium.launch({
headless: false,
});

// Create a new page
const page = await browser.newPage();

// Navigate to the login page
await page.goto('https://app.scrapingbee.com/account/login');

// Fill in the email and password fields
await page.waitForSelector('#email');
await page.type('#email', 'Your Email', {
delay: 100
});
await page.waitForSelector('#password', {
delay: 100
});
await page.type('#password', 'Your Password');

// Solve reCAPTCHAs on the page
await page.solveRecaptchas();

// Wait for navigation and click the login button
await Promise.all([
page.waitForNavigation(),
page.click('[type="submit"]'),
]);

};

// Call the login function
logIn()
  • The script first filled in the login credentials.
  • Then, used the page.solveRecaptchas() method to automatically solve any reCAPTCHA challenges present on the page.

Advanced Playwright Extra Integrations

You can use various other advanced integrations with Playwright Extra. We'll be discussing three advanced integrations of Playwright Extra.

Using TypeScript with Playwright Extra Plugin

Using TypeScript with Playwright Extra improves the code readability and productivity.

TypeScript is a superset of JavaScript that adds type safety and other features, making your code more robust and easier to maintain.

Leverage TypeScript features such as code completion, type checking, and enhanced readability to build reliable and maintainable automation scripts.

To enable TypeScript, follow these steps:

Step 1: Install TypeScript to add it to your project.

npm install typescript

Step 2: To initialize TypeScript, run the following command. This will create a tsconfig.json file in your project root, which contains TypeScript configurations.

npx tsc --init

Step 3: To enable TypeScript in Playwright Extra, rename your script from .js to .ts and update the imports accordingly. Replace all require() statements with import statements. For example, replace const { chromium } = require('playwright-extra') with import { chromium } from 'playwright-extra'.

Using Multiple Playwrights with Different Plugins

Using multiple Playwrights with different plugins is a powerful way to scale up large-scale scraping operations.

For example, you can use one Playwright instance to scrape pages that require high stealth, and another Playwright instance to scrape pages that contain captchas.

To use multiple Playwrights with different plugins, you can use the addExtra() function from Playwright Extra to create different Playwright instances, each representing a distinct browser environment.

Then, add the required plugins for each instance using the chromium.use() method.

const {
addExtra
} = require("playwright-extra");
const { chromium: vanillaChromium } = require("playwright");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
const RecaptchaPlugin = require("puppeteer-extra-plugin-recaptcha");

(async () => {
const chromium = addExtra(vanillaChromium);
chromium.use(StealthPlugin());
chromium.use(RecaptchaPlugin());

// Do stuff
})();

More Web Scraping Tutorials

In this guide, you learned about Playwright Extra and its plugins, including the best plugins for web scraping, debugging, and other valuable purposes, as well as advanced integrations.

If you would like to learn more about Web Scraping with Playwright, then be sure to check out The Playwright Web Scraping Playbook.

Or check out one of our more in-depth guides: