The NodeJS Playwright Guide

Playwright is a Node.js automation library built by Microsoft. It offers a high-level, user-friendly API for automating tasks and interacting with dynamic web pages.

Playwright communicates directly with the browser, primarily Chromium, WebKit and Firefox, providing a smooth experience for tasks such as DOM interaction and navigation. It lets programmatic control of a wide choice of browsers in headless mode, resulting in faster execution.

Headless browsers are those without a graphical user interface (GUI). They operate in the background and render pages without a visible display, thereby making them faster and more efficient.

In this tutorial, we'll take you through:

How To Install NodeJS Playwright
How To Use Playwright
How To Scrape Pages With Playwright
How To Wait For The Page To Load
How To Click On Buttons With Playwright
How To Scroll The Page With Playwright
How To Take Screenshots With Playwright
How to Use A Proxy With Playwright
More Playwright Functionality
More Web Scraping Tutorials

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.

Proxy Manager

Scraper Monitoring

Job Scheduling

How To Install NodeJS Playwright

Before you install Playwright, make sure Node.js is installed on your system. To install Node.js, go to the Node.js website and install the most recent version. Now let's install and configure Playwright.

Open the terminal and create a new folder for your project with any name (in our case, playwright_guide).

mkdir playwright_guide

Now, using the cd command, change the directory to the above-created directory.

cd playwright_guide

Great, you're now in the right directory. Run the following command to initialize the package.json file:

npm init -y

Next, install the latest version of Playwright using the following command:

npm install playwright@latest

You also need to install a browser to use with playwright. Here's how to install chromium:

npm install playwright-chromium

This is how the installation process looks.

NodeJs Playwright Guide: Playwright Installation

Attention, head over to the package.json file and add "type": "module" to load the ES module and handle ES6 features such as template literals, classes, and promises.

NodeJs Playwright Guide: Package JSON File

How To Use Playwright

We’ll use the toscrape website to understand Playwright. This website is mainly designed for web scraping and is easy to use and navigate.

Before we jump into the code, create a JavaScript file (index.js) in the directory we created above and run the following code:

// index.js

// Import chromium from Playwright module
import { chromium } from "playwright";

// Define a function to scrape quotes from a website
const scrapeData = async () => {

    // Launch a new chromium browser instance
    const browser = await chromium.launch({
        headless: false // Set to true to run in headless mode
    });

    // Open a new page in the browser
    const page = await browser.newPage();

    // Navigate to the URL of the website you want to scrape
    await page.goto("http://quotes.toscrape.com/");

    // Take a screenshot of the webpage
    await page.screenshot({ path: 'screenshot.png' });

    // Close the browser instance
    await browser.close();
};

// Call the scrapeData function to initiate the scraping process
scrapeData();

Here’s the code result:

Quotes to Scrape Page Screenshot

This code takes a screenshot of a web page. The scrapeData() function launches a new chromium browser instance with chromium.launch() and sets the headless mode to false so that you can see the web pages in your browser.

Next, the function creates a new page in the browser using browser.newPage(). It then passes the webpage URL to the page.goto() function to navigate there. The function then captures a screenshot of the page using the page.screenshot() function.

Finally, the function closes the browser instance by calling the browser.close() method.

Remember, if you want to use playwright with a different browser, say firefox, you need to install a wrapper library for it first. You can learn more here.

How To Scrape Pages With Playwright

Playwright is commonly used for web scraping. Let's scrape the first quote from the website. As shown in the image below, there is a parent class called quote with some child classes, such as text (class="text"), author (class="author"), and tags (class="tags").

Quotes to Scrape DOM Tree

Let's understand the code. Here, the querySelector() method selects an element on the web page based on the .quote class. After that, we extract the text and the author by passing .text and .author to the querySelector() method.

// Import chromium from Playwright module
import { chromium } from "playwright";

// Define a function to handle web scraping
const scrapeData = async () => {
    // Launch a new browser instance
    const browser = await chromium.launch({
        headless: false // Set to true to run in headless mode
    });

    // Create a new page in the browser
    const page = await browser.newPage();

    // Navigate to the target URL
    await page.goto("http://quotes.toscrape.com/");

    // Extract data from the web page
    const quotes = await page.evaluate(() => {
        // Use querySelector to select an element on the web page based on its CSS selector
        const quote = document.querySelector(".quote");

        // Extract the text of the quote
        const text = quote.querySelector(".text").innerText;

        // Extract the author of the quote
        const author = quote.querySelector(".author").innerText;

        // Extract the tags associated with the quote
        const tags = quote.querySelector(".tags").innerText;

        return { text, author, tags };
    });

    // Print the scraped data (quote, author, tags)
    console.log(quotes);

    // Close the browser instance
    await browser.close();
};

// Call the scrapeData function to initiate the scraping process
scrapeData();

Here’s the code result:

{
  text: '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”',
  author: 'Albert Einstein',
  tags: 'Tags: change deep-thoughts thinking world'
}

How To Wait For The Page To Load

When using headless browsers, a common requirement is to make sure that the page is fully loaded and ready for interaction. Here are some common methods:

Wait for a specific amount of time.
Wait for a page element to appear.

Wait Specific Amount of Time

To wait for a specific amount of time before carrying out the next steps in our script, you use the waitForTimeout() method, defining a time in milliseconds.

import { chromium } from "playwright";

const scrapeData = async () => {
    const browser = await chromium.launch({ headless: false });
    const page = await browser.newPage();
    await page.goto("http://quotes.toscrape.com/");

    // Wait for 5 seconds
    await page.waitForTimeout(5000);

    await browser.close();
};

scrapeData();

Wait For Page Element To Appear

The other approach is to wait for a page element to appear on the page before moving on. You can do this using the page.waitForSelector() method.

import { chromium } from "playwright";

const scrapeData = async () => {
    const browser = await chromium.launch({ headless: false });
    const page = await browser.newPage();
    await page.goto("http://quotes.toscrape.com/");

    // Wait for an element with the class "quote" to appear on the page
    await page.waitForSelector('.quote', { visible: true });

    await browser.close();
};

scrapeData();

How To Click On Buttons With Playwright

Clicking a button with Playwright is quite simple. You only need to locate the button using a selector and then tell Playwright to click on it. The click() method is used for this purpose. In the http://quotes.toscrape.com/ website, the next button can be found at the path .next > a.

NodeJs Playwright Guide: Quotes to Scrape Locate Button

Here, we’re using the waitForSelector() so that the button query will be loaded before passing the query to the click method. We’re not using the close() method here, just to show you that the button is clicked. You’ll observe that a new page will be opened.

import { chromium } from "playwright";
const scrapeData = async () => {
    const browser = await chromium.launch({
        headless: false,
        defaultViewport: null,
    });
    const page = await browser.newPage();
    await page.goto("http://quotes.toscrape.com/");

    // Click on the Next button
    const button_query = ".next > a";
    await page.waitForSelector(button_query);
    await page.click(button_query);

    
};

scrapeData();

Here’s the code result:

Click to Next Page with Playwright

How To Scroll The Page With Playwright

Many websites use infinite scrolls to load more results onto the page. You can scroll to the bottom of the page using the code below and scrape all the data you need.

import { chromium } from "playwright";

const scrapeData = async () => {
    const browser = await chromium.launch({ headless: false });

    const page = await browser.newPage();
    await page.goto("http://quotes.toscrape.com/scroll");

    async function scrollToBottom() {
        let previousHeight;
        while (true) {
            // Get the current scroll height
            const { scrollHeight } = await page.evaluate(() => ({
                scrollHeight: document.documentElement.scrollHeight,
            }));

            // Scroll to the bottom
            await page.evaluate(() => {
                window.scrollTo(0, document.documentElement.scrollHeight);
            });
            // Wait for a short time to allow the page to load other content
            await page.waitForTimeout(1000);

            // Check if the scroll height has not increased
            if (previousHeight === scrollHeight) {
                break;
            }

            previousHeight = scrollHeight;
        }
    }

    await scrollToBottom();

    // await browser.close();
};

scrapeData();

How To Take Screenshots With Playwright

Another common use case is taking screenshots, which Playwright makes very easy. To capture a screenshot with Playwright, you only need to use page.screenshot() and specify the path to save the file.

To capture the entire webpage, use the fullPage option. When this option is set to true, it takes a screenshot of the entire scrollable page.

import { chromium } from "playwright";

const browser = await chromium.launch({ headless: false });

const page = await browser.newPage();
await page.goto('http://quotes.toscrape.com/');

// full-page screenshot 
await page.screenshot({ path: 'scrapeops.jpeg', fullPage: true });

await browser.close()

You can also capture a specific section of the page by defining a viewport size:

import { chromium } from "playwright";

const browser = await chromium.launch({ headless: false });

const page = await browser.newPage();
await page.goto('http://quotes.toscrape.com/');

// specified viewport
await page.setViewportSize({ width: 800, height: 600 });
await page.screenshot({ path: 'scrapeops.png' });

await browser.close()

You can use either jpeg, png, or webp. One thing to keep in mind is that taking screenshots in jpeg format is faster than in png format.

Note: If no path is specified, the image will not be saved.

How to Use A Proxy With Playwright

If you’re scraping web pages, you’ll surely want to add a proxy. With Playwright, you can set the proxy when you launch the browser.

If you need to authenticate the proxy, you can do like this:

const { chromium } = require("playwright");

async function scrapeInfo() {
    // Replace with your residential proxy aggregator details
    const PROXY_SERVER = "http://residential-proxy.scrapeops.io:8181";
    const PROXY_USERNAME = "scrapeops";
    const PROXY_PASSWORD = "YOUR-SCRAPEOPS-RESIDENTIAL-PROXY-API-KEY";

    // Launch browser with proxy settings
    const browser = await chromium.launch({
        headless: false,
        proxy: {
            server: PROXY_SERVER, // Proxy server URL
            username: PROXY_USERNAME, // Proxy authentication username
            password: PROXY_PASSWORD  // Proxy authentication password
        },
    });

    // Create a browser context that ignores HTTPS errors (if needed)
    const context = await browser.newContext({ ignoreHTTPSErrors: true });

    // Open a new page in the current browser context
    const page = await context.newPage();

    // Navigate to the target page
    await page.goto("https://quotes.toscrape.com", { waitUntil: "domcontentloaded" });

    // Log page title
    console.log("Page Title:", await page.title());

    // Wait for 5 seconds to observe the result
    await page.waitForTimeout(5000);

    // Close the browser when done
    await browser.close();
}

// Call the scrapeInfo function
scrapeInfo();

More Playwright Functionality

Playwright has a huge amount of functionality and is highly customizable, but it is difficult to cover everything properly in a single guide.

So if you would like to learn more about Playwright, check out the official documentation. It covers everything from emulating dark mode:

await page.emulateMedia({ colorScheme: 'dark' });

To running the browser in headless mode:

const browser = await playwright.launch({
    headless: true
});

The NodeJS Playwright Guide

Need help scraping the web?

How To Install NodeJS Playwright​

How To Use Playwright​

How To Scrape Pages With Playwright​

How To Wait For The Page To Load​

Wait Specific Amount of Time​

Wait For Page Element To Appear​

How To Click On Buttons With Playwright​

How To Scroll The Page With Playwright​

How To Take Screenshots With Playwright​

How to Use A Proxy With Playwright​

More Playwright Functionality​

More Web Scraping Tutorials​

How To Install NodeJS Playwright

How To Use Playwright

How To Scrape Pages With Playwright

How To Wait For The Page To Load

Wait Specific Amount of Time

Wait For Page Element To Appear

How To Click On Buttons With Playwright

How To Scroll The Page With Playwright

How To Take Screenshots With Playwright

How to Use A Proxy With Playwright

More Playwright Functionality

More Web Scraping Tutorials