Python Selenium vs NodeJS Playwright Compared
Python Selenium and NodeJS Playwright are both widely used automation frameworks that enable developers and testers to automate web browser interactions.
Both Selenium and Playwright are incredibly similar frameworks for scraping. They both use browsers, they both offer great support for JavaScript and dynamic content, and they are both relatively beginner friendly.
In this comprehensive comparison, we delve into the key features, performance metrics, and use cases of Python Selenium and NodeJS Playwright, providing an insightful analysis to help you make informed decisions when choosing between these two powerful automation tools.
- TLDR NodeJS Playwright vs Python Selenium
- What is Playwright?
- What is Selenium?
- Detailed Comparison of Features
- Case Study: Side by Side Comparison of Playwright and Selenium
- Additional Resources
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
TLDR: Summary Comparison
- NodeJS Playwright: is a browser automation library, originally designed for website testing but can be very useful for scraping in situations when a real browser is required. It is the more performant of the two.
- Python Selenium: is a browser automation library designed with scraping in mind. It is bulkier and does require more setup/maintenance, but allows the user to automate the browser without thinking about async/await.
Feature | Playwright with JavaScript | Selenium with Python |
---|---|---|
Programming Language | JavaScript (Node.js) | Python |
Async/Await Support | Requires understanding of async/await | Does not natively support async/await |
Performance | Faster | Slower |
Community & Support | Growing community with up-to-date documentation | Large community but some outdated community documentation |
Parallel Execution | Supports parallel execution out of the box | Requires additional tooling |
When to Choose | When speed and async/await are important | When speed and async programming are not a concern |
Selenium and Playwright are two scraping frameworks built for automating action inside the browser. Both are designed to run a browser and automate actions inside the browser. They are also both used for creating automated tests for websites.
Playwright is used with Node.js
and is excellent for scraping with a browser JavaScript. Selenium is available in many different languages but this article focuses primarily on Selenium in Python.
What is Playwright?
Playwright is a popular Node.js
library for browser automation. It has great support for async/await programming and is very easy to setup and use.
Advantages of Playwright
Playwright offers several key advantages that make it a popular choice for web automation tasks, particularly in the context of modern web applications. Some of the notable advantages of Playwright include:
- Dynamic content handling:
- Playwright excels at handling dynamic content, such as single-page applications, where the content changes without requiring a full page reload.
- It can efficiently interact with and manipulate dynamic elements, enabling seamless automation of complex web applications that rely heavily on dynamic content for user interactions.
- Comes with a browser:
- Playwright includes its own browser instances, providing a built-in solution for browser automation tasks.
- This feature streamlines the setup process, allowing users to get started quickly without the need for additional browser installations.
- It also ensures a consistent testing environment, as the automation scripts run within the Playwright-provided browser.
- Full support for async/await programming:
- Playwright fully supports the async/await programming paradigm, allowing developers to handle asynchronous operations effectively.
- This feature simplifies the management of asynchronous tasks, making it easier to write and maintain clean, readable, and efficient automation scripts without complex callback structures.
- Screenshots:
- Playwright offers built-in capabilities for capturing screenshots of web pages during automation tasks.
- This feature is particularly useful for visual validation, debugging, and generating reports, allowing users to monitor the state of the web pages at specific points during the automation process.
- Simulating user interactions:
- Playwright provides comprehensive functionality for simulating user interactions, including clicking buttons, filling out forms, selecting dropdown options, and performing various actions on web pages.
- This capability enables the automation of user-driven tasks, such as form submissions and navigation, ensuring that the automation scripts can accurately mimic real user behavior on the web page.
Disadvantages of Playwright
Playwright does come with its own set of disadvantages as well. Playwright can be difficult to learn because of the fact that you need to be famailiar with async JavaScript. It is also resource intensive because when you run Playwright, you are running a browser.
- Learning Curve:
- Playwright heavily relies on asynchronous programming techniques, which can pose a challenge for developers who are not familiar with this programming paradigm.
- Understanding and effectively managing asynchronous operations, callbacks, and promises are crucial for creating efficient and reliable automation scripts with Playwright.
- Resource intensive:
- Playwright's approach of running its own browser instances can consume significant system resources, including memory and processing power.
- This resource-intensive nature can impact the overall performance of the system, especially when running multiple instances or conducting extensive automation tasks, potentially leading to slower execution times and increased resource utilization.
- Not suitable for scraping at scale:
- While Playwright offers robust capabilities for web automation, it may not be the ideal choice for large-scale web scraping tasks that involve the extraction of a vast amount of data from multiple sources.
- The resource-intensive nature of running a browser for each instance and the associated overhead may limit the scalability and efficiency of Playwright for extensive web scraping operations that require high throughput and rapid data extraction.
When Should You Use Playwright over Selenium?
Playwright is noticeably faster than Selenium and has a much simpler setup process. Playwright code tends to be more difficult to write simply because of the async/await learning curve.
You should choose Playwright when speed matters and when you understand async programming. The easy setup and faster performance are great reasons for more experienced developers to choose Playwright over Selenium.
Installing Playwright
Assuming you already have Node.js
and a package manager installed. Setting up Playwright is a breeze.
First, let's make a new folder for our first Playwright project (you can name it whatever you like, this one will be called playwright-tutorial
).
From inside this new folder, run the following commands to create a new JavaScript project and to add Playwright to our new project:
npm init --yes
npm install playwright
npm init --yes
creates a newpackage.json
for our new JavaScript projectnpm install playwright
adds Playwright to ourpackage.json
and allows us to use it as a dependency- Playwright does not require us to install a browser or browser driver. We get
Chromium
access right out of the box
After adding Playwright, we can make a new script, playwright_tutorial.js
.
//import playwright
const playwright = require("playwright");
//create an asynchronous main function
async function main() {
//launch the browser
const browser = await playwright.chromium.launch({
headless: false // setting this to true will not run the UI
});
//open a new page in the browser
const page = await browser.newPage();
//navigate to the page
await page.goto("https://quotes.toscrape.com");
const pageTitle = await page.title();
//log the page title to our console
console.log(`Title: ${pageTitle}`);
//wait for 5 seconds so that we can view the page
await page.waitForTimeout(5000);
//close the browser
await browser.close();
}
//now that we've compiled our main function, run it
main();
In the code above we do the following:
- import Playwright with
require("playwright")
- create and open a browser asynchronously with
const browser = await playwright.chromium.launch()
- open a new page with
browser.newPage()
- navigate to quotes.toscrape.com with
page.goto()
- get the title of the page with
page.title()
- wait 5 seconds (so we can look at the page and allow things to load) with
page.waitForTimeout(5000)
Launching a browser with Playwright is very simple and straightforward. We can open a page with newPage()
and navigate to a new site with goto()
. From there we can easily select and interact with elements because of JavaScript's excellent browser support.
What is Selenium?
Selenium is another library for automating the browser. It also offers first class support for dealing with JavaScript and dynamic content.
Advantages of Selenium
Selenium is another great tool for browser automation and it allows us to click, scroll, navigate pages and take screenshots.
- JavaScript handling:
- Selenium offers robust support for JavaScript execution within the browser, enabling the automation of tasks that involve complex JavaScript interactions and functionalities.
- It can handle dynamic content, execute JavaScript functions, and respond to asynchronous events, ensuring comprehensive testing and interaction capabilities for web applications that heavily rely on JavaScript for their functionality.
- Browser simulation:
- Selenium provides a comprehensive browser simulation environment, allowing users to emulate various browser behaviors and configurations during testing and automation tasks.
- This feature enables developers and testers to replicate specific browser settings, preferences, and behaviors, ensuring a more accurate representation of the user's browsing experience and facilitating in-depth testing of web applications across different browser environments.
- High level interaction:
- Selenium facilitates high-level interaction with web elements, enabling users to perform a wide range of interactions, such as clicking buttons, filling out forms, selecting dropdown options, and navigating through web pages.
- Its intuitive interface and rich set of functionalities simplify the process of simulating user actions, ensuring efficient and effective automation of user-driven tasks within the web application.
- Taking screenshots:
- Selenium allows for the capturing of screenshots during the testing and automation process, providing a visual representation of the web pages at specific points in the testing workflow.
- This capability is valuable for visual validation, error debugging, and generating reports, enabling users to monitor the state of the web application interface and identify any visual discrepancies or issues that may arise during the testing phase.
Disadvantages of Selenium
Similar to Playwright, Selenium has its own set of disadvantages, all of which are also mostly related to the set of incredible features that it comes with.
- Resource intensive:
- Selenium is known to be resource-intensive, consuming significant system resources, including memory and processing power, particularly when executing automation tasks that involve running multiple instances or conducting extensive test suites.
- This resource-intensive nature can impact the overall performance of the system, leading to potential slowdowns and increased resource utilization.
- Runs a browser:
- Selenium's approach involves running real browser instances for automation tasks, which can contribute to higher resource consumption and slower execution times compared to headless solutions or frameworks that operate without the need for an actual browser.
- This reliance on browsers can add overhead and increase the complexity of the automation process.
- Dependency updates:
- Managing dependency updates and compatibility issues with different browser versions and configurations can be a potential drawback of using Selenium.
- As browsers undergo updates and changes, maintaining compatibility and ensuring the seamless functioning of Selenium with the latest browser versions may require continuous monitoring and updates, which can add complexity to the testing and automation process.
- Slower Than Browserless Frameworks:
- Selenium's reliance on running actual browser instances for automation can result in slower execution times compared to frameworks that operate in a browserless or headless mode.
- This speed difference can be particularly notable when performing large-scale automation tasks or when targeting high throughput and rapid test execution, making Selenium comparatively slower in these scenarios.
When Should You Use Selenium over Playwright?
Selenium is very easy to use and does not require the use of async
and await
. This alone would be reason enough for many developers to choose Selenium over Playwright.
Selenium was about 15% slower than Playwright in our test and does require more setup. If runtime speed and quick setup are what you need, you should not choose Selenium.
Installing Selenium
Selenium installation requires a bit more overhead. First, you need to make sure you have a browser, the one used in this tutorial is Chrome
. You can check your version of chrome with the following command:
google-chrome --version
Once you know which version of Chrome you're using, head over to here and download the driver that matches your browser.
Afterward, we are ready to install Selenium with pip
:
pip install selenium
Now let's create a new folder called selenium_tutorial
and place a file inside of it called selenium_tutorial.py
.
Place the following code inside the file and we can do exactly what we did with Playwright earlier.
#import webdriver
from selenium import webdriver
#import sleep
from time import sleep
#open chrome
driver = webdriver.Chrome()
#navigate to the page
driver.get("https://quotes.toscrape.com")
#save the title as a variable
title = driver.title
#print the title in the terminal
print(f"Title: {title}")
#wait 5 seconds so that we can view the page
sleep(5)
#close the browser
driver.quit()
In the code above, we:
- import
webdriver
andsleep
- open an instance of Chrome with
webdriver.Chrome()
- navigate to a new page with
driver.get()
- find the title of the page with
page.title
and print it - Wait five seconds with
sleep()
so we can view the page
Once you have everything setup, Selenium is very similar to Playwright. It launches a browser with webdriver.Chrome()
and we can navigate to different sites with driver.get()
. The only big differences so far are the dependencies and language syntax (Python vs. JavaScript).
Detailed Comparison of Features
Below is a detailed comparison of Selenium with Python and NodeJS Playwright with JavaScript
Feature | Selenium with Python | Playwright with JavaScript |
---|---|---|
Cross-Browser Support | Supports multiple browsers, requires browser-specific drivers | Supports multiple browsers with no driver installation needed |
Installation | Requires separate installation of Selenium and browser drivers | Requires installation of Playwright package |
Programming Language | Python | JavaScript (Node.js) |
Syntax & API | Somewhat complex, does not require async/await | Requires basic understanding of async/await |
Async/Await Support | Limited, can be used with external libraries | Native support for async/await |
Element Locators | Supports various locators (XPath, CSS selectors, Class Name, Tag Name) | Supports CSS and XPath selectors, find pretty much anything with page.$() |
Headless Mode | Supported but still runs browser under the hood | Supported but still runs browser under the hood |
Multiple Tabs/Windows | Supported | Supported |
Page Navigation | Navigate back, forward, refresh, etc. | Provides easier navigation methods. |
Interacting with Elements | Supports interacting with browser elements | Supports interacting with browser elements |
Parallel Execution | Requires additional tools | Supports parallel execution right out of the box |
Performance | Slower due to the use of WebDriver | Significantly faster and more efficient |
Community & Support | Large and active community, but much of the community documentation is out of date | Growing community with robust and up-to-date documentation |
Browser Automation | Automates actions inside the browser | Automates actions inside the browser |
Playwright
Playwright is an excellent framework. It handles navigation very well and allows us to find single elements using page.$()
, and page.$$()
to find and return a list of all elements of a certain type.
Playwright is slightly faster than Selenium but also requires some knowledge of async
and await
. While the learning curve for Playwright itself is not very steep, developers who are not used to it can definitely struggle with asynchronous programming and particular, when to use the await
keyword.
If you decide to choose Playwright with Node.js
, your project should be one that requires JavaScript so you should be familiar with JavaScript and how properly to use async
and await
. You should also be comfortable setting timeouts because the browser in Playwright will give timeout errors if a page is taking a long time to load.
You should not choose Playwright JavaScript if you don't understand when to await
, or if you are not comfortable writing JavaScript code. Playwright is also not suitable for scraping a long list of webpages becasue it needs to load these pages into the browser.
Selenium
Selenium is another great tool to start with. While it is a little bit slower and does require more setup, Selenium makes up for this in several ways. Selenium does not require you to understand asynchronous programming (it handles most of this for you under the hood). You don't have to wait for timeouts, and many people tend to think the Python's syntax is much easier to grasp then JavaScript's.
You should choose Selenium Python when you are more comfortable writing Python code and when you don't want to (or are not ready to) think about which portions of your code to await
.
You should not choose Selenium with Python if you are more comfortable writing JavaScript or if you have issues with dependency management. Dealing with browser drivers can be quite the hassle. Similar to Playwright, Selenium also struggles to scrape content from a long list of pages. If you have a large amount of content to scrape, do not choose Selenium.
Case Study: Side by Side Comparison of Playwright and Selenium
Scraping Recent Home Sales With Playwright
In this section, we're going to scrape recent home sales listed on Realtor with Playwright.
Since we're scraping a production site, it is best practice to use a proxy, and ours will be the ScrapeOps Proxy.
Many websites try to detect and block scrapers, this proxy allows us to go about our task undetected. Create a new Playwright project and add the following code to a JavaScript file.
Please note that the specific HTML structure, element locators, and class or ID attributes used in the code samples are based on the current state of the web page as of 30/10/2023.
Due to possible updates or modifications to the webpage's design and structure, the element locators and CSS selectors mentioned in the examples may no longer be valid or effective.
Please leave a comment on the article if you encounter any discrepancies, changes, or issues with the provided code samples.
//import playwright
const playwright = require("playwright");
const API_KEY = "your-super-secret-api-key"
//create a function to convert regular urls into proxy urls
function getScrapeOpsUrl(url) {
let payload = {
"api_key": API_KEY,
"url": url
};
//convert our payload into a query string
const queryString = new URLSearchParams(payload).toString();
//combine the query string with the base proxy url
const proxy_url = `https://proxy.scrapeops.io/v1/?${queryString}`;
//return the url of the proxied site
return proxy_url
}
//create an asynchronous main function
async function main() {
//launch the browser
const browser = await playwright.chromium.launch({
headless: false // setting this to true will not run the UI
});
//open a new page in the browser
const page = await browser.newPage();
//navigate to the page, make sure to use a long timeout
//we are waiting for the proxy to fetch the page and return it back to us
await page.goto(
getScrapeOpsUrl("https://www.realtor.com/realestateandhomes-search/Detroit_MI/show-recently-sold/sby-6"),
{ timeout: 30000000 });
//find all elements with the class name "card-anchor"
const elements = await page.$$(".card-anchor");
//iterate through the elements
for (const element of elements) {
//await the href
const href = await element.evaluate((el) => el.getAttribute("href"));
//split the href into an array
const array = href.split("/");
//print the last element of the array, the home address
console.log(array[array.length-1]);
}
await browser.close();
}
main();
The code above does the following:
- Create a function that converts a regular url into one using the ScrapeOps Proxy,
getScrapeOpsUrl()
- Open an instance of Chromium using
await playwright.chromium.launch()
- Create a new page with
browser.newPage()
- navigate to the website with
page.goto()
, we set a long timeout because we're pinging one server that pings another server and sends us the response after receiving it - We find all the
card-anchor
elements withpage.$$(".card-anchor")
(card-anchor
is the class name of each listing) - Iterate through each of the elements and
await
thehref
usingawait el.getAttribute("href")
- Once we have the href, we split it into an array
- After getting the array, we print the last element (the home address), with
console.log(array[array.length-1])
This process took 47 seconds on a Lenovo Ideapad 1i. The vast majority of time was spent waiting for the page to finish loading and finding the elements. Once the elements we're found, they were printed to the console in under a second. While Playwright is great for interacting with the page, it is not suited for scraping at scale.
Scraping Recent Home Sales With Selenium
Now, we'll scrape recently sold homes with Selenium. Inside our selenium-tutorial
folder, lets create a new file and call it home_sales.py
.
Once again, since we're scraping a production website, it is best practice to always use a proxy. We'll continue to make full use of the ScrapeOps Proxy. Create a new Selenium project and add a Python file with the following code.
from selenium import webdriver
from selenium.webdriver.common.by import By
from urllib.parse import urlencode
API_KEY= "your-super-secret-api-key"
#converts a url to a ScrapeOps url
def get_scrapeops_url(url):
payload = {
"api_key": API_KEY,
"url": url
}
#encode the url you want to scrape, and combine it with the url of the proxy
proxy_url = "https://proxy.scrapeops.io/v1/?" + urlencode(payload)
#return the url of the proxied site
return proxy_url
#open chrome
driver = webdriver.Chrome()
#navigate to the page
driver.get(get_scrapeops_url("https://www.realtor.com/realestateandhomes-search/Detroit_MI/show-recently-sold/sby-6"))
#find our list of elements and save it as a variable
elements = driver.find_elements(By.CLASS_NAME, "card-anchor")
#iterate through the list
for element in elements:
#find the link attached to the element
link = element.get_attribute("href")
#split the address into an array
array = link.split("/")
#print the last element in the array, the street address
print(array[-1])
driver.quit()
In this script, we do the following:
- Define a function to connect to sites with the ScrapeOps proxy,
get_scrapeops_url()
- Open an instance of Chrome with
webdriver.Chrome()
- Navigate to the proxied url of recently sold homes
- Find all elements with the
CLASS_NAME
ofcard-anchor
and return this list as a variable,elements
- For each element in our list, we find its url by getting its href attribute with
element.get_attribute("href")
- Once we have the url, we split it into an array of strings with
link.split("/")
- Since the address of each home is the last element of the array, we print it to the terminal with
print(array[-1])
- Once we're finished, we close the browser with
driver.quit()
On a Lenovo Ideapad 1i, this process took approximately 55 seconds. Selenium does just fine when opening and navigating to the page. Once the page is open and loaded, Selenium takes an extremely long time to find the elements. After Selenium successfully found all of the elements, it ran through the list and printed them in just a few seconds.
The Results: Which Framework Wins?
Playwright and Selenium are both great at what they're designed to do. Playwright setup is easier and runs faster, but coding for Selenium tends to be easier and abstracts away much of the async programming that goes on when interacting with the page.
If you are looking for speed and async support, Playwright is our clear winner. If you are looking for ease of use and to stay away from async programming, Selenium is the obvious choice.
Additional Resources
If you are looking to learn more about either of these frameworks, you can take a look at
More Web Scraping Guides
Now you should have a basic understanding of how both Playwright and Selenium work. You should understand that they are both actually suited for very similar tasks and whichever one you choose is basically a matter of preference. Playwright is faster and easier to setup, but Selenium is easier to code with once you are setup properly.
Looking to learn more about scraping in JavaScript? Take a look at the Node.js Web Scraping Playbook. Here are some great places to start:
Also take a look at the Python Web Scraping Playbook. Any of the following pieces are great places to start: