Playwright Guide: How To Find Elements by CSS Selector
Whenever we automate actions with a web browser, we need to locate and select elements. Without the ability to select an element, we can't click, we can't fill form data, we can't even scroll to the element because our Playwright doesn't know where it is.
If Playwright can't find page elements, it can't really do much other than open new pages and close the browser!
In this guide, we're going to learn how to find elements using their CSS selector.
- TLDR: How To Find Elements by CSS Selector
- Understanding CSS Selectors
- Basic CSS Selectors
- Advanced CSS Selectors
- Debugging and Troubleshooting
- Conclusion
- More Web Scraping Guides
TLDR: How To Find Elements by CSS Selector
To find elements by CSS selector in Node.js using Playwright, follow the script below:
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com");
//find the first h1 element
const h1 = page.locator("h1").first();
const h1Text = await h1.textContent();
console.log(`H1 element: ${h1Text}`);
//find all elements with the class "quote"
const quotes = page.locator(".quote");
//get the count of our quotes list
const quoteCount = await quotes.count();
//iterate through quotes
for (let i=0; i<quoteCount; i++) {
//get the text of the quote
text = await quotes.nth(i).textContent();
//log it to the console
console.log(`Quote: ${text}`);
}
//close the browser
await browser.close();
}
main()
- First, we initiate a Chromium browser instance using Playwright's
launch()
function. - Within the browser, we create a new page via browser.newPage() and navigate to the URL Quotes to Scrape .
- We locate the first
h1
element on the page usingpage.locator("h1").first()
.- Find By Tag Name: Pass the tag name into
locator()
...page.locator("h1")
- Find By Class Name: Pass
.name-of-class
intolocator()
...page.locator(".quote")
- Find By ID: Pass
#id
intolocator()
...page.locator("#username")
- Find By Tag Name: Pass the tag name into
- After, we retrieve the text content of the found
h1
element usingh1.textContent()
and print the text content of theh1
element to the console. - We locate all elements with the class "quote" on the page using
page.locator(".quote")
. - We print the text content of each quote to the console and finally, we close the browser instance to release system resources using
browser.close()
.
Understanding CSS Selectors
CSS selectors are used in order to style a webpage. Any page with more than just white background and black text uses them. Often when styling a webpage, a developer will use a class in order to style things.
CSS selectors extend much further than just simple classes, but let's take a look at a CSS class just to show how CSS works on a webpage.
For starters, we can create a new HTML file.
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Our Demo Page</title>
<link rel="stylesheet" type="text/css" href="demo.css">
</head>
<body>
<h1>Hello I am an HMTL File</h1>
<p>These are some smaller words.</p>
</body>
</html>
If you open this file in your browser, it will look like the image below... Pretty boring right?
Next, we're going to create a class. Since our HTML file already links to "demo.css", let's create a new CSS file, "demo.css". Here is our new class, it's not much, just enough to show you what CSS does.
.our-new-class {
background-color: black;
color: white;
}
Next, we update our file to use our new class. We do one thing: add class="our-new-class"
to the body element.
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Our Demo Page</title>
<link rel="stylesheet" type="text/css" href="demo.css">
</head>
<body class="our-new-class">
<h1>Hello I am an HMTL File</h1>
<p>These are some smaller words.</p>
</body>
</html>
Here is a screenshot of the page after giving it new style.
However, there is for more to CSS than just classes. Take a look at the table below.
When using CSS selectors on a page, we can select them by:
- tag,
- id,
- class,
- attribute,
- descendant,
- child,
- adjacent sibling,
- general sibling,
- pseudo-element and
- pseudo-class.
Selector Type | CSS Syntax | Description |
---|---|---|
Tag Name | tag | Selects elements by their tag. |
ID | #id | Selects an element by its ID. |
Class | .class | Selects elements by their class. |
Attribute | [attribute=value] | Selects elements with a specific attribute. |
Descendant | ancestor descendant | Selects an element within another element. |
Child | parent > child | Selects direct children of an element. |
Adjacent Sibling | previous + next | Selects an element directly after another. |
General Sibling | sibling ~ sibling | Selects all siblings of a specified element. |
Pseudo-class | element:pseudo-class | Selects elements in a certain state. |
Basic CSS Selectors in Playwright
Let's dive into some basic CSS selectors and their usage in Playwright:
Find by Class Name
To locate an element using its class name, we use the .
operator. If I want to find a class called my-custom-class
, I'd tell Playwright I'm looking for .my-custom-class
.
The example below shows how us how to do find an element using its class name.
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com");
//find the FIRST element with the class "tag"
const firstTag = page.locator(".tag").first();
//get await the text
const text = await firstTag.textContent();
//log the text
console.log("First tag:", text);
//close the browser
await browser.close();
}
main();
This example does the following:
- Open a browser with
await playwright.chromium.launch()
- Open a new page with
browser.newPage()
- Go to the website with
page.goto()
- Find the first element with the
tag
class withpage.locator(".tag").first()
- Get the text from the tag with
firstTag.textContent()
- Log the text to the console
- Close the browser with
browser.close()
Find by ID
Now that we know how to find an element using its class name, let's try something else. Here, we'll scrape the same site, but this time we'll find an element using its ID.
In this example we head over to the "login" page. We find the username box by using its id with the #
operator, #username
. We then take a screenshot of the result.
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com/login");
//find the FIRST element with the id "username"
const username = page.locator("#username").first();
//fill the box with text
await username.fill("ScrapeOps");
//take a screenshot
await page.screenshot({ path: "find-by-id.png" });
//close the browser
await browser.close();
}
main();
In this example, we:
- Open a browser with
playwright.chromium.launch()
- Open a new page with
browser.newPage()
- Navigate to the site with
page.goto()
- Find the first username element using its id with
page.locator("#username").first()
- We then fill the username box with text using
username.fill()
- We take a screenshot with
page.screenshot()
- Close the browser
Here is the resulting screenshot:
As you can see above, we found the username box using its ID. After finding it, we were able to successfully fill the box with text.
Find by Tag Name
Now, let's find an element using its tag name. Tags are very simple. To find all <h1>
elements, we would search for the tag, h1
.
The example below is almost identical to our first. See if you can spot the difference.
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com");
//find the FIRST element with the tag "h1"
const firstTag = page.locator("h1").first();
//get the text
const text = await firstTag.textContent();
//log the text
console.log("First h1 element by tag name:", text);
//close the browser
await browser.close();
}
main();
This example does the following:
- Open a browser with
playwright.chromium.launch()
- Open a new page with
browser.newPage()
- Go to the site with
page.goto()
- Find the first
<h1>
using its tag,h1
, using the following code:page.locator("h1").first()
- After finding the element, get its text content with
firstTag.textContent()
- We then log the text to the console and close the browser
Advanced CSS Selector Techniques
Now that you know how to perform some basic CSS selector operations, let's try some more advanced methods. The methods below combine basic selectors and operators to filter through elements in a more powerful and efficient way.
Attribute Presence
The code below finds all elements containing the id
attribute, and fills them with text.
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com/login");
//find the all elements that have an ID attribute
const itemsWithId = page.locator("[id]");
const itemsCount = await itemsWithId.count();
//fill the items with text
for (let i=0; i < itemsCount; i++) {
await itemsWithId.nth(i).fill("scrapeops");
}
//take a screenshot
await page.screenshot({ path: "find-by-attribute.png" });
//close the browser
await browser.close();
}
main();
You should notice a few differences from our previous examples here:
itemsWithId = page.locator("[id]")
returns all items that have theid
attribute- Our result is not a normal array. To iterate through it, we need its count,
itemsWithId.count()
- Once we have its count, we iterate through it and fill each one with text:
itemsWithId.nth(i).fill("scrapeops");
- After these steps, we take a screenshot and close the browser like the previous example when we filled the username box
Here is the resulting screenshot:
Attribute Value
In this example, we'll find all elements with a specific attribute value. This code finds all elements with an id of username.
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com/login");
//find the all elements that have an ID attribute with the value "username"
const itemsWithIdUsername = page.locator("[id='username']");
const itemsCount = await itemsWithIdUsername.count();
//fill the items with text
for (let i=0; i < itemsCount; i++) {
await itemsWithIdUsername.nth(i).fill("scrapeops")
}
//take a screenshot
await page.screenshot({ path: "find-by-attribute-value.png" });
//close the browser
await browser.close();
}
main();
The only difference in this example:
- Instead of using
page.locator("[id]")
, we usepage.locator("[id=username]")
Here is the screenshot:
As you can see from the screenshot, there is only one element with the ID of username
that we're able to fill with the text.
Attribute Contains
Next, we're going to find elements by things contained in their attribute. The code below finds all the elements with an href
, but only if the href
contains the word "author".
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com/");
//find the all elements with an href containing the word "author"
const itemsWithAuthor = page.locator("[href*='author']");
const itemsCount = await itemsWithAuthor.count();
//log the items to the console
for (let i=0; i < itemsCount; i++) {
text = await itemsWithAuthor.nth(i).textContent()
console.log("Text Content:", text);
}
//close the browser
await browser.close();
}
main();
Here is what you should notice from this example:
page.locator("[href*='author']")
finds all elements with anhref
that contains the wordauthor
- The
*=
operator specifies that the attribute contains the text,'author'
Attribute Starts With
In this example, we'll find all the elements with a class
attribute that begins with the letter "q".
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com/");
//find the all elements with an class starting with "q"
const itemsWithQuote = page.locator("[class^='q']");
const itemsCount = await itemsWithQuote.count();
//log the items to the console
for (let i=0; i < itemsCount; i++) {
text = await itemsWithQuote.nth(i).textContent()
console.log("Text Content:", text);
}
//close the browser
await browser.close();
}
main();
Important details to consider here:
page.locator("[class^='q']")
tells us to only find elements with a class name that begins with "q".- The
^=
operator tells our locator to only look for items beginning the the following text,'q'
.
Attribute Ends With
Next, we'll look only for elements by what their attribute ends with. In this example we'll use the $=
operator.
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com/");
//find the all elements with an class ending with "e"
const itemsWithQuote = page.locator("[class$='e']");
const itemsCount = await itemsWithQuote.count();
//log the items to the console
for (let i=0; i < itemsCount; i++) {
text = await itemsWithQuote.nth(i).textContent()
console.log("Text Content:", text);
}
//close the browser
await browser.close();
}
main();
Key points in this example:
page.locator("[class$='e']")
finds only elements that contain a class that ends with the letter "e".- We use the
$=
operator to tell Playwright that we only want elements where the element has an attribute ending with a specific character.
Descendant Selector
Now we'll search for elements using a descendant selector. The code below finds all div
elements descended from at least four other div
elements.
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com/");
//find the all div elements nested within at least four other divs
const itemsFromDivs = page.locator("div div div div div");
const itemsCount = await itemsFromDivs.count();
//log the items to the console
for (let i=0; i < itemsCount; i++) {
text = await itemsFromDivs.nth(i).textContent()
console.log("Text Content:", text);
}
//close the browser
await browser.close();
}
main();
In this example:
page.locator("div div div div div")
tells Playwright to find alldiv
elements that have descended from at least four otherdiv
elements.
Child Selector
Next, we'll search for elements that are direct children of another element. The example below looks for all div
elements that are direct children of the body
element.
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com/");
//find the all div elements that are direct children of the body element
const divsFromBody = page.locator("body > div");
const itemsCount = await divsFromBody.count();
//log the items to the console
for (let i=0; i < itemsCount; i++) {
text = await divsFromBody.nth(i).textContent()
console.log("Text Content:", text);
}
//close the browser
await browser.close();
}
main();
What you should learn from this example:
page.locator("body > div")
tells Playwright that we want alldiv
items that are children of thebody
element.- When searching for elements this way, always specifc them in the following fashion:
parentElement > childElement
Adjacent Sibling Selector
Next we'll use the adjacent sibling selector to find elements. We'll use the +
operator to find our elements.
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com/");
//find the all div elements that are siblings adjacent to other divs
const divsAdjacent = page.locator("div + div");
const itemsCount = await divsAdjacent.count();
//log the items to the console
for (let i=0; i < itemsCount; i++) {
text = await divsAdjacent.nth(i).textContent()
console.log("Text Content:", text);
}
//close the browser
await browser.close();
}
main();
Key point of this example:
page.locator("div + div")
tells Playwright that we want onlydiv
elements that are adjacent siblings to otherdiv
elements
General Sibling Selector
The code below find all div
elements that are general siblings of other div
elements.
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com/");
//find the all div elements that are siblings to other divs
const divsGeneral = page.locator("div ~ div");
const itemsCount = await divsGeneral.count();
//log the items to the console
for (let i=0; i < itemsCount; i++) {
text = await divsGeneral.nth(i).textContent()
console.log("Text Content:", text);
}
//close the browser
await browser.close();
}
main();
Pay attention to the following line of code:
page.locator("div ~ div")
tells Playwright that we want alldiv
elements that are general siblings of otherdiv
elements.
Pseudo-Classes and Pseudo-Elements
Now, we'll learn how to select items by pseudo class. Pseudo-elements are actuallly not present in the DOM tree so, they cannot be found using Playwright's API.
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com/");
//find all div elements that are first children of any element in the DOM
const itemsList = page.locator("div:first-child");
const itemsCount = await itemsList.count();
//log the items to the console
for (let i=0; i < itemsCount; i++) {
text = await itemsList.nth(i).textContent()
console.log("Text Content:", text);
}
//close the browser
await browser.close();
}
main();
Key points here:
- Find all divs that are first children in the DOM using
page.locator("div:first-child")
- When searching for a pseudo class, you look do so in the following syntax
element:attribute-to-find
Combining Selectors
In our final example, we'll find some elements by combining multiple selectors. In this example, we'll find all div
elements that are first children of the body element.
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://quotes.toscrape.com/");
//find all div elements that are first children of the body element
const itemsList = page.locator("body > div:first-child");
const itemsCount = await itemsList.count();
//log the items to the console
for (let i=0; i < itemsCount; i++) {
text = await itemsList.nth(i).textContent()
console.log("Text Content:", text);
}
//close the browser
await browser.close();
}
main();
What you should notice here:
page.locator("body > div:first-child")
tells Playwright to find alldiv
elements that are first children of thebody
element.- When combining multiple selectors, we literally just pass multiple selectors into
page.locator()
... It's that easy!
Debugging and Troubleshooting
The selectors should accurately target the desired elements on the webpage. Here are some tips to help with this process:
Inspecting with DevTools
Sometimes you'll need to inspect
elements from within the browser. To inspect the page, simply right-click the page, and choose inspect
from the dropdown menu.
Tips and Mistakes to Avoid
When working with CSS selectors in Playwright, here are some tips to consider and common mistakes to avoid:
Write Maintainable Selectors
When we write selectors, we need to find a balance between effiency and maintainability. When writing selectors inside of a Playwright script, try to use selectors that are easy to understand and always use comments when neccessary.
Good comments often make it much easier to read code you didn't write or don't remember.
Find a Balance Between Specific and Flexible
Overly specific selectors can be great...at first. If you're writing code that scrapes a constantly changing page, sometimes you'll run your selector and the element that was there yesterday won't be there today!
To properly handle this issue write selectors that are easy to change. This way they can adapt to the pages they're used to scrape.
Don't Select Elements for No Reason
Running code uses resources. Selectors is no exception to this rule. Only select relevant items on the page. You don't want Playwright to crash or slow down just because you wanted to show off by selecting a bunch of extra stuff!
How to Wait for Dynamic Elements to Appear
To wait for dynamic elements to appear before interacting with them using CSS selectors in Playwright, you can employ different waiting methods.
Hardcoded Waiting
In the example below, we use page.waitForTimeout(1000)
to perform a hardcoded wait of one second. We wait for exactly one second and then take a screenshot of the page.
const playwright = require("playwright");
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://www.espn.com/");
//wait one second
await page.waitForTimeout(1000)
//take a screenshot
await page.screenshot({ path: "hardcoded-wait.png"});
//close the browser
await browser.close();
}
main();
This code produced the following screenshot:
Network Waits
In this example, we perform our wait based on the network conditions. We use page.waitForLoadState()
and pass "networkidle"
to it as a parameter. This tells Playwright to wait until the network has gone idle before continuing the script.
async function main() {
//open a browser
const browser = await playwright.chromium.launch();
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://www.espn.com/");
//wait until the network is idle
await page.waitForLoadState("networkidle")
//take a screenshot
await page.screenshot({ path: "network-wait.png"});
//close the browser
await browser.close();
}
main();
This is the screenshot produced by a network wait.
Conclusion
You've made it to the end! At this point you should have a solid grasp of both Playwright basics and CSS selectors. You can definitely take this knowledge and build your first scraper with Playwright.
Want to learn more? Check the links below.
More Playwright Web Scraping Guides
If you would like to learn more about Web Scraping with Playwright, then be sure to check out The Playwright Web Scraping Playbook.
Or check out one of our more in-depth guides: