Playwright Guide - How to Scroll Pages with Playwright
Quite often, in order to interact with a page, its elements need to be visible. If we need to take screenshots of a page, our target content needs to be visible. No matter what you are doing, the ability to scroll the page is vital to your scraping toolbox.
When using Playwright (or any other automated browser for that matter), scrolling the page is something everyone should know how to do.
In this guide, we'll explore how to use Playwright to scroll pages effectively. By the end, you'll have a solid understanding of how to leverage Playwright's scrolling functionality to enhance your web automation scripts.
- Ways To Scroll Using Playwright
- Method 1: Scrolling to a Specific Element
- Method 2: Scrolling by a Certain Amount
- Method 3: Scrolling to the Bottom of the Page
- Method 4: Scrolling with Keyboard Shortcuts
- Method 5: Scrolling with Mouse
- Method 6: Scrolling with Touchscreen
- Dealing With Lazy Loading
- Challenges When Scrolling With Playwright
- Conclusion
- More Web Scraping Guides
Ways To Scroll Using Playwright
Scrolling using Playwright can be accomplished in several ways, each tailored to specific requirements and scenarios.The most common ways of scrolling include:
- Scrolling to a Specific Element
- Scrolling by a Certain Amount
- Scrolling to the Bottom of the Page
- Scrolling with Keyboard Shortcuts
- Scrolling with Mouse
- Scrolling with Touchscreen
These methods each have their own usecases and there is an appropriate time in which to use them.
Method # | Description | How It Works | Notes |
---|---|---|---|
1 | Scrolling to a Specific Element | Use element.scrollIntoView() | Scrolls to a chosen DOM element. |
2 | Scrolling by a Certain Amount | Use window.scrollBy(x, y) with page.evaluate . | Scrolls a specific distance horizontally (x) and/or vertically (y). |
3 | Scrolling to the Bottom of the Page | Use document.body.scrollHeight with page.evaluate . | Scrolls to the very bottom of the page. |
4 | Scrolling with Keyboard Shortcuts | Use page.keyboard.press('Space') or other keys. | Simple method, but not precise. |
5 | Scrolling with Mouse | Use page.mouse.wheel for mouse wheel actions. | Simulates mouse wheel scrolling, lacks precision. |
6 | Scrolling with Touchscreen | Not actually possible, but we can scroll with the methods above and then tap() on elements | Suitable for simulating scrolling on touchscreen devices. |
Method #1: Scrolling to a Specific Element
When scrolling to a specific element, we need to identify the element, and then scroll until it is visible on the page. We can use page.$(ourCriteriaGoesHere)
in order to identify elements on the page.
Once we've found the element, we use element.scrollIntoViewIfNeeded()
to scroll to the element. We can then interact with it using element.click()
.
//import playwright
const playwright = require("playwright");
//async main function
async function main() {
//launch the browser
const browser = await playwright.chromium.launch({
headless: false
});
//open a new page in the browser
const page = await browser.newPage();
//navigate to the site
await page.goto("https://books.toscrape.com");
//set a counter for pages left to view
let pageCount = 5;
//while we still have pages to view
while (pageCount > 0) {
//find the next button on the page
const nextButton = await page.$("text='next'");
//scroll until the button is visible
await nextButton.scrollIntoViewIfNeeded();
//wait 3 seconds
await page.waitForTimeout(3000);
//click the button
await nextButton.click();
//wait 3 more seconds
await page.waitForTimeout(3000);
//decrement the counter
pageCount--;
}
//we've viewed all the pages in the counter, close the browser
await browser.close();
}
main();
In the code above, we:
- Import Playwright using
const playwright = require("playwright");
- Create an
async
function,main()
, which holds the runtime of our scraper. - Launch the browser with
playwright.chromium.launch()
- Open a new page with
browser.newPage()
- Navigate to the website with
page.goto()
- Create a counter, this counter keeps track of the clicks we have left
while
the counter is greater than zero:- We find the "next" button with
page.$("text='next'")
- Scroll to the next button with
nextButton.scrollIntoViewIfNeeded()
- Wait for 3 seconds with
page.waitForTimeout(3000);
- Click on the button with
nextButton.click()
- Wait 3 more seconds
- Decrement the counter
- We find the "next" button with
Method #2: Scrolling by a Certain Amount
We can also scroll by an arbitrary amount. This can be very useful when trying to appear human, or if an actual human is watching the scraper in real time. Scrolling by a specific amount gives us the ability to scroll down, look at the page, and repeat until we get to the bottom of the page.
The example below scrolls the page by a specific amount. If the "next" button isn't visible yet, it waits a couple seconds and scrolls again. It repeats this process until the next button is visible. Then it clicks on the next button just like our previous example did.
//import playwright
const playwright = require("playwright");
//async main function
async function main() {
//launch the browser
const browser = await playwright.chromium.launch({
headless: false
});
//open a new page in the browser
const page = await browser.newPage();
//navigate to the site
await page.goto("https://books.toscrape.com");
//set a counter for pages left to view
let pageCount = 5;
//while we still have pages to view
while (pageCount > 0) {
//scroll down by 250
await page.evaluate(() => window.scrollBy(0, 250));
//wait 3 seconds
await page.waitForTimeout(3000);
//find the next button on the page
const nextButton = await page.$("text='next'");
//if the next button is visible
if (nextButton.isVisible()){
//click on it
await nextButton.click();
//decrement the counter
pageCount--;
//if the next button is not visible
} else {
//scroll downward by another 250
await page.evaluate(() => window.scrollBy(0, 250));
}
}
//we've viewed all the pages in the counter, close the browser
await browser.close();
}
main();
While this example does basically the same thing, you should notice some important differences in our loop.
await page.evaluate(() => window.scrollBy(0, 250));
tells Playwright to scroll down by 250 pixels.await page.waitForTimeout(3000);
waits 3 seconds before performing the next action so we can look at the page.- While we find the next button the same way we did before, we handle it differently. If
nextButton.isVisible()
returnstrue
, we click the next button. - If
nextButton.isVisible()
returnsfalse
, we scroll down by another 250 pixels and restart at the beginning of the loop.
In this example, we scroll by small amounts and continue until we can see nextButton
. Once it is visible, we click on it just like we did in the previous example.
Method #3: Scrolling to the Bottom of the Page
What if we want to immediately scroll to the bottom of the page? JavaScript's DOM manipulation gives us a very easy way to do this. We simply get the height of the document (our page), and scroll by that amount!
Take a look at the example below:
It follows the same basic principle as the previous two methods, find the next button and then click on it.
You should notice though, it is exponentially faster than the previous two methods. This sort of tactic is best employed on sites that aren't actively trying to block bots. Performing actions this quickly will almost certainly get you spotted by bot protection.
//import playwright
const playwright = require("playwright");
//async main function
async function main() {
//launch the browser
const browser = await playwright.chromium.launch({
headless: false
});
//open a new page in the browser
const page = await browser.newPage();
//navigate to the site
await page.goto("https://books.toscrape.com");
//set a counter for pages left to view
let pageCount = 5;
//while we still have pages to view
while (pageCount > 0) {
//scroll down by the height of the document
await page.evaluate(() => window.scrollBy(0, document.body.scrollHeight));
const nextButton = await page.$("text='next'");
//if the next button is visible
if (nextButton.isVisible()){
//click on it
await nextButton.click();
//decrement the counter
pageCount--;
}
}
//we've viewed all the pages in the counter, close the browser
await browser.close();
}
main();
If you pay attention to this example you should notice that it's actually much shorter. Key differences here:
- While we still use
window.scrollBy()
, instead of passing 250 pixels in as an argument, we pass in the length of the entire document,document.body.scrollHeight
. - We skip over the rest of the scrolling logic because we're already at the bottom of the page.
- Once we're at the bottom of the page, as with the other examples, if the next button is visible, we click on it.
Method #4: Scrolling with Keyboard Shortcuts
We can also scroll using the keyboard API. This API is quite simple to use but also inaccurate. Here is an example using the keyboard:
const playwright = require("playwright");
async function main() {
const browser = await playwright.chromium.launch({
headless: false
});
const page = await browser.newPage();
await page.goto("https://books.toscrape.com");
let pageCount = 5;
while (pageCount > 0) {
// Use the keyboard to scroll down with the space bar
await page.keyboard.press("Space");
//wait a second to see the page
await page.waitForTimeout(1000);
//find the next button
const nextButton = await page.$("text='next'");
//if it's visible, click on it
if (nextButton && await nextButton.isVisible()) {
await nextButton.click();
pageCount--;
await page.waitForTimeout(1000);
}
}
//close the browser
await browser.close();
}
main();
If you run and watch the example above, you'll notice that it is rather inconsistent. Sometimes, when the space bar is pressed, we move further than other times. Although we do eventually find the bottom of each page and click the next button, you've probably noticed that a couple of the previous methods work better.
Method #5: Scrolling with Mouse
We can also use the mouse API to scroll. Similar to the keyboard API, it is both inconsistent and somewhat inaccurate, but can still be useful if you're looking to simulate a real user.
The example below uses page.mouse.wheel()
to scroll and page.mouse.move()
to create some cursor movements on the screen. As previously stated, this method not very accurate or efficient, but can be quite useful in simulating a user.
const playwright = require("playwright");
async function main() {
const browser = await playwright.chromium.launch({
headless: false
});
const page = await browser.newPage();
await page.goto("https://books.toscrape.com");
let pageCount = 5;
while (pageCount > 0) {
//use the mouse to scroll down with the space bar
await page.mouse.wheel(0, 100);
//move the mouse a bit too
await page.mouse.move(20, 40);
//wait a second to see the page
await page.waitForTimeout(1000);
//find the next button
const nextButton = await page.$("text='next'");
//if it's visible, click on it
if (nextButton && await nextButton.isVisible()) {
await nextButton.click();
pageCount--;
await page.waitForTimeout(1000);
}
}
//close the browser
await browser.close();
}
main();
Key differences from the last example:
- Instead of scrolling with
page.keyboard("Space")
, we scroll withpage.mouse.wheel()
and we once again pass desired X and Y values into our scrolling function. page.mouse.move()
takes X and Y coordinates as well. Instead of scrolling the page, it gives us some seemingly random screen motion... similar to what you'd get from a real user.
Method #6: Scrolling with Touchscreen
While the touchscreen method doesn't support a native swipe, we can use window.scrollBy()
to control the amount we'd like to scroll, and then we can tap
the next button the way we would on a touchscreen. You can learn more about Playwright's touchscreen API here.
The example below uses window.scrollBy()
in order to scroll and then clicks the button with page.touchscreen.tap()
.
const playwright = require("playwright");
async function main() {
const browser = await playwright.chromium.launch({ headless: false });
const context = await browser.newContext({ hasTouch: true });
const page = await context.newPage();
await page.goto("https://books.toscrape.com");
let pageCount = 5;
while (pageCount > 0) {
//scroll down 10 pixels
await page.evaluate(() => window.scrollBy(0, 10));
//wait one second
await page.waitForTimeout(1000);
//find the next button
const nextButton = await page.$("text='next'");
//if it is visbile, tap it and decrement the counter
if (nextButton && await nextButton.isVisible()) {
await nextButton.tap();
pageCount--;
await page.waitForTimeout(1000);
}
}
await browser.close();
}
main();
Dealing With Lazy Loading
Sometimes, content is dynamically loaded as you scroll. This is called "Lazy Loading". When dealing with lazy loaded images, we need to wait for them to load before we continue our scroll.
When waiting for content to load,
- we can either hardcode our wait times, or
- we can wait until the network is idle.
The utility of each method will vary depending on the site that you're scraping. It's often best to actually use a combination of the two.
Hardcoded Wait Times
The code example below uses await page.waitForTimeout()
in order to hard code our wait times into the script. When we use await
in combination with page.waitForTimeout()
takes an argument in milliseconds and waits until that many milliseconds have passed.
const playwright = require('playwright');
async function main() {
const browser = await playwright.chromium.launch({
headless: false
});
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://reddit.com");
//wait for a hardcoded 2 seconds
await page.waitForTimeout(2000);
//scrolls remaining before the script exits
var scrollsRemaining = 30;
//while we have scrolls remaining
while (scrollsRemaining > 0) {
//scroll down by 10,000 pixels
await page.evaluate(()=> window.scrollBy(0, 10000));
//use a hardcoded wait time of one second for content to load
await page.waitForTimeout(1000);
//decrement the scrolls remaining
scrollsRemaining--;
}
//out of scrolls, close the browser
await browser.close();
}
main();
After launching the browser and navigating to the page, in the code above, we:
- Wait for 2 seconds with
await page.waitForTimeout(2000);
- After our first wait, we set a counter variable,
scrollsRemaining
- While
scrollsRemaining
is greater than zero, we:- Scroll down the page by 10,000 pixels
- Wait for one more second with
await page.waitForTimeout(1000);
- Decrement our scrolls remaining
This method tends to work just fine for sites that can load their content quickly, but occasionally, if we don't set our wait times for long enough, we can create problems elsewhere in our code when we need to find elements that haven't loaded yet.
Network Waits
The other way we can wait for content to load is to wait for the network. More specifically, we wait until the network is idle. This means that we wait until there are no more communications going on between our browser and the server. Let's re-implement the previous example, but this time, we'll use network waits instead.
const playwright = require('playwright');
async function main() {
const browser = await playwright.chromium.launch({
headless: false
});
//open a new page
const page = await browser.newPage();
//navigate to the site
await page.goto("https://reddit.com");
//wait for a hardcoded 2 seconds
await page.waitForLoadState("networkidle");
//scrolls remaining before the script exits
var scrollsRemaining = 30;
//while we have scrolls remaining
while (scrollsRemaining > 0) {
//scroll down by 10,000 pixels
await page.evaluate(()=> window.scrollBy(0, 10000));
//use a hardcoded wait time of one second for content to load
await page.waitForLoadState("networkidle", {
timeout: 10000
});
//decrement the scrolls remaining
scrollsRemaining--;
}
//out of scrolls, close the browser
await browser.close();
}
main();
The code above is virtually the same as the example preceding it with two exceptions:
page.waitForTimeout()
becomespage.waitForLoadState()
.- Instead of passing an arbitrary amount of time into our waiting function, we pass the argument
"networkidle
.
If you run the code above, it will scroll down once and exit immediately. The reason for this: our network is idle, but we're still waiting for new content to load!. The scraper scrolls to the bottom of the page, and then continues scrolling before the new content can load!
Challenges When Scrolling With Playwright
In the previous section, we scrolled the mother of all infinite scroller, Reddit. When scrolling with hardcoded wait times, Reddit loaded just fine without any issues whatsover. We then remade the implementation by waiting until the network was idle. You should recall that Playwright continued to scroll the page, even though the network was idle. The script then exited without any errors! This means that Playwright executed all of those scrolls before the content was loaded.
Here is a list of common issues you should be aware of when scrolling with Playwright:
-
Dynamic Content Loading: This issue was best explained by our Reddit example. Content can continue loading even when the network is idle. To address this, you should get to know the site you are scraping and use a combination of network waits and hardcoded waits when your project requires it.
-
Timing and Network Activity: As you just learned, you cannot solely rely on hardcoded wait times or the state of the network activity. You need to be aware of both and use them in tandem when scraping.
-
Performance Problems: When scrolling pages that use a large number of resources, it becomes an intense workload on the browser. Because the browser relies on your system memory, it can bog down your entire system. If you're not careful in these situations, you could crash your browser, or even your system! To handle these issues properly, make sure that you're not scrolling by too much and that Playwright doesn't have to load too much content at once... Slow down if you need to!
-
Scrolling Mechanisms: You've learned multiple methods of scrolling. Not all of them are meant for the same uses. Sometimes it can actually be better to scroll with a mouse or keyboard. While not exactly accurate, some sites actually require user interaction in order to continue loading content.
-
Screenshots: When capturing screenshots on extremely long sites, getting the whole thing in one shot can be nigh impossible. To handle your screenshots properly, you should:
- Scroll
- Take a screenshot
- Scroll some more Don't overload your system by trying to take a full shot of an infinite scroller.
-
Finding Elements: As you probably remember from our earlier examples in this tutorial, we would only click the "next button" when we could find it and it was visible. When interacting with elements, try to always make sure that they're visible to Playwright. The last thing you need is to finish a super long crawl only to find out that your scraper hit the wrong buttons because the right ones weren't visible.
-
Handling Pop-Ups: While they usually don't get in the way, pop-ups can occasionally cause problems for scrapers. When you run into issues with a pop-up, inspect it manually from your browser so you know how to find it with Playwright. Once you can find your pop-up, all you need is a
click()
in your script to remove it. -
Browser Compatibility: Different browsers use different drivers. Because of this, different types of scrolling, such as the mouse wheel or the space bar might scroll the page by different amounts in each browser. When designing a Playwright script for Chromium, you can't just assume it will work with other browsers. If you want the script to be compatible with another browser, you will need to test it with the other browser and you will probably need to tweak it.
Conclusion
Congratulations! You've made it to the end of this tutorial and now you can employ a myriad of different scrolling methods next time you need to scrape a page with Playwright.
You know how to:
- Scrolling to a Specific Element
- Scrolling by a Certain Amount
- Scrolling to the Bottom of the Page
- Scrolling with Keyboard Shortcuts
- Scrolling with Mouse
- Scrolling with Touchscreen
More Playwright Web Scraping Guides
Want to learn more? Checkout the links below: