Skip to main content

How to Scroll Pages

Puppeteer Guide - How to Scroll Pages

The ability to scroll pages is an important skill to learn when using Puppeteer. Some pages may load dynamically or load while scrolling. For pages like this the only way to access all the content is to scroll the page.

In this article you will learn a variety of ways to scroll pages in Puppeteer and how to use them.

TLDR: How to Scroll Pages using Puppeteer

The most basic way to scroll to the bottom of the page is using the evaluate method.

// Scroll to the bottom of the page
await page.evaluate(() => {
window.scrollTo(0, document.body.scrollHeight);
});

The most basic way to generally scroll is using the mouse class. This also allows you to scroll horizontally using deltaX as well.

// Scroll 1000 pixels
await page.mouse.wheel({
deltaY: 1000,
});

Either of these methods are likely to be used with waiting to allow for resources to load. The most used way to do this is by checking for network idle

// Scroll to the bottom of the page
await page.evaluate(() => {
window.scrollTo(0, document.body.scrollHeight);
});

// Check if network is idle after scrolling
await page.waitForNetworkIdle();

Scrolling with Mouse

The first way you can scroll a page is by using the Mouse class. This class provides the wheel method that dispatches mousewheel events allowing you to scroll. See the following code as an example using the mouse.wheel method.

const puppeteer = require("puppeteer");

(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();

// Navigate to page
await page.goto("https://scrapeops.io/");

// Scroll 1000 pixels
await page.mouse.wheel({
deltaY: 1000,
});

// Take a screenshot
await page.screenshot({
path: "mouse.png",
});

// Close the browser
await browser.close();
})();

In the above code:

  • We open the browser, navigate to ScrapeOps.io and scroll the mouse wheel down 1000 pixels.

  • Then we take a screenshot and close the browser.

  • You can see in the screenshot the page has been scrolled down further than where it starts at normally.

Mouse Wheel Scrolled Page

Scroll Using Evaluate

Another way to scroll with Puppeteer is using the page.evaluate method. This method allows you to execute JavaScript to scroll the page rather than dispatching events directly from the mouse class.

With this approach they are two ways to achieve scrolling in different scenarios.

  1. The first is to scroll to a specific element using scrollIntoView.
  2. The second is to use scroll. scrollBy or scrollTo on the window object.

Scrolling an Element Into View

To scroll an element into view you can use scrollIntoView on the element.

const puppeteer = require("puppeteer");

(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();

// Navigate to page
await page.goto("https://scrapeops.io/");

// Scroll the .demo-image element into view
await page.evaluate(() => {
document.querySelector(".demo-image").scrollIntoView();
});

// Take a screenshot
await page.screenshot({
path: "evaluate-intoView.png",
});

// Close the browser
await browser.close();
})();
  • The above code opens a browser and navigates to ScrapeOps.

  • Then we use the evaluate method to execute querySelector and scrollIntoView.

  • We select an element with the demo-image class and then use the scrollIntoView method to scroll the page until it can see that element.

You can see in the screenshot below that once again the page has been scrolled from its original starting position.

Scroll with Evaluate scrollIntoView

Scrolling the Window

If you do not want to scroll to a specific element you can use scroll, scrollBy and scrollTo to scroll the entire window similar to how using the mouse class in puppeteer works.

const puppeteer = require("puppeteer");

(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();

// Navigate to page
await page.goto("https://scrapeops.io/");

// Scroll using evaluate
await page.evaluate(() => {
// Use scrollBy to scroll Y by 1000 pixels
window.scrollBy(0, 1000);

// Use scroll to scroll Y by another 100 pixels
// because scroll sets the current scroll coordinates
window.scroll(0, 1100);

// Scroll to the bottom of the page
window.scrollTo(0, document.body.scrollHeight);
});

// Take a screenshot
await page.screenshot({
path: "evaluate-window.png",
});

// Close the browser
await browser.close();
})();
  • The above code launches the browser and navigates a page to ScrapeOps.

  • Then, in our evaluate statement, we first use scrollBy to scroll from our current position by 1000 pixels in the Y plane.

  • Then we use scroll to set our current scroll coordinates to (0, 1100) for the x and y respectively.

  • Note, we have only actually scrolled 100 pixels from our first scroll because we are setting the value to 1100 after scrolling 1000 pixels already.

  • Finally, we use scrollTo to scroll to a specific set of coordinates. In this case, we scroll to document.body.scrollHeight which is the bottom of the page. Then we take a screenshot, showing the bottom of the page, and close the browser.

Scroll with Evaluate using Window

Waiting While Scrolling

For some pages, you will need to wait before or during scrolling. This is because as you scroll new network requests will be made to fetch new content or other computation will be performed to dynamically change and load the page. So if you try to scrape data before waiting it will not be loaded yet.

There are a couple ways to do this. The ones covered here will be:

  1. waiting for selectors,
  2. network idle and
  3. using hard coded sleep values.

Wait for Selector

If you are looking for a specific element or an element that can otherwise be narrowed down to a unique CSS Selector you can scroll until an element matching the query selector is found. See the following example

const puppeteer = require("puppeteer");

(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();

// Navigate to page
await page.goto(
"https://scrollmagic.io/examples/advanced/infinite_scrolling.html"
);

// Loop until querySelector is no longer null
while ((await page.$("div.box1:nth-child(89)")) == null) {
// Scroll by 200 pixels each time
await page.mouse.wheel({ deltaY: 200 });
// Small delay to give loading time and prevent CPU spikes
await new Promise((resolve) => setTimeout(resolve, 100));
}

// Take a screenshot
await page.screenshot({
path: "wait-selector.png",
});

// Close the browser
await browser.close();
})();
  • In the above code we load a website that infinitely loads colored squares as you scroll.

  • The element we are targeting is the 89th square specifically. That is why we use the :nth-child(89) query selector. After loading the page we start a while loop using page.$.

  • This method will return null if no elements match the query selector. This way we can run out loop until an element matching the selector is found.

  • Inside the loop we use page.mouse.wheel to scroll by 200 pixels every 100 milliseconds. This loop will run until the query selector is found.

  • At the end, you can see in the screenshot we have scrolled a fair bit down the page Scroll while looking for selector

  • Best practice for this situation would be to also add a timeout or failsafe of some kind so that you do not run indefinitely.

Waiting for Network Idle

If you are not looking for a specific element you can instead scroll until the browser network requests idle likely meaning no more data can be fetched.

The following example will continue to scroll to the bottom of the page and check if the network is idle and then scroll again to the bottom of the page.

const puppeteer = require("puppeteer");

(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();

// Navigate to page
await page.goto("https://infinite-scroll.com/demo/masonry/");

// Function to check if we've reached the bottom of the page
const isBottom = async () => {
return await page.evaluate(() => {
return window.innerHeight + window.scrollY >= document.body.offsetHeight;
});
};

// Loop until we've reached the bottom of the page
// This is because as new elements load we will no longer
// be at the bottom of the page.
while (!(await isBottom())) {
// Scroll to the bottom of the page
await page.evaluate(() => {
window.scrollTo(0, document.body.scrollHeight);
});

// Check if network is idle after scrolling
await page.waitForNetworkIdle();
}

// Take a screenshot
await page.screenshot({
path: "wait-networkidle.png",
});

// Close the browser
await browser.close();
})();
  • In the above code, we load a website that has a dynamically loading image grid. More images will load as we scroll until we reach the end of the images.

  • First we load the website.

  • Then, we define a function that uses page.evaluate to determine if we have reached the bottom of the page. Now we can begin looping until we reach the bottom of the page.

  • This loop works because the bottom of the page continues to move while we load more images.

  • In the loop we scrollTo the current bottom of the page and then wait for the network to idle.

  • This causes our loop to check again where we are no longer at the bottom of the page because new elements have loaded.

  • When our loop finally breaks we take a screenshot and close the browser. Scroll while waiting for network idle

Waiting for Hard Coded Values

The final way to wait for scrolling that we will cover is using hard coded sleep values. This method is less dynamic but likely is more than enough for a large number of websites. We can use a promise with setTimeout for this.

See the following example:

const puppeteer = require("puppeteer");

(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();

// Navigate to page
await page.goto("https://infinite-scroll.com/demo/masonry/");

// Function to check if we've reached the bottom of the page
const isBottom = async () => {
return await page.evaluate(() => {
return window.innerHeight + window.scrollY >= document.body.offsetHeight;
});
};

// Loop until we've reached the bottom of the page
// This is because as new elements load we will no longer
// at the bottom of the page.
while (!(await isBottom())) {
// Scroll to the bottom of the page
await page.evaluate(() => {
window.scrollTo(0, document.body.scrollHeight);
});

// Wait 1s (1000ms) before scrolling again
await new Promise((r) => setTimeout(r, 1000));
}

// Take a screenshot
await page.screenshot({
path: "wait-hardcode.png",
});

// Close the browser
await browser.close();
})();

This example is very similar to the network idle example.

  • We load the same website and use the same isBottom helper function.

  • The difference is instead of waiting for network to idle we just wait 1 second before checking the loop condition again.

  • We achieve the same result in a more rudimentary way as seen in the screenshot. Scroll with hardcoded value

Real World Example

For the finale of this article, we will use some of the above methods to scrape images from the Imgur homepage.

See the following code:

const puppeteer = require("puppeteer");

// Create an auto-running async function
(async () => {
// Create a browser instance
const browser = await puppeteer.launch({
defaultViewport: {
width: 1920,
height: 1080,
},
headless: false,
});

// Open a new page in the browser
const page = await browser.newPage();

// Set request interceptor to block XHR requests
// to block requests that interfere with our waitForNetworkIdle
await page.setRequestInterception(true);
page.on("request", (req) => {
// Block XHR events like the /events polling endpoint
if (req.resourceType() === "xhr") {
return req.abort();
}

// Block third-party requests that are not imgur
if (!req.url().includes("imgur")) {
console.log("abort " + req.url());
return req.abort();
}

return req.continue();
});

// Navigate to Imgur.com
await page.goto("https://imgur.com/");

// Set a start time
const start = Date.now();
let count = 0;
// Loop for 20 seconds
while (start + 20000 > Date.now()) {
// Scroll to the bottom of the page
await page.evaluate(() => {
window.scrollTo(0, document.body.scrollHeight);
});

// Check if network is idle after scrolling
await page.waitForNetworkIdle();

// Save all images on the page
const urls = await page.evaluate(() => {
const images = document.querySelectorAll("img");
const urls = [];
images.forEach((image) => {
urls.push(image.src);
});
return urls;
});

// Log URLs
count += urls.length;
console.log(urls);
}

// Log the total number of images
console.log("count: " + count);

// Close the browser
await browser.close();
})();
  • This example builds heavily on the Wait for Network Sample but there is a little extra going on.

  • Before loading the imgur page we set up request interception.

  • That is because imgur has a number of third party services and trackers that will stop out network from being idle.

  • Once request interception is set up we can navigate to imgur and start scraping. We start a while loop that will run for 20 seconds.

  • Inside the loop scroll to the bottom of the page which will trigger more images to load.

  • Then we use waitForNetworkIdle to make sure all images are finished loading.

  • Next, we use page.evaluate to querySelectorAll for all img elements. Then we grab the src for the images and put them in an array.

  • Finally, we log the images on each iteration and output the total count at the end.

Notes:

  • This is a very naive example to showcase the use of scrolling on a real website. This program would not be appropriate to actually scrape imgur because it only queries img elements (not gifs and videos etc).
  • We also are prone to a large number of duplicates because we do nothing to prevent that.
  • And of course, we are not actually downloading the images in this example.
  • Nonetheless, it serves as a great example of dynamically scraping infinite scrolling websites and social media feeds.

Conclusion

In conclusion, mastering Puppeteer's scrolling techniques is crucial for effective web scraping. This article covered diverse methods, from using the Mouse class for precise scrolling to executing JavaScript with page.evaluate.

The examples demonstrated how to scroll to specific elements, wait intelligently during scrolling, and scrape images from dynamic pages.

Overall, this guide equips you with the skills to efficiently navigate and extract content from a variety of web platforms using Puppeteer.

More Web Scraping Guides

If you would like to learn more about Web Scraping with Puppeteer, then be sure to check out The Puppeteer Web Scraping Playbook.

Or check out one of our more in-depth guides: