Puppeteer Guide: Waiting For Page or Element To Load
It is essential to comprehend the time required for a web browser to completely load and display a website on the screen, before attempting to take a screenshot using Puppeteer.
In this Puppeteer guide, we will review the efficient waiting strategies that Puppeteer offers waiting for pages or elements to load and covering:
- Why Do We Care About Page Load in Puppeteer?
- How To Wait For Page To Load With Puppeteer
- Methods For Waiting for a Page or Element to Load in Puppeteer
- Common Situations for Waiting for a Page or Element to Load in Puppeteer
- Combining Waiting Strategies
- Best Practices for Waiting in Puppeteer
- Conclusion
- More Web Scraping Tutorials
Why Do We Care About Page Load in Puppeteer?
Many websites exhibit dynamic behaviour, continuously loading new content asynchronously, with elements appearing and disappearing in the process.
Automated scripts may execute prematurely or cause errors due to elements that are not fully loaded yet or have been changed dynamically.
The following detailed points elaborate on why we care about page load in Puppeteer:
-
Filling Forms: Efficiently waiting for forms to load is critical for accurate input and submission. Puppeteer provides strategies to synchronize with form elements, ensuring a seamless automation process.
-
Pop-ups & Modals: Waiting for the appearance of pop-ups and modals is essential for interacting with these dynamic elements. Puppeteer offers specialized methods to handle these scenarios effectively.
-
Waiting for a Specific Element: In scenarios where specific elements are pivotal to the automation process, Puppeteer provides methods to precisely wait for their full loading, preventing premature interactions.
-
Resource Management: Efficient resource management is crucial for optimizing page load times. Puppeteer equips users with tools to manage resources effectively, ensuring a streamlined automation experience.
-
Avoiding Detection: To navigate web scraping without detection, Puppeteer provides methods to wait intelligently, minimizing the risk of being flagged by anti-bot mechanisms.
How To Wait For Page To Load With Puppeteer
There are several methods available to wait for a page to load, each serving a specific purpose.
Let's delve into the various options:
Method | Description |
---|---|
page.waitForSelector() | Waits until the specified CSS selector is present on the page. This is often the preferred method, as it ensures a specific element is loaded before proceeding. |
page.waitForFunction() | Waits until the provided function returns a true . Useful for custom conditions based on evaluating JavaScript expressions. |
page.waitForNavigation() | Waits for a navigation event to occur, such as clicking a link or submitting a form. |
page.waitForResponse() | Waits for a network response matching the provided criteria. Useful for scenarios where waiting for a specific API call or resource is necessary. |
page.waitForRequest() | Similar to waitForResponse() , but waits for a network request to be initiated. Useful for scenarios where you want to ensure a request is made before proceeding. |
page.waitForXPath() | Waits until the specified XPath is present on the page. Similar to waitForSelector() but uses XPath expressions for element selection. |
page.waitForTimeout() | Introduces a static delay by waiting for a specified amount of time in milliseconds. While generally not recommended, it can be useful in specific scenarios. |
page.waitForEvent() | Introduced in Puppeteer v8, this method waits for a Puppeteer event to be emitted. It provides more flexibility when waiting for custom events within the Puppeteer lifecycle. |
page.waitForLoadState() | Waits for a specific load state, such as load , domcontentloaded , or networkidle . Offers more control over when to consider the page fully loaded. |
We'll delve into the specifics of each waiting method shortly.
For now, let's see a simple example demonstrating a 15-second delay using page.waitForTimeout()
.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: { width: 1920, height: 1080 }
});
const page = await browser.newPage();
await page.goto("https://twitter.com/ScrapeOps");
await page.waitForTimeout(15000);
await page.screenshot({ path: `twitter.png` });
await browser.close();
})()
In this example, page.waitForTimeout(15000)
pauses the execution for 15 seconds, providing time for the page to load.
Although static delays are generally not recommended for waiting for page loads, they can be useful in specific scenarios.
As we explore other waiting methods, we'll find more dynamic and reliable ways to ensure the page is fully loaded before proceeding with further actions.
Methods For Waiting for a Page or Element to Load in Puppeteer
Now, we will explore all the methods that Puppeteer provides to wait for page load, in detail:
goto
Method Options
The page.goto(url, options)
method stands out as the most valuable waiting strategy.
While primarily employed for navigating to a web page, it proves versatile by accommodating various options to pause for specified durations or await specific events before progressing to the subsequent actions.
Two pivotal options frequently employed in the context of page loading, particularly concerning screenshot capture, are waitUtil
and timeout
.
- waitUntil: The
waitUntil
option in thepage.goto(url, options)
method can be configured with fourWaitForOptions
types:load
,domcontentloaded
,networkidle0
, andnetworkidle2
.
More than one waitUnil
options can be employed by passing them as an array.
-
domcontentloaded:
- This option instructs Puppeteer to wait until the
DOMContentLoaded
event is fired. - This event occurs when the initial HTML document has been completely loaded and parsed.
- It indicates that the DOM tree is available to the browser, excluding external resources like stylesheets and images.
Let's see an example:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: { width: 1920, height: 1080 }
});
const page = await browser.newPage();
await page.goto("https://finance.yahoo.com", { waitUntil: "domcontentloaded" });
await page.screenshot({ path: `yahoo-domcontentloaded.png` });
await browser.close();
})()Observing the page, it's evident that certain elements, such as advertisement banners and specific styles, are not rendered correctly.
This discrepancy is expected, as the
domcontentloaded
event guarantees only the proper loading and parsing of HTML, leaving out assurance for the complete rendering of external resources like images, fonts and certain styles. - This option instructs Puppeteer to wait until the
-
load (Default):
- If no
waitUntil
option is provided, the default behavior is to wait until theload
event is fired. - The
load
event signifies that the entire page, including the DOM tree, CSS styles, fonts, and images, has finished loading. - This is a more comprehensive wait condition, encompassing all resources associated with the page.
Let's modify our earlier example to utilize the
load
event instead ofdomcontentloaded
and observe if this results in correct rendering of advertisement banners:const page = await browser.newPage();
await page.goto("https://finance.yahoo.com", { waitUntil: "load" });
await page.screenshot({ path: `yahoo-load.png` }); - If no
Success! It's evident that employing the load
event provides greater assurance that the page has been fully rendered, encompassing images, styles, and fonts.
This completeness was not entirely guaranteed when relying solely on domcontentloaded
.
- timeout: This option specifies the maximum navigation time in milliseconds.
If the navigation events (like load
, domcontentloaded
, etc.) are not completed within this time, the page.goto()
method will throw an error.
It sets a time limit for the entire page navigation process.
await page.goto('https://twitter.com/ScrapeOps', { timeout: 10000 });
// Set timeout to 10 seconds
The timeout
option in the page.goto(url, options)
method and the page.waitForTimeout()
function serve different purposes. The waitForTimeout()
method is not directly related to page navigation or waiting for specific events on the page.
It simply pauses the execution of the script for the specified duration.
Wait for Network Idle
When navigating to a web page, various components such as HTML, CSS files, images, fonts, and API calls don't load simultaneously but rather through multiple network requests.
Estimating the time it takes for these requests to settle provides a valuable metric for determining when the page has fully loaded. This elegant waiting strategy is facilitated by the networkidle option as the waitUntil
parameter.
Network idle simply refers to the period when the browser ceases to make any network requests to the server.
In Puppeteer, there are two variants of networkidle:
-
networkidle0:
- This setting waits until there are no more than 0 network connections persisting for a continuous 500-millisecond duration.
- It demands absolute network idleness, ensuring no active connections.
- This condition is particularly strict and is well-suited for static sites, where the entire website is fetched, and no further content loading is necessary due to the site's non-dynamic nature.
- It is ideal when certainty is needed that all network activity has come to a complete halt, indicating a state of total network idleness.
-
networkidle2:
- This configuration waits for no more than 2 network connections to persist for a continuous 500-millisecond duration.
- It adopts a slightly more lenient condition compared to
networkidle0
, allowing for up to two ongoing connections. - In situations where browsers may continue sending requests, such as websites utilizing sockets (e.g., lichess.com, a chess game website), this condition is apt.
- It ensures that the majority of network requests have concluded while permitting a minor amount of ongoing network activity.
Let's examine a code example where we won't utilize networkidle and observe the resulting screenshot:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: { width: 1920, height: 1080 }
});
const page = await browser.newPage();
await page.goto("https://twitter.com/ScrapeOps", { waitUntil: "load" });
await page.screenshot({ path: `twitter-without-networkidle.png` });
await browser.close();
})()
As you can see that the screenshot is limited to Twitter's "X" icon, as the initially loaded HTML only includes this logo. Other content is fetched later through events and APIs.
Now, let's employ networkidle2
and observe the resulting screenshot:
const page = await browser.newPage();
await page.goto("https://twitter.com/ScrapeOps", { waitUntil: ["load", "networkidle2"] });
await page.screenshot({ path: `twitter-with-networkidle.png` });
Observing the screenshot, it's evident that the page was populated with all its content and images.
This outcome occurred because we captured the screenshot when there were no more than 2 HTTP requests to fetch additional content, indicating that the content was likely rendered before taking the screenshot.
In the very first example within this guide, a similar outcome was achieved using waitForTimeout()
with a hardcoded 15-second delay.
However, the advantage of employing networkidle2
is that it optimally waits for the necessary time to load the page correctly, making the process more efficient.
Custom Wait Conditions
As you're aware, both networkidle0
and networkidle2
in Puppeteer employ a default network idle time of 500-milliseconds, allowing the script to pause until all network requests have concluded and the network is in an idle state.
However, there are situations where a customized idle time may be more appropriate. This becomes crucial in scenarios where certain requests may take longer to resolve due to external factors, such as server-side processing or intermittent network fluctuations.
In these cases, relying on a fixed 500-millisecond idle time might result in premature script execution or unnecessary delays. It not only accommodates scenarios where longer idle times are required due to prolonged network activities but also optimizes scripts in situations where a shorter idle time is sufficient.
To address such cases, Puppeteer offers the page.waitForNetworkIdle(options)
method. This method introduces an options parameter, specifically idleTime
, allowing users to define a custom idle time in milliseconds.
In essence, waitForNetworkIdle()
furnishes a higher-level abstraction, encapsulating the underlying logic required to wait until there is no ongoing network activity.
await page.goto("https://bbc.co.uk");
await page.waitForNetworkIdle({idleTime: 750});
await page.screenshot({ path: `wait-for-network-idle.png` });
Wait for Selector
Up to this point, our discussion has revolved around page loading concerning the capture of viewport or full-sized screenshots.
However, there are instances when your focus is solely on a specific element, such as a crypto candlestick chart or a Power BI dashboard.
In these cases, rather than waiting for the entire page to load along with all associated events and API calls, a more efficient approach is to wait for that particular element to load and render.
This not only saves time but also ensures the proper rendering of the targeted element.
Puppeteer facilitates this process through its page.waitForSelector()
method, requiring a CSS selector as a parameter.
Notably, this method includes a {visible: true}
flag, instructing Puppeteer to wait until the element is present in the DOM tree and does not possess CSS properties like {display: none}
or {visibility: hidden}
.
Let's see an example where we await the loading of a crypto graph, capturing its screenshot using the waitForSelector()
method:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: { width: 1920, height: 1080 }
});
const page = await browser.newPage();
await page.goto("https://www.tradingview.com/markets/cryptocurrencies/");
const element = await page.waitForSelector(".tv-lightweight-charts", {visible: true});
await element.screenshot({ path: `crypto-graph.png` });
await browser.close();
})()
Wait for Page Navigation
Page navigation involves events such as clicking on links, submitting forms, or any action that triggers a change in the page's URL.
In Puppeteer, page.waitForNavigation()
is a valuable method designed to handle scenarios where a script needs to wait for the completion of page navigation before proceeding with further actions.
Additionally, this method also shares the same timeout
and waitUntil
options like page.goto()
, offering same functionality but specifically for the new page being navigated to.
Some of the use cases for page navigation can be summarized as:
-
Form Submissions: When automating form submissions, waiting for navigation ensures that subsequent interactions are performed on the fully loaded page, preventing premature actions.
-
Link Clicks: After triggering a click on a link, waiting for navigation becomes crucial to guarantee that the new page has fully loaded before executing additional steps.
-
Single Page Applications (SPAs): In SPAs where page content dynamically changes without a full page reload,
page.waitForNavigation()
synchronizes script execution with the application's state.
Here's an example wherein we navigate to the login page, input the username and password, and subsequently await the navigation and loading of the next page using the waitForNavigation()
method:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: { width: 1920, height: 1080 }
});
const page = await browser.newPage();
await page.goto("https://practicetestautomation.com/practice-test-login/");
await page.type('#username', 'student');
await page.type('#password', 'Password123');
await Promise.all([
page.click('#submit'),
page.waitForNavigation({ waitUntil: "load" }),
]);
await page.screenshot({ path: `login.png` });
await browser.close();
})()
The use of Promise.all()
concurrently handles both statements, avoiding race conditions and ensuring a smooth synchronization with the navigation events.
Wait for Timeout
To introduce a pause in script execution, allowing sufficient time for proper page loading, the page.waitForTimeout()
method was previously employed.
However, it has become deprecated in recent versions of Puppeteer and is not recommended to be used anymore. The alternative is to use setTimeout()
, which serves the same purpose.
Here is an example, where the setTimeout()
function ensures a delay of 15 seconds before capturing a screenshot on the Twitter page.
await page.goto("https://twitter.com/ScrapeOps");
await new Promise(resolve => setTimeout(resolve, 15000));
await page.screenshot({ path: `twitter.png` });
Wait for Function
Puppeteer's page.waitForFunction()
is designed to pause script execution until a specified function completes its evaluation within the page's context.
This functionality proves valuable in situations where custom script evaluation is necessary for waiting during page loading, providing a more tailored approach than relying solely on Puppeteer's built-in methods and events.
Let's explore an example where we wait for a specific DOM element to become visible on the screen before capturing its screenshot:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: { width: 1920, height: 1080 }
});
const page = await browser.newPage();
await page.goto("https://www.tradingview.com/markets/cryptocurrencies/");
await page.waitForFunction(() => {
const element = document.querySelector('.tv-lightweight-charts');
return element && element.offsetHeight > 0 && element.offsetWidth > 0;
});
await page.screenshot({ path: `crypto-market.png` });
await browser.close();
})()
Wait for XPath
In Puppeteer, page.waitForXPath()
is a method designed to wait for the presence of an XPath expression on the page before proceeding with further actions.
It allows you to wait for the presence of a specific XPath expression, be it an element or text content.
Here is an example on how to select an element with a specific text content in it:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: { width: 1920, height: 1080 }
});
const page = await browser.newPage();
await page.goto("https://example.com")
await page.waitForXPath('//a[contains(text(), "More information...")]')
console.log("Success!");
await browser.close();
})();
// Success!
Common Situations for Waiting for a Page or Element to Load in Puppeteer
Now that we've covered the most common waiting methods, let's delve into some typical scenarios where applying a combination of these methods and options can yield optimal solutions to address challenges related to page and element loading:
1. Waiting for a Specific Element to be Visible on the Page
In certain scenarios, capturing screenshots of specific DOM elements rather than the entire page is crucial:
-
Product Thumbnails in E-commerce: In an e-commerce site, you might want to capture screenshots of individual product thumbnails for quality assurance or visual documentation.
-
Form Submissions: After submitting a form, you may want to take a screenshot of a specific confirmation message or result element to verify the success of the form submission.
-
Dashboard Widgets: In a dashboard application, you might want to capture screenshots of individual widgets or components to monitor their appearance and updates independently.
-
User Profile Sections: In a social media platform, you might want to capture screenshots of specific user profile sections, such as profile pictures or bio information.
-
Graphs and Charts: When dealing with data visualization, capturing screenshots of specific graphs or charts allows for detailed inspection and monitoring of data trends.
The page.waitForSelector()
method with {visible: true}
flag proves instrumental in waiting for the presence and visibility of a specific element before proceeding with taking a screenshot.
2. Waiting for Page to be Ready for Button Clicks and Form Submissions
In scenarios where you need to interact with buttons or submit forms on a web page, it's crucial to ensure that the page has completed its navigation and is ready for the subsequent actions.
page.waitForNavigation()
proves invaluable in such cases, providing a means to pause the script until the page has fully loaded after a button click or form submission.
3. Waiting for Page Load Before Taking Screenshot
When you inspect the network tab in your browser's DevTools, you'll notice two essential DOM events occurring during the page load: load
and DOMContentLoaded
, each timestamped.
- The first event,
DOMContentLoaded
, occurs after the initial HTML has been loaded and parsed.
await page.goto(url, {waitUntil: "domcontentloaded"});
- The subsequent event, named
load
, transpires when additional elements like styles, fonts, and images have been fetched and integrated into the webpage.
await page.goto(url, {waitUntil: "load"});
These events serve as metrics, indicating when the page has finished loading, providing insights into the time required for a specific page to complete its loading process.
When Page Load is not Enough
Relying solely on domcontentloaded
and load
is not enough, especially in scenarios where websites keep requesting server to fetch new data, to change the web page dynamically.
In such cases networkidle0
and networkidle2
proves helpful.
- networkidle0: Waits for 500 ms until there are no more than 0 network connections. It is ideal when ensuring complete network idleness is crucial.
await page.goto(url, {waitUntil: ["load", "networkidle0"]});
- networkidle2: Waits for 500 ms until there are no more than 2 network connections. It waits until the majority of requests have settled, allowing a small number of connections for tasks such as sockets.
await page.goto(url, {waitUntil: ["load", "networkidle0"]});
- waitForNetworkIdle(): Helpful when a different idle time other than 500 ms is required. It's useful in scenarios where network requests take either less or more than 500 ms to execute.
await page.goto(url, {waitUntil: "load"});
await page.waitForNetworkIdle({idleTime: 250});
// Wait for 250 milli-seconds
4. Waiting for an API Call to Populate the Page Content
In scenarios where you expect a website to be fully loaded only after a specific API request or response, Puppeteer's page.waitForRequest()
and page.waitForResponse()
methods can be employed.
These methods accept a URL, such as an API endpoint, or a predicate function. The predicate function allow you to evaluate specific expressions.
For example, you can use this function to verify whether the desired data has been successfully received through the API request.
This approach is particularly useful when waiting for dynamic content to be populated on the page as a result of asynchronous API calls.
By incorporating these Puppeteer methods, you can synchronize your script with the completion of API requests, ensuring that the page is fully loaded and ready for capturing screenshot.
Here are examples illustrating the use of these methods:
- waitForRequest(url): Wait for a particular request to take place.
await page.goto("https://example.com");
await page.waitForRequest("https://example.com/some/resource");
await page.screenshot({path: "image.png"});
- waitForResponse(url, callback): Awaits the response to be fetched by the browser following the initiation of a request.
await page.goto("https://example.com");
await page.waitForResponse(res =>
res.url().match(/example.com/) && response.text().includes("<html>")
);
await page.screenshot({path: "image.png"});
Combining Waiting Strategies
In the realm of web automation, achieving robust and reliable scripts often demands a thoughtful combination of waiting strategies. This is crucial to address diverse scenarios and ensure precise synchronization with the dynamic behaviors of webpages.
A strategic approach involves adapting to variable loading times, optimizing wait durations, and effectively handling timeouts and exceptions.
Best Practices for Waiting in Puppeteer
When it comes to waiting in Puppeteer, there are several best practices to keep in mind:
-
Optimize Wait Times: Striking the appropriate balance in wait times is crucial to optimize automation efficiency. A useful strategy involves leveraging the browser's inspect tab to investigate the time duration required for page loading on a specific site. Additionally, the networkidle strategy, as discussed earlier, proves valuable for further optimizing waiting times during the automation process.
-
Avoid excessive waiting to improve efficiency: Waiting too long can slow down your automation and waste resources. To avoid this, you can use the
page.setDefaultTimeout()
method to set a maximum timeout for all wait methods. -
Handling exceptions: All the waiting methods we covered operate asynchronously and may encounter failures due to network issues or server side errors. It is advisable to encapsulate these methods within
try...catch
blocks to handle exceptions. Here's a code example demonstrating robust error handling:
try {
await Promise.all([
page.click('#submit'),
page.waitForNavigation({ waitUntil: "load" }),
]);
} catch (error) {
console.error('Navigation Unsuccessful!', error.message);
}
Conclusion
Puppeteer offers a variety of methods and options, serving as an effective initial solution and waiting strategy before proceeding with subsequent web scraping actions, such as capturing screenshots.
While Puppeteer provides ample methods for most websites, there may be instances where creating a custom waiting strategy or function becomes necessary for optimal synchronization before capturing a screenshot.
More Web Scraping Tutorials
If you would like to learn more about Web Scraping with Puppeteer, then be sure to check out The Puppeteer Web Scraping Playbook.
Or check out one of our more in-depth guides: