Playwright Guide: Managing Cookies
Cookies are small pieces of data stored in the browser that contain information about user sessions and preferences. When automating browsers with Playwright, properly managing cookies is essential for simulating real user interactions.
This guide covers how to get, set, modify, and delete cookies using Playwright's APIs. It also provides tips for handling cookies in real-world automation scenarios.
- TLDR - How to Manage Cookies in Playwright
- Understanding Cookies
- How to Get Cookies with Playwright
- How to Accept Cookie Consent Prompts
- How to Save Cookies Locally with Playwright
- How to Load Cookies with Playwright
- How to Delete Cookies with Playwright
- Working with Session Cookies in Playwright
- Why Handle Cookies In Playwright
- Best Practices for Managing Cookies with Playwright
- Troubleshooting Cookie-Related Issues
- Cookie Handling in Production
- Limitations of Playwright in Managing Cookies
- Why Managing Cookies is Important in Web Scraping
- Conclusion
- More Web Scraping Guides
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
TLDR - How to Manage Cookies in Playwright
Here is a quick code snippet to get, modify, and clear cookies with Playwright:
// Get all cookies
const cookies = await context.cookies();
// Set a new cookie
await context.addCookies([
{
name: "session",
value: "1234",
url: "https://www.example.com",
},
]);
// Clear cookies
await context.clearCookies();
Playwright provides several approaches to manage cookies programmatically:
cookies()
: Get all cookies for current page sessionaddCookies()
: Set a new cookie by passing cookie objectdeleteCookies()
: Delete specific cookiesclearCookies()
: Delete all cookies for current sessionstorageState()
: Persist cookies across sessions
Understanding Cookies
Cookies are small text objects stored in the browser containing data about a user's session. Websites use cookies to remember information between page visits, like items in a shopping cart or whether a user is logged in.
The browser may store it and send it back with later requests to the same server. This is how the server knows that two requests come from the same browser, allowing it to maintain a consistent user session.
Here are some key points about cookies:
- An HTTP cookie is a small piece of data that a server sends to a user's web browser.
- It is used to tell if two requests come from the same browser — allowing the server to maintain a consistent user session.
- A single cookie should not exceed 4 KB in size. However, a domain can store multiple cookies, each up to this size.
The anatomy of a cookie includes the name, value, expiry date, the path, and the domain.
- The name is the identifier for the cookie,
- the value holds the actual data,
- the expiry date tells when the cookie will be deleted,
- the path defines the scope within the site, and
- the domain tells which site the cookie belongs to.
Why Managing Cookies is Important in Web Scraping
Cookies play a crucial role in web applications. They help maintain session state and user preferences, and they can also be used for tracking and personalization.
In automated testing, managing cookies is important for testing different scenarios and user states. For example, you might want to test how your application behaves for a logged-in user versus a guest user.
There are several scenarios where cookie management is critical. These include testing login functionality, testing personalized content, and ensuring that session data is being handled correctly.
How are Cookies Stored and Accessed by the Browser?
You can view cookies from your browser's developer tools. For example, in Chrome, you can right click the page and click Inspect. Then navigate to the Application tab. Then you can click the Cookies drop down and see them sorted by domain.
To delete cookies from your browser, go to the same location. You can right click a domain to clear all or right click a single cookie to delete it.
Types of Cookies
There are two primary types, session cookies and persistent cookies. Session cookies expire when the browser is closed while persistent cookies have an expiration date.
-
Session Cookies: These are temporary cookies that are erased when you close your browser. They do not collect information from your computer and are typically used to track your movements among the pages of a website, allowing you to keep items in your shopping cart during a session, for example.
-
Persistent Cookies: These cookies remain on your hard drive until they expire or are deleted. They can be used to recognize you when you return to the website and to remember your preferences.
How to Get Cookies with Playwright
To get cookies with Playwright, we use the context.cookies()
method which returns an array of cookie objects for the current context.
For example:
await page.goto("https://www.example.com/");
const cookies = await context.cookies();
console.log(cookies); // prints array of ALL cookie objects
Playwright manages cookies through the browser contexts. Each browser context has its own storage for cookies that are separate from other contexts.
The context object provides methods like cookies()
, addCookies()
, and clearCookies()
to get, set, and delete the cookies for that particular context.
Note that multiple pages can belong to a singular context object and therefore the cookies from that context may belong to any of the opened pages.
How To Get & Filter Cookies With Playwright
After retrieving the cookies, you can filter them based on your needs (e.g., by name, domain, etc.), for example:
const cookies = await context.cookies();
const sessionCookie = cookies.find((c) => c.name === "session_id");
How To Get Specific Cookies From Url With Playwright
To get cookies for a specific domain, pass the URL to cookies()
:
const googleCookies = await context.cookies(["https://www.google.com"]);
An Example of getting cookies.
As a full example, consider GoodReads.com. The following code will navigate to their website, get all cookies from the context, search for the locale
cookie which contains the user language, and finally look for cookies specifically from goodreads.com
so that third party / tracking cookies are filtered out.
const { chromium } = require("playwright");
(async () => {
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
await page.goto("https://www.goodreads.com/");
// Get all cookies
const cookies = await context.cookies();
console.log(cookies);
const sessionCookie = cookies.find((c) => c.name === "locale");
console.log(sessionCookie);
const goodreadCookies = await context.cookies(["https://www.goodreads.com/"]);
console.log(googleCookies);
await browser.close();
})();
You should see three sets of output, the first being all cookies, the second being our locale cookie specifically and the last being the filtered cookies from goodread.com only. All of them will follow the format below
{
name: 'locale',
value: 'en',
domain: 'www.goodreads.com',
path: '/',
expires: -1,
httpOnly: false,
secure: false,
sameSite: 'Lax'
}
How To Get & Filter Cookies With Playwright
- After retrieving the cookies, you can filter them based on your needs (e.g., by name, domain, etc.).
const allCookies = await context.cookies();
const specificCookie = allCookies.filter(
(cookie) => cookie.name === "session_id"
);
console.log(specificCookie);
How to Accept Cookie Consent Prompts
Websites seek user consent for storing cookies primarily due to privacy regulations. Here are the two main regulations:
- General Data Protection Regulation (GDPR):
- This is a regulation in EU law on data protection and privacy in the European Union and the European Economic Area. It also addresses the transfer of personal data outside these areas.
- The GDPR aims to give control to individuals over their personal data and to simplify the regulatory environment for international business.
- Under GDPR, websites must inform users about the data they collect, why they collect it, and how they use it. They must also get explicit consent from users to collect this data.
- The ePrivacy Directive:
- Also known as the Cookie Law, it's an EU directive that requires websites to get consent from visitors to store or retrieve any information on a computer, smartphone, or tablet.
- It was designed to protect online privacy, by making consumers aware of how information about them is collected and used online, and give them a choice to allow it or not.
Working with these consent prompts will depend on the website, you will have to develop a custom solution for them.
As an example, we will be giving consent to the European Union website.
const { chromium } = require("playwright");
(async () => {
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
await page.goto("https://european-union.europa.eu/index_en");
const acceptBtn = await page.waitForSelector("[href='#accept']");
await acceptBtn.click();
console.log(await context.cookies());
await browser.close();
})();
The above code navigates to the EU website, waits for the accept button (identified by the [href='#accept']
selector) and clicks it. Then you can see the saved cookie is output to console.
How to Save Cookies Locally with Playwright
Now that you know how to get cookies from a playwright context, it may be useful to save them as well. You can use the fs
module in NodeJS to write the cookie objects to a file as JSON.
For example, the following code will write our GoodReads cookies to a cookies.json
file.
const { chromium } = require("playwright");
const fs = require("fs/promises");
(async () => {
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
await page.goto("https://www.goodreads.com/");
// Get all cookies
const cookies = await context.cookies();
console.log(cookies);
// Save them to a json file
await fs.writeFile("cookies.json", JSON.stringify(cookies));
await browser.close();
})();
We fetch the cookies, then using the fs.writeFile
method we save them. Note that we are using fs/promises
in the require
statement so that we can properly await it. Similarly, you could use the writeFileSync
method to achieve the same result.
How to Load Cookies with Playwright
Loading cookies works in reverse of adding them. Rather than getting cookies then writing them to a file we want to get cookies from a file and write them to the browser.
To do this, we can use fs.readFile
and context.addCookies()
as shown in the example below.
const { chromium } = require("playwright");
const fs = require("fs/promises");
(async () => {
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
await page.goto("https://www.example.com/");
// Get all cookies
const cookies = await fs.readFile("cookies.json", "utf-8");
// Load them from a json file
await context.addCookies(JSON.parse(cookies));
console.log(await context.cookies());
await browser.close();
})();
The above code navigates to example.com and reads the contents of the cookies.json
file. Then we use JSON.parse()
to parse the file contents back to an array of JSON objects which we can pass to addCookies
. Finally, you see we print our previously saved GoodReads cookies even though this browser has only visited example.com.
How to Delete Cookies with Playwright
There are many reasons you may need to delete cookies, for example:
- New Session: Clearing cookies allows you to start your automation with a clean slate, simulating a fresh user session.
- Different Login Credentials: Resetting specific user data, like login credentials, to scrap web pages with different login information.
You can choose to either delete all cookies or just specific ones.
Delete All Cookies
- If you want to delete all cookies in the current context, you can simply clear the browser context's cookies:
const { chromium } = require("playwright");
(async () => {
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
await page.goto("https://www.goodreads.com/");
// Get all cookies
const cookies = await context.cookies();
console.log(cookies);
// Clear all cookies
await context.clearCookies();
// Check that cookies are cleared
const cookiesAfterClear = await context.cookies();
await browser.close();
})();
Delete a Specific Cookie
To delete a specific cookie you have to set the expires
field to 0
or a time in the past. To do this you can use the addCookies
method to overwrite existing ones, for example:
const { chromium } = require("playwright");
(async () => {
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
await page.goto("https://www.goodreads.com/");
// Get all cookies
const cookies = await context.cookies();
console.log(cookies);
// Delete a cookie
const cookieToDelete = cookies.find((cookie) => cookie.name === "locale");
await context.addCookies([
{
...cookieToDelete,
expires: 0,
},
]);
// Check that cookie is deleted
const cookiesAfterDelete = await context.cookies();
console.log(cookiesAfterDelete);
await browser.close();
})();
Working with Session Cookies in Playwright
Session cookies are a type of cookies that are stored in the user's browser only until the browser is closed. They are used to track the user's activity for the duration of their visit. Unlike persistent cookies, they are not stored on the user's computer and do not have an expires or maxAge attribute.
In Playwright, you can create a session cookie by simply not specifying the expires or maxAge properties when you use the context.addCookies()
method.
Here's a code example:
const { chromium } = require("playwright");
(async () => {
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
// Add a session cookie
await context.addCookies([
{
name: "session_cookie",
value: "123456",
domain: "example.com",
path: "/",
httpOnly: true,
secure: true,
sameSite: "None",
},
]);
// Verify the cookie is set
const cookies = await context.cookies();
console.log(cookies);
await browser.close();
})();
Why Handle Cookies In Playwright
Handling cookies in Playwright is crucial for various reasons, primarily revolving around website functionality, user experience, and testing scenarios.
Here are some key reasons why handling cookies in Playwright is important:
- Session Persistence: Use cookies to manage user sessions in web apps during automation or testing with Playwright.
- Authentication: Retrieve cookies storing authentication tokens to replicate a logged-in user state.
- Testing Cookie-Based Features: Manipulate and verify cookies to test features that depend on them, like shopping carts or user preferences.
- Web Scraping: Manage cookies to access information or mimic user behavior on websites that use cookies for session tracking or user-specific data.
- Avoiding Repeated Logins: Save cookies between sessions to avoid repeated logins and reduce the risk of triggering anti-bot measures.
- Bypassing Rate Limits or IP Blocks: Manage cookies to avoid being rate-limited or blocked, as some websites track user visits or actions via cookies.
- Bypassing Anti-Bot Systems: Use Playwright to harvest valid session cookies to bypass website anti-bot systems when scraping pages or accessing internal API endpoints.
- Testing Different User Roles: Manage cookies to easily switch between user accounts when testing a web application from different user perspectives.
- Compliance Testing: Use Playwright to automate cookie usage and management checks, ensuring website compliance with regulations like GDPR or CCPA.
Best Practices for Managing Cookies with Playwright
When working with cookies in Playwright, adhering to best practices ensures efficient, secure, and reliable handling of user data. Here are some recommended best practices for managing cookies in Playwright:
- Update Cookies Regularly: When acquiring cookies, it's crucial to take into account their expiration date (
expires
key). Waiting for a few days and collecting them a new before commencing work on a target website is advisable. - Consider Using Separate Browser Contexts: For more isolation between tasks, consider using separate browser contexts for different tasks. Each context has its own set of cookies, providing a cleaner separation of state.
- Test in Headless Mode: Test your Playwright scripts in headless mode, as this is closer to how your automation will run in production. This helps uncover potential issues early in the development process.
- Handle Errors: Implement error handling to manage potential issues when interacting with cookies. This helps prevent script failures and improves the robustness of your automation.
Troubleshooting Cookie-Related Issues
Troubleshooting cookie-related issues in Playwright involves identifying and resolving issues that arise during cookie manipulation, storage, or usage.
- Use the Right Methods: Playwright provides several methods to manage cookies, such as
context.cookies()
,context.addCookies()
, andcontext.clearCookies()
. Use these methods appropriately to retrieve, set, or clear cookies. - Handle Session Persistence: To maintain a consistent user session, you can retrieve cookies after a user logs in and then set these cookies in new browser contexts or pages.
- Manage Authentication: If a website uses cookies for authentication, retrieve these cookies after a user logs in. You can then use these cookies to replicate the user's authenticated state.
- Test Cookie-Based Features: To test features that depend on cookies, manipulate cookies using
context.addCookies()
and verify them usingcontext.cookies()
. - Bypass Anti-Bot Measures: To bypass anti-bot measures, try to mimic human-like behavior. For example, avoid clearing cookies too frequently, as this is a common behavior of bots.
Cookie Handling in Production
As an example of handling cookies in the real world, we will log into a website on one browser then transfer the authentication cookie to a new browser so we can access protected content without logging in again.
const { chromium } = require("playwright");
(async () => {
// Open a new browser
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
// Navigate to the login page and log in
await page.goto("https://the-internet.herokuapp.com/login");
await page.fill("#username", "tomsmith");
await page.fill("#password", "SuperSecretPassword!");
await page.click(".fa-sign-in");
// Wait for navigation to ensure login is complete
await page.waitForURL("**/secure**");
// Get the authentication cookie
const cookies = await context.cookies();
const authCookie = cookies.find((cookie) => cookie.name === "rack.session");
// Close the current browser
await browser.close();
// Open a new browser
const newBrowser = await chromium.launch();
const newContext = await newBrowser.newContext();
const newPage = await newContext.newPage();
// Add the authentication cookie
await newContext.addCookies([authCookie]);
// Navigate to the protected content
await newPage.goto("https://the-internet.herokuapp.com/secure"); // replace with your protected content URL
console.log(await newPage.innerText("h4"));
// Close the new browser
await newBrowser.close();
})();
Limitations of Playwright in Managing Cookies
While Playwright offers robust capabilities for managing cookies in browser automation, it also has certain limitations that you should be aware of:
- httpOnly Cookies: Unlike Puppeteer, Playwright's
context.cookies()
andpage.context().cookies()
methods can access both HTTP-only and non-HTTP-only cookies. HTTP-only cookies, which are intended to enhance security, are typically inaccessible to client-side scripts and are only sent to the server with HTTP requests, reducing the risk of cross-site scripting (XSS) attacks. However, Playwright operates at a deeper level within the browser, enabling it to retrieve HTTP-only cookies. This capability is particularly useful for automation and testing purposes, allowing verification and interaction with all types of cookies, including those marked as HTTP-only. - No Cookie Events: Playwright does not provide built-in events to listen for changes in cookies. Developers need to manually handle scenarios where cookies might be modified during script execution.
- Cross-Browser Context Limitations: Cookies set in one browser context, like incognito mode, aren't automatically accessible in another context, leading to potential inconsistencies. This arises because each browser context maintains its distinct set of cookies, lacking automatic synchronization between them.
Conclusion
In this article, we've explored the concept of browser cookies, their types, and how they are stored and accessed by the browser. We've learned that cookies are small pieces of data stored on our devices by websites, and they play a crucial role in maintaining session state, storing user preferences, and tracking user behavior.
We've also delved into the importance of managing cookies in web scraping and automated testing. Efficient cookie management can help us test different user states and scenarios, such as logged-in vs guest users, and ensure that our applications handle session data correctly.
Understanding and managing cookies is a vital skill in web development and testing. Whether you're building a web application, testing a website, or scraping data from the web, a solid grasp of cookies and how to manage them can greatly enhance your efficiency and effectiveness.
More Web Scraping Guides
Want to learn more? Checkout the links below: