Skip to main content

Using Fake User Agents with Playwright

Playwright: Using Fake User Agents

User agents are strings that identify the browser and device making requests to websites. When automating browsers with Playwright, using custom or fake user agents can be useful for bypassing bot detection and accessing user agent-specific content.

This guide covers several methods for setting and managing fake user agents with Playwright.


How To Use Fake User-Agents In Playwright

Here is a simple example of directly setting a fake user agent string when launching a Playwright browser context:

const context = await browser.newContext({
userAgent:
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
});

This approach allows complete control over the user agent value. However, the drawback is you have to manually manage a pool of strings. Later sections cover more robust and automated methods.


What Are Fake User-Agents?

A user agent string identifies details about the browser and operating system to servers. For example:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36

The various components provide information on compatibility, browser engine, browser name and version, etc.

The server can use this information to tailor its response to the specific capabilities and preferences of the client, such as rendering web pages in a format suitable for the user's browser.

Fake user agents, also known as spoofed user agents or masqueraded user agents, are user agent strings that intentionally misrepresent the identity or characteristics of the client making the request.

The purpose of using fake user agents can vary, but some common reasons include:

  1. Bypassing restrictions: Some websites or online services may have restrictions or limitations based on the user agent, such as blocking access to certain content for specific browsers or operating systems. By using a fake user agent, users can attempt to bypass these restrictions.

  2. Testing and debugging: Web developers and testers may use fake user agents to simulate requests from different browsers or devices during testing and debugging processes.

  3. Web scraping: When scraping data from websites, using fake user agents can help disguise the automated nature of the requests, making it more difficult for the website to detect and block the scraping activity.

  4. Privacy and anonymity: In some cases, users may use fake user agents as part of efforts to maintain privacy or anonymity online by obfuscating their true browsing environment.

User-Agent String Components

  • Mozilla/5.0: This part represents the product token and version. In this case, it indicates compatibility with Mozilla, and the number (5.0) is a reference to the version.
  • (X11; Linux x86_64): These are the comments or comments within parentheses. They provide additional information about the user's operating system and environment. In this example, it specifies that the browser is running on the X Window System on a 64-bit Linux system.
  • AppleWebKit/537.36 (KHTML, like Gecko): This part identifies the browser engine. In this case, it's the WebKit engine, which is used by browsers like Chrome and Safari. The "KHTML, like Gecko" is historical and indicates compatibility with KHTML (used by Konqueror) and Gecko (used by Firefox).
  • Chrome/120.0.0.0: This part specifies the browser and its version. In this example, it indicates that the browser is Chrome, and the version is 120.0.0.0.
  • Safari/537.36: This part further mentions compatibility, indicating that the browser is like Safari (as both Chrome and Safari use the WebKit engine). The version number is also provided.
  • For example, see your own user agent by visiting https://useragentstring.com.

User Agent breakdown from useragentstring.com

User agents are one of the most commonly checked fields to detect bot requests. Fake user agents mimic real browser user agents to bypass anti-bot systems. By spoofing user agents, scrapers can access content as if from a real browser.

User-Agents & Anti-Bots

Websites can use the user agent string sent by the client's browser to identify the type of browser, operating system, and sometimes even the device being used to access the site.

Based on this information, the website can dynamically adjust the content it serves to optimize the user experience. For example, a website might serve a mobile-friendly version of its pages to users accessing it from a smartphone, while serving a desktop version to users on a computer.

Websites often check the user agent against lists of known bots and headless browsers to deny access. Similarly not providing a user-agent when using request libraries cause the same issue. By rotating fake user agents that mimic real ones, scrapers can bypass these protections.

Fake user agents can be valuable tools in web scraping and testing scenarios:

  • In web scraping, using fake user agents can help avoid detection by websites that attempt to block or limit automated scraping activities.
  • By mimicking the user agent strings of legitimate browsers, scraping bots can appear more like human visitors, reducing the likelihood of being detected and blocked by anti-scraping measures.
  • Similarly, in testing environments, fake user agents allow developers to simulate requests from various browsers or devices, ensuring that web applications function correctly across different environments.

Playwright uses predictable default user agents that can be easy to detect. Setting custom ones helps improve reliability when scraping sensitive sites.

By continuously rotating or randomizing user agent strings, scraping scripts can avoid detection by anti-scraping measures that specifically target static or well-known user agents.


How Playwright Manages User-Agents

To understand default user-agents in Playwright, we can run the following script:

const { chromium } = require("playwright");

async function displayUserAgent() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext();
const page = await context.newPage();

// Get the user-agent string in headless mode
const headlessUserAgent = await page.evaluate(() => navigator.userAgent);
console.log("Headless Mode User-Agent:", headlessUserAgent);

await browser.close();

const browser2 = await chromium.launch({ headless: false });
const context2 = await browser2.newContext();
const page2 = await context2.newPage();

// Get the user-agent string in headful mode
const headfulUserAgent = await page2.evaluate(() => navigator.userAgent);
console.log("Headful Mode User-Agent:", headfulUserAgent);

await browser2.close();
}

displayUserAgent();

This script outputs:

Headless Mode User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/121.0.6167.57 Safari/537.36
Headful Mode User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36

As you can see, the output of the headless user agent includes HeadlessChrome which may give away that our browser is not a real user.


Use Random User-Agent for Each Session

Now that you've seen what the default user agents look like you may see some reasons why you should provide your own.

But as mentioned, it is difficult to maintain and use your own pool of user agents so instead we can use the random-useragent package to select random real browser agents.

Here is an example of that.

const { chromium } = require("playwright");
const randomUserAgent = require("random-useragent");

async function displayUserAgent() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
userAgent: randomUserAgent.getRandom(),
});
const page = await context.newPage();

// Get the user-agent string
const userAgent = await page.evaluate(() => navigator.userAgent);
console.log("User-Agent:", userAgent);

await browser.close();
}

displayUserAgent();

Running that script a couple times you will see that it gives a new user agent each time and handles the user agent management for us.


Use Playwright Stealth Plugin

The Playwright Stealth Plugin is a utility tool designed to enhance web scraping and automation capabilities by mitigating detection and blocking mechanisms commonly employed by websites.

It aims to make automated browser interactions more stealthy and indistinguishable from human browsing behavior.

For comprehensive bot detection evasion, tools like puppeteer-extra-plugin-stealth provide user agent spoofing along with other tricks:

const { chromium } = require("playwright-extra");

const stealth = require("puppeteer-extra-plugin-stealth")();
chromium.use(stealth);

const browser = await chromium.launch();

Stealth handles everything under the hood. But custom user agents can still be set if needed.

For more information, read our article on the extra stealth plugin.


Obtaining User-Agent Strings

Several websites provide a vast collection of User-Agent strings. Here are a couple of them:

  • useragentstring.com: This website provides a comprehensive list of User-Agent strings for various browsers, operating systems, and devices. You can use it to find User-Agent strings for specific configurations.

  • whatismybrowser.com: This website shows your current User-Agent string. It's a handy tool for quickly finding out what User-Agent string your browser is sending with its requests.

ScrapeOps Fake User-Agent API

ScrapeOps provides a Fake User-Agent API that you can use to obtain User-Agent strings. This API provides a simple way to get random, valid User-Agent strings for your web scraping or testing needs.

Here's a simple example of how to use the API with Node.js:

const axios = require("axios");

async function getUserAgents() {
const response = await axios.get(
"http://headers.scrapeops.io/v1/user-agents?api_key=YOUR_API_KEY"
);
return response.data["results"];
}

getUserAgent().then((userAgentArray) => console.log(userAgentArray));

For more information about the Scrape Ops fake user agent API, please read the docs.


Troubleshooting Guide

Setting user agents in Playwright can present challenges:

  • Bot Detection: Some sites can still identify and block headless browsers, even with a changed user agent.
  • Mismatched User Agent and Browser Features: A user agent that doesn't align with the browser's capabilities can raise suspicion.
  • Site-Specific Issues: Certain sites may need a specific user agent for correct rendering or functionality.
  • Inconsistent User Agents: User Agents should be consistent across requests or when using authorization across multiple Playwright instances.
  • Ignoring Other Browser Signatures: Focusing only on the user agent and neglecting other headers can make your bot more detectable.
  • Extension or Plugin Problems: Changing the user agent might disrupt extensions or plugins that rely on specific browser characteristics.
  • Cache and Session Inconsistencies: Altering user agents within the same session can cause inconsistencies in cached data or session info, leading to unexpected website behavior.
  • Automation Testing Challenges: Different user agents can affect page rendering and interaction, potentially causing inaccuracies in tests.

To mitigate these issues, manage user agent strings carefully, consider other browser characteristics, maintain consistency in your sessions, and thoroughly test your Playwright scripts.


Conclusion

Custom user agents are pivotal for evading bot detections when scraping. Playwright offers multiple techniques to manage user agents effectively.

For more information, check out the official Playwright Docs.


More Web Scraping Guides

To learn more about how to use Playwright for web scraping and how to avoid detection, read our NodeJS Playwright Guide.

For further reading, checkout the links below: