Selenium Undetected Chromedriver: Bypass Anti-Bots With Ease
In this guide for The Python Selenium Web Scraping Playbook, we will look at how to setup and use Seleniums Undetected ChromeDriver to bypass some of the most sophisticated anti-bot mechanisms on the market today like DataDome, Perimeterx and Cloudflare.
One of the main reasons developers use headless browsers like Selenium is because it can help bypass the anti-bot mechanisms websites use to detect and block web scraping.
However, in the last number of years there has been a surge in the usage of sophisticated anti-bot headless browsers from the browser fingerprints they reveal to the browser when making requests. As a result, developers need to fortify their browsers to hide these details and make their Selenium scrapers undetectable by anti-bot solutions.
So in this guide we will show you how to make your Selenium scrapers more undetectable (can never be 100% undetectable) using the Undetected ChromeDriver:
- What Is Selenium's Undetected Chromedriver?
- Installing Selenium Undetected Chromedriver
- Using The Selenium Undetected Chromedriver
- Adding Chrome Options To Undetected Chromedriver
- Using Proxies With Undetected Chromedriver
- Issues With Undetected Chromedriver
- Alternatives To Selenium Undetected Chromedriver
If you prefer to follow along with a video then check out the video tutorial version here:
What Is Selenium's Undetected Chromedriver?
The Selenium Undetected ChromeDriver is an optimized version of the standard ChromeDriver designed to bypass the detection mechanisms of most anti-bot solutions like DataDome, Perimeterx and Cloudflare.
The standard Selenium ChromeDriver leaks a lot of information that anti-bot systems can use to determine if it is an automated browser/scraper or a real user visiting the website.
The Selenium Undetected ChromeDriver fortifies the standard Selenium ChromeDriver by patching the vast majority of the ways anti-bot systems can use to detect your Selenium bot/scraper.
Making it much harder for anti-bot systems like DataDome, Imperva, Perimeterx, Botprotect.io and Cloudflare to detect and block your Selenium bot/scraper.
Installing Selenium Undetected Chromedriver
Installing the Seleniums Undetected ChromeDriver is very simple.
We just need to install the undetected-chromedriver package via pip:
pip install undetected-chromedriver
Now with the undetected-chromedriver installed we can setup our scraper/bot to use it instead of the default Chromedriver.
Using The Selenium Undetected Chromedriver
Using the undetected-chromedriver in our scraper is actually pretty simple.
We just need to import the undetected_chromedriver
and then activate it using uc.Chrome()
.
import undetected_chromedriver as uc
driver = uc.Chrome()
driver.get('https://distilnetworks.com')
From here we can use the chromedriver like we would use any other Selenium Chromedriver.
When we run uc.Chrome()
downloads the latest Chromedriver binary and then patches the chromedriver so that the vast majority of the ways anti-bot systems can use to detect your Selenium bot/scraper are fixed.
You could do this manually yourself, however, does this browser fortification process for you automatically once you download the Chromedriver.
Download Specific Chrome Versions
If your use case requires that you use a specific version of Chrome with your Selenium scraper then you can tell Undetected Chromedriver to download and patch that Chrome version instead of the latest version by using the uc.TARGET_VERSION
method before downloading the driver.
import undetected_chromedriver as uc
uc.TARGET_VERSION = 85
driver = uc.Chrome()
driver.get('https://distilnetworks.com')
Now when you run your scraper it will use Chrome version 85 as the browser.
Adding Chrome Options To Undetected Chromedriver
You can customize the undetected chromedriver like you would a normal chromedriver by setting the uc.ChromeOptions()
.
In the example below we set the Chromedriver to not load images when loading the page:
import undetected_chromedriver as uc
## Set Chrome Options
options = uc.ChromeOptions()
options.add_argument('--blink-settings=imagesEnabled=false')
## Create Undetected Chromedriver with Options
driver = uc.Chrome(options=options)
driver.get('https://distilnetworks.com')
The undetected-chromedriver changes a lot of the default chromedriver settings to make it less detectable to anti-bot systems. As a result, by changing the settings you can make your scrapers more detectable if you change a setting that has been optimized by the undetected-chromedriver patch to make your scraper less detectable.
Using Proxies With Undetected Chromedriver
You can also use the Undetected Chromedriver with proxies by setting the --proxy-server
argument in the ChromeOptions which will make your scraper even more undetectable as you can use a different IP address for every page you load.
import undetected_chromedriver as uc
## Example Proxy
PROXY = "11.456.448.110:8080"
## Set Chrome Options
options = uc.ChromeOptions()
options.add_argument(f'--proxy-server={PROXY}')
## Create Undetected Chromedriver with Proxy
driver = uc.Chrome(options=options)
## Send Request
driver.get('https://distilnetworks.com')
Using Authenticated Proxies With Undetected Chromedriver
The above method doesn't work if you need to use proxies that require username
and password
authentication.
It is very common for commercial proxy providers to provide access to their proxy pools by giving you single proxy endpoint that you send your requests to and authenticate your account using a username
and password
.
"http://USERNAME:PASSWORD@proxy-server:8080"
There are a couple ways to solve this, but one of the easiest is to use the Selenium Wire extension which makes it very easy to use proxies with Selenium.
In the below example we will load the undetected_chromedriver
from seleniumwire
instead of directly from the undetected-chromedriver package and pass the proxy settings into the seleniumwire_options
attribute of the Chromedriver.
import seleniumwire.undetected_chromedriver as uc
## Chrome Options
chrome_options = uc.ChromeOptions()
## Proxy Options
proxy_options = {
'proxy': {
'http': 'http://user:pass@ip:port',
'https': 'https://user:pass@ip:port',
'no_proxy': 'localhost,127.0.0.1'
}
}
## Create Chrome Driver
driver = uc.Chrome(
options=chrome_options,
seleniumwire_options=proxy_options
)
driver.get('https://distilnetworks.com')
Now when we run the script we Selenium will route the requests through the proxy URL using the undetected chromedriver.
Issues With Undetected Chromedriver
One issue to be aware of with the undetected_chromedriver library is that if you use the GUI (keyboard or mouse) to navigate in the browser then you can make the chromedriver detectable.
Instead, you need to browse programmatically (ie: using .get(url)
) to ensure your bot or scraper isn't detectable. For example:
In [1]: import undetected_chromedriver as uc
In [2]: driver = uc.Chrome()
In [3]: driver.execute_script('return navigator.webdriver')
Out[3]: True # Detectable
In [4]: driver.get('https://distilnetworks.com') # starts magic
In [4]: driver.execute_script('return navigator.webdriver')
In [5]: None # Undetectable!
Another issue is creating New Tabs. If you really need multi-tabs, then open the tab with the blank page and do your thing as usual.
If you follow these "Rules" (undetected chromedriver's default behaviour), then you should have no issues.
Alternatives To Selenium Undetected Chromedriver
If you want to use the undetected_chromedriver to scrape websites protected by Cloudflare, DataDome, Perimeterx, etc. then it can work well, however, it does have its downsides. Namely:
- Detection: One of the issues with open sourced fortified headless browsers like undetected_chromedriver is that anti-bot companies can see how they bypass their anti-bot protections systems and easily patch the issues that they exploit. Meaning that they only have a couple months of shelf life before open source fortified browsers they stop working and need to be patched again.
- Stability: When running headless browsers like undetected_chromedriver at scale it is common to run into stability issues as they consume a lot of memory and can easily crash servers. So managing them at scale can be quite a challenge.
- Bandwidth/Cost: Headless browsers can consume a lot of bandwidth as they render the full page which can really drive up your scraping costs if you are using a proxy service that prices based on the bandwidth consumed.
So if you are thinking of using undetected_chromedriver solely to bypass a anti-bot system on a website then an alternative is to use smart proxies that develop and maintain their own private anti-bot bypasses.
These are typically more reliable as it is harder for anti-bot companies like Cloudflare to develop detections for, as they are developed by proxy companies who are financially motivated to stay 1 step ahead of anti-bot companies and fix their bypasses the very minute they stop working.
One of the best options is the ScrapeOps Proxy Aggregator as it integrates over 20 smart proxy providers into the same proxy API, and finds the best/cheapest proxy provider for your target domains.
You can activate ScrapeOps' Anti-Bot Bypasses by simply using the bypass
flag to your API request.
For example, the the below code we will use the Cloudflare bypass by adding bypass=cloudflare_level_1
to the request:
import requests
response = requests.get(
url='https://proxy.scrapeops.io/v1/',
params={
'api_key': 'YOUR_API_KEY',
'url': 'http://example.com/', ## Cloudflare protected website
'bypass': 'cloudflare_level_1',
},
)
print('Body: ', response.content)
Cloudflare is the most common anti-bot system being used by websites today, and bypassing it depends on which security settings the website has enabled.
To combat this, we offer 3 different Cloudflare bypasses designed to solve the Cloudflare challenges at each security level.
Security Level | Bypass | API Credits | Description |
---|---|---|---|
Low | cloudflare_level_1 | 10 | Use to bypass Cloudflare protected sites with low security settings enabled. |
Medium | cloudflare_level_2 | 35 | Use to bypass Cloudflare protected sites with medium security settings enabled. On large plans the credit multiple will be increased to maintain a flat rate of $3.50 per thousand requests. |
High | cloudflare_level_3 | 50 | Use to bypass Cloudflare protected sites with high security settings enabled. On large plans the credit multiple will be increased to maintain a flat rate of $4 per thousand requests. |
The advantage of taking this approach is that you can use your normal HTTP client and don't have to worry about:
- Fortifying headless browsers
- Managing numerous headless browser instances & dealing with memory issues
- Reverse engineering the anti-bot protection systems
As this is all managed within the ScrapeOps Proxy Aggregator.
You can get a ScrapeOps API key with 1,000 free API credits by signing up here.
More Web Scraping Tutorials
So that's how to setup and use Selenium's undetected_chromedriver to scrape websites without getting blocked by anti-bot systems.
If you would like to learn more about Web Scraping with Selenium, then be sure to check out The Selenium Web Scraping Playbook.
Or check out one of our more in-depth guides: