Selenium Wire: Extending Python Selenium
Selenium Wire is a powerful extension for Python Selenium that gives you access to the underlying requests made by your Selenium browser.
It enables a lot of more advanced functionality that you might need when scraping/botting specific websites.
In this guide for The Python Selenium Web Scraping Playbook, we will walk through how to use Selenium Wire and its most common use cases:
- What Is Selenium Wire?
- Installing Selenium Wire
- Using Selenium Wire With Your Selenium Scrapers/Bots
- Using Proxies With Selenium Wire
- Selenium Wire Request/Response Objects
- Intercepting Requests and Responses With Selenium Wire
- Modifying Request Headers
- Modifying Response Headers
- Modifying Request Parameters
- Basic Authentication
- Blocking Requests
- Mocking Responses
- Bypassing Anti-Bots With Undetected Chromedriver
- More Selenium Wire Functionality & Options
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
What Is Selenium Wire?
Selenium Wire is a extension for Python Selenium that gives you access to the underlying requests made by the browser.
When enabled Selenium Wire spins up a server in the background that intercepts all the requests and responses your Selenium browser makes. This gives you additional capabilities for inspecting and modifying requests and responses on the fly.
You write your Selenium code as you would normally do, and Selenium Wire gives you the additional functionality via the APIs it exposes to you.
Installing Selenium Wire
Installing the Selenium Wire is very simple.
We just need to install the Selenium Wire package via pip:
pip install selenium-wire
Now with the selenium-wire installed we can setup it up with our Selenium driver.
Using Selenium Wire With Your Selenium Scrapers/Bots
To use Selenium Wire within your Selenium scrapers/bots you just need to import the Selenium webdriver
from seleniumwire
instead of selenium
.
With this one change you can use Selenium Wire as you would vanilla Selenium:
from seleniumwire import webdriver # Import from seleniumwire
# Create a new instance of the Chrome driver
driver = webdriver.Chrome()
# Go to page
driver.get('https://www.example.com')
In the above example, we use seleniumwire
instead of selenium
, however, we don't use of any of Selenium Wires additional capabilities.
In the following sections we will show you the most common use cases for Selenium Wire.
Using Proxies With Selenium Wire
Probably the most common use case for Selenium Wire is that it enables the use of authenticated proxies with your Selenium bots/scrapers.
The default method to use proxies with vanilla Selenium doesn't work if the proxy needs to be authenticated (i.e. requires username
and password
authentication).
from selenium import webdriver
## Example Proxy
PROXY = "11.456.448.110:8080"
## Create WebDriver Options to Add Proxy
chrome_options = WebDriver.ChromeOptions()
chrome_options.add_argument(f'--proxy-server={PROXY}')
chrome = webdriver.Chrome(chrome_options=chrome_options)
## Make Request Using Proxy
chrome.get("https://httpbin.org/ip")
As a result, the most common fix for this is to use Selenium Wire's Proxy functionality as it allows for proxy authentication.
from seleniumwire import webdriver
## Define Your Proxy Endpoints
proxy_options = {
'proxy': {
'http': 'http://USERNAME:PASSWORD@proxy-server:8080',
'https': 'http://USERNAME:PASSWORD@proxy-server:8080',
'no_proxy': 'localhost:127.0.0.1'
}
}
## Set Up Selenium Chrome driver
driver = webdriver.Chrome(seleniumwire_options=proxy_options)
## Send Request Using Proxy
driver.get('https://httpbin.org/ip')
Now when we run the script we can use authenticated proxies with Selenium webdriver.
Selenium Wire Request/Response Objects
One of the core features of Selenium Wire that is provides you with Request and Response objects.
Selenium Wire captures all HTTP/HTTPS traffic made by the browser and then provides access to them using the Request and Response objects.
For example the driver.requests
command will return a list of requests in chronological order.
From here you can access the data of each individual equest. For example, here are the attributes of Request object:
Attribute | Description |
---|---|
body | The request body as bytes. If the request has no body the value of body will be empty, i.e. b'' . |
cert | Information about the server SSL certificate in dictionary format. Empty for non-HTTPS requests. |
date | The datetime the request was made. |
headers | A dictionary-like object of request headers. |
host | The request host, e.g. www.example.com |
method | The HTTP method, e.g. GET or POST etc. |
params | A dictionary of request parameters. If a parameter with the same name appears more than once in the request, it's value in the dictionary will be a list. |
path | The request path, e.g. /some/path/index.html |
querystring | The query string, e.g. foo=bar&spam=eggs |
response | The response object associated with the request. This will be None if the request has no response. |
url | The request URL, e.g. https://www.example.com/some/path/index.html?foo=bar&spam=eggs |
Here are the attributes of Response object:
Attribute | Description |
---|---|
body | The request body as bytes. If the request has no body the value of body will be empty, i.e. b'' . |
date | The datetime the request was made. |
headers | A dictionary-like object of request headers. |
reason | The reason phrase, e.g. OK or Not Found etc.. |
status_code | The status code of the response, e.g. 200 or 404 etc. |
Selenium Wire also lets you intercept and modify these Request and Response objects on the fly opening up some useful functionality which we will walk through some examples of next.
Intercepting Requests and Responses With Selenium Wire
Selenium Wire enables you intercept and modify these Request and Response objects on the fly by adding driver.request_interceptor
and driver.response_interceptor
attributes before you start using the driver.
- A request interceptor should accept a single argument for the request.
- A response interceptor should accept two arguments, one for the originating request and one for the response.
To use request and response interceptors you just need to set a interceptor function to either the request_interceptor
or response_interceptor
.
def my_request_interceptor(request):
## Do something with the request
pass
def my_response_interceptor(request, response):
# A response interceptor takes two args (request, response)
## Do something
pass
## Set Request Interceptor
driver.request_interceptor = my_request_interceptor
## Set Response Interceptor
driver.response_interceptor = my_response_interceptor
## Make requests
driver.get('https://www.example.com')
To remove an interceptor, use del
:
del driver.request_interceptor
del driver.response_interceptor
Next we will walk through some example use cases of using request and response interceptors.
Modifying Request Headers
Using the request_interceptor
you can modify the headers of every request that your driver sends:
def interceptor(request):
## Add New Header
request.headers['New-Header'] = 'Some Value'
## Change Existing Header
del request.headers['Referer'] # Delete the header first
request.headers['Referer'] = 'some_referer' # Add new header
## Set Request Interceptor
driver.request_interceptor = interceptor
# Make request --> all requests will use new header
driver.get('https://httpbin.org/headers')
Duplicate header names are permitted in an HTTP request, so before setting the replacement header you must first delete the existing header using del
then add the new header.
Modifying Response Headers
You can also modify the response headers returned by the server using the response_interceptor
:
def interceptor(request, response):
if request.url == 'https://server.com/some/path':
response.headers['New-Header'] = 'Some Value'
## Set Response Interceptor
driver.response_interceptor = interceptor
## Make request --> responses from 'https://server.com/some/path' will now contain New-Header
driver.get('https://httpbin.org/headers')
Modifying Request Parameters
You can also modify the query parameters of a request using the request_interceptor
:
def interceptor(request):
params = request.params
params['foo'] = 'bar'
request.params = params
## Set Request Interceptor
driver.request_interceptor = interceptor
## Make request --> query parameter 'foo=bar' will be added to all requests
driver.get('https://www.example.com')
Request parameters
work differently to headers
in that they are calculated when they are set on the request. That means that you first have to read them, then update them, and then write them back.
Parameters are held in a regular dictionary, so parameters with the same name will be overwritten.
Basic Authentication
If a site requires basic authentication (username
and password
), you can use a request_interceptor
to add authentication credentials to each request. This will stop the browser from displaying a username/password pop-up.
import base64
auth = (
base64.encodebytes('my_username:my_password'.encode())
.decode()
.strip()
)
def interceptor(request):
if request.host == 'www.example.com':
request.headers['Authorization'] = f'Basic {auth}'
## Set Request Interceptor
driver.request_interceptor = interceptor
## Make request --> Credentials will be transmitted with every request to "www.example.com"
driver.get('https://www.example.com')
Blocking Requests
Sometimes to increase the speed of your scraping/automation or to reduce the bandwidth consumed by your browser you can block certain types of requests like images, etc.
You can use request.abort()
to block a request and send an immediate response back to the browser. An optional error code can be supplied. The default is 403
(forbidden).
def interceptor(request):
# Block PNG, JPEG and GIF images
if request.path.endswith(('.png', '.jpg', '.gif')):
request.abort()
## Set Request Interceptor
driver.request_interceptor = interceptor
## Make request --> requests for PNG, JPEG and GIF images will be blocked
driver.get('https://www.example.com')
Mocking Responses
You can use the create_response()
method to send a mock response back to the browser. No data will be sent to the remote server.
def interceptor(request):
if request.url == 'https://www.example.com':
request.create_response(
status_code=200,
headers={'Content-Type': 'text/html'}, # Optional headers dictionary
body='<html>Hello World!</html>' # Optional body
)
## Set Request Interceptor
driver.request_interceptor = interceptor
## Make request --> will return mock response
driver.get('https://www.example.com')
Bypassing Anti-Bots With Undetected Chromedriver
The standard Selenium ChromeDriver leaks a lot of information that anti-bot systems can use to determine if it is a automated browser/scraper or a real user visiting the website and ban your requests.
The Selenium Undetected ChromeDriver is an optimized version of the standard ChromeDriver designed to bypass the detection mechanisms of most anti-bot solutions like DataDome, Perimeterx and Cloudflare.
Selenium Wire provides an easy integration right out of the box.
Selenium Wire will integrate with undetected-chromedriver if it finds it in your environment. So to use it you must have undetected_chromedriver installed first:
pip install undetected-chromedriver
Then load the undetected_chromedriver
from seleniumwire
instead of directly from the undetected-chromedriver package.
import seleniumwire.undetected_chromedriver as uc
## Chrome Options
chrome_options = uc.ChromeOptions()
## Create Chrome Driver
driver = uc.Chrome(
options=chrome_options,
seleniumwire_options={}
)
driver.get('https://distilnetworks.com')
More Selenium Wire Functionality & Options
The above examples were just some of the main use cases and functionality of selenium wire. However, it has a lot more to offer.
Selenium Wire also has support for:
If you would like to check out all the Selenium Wire options then check out the full list here.
More Web Scraping Tutorials
So that's a run through of Selenium Wire and what it has to offer.
If you would like to learn more about Web Scraping with Selenium, then be sure to check out The Selenium Web Scraping Playbook.
Or check out one of our more in-depth guides: