Skip to main content

Extending Python Selenium

Selenium Wire: Extending Python Selenium

Selenium Wire is a powerful extension for Python Selenium that gives you access to the underlying requests made by your Selenium browser.

It enables a lot of more advanced functionality that you might need when scraping/botting specific websites.

In this guide for The Python Selenium Web Scraping Playbook, we will walk through how to use Selenium Wire and its most common use cases:

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.


What Is Selenium Wire?

Selenium Wire is a extension for Python Selenium that gives you access to the underlying requests made by the browser.

When enabled Selenium Wire spins up a server in the background that intercepts all the requests and responses your Selenium browser makes. This gives you additional capabilities for inspecting and modifying requests and responses on the fly.

You write your Selenium code as you would normally do, and Selenium Wire gives you the additional functionality via the APIs it exposes to you.


Installing Selenium Wire

Installing the Selenium Wire is very simple.

We just need to install the Selenium Wire package via pip:


pip install selenium-wire

Now with the selenium-wire installed we can setup it up with our Selenium driver.


Using Selenium Wire With Your Selenium Scrapers/Bots

To use Selenium Wire within your Selenium scrapers/bots you just need to import the Selenium webdriver from seleniumwire instead of selenium.

With this one change you can use Selenium Wire as you would vanilla Selenium:

from seleniumwire import webdriver  # Import from seleniumwire

# Create a new instance of the Chrome driver
driver = webdriver.Chrome()

# Go to page
driver.get('https://www.example.com')

In the above example, we use seleniumwire instead of selenium, however, we don't use of any of Selenium Wires additional capabilities.

In the following sections we will show you the most common use cases for Selenium Wire.


Using Proxies With Selenium Wire

Probably the most common use case for Selenium Wire is that it enables the use of authenticated proxies with your Selenium bots/scrapers.

The default method to use proxies with vanilla Selenium doesn't work if the proxy needs to be authenticated (i.e. requires username and password authentication).


from selenium import webdriver

## Example Proxy
PROXY = "11.456.448.110:8080"

## Create WebDriver Options to Add Proxy
chrome_options = WebDriver.ChromeOptions()
chrome_options.add_argument(f'--proxy-server={PROXY}')
chrome = webdriver.Chrome(chrome_options=chrome_options)

## Make Request Using Proxy
chrome.get("http://httpbin.org/ip")

As a result, the most common fix for this is to use Selenium Wire's Proxy functionality as it allows for proxy authentication.


from seleniumwire import webdriver

## Define Your Proxy Endpoints
proxy_options = {
'proxy': {
'http': 'http://USERNAME:PASSWORD@proxy-server:8080',
'https': 'http://USERNAME:PASSWORD@proxy-server:8080',
'no_proxy': 'localhost:127.0.0.1'
}
}

## Set Up Selenium Chrome driver
driver = webdriver.Chrome(seleniumwire_options=proxy_options)

## Send Request Using Proxy
driver.get('http://httpbin.org/ip')


Now when we run the script we can use authenticated proxies with Selenium webdriver.


Selenium Wire Request/Response Objects

One of the core features of Selenium Wire that is provides you with Request and Response objects.

Selenium Wire captures all HTTP/HTTPS traffic made by the browser and then provides access to them using the Request and Response objects.

For example the driver.requests command will return a list of requests in chronological order.

From here you can access the data of each individual equest. For example, here are the attributes of Request object:

AttributeDescription
bodyThe request body as bytes. If the request has no body the value of body will be empty, i.e. b''.
certInformation about the server SSL certificate in dictionary format. Empty for non-HTTPS requests.
dateThe datetime the request was made.
headersA dictionary-like object of request headers.
hostThe request host, e.g. www.example.com
methodThe HTTP method, e.g. GET or POST etc.
paramsA dictionary of request parameters. If a parameter with the same name appears more than once in the request, it's value in the dictionary will be a list.
pathThe request path, e.g. /some/path/index.html
querystringThe query string, e.g. foo=bar&spam=eggs
responseThe response object associated with the request. This will be None if the request has no response.
urlThe request URL, e.g. https://www.example.com/some/path/index.html?foo=bar&spam=eggs

Here are the attributes of Response object:

AttributeDescription
bodyThe request body as bytes. If the request has no body the value of body will be empty, i.e. b''.
dateThe datetime the request was made.
headersA dictionary-like object of request headers.
reasonThe reason phrase, e.g. OK or Not Found etc..
status_codeThe status code of the response, e.g. 200 or 404 etc.

Selenium Wire also lets you intercept and modify these Request and Response objects on the fly opening up some useful functionality which we will walk through some examples of next.


Intercepting Requests and Responses With Selenium Wire

Selenium Wire enables you intercept and modify these Request and Response objects on the fly by adding driver.request_interceptor and driver.response_interceptor attributes before you start using the driver.

  • A request interceptor should accept a single argument for the request.
  • A response interceptor should accept two arguments, one for the originating request and one for the response.

To use request and response interceptors you just need to set a interceptor function to either the request_interceptor or response_interceptor.


def my_request_interceptor(request):
## Do something with the request
pass

def my_response_interceptor(request, response):
# A response interceptor takes two args (request, response)
## Do something
pass

## Set Request Interceptor
driver.request_interceptor = my_request_interceptor

## Set Response Interceptor
driver.response_interceptor = my_response_interceptor

## Make requests
driver.get('https://www.example.com')

To remove an interceptor, use del:


del driver.request_interceptor
del driver.response_interceptor

Next we will walk through some example use cases of using request and response interceptors.


Modifying Request Headers

Using the request_interceptor you can modify the headers of every request that your driver sends:


def interceptor(request):
## Add New Header
request.headers['New-Header'] = 'Some Value'

## Change Existing Header
del request.headers['Referer'] # Delete the header first
request.headers['Referer'] = 'some_referer' # Add new header

## Set Request Interceptor
driver.request_interceptor = interceptor

# Make request --> all requests will use new header
driver.get('https://httpbin.org/headers')

Changing Header Values

Duplicate header names are permitted in an HTTP request, so before setting the replacement header you must first delete the existing header using del then add the new header.


Modifying Response Headers

You can also modify the response headers returned by the server using the response_interceptor:


def interceptor(request, response):
if request.url == 'https://server.com/some/path':
response.headers['New-Header'] = 'Some Value'

## Set Response Interceptor
driver.response_interceptor = interceptor

## Make request --> responses from 'https://server.com/some/path' will now contain New-Header
driver.get('https://httpbin.org/headers')


Modifying Request Parameters

You can also modify the query parameters of a request using the request_interceptor:


def interceptor(request):
params = request.params
params['foo'] = 'bar'
request.params = params

## Set Request Interceptor
driver.request_interceptor = interceptor

## Make request --> query parameter 'foo=bar' will be added to all requests
driver.get('https://www.example.com')

Request parameters work differently to headers in that they are calculated when they are set on the request. That means that you first have to read them, then update them, and then write them back.

Unique Parameters

Parameters are held in a regular dictionary, so parameters with the same name will be overwritten.


Basic Authentication

If a site requires basic authentication (username and password), you can use a request_interceptor to add authentication credentials to each request. This will stop the browser from displaying a username/password pop-up.


import base64

auth = (
base64.encodebytes('my_username:my_password'.encode())
.decode()
.strip()
)

def interceptor(request):
if request.host == 'www.example.com':
request.headers['Authorization'] = f'Basic {auth}'

## Set Request Interceptor
driver.request_interceptor = interceptor

## Make request --> Credentials will be transmitted with every request to "www.example.com"
driver.get('https://www.example.com')


Blocking Requests

Sometimes to increase the speed of your scraping/automation or to reduce the bandwidth consumed by your browser you can block certain types of requests like images, etc.

You can use request.abort() to block a request and send an immediate response back to the browser. An optional error code can be supplied. The default is 403 (forbidden).


def interceptor(request):
# Block PNG, JPEG and GIF images
if request.path.endswith(('.png', '.jpg', '.gif')):
request.abort()

## Set Request Interceptor
driver.request_interceptor = interceptor

## Make request --> requests for PNG, JPEG and GIF images will be blocked
driver.get('https://www.example.com')


Mocking Responses

You can use the create_response() method to send a mock response back to the browser. No data will be sent to the remote server.


def interceptor(request):
if request.url == 'https://www.example.com':
request.create_response(
status_code=200,
headers={'Content-Type': 'text/html'}, # Optional headers dictionary
body='<html>Hello World!</html>' # Optional body
)

## Set Request Interceptor
driver.request_interceptor = interceptor

## Make request --> will return mock response
driver.get('https://www.example.com')


Bypassing Anti-Bots With Undetected Chromedriver

The standard Selenium ChromeDriver leaks a lot of information that anti-bot systems can use to determine if it is a automated browser/scraper or a real user visiting the website and ban your requests.

The Selenium Undetected ChromeDriver is an optimized version of the standard ChromeDriver designed to bypass the detection mechanisms of most anti-bot solutions like DataDome, Perimeterx and Cloudflare.

Selenium Wire provides an easy integration right out of the box.

Selenium Wire will integrate with undetected-chromedriver if it finds it in your environment. So to use it you must have undetected_chromedriver installed first:


pip install undetected-chromedriver

Then load the undetected_chromedriver from seleniumwire instead of directly from the undetected-chromedriver package.


import seleniumwire.undetected_chromedriver as uc

## Chrome Options
chrome_options = uc.ChromeOptions()

## Create Chrome Driver
driver = uc.Chrome(
options=chrome_options,
seleniumwire_options={}
)

driver.get('https://distilnetworks.com')


More Selenium Wire Functionality & Options

The above examples were just some of the main use cases and functionality of selenium wire. However, it has a lot more to offer.

Selenium Wire also has support for:

If you would like to check out all the Selenium Wire options then check out the full list here.


More Web Scraping Tutorials

So that's a run through of Selenium Wire and what it has to offer.

If you would like to learn more about Web Scraping with Selenium, then be sure to check out The Selenium Web Scraping Playbook.

Or check out one of our more in-depth guides: