Skip to main content

Python Selenium Examples

The following are code examples on how to integrate the ScrapeOps Residential Proxy Aggregator with your Python Selenium Scrapers.


Authorisation - API Key

To use the ScrapeOps proxy, you first need an API key which you can get by signing up for a free account here.

Your API key must be included with every request using the password proxy port parameter otherwise the proxy port will return a 403 Forbidden Access status code.


Residential Proxy Port

The following is some example code to send a URL to the ScrapeOps Residential Proxy port:


"http://scrapeops:YOUR_API_KEY@residential-proxy.scrapeops.io:8181"

Here are the individual connection details:

  • Proxy: residential-proxy.scrapeops.io
  • Port: 8181
  • Username: scrapeops
  • Password: YOUR_API_KEY

To enable extra/advanced functionality, you can pass parameters by adding them to username, separated by periods.

For example, if you want to enable Country Geotargeting with a request, the username would be scrapeops.country=us.


"http://scrapeops.country=us:YOUR_API_KEY@residential-proxy.scrapeops.io:8181"

Also, multiple parameters can be included by separating them with periods.


Integrating With Selenium Scrapers

To integrate our proxy with your Selenium scraper we recommend that you use the Selenium Wire extension which makes it very easy to use proxies with Selenium.

First, you need to install Selenium Wire using pip:


pip install selenium-wire

Then update your scraper to use seleniumwire's webdriver instead of the default selenium webdriver.


from seleniumwire import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup

SCRAPEOPS_API_KEY = 'YOUR_API_KEY'

## Define ScrapeOps Proxy Port Endpoint
proxy_options = {
'proxy': {
'http': f'http://scrapeops:{SCRAPEOPS_API_KEY}@residential-proxy.scrapeops.io:8181',
'https': f'http://scrapeops:{SCRAPEOPS_API_KEY}@residential-proxy.scrapeops.io:8181',
'no_proxy': 'localhost:127.0.0.1'
}
}

## Set Up Selenium Chrome driver
driver = webdriver.Chrome(seleniumwire_options=proxy_options)


## Send Request Using ScrapeOps Proxy
driver.get('http://quotes.toscrape.com/page/1/')

## Retrieve HTML Response
html_response = driver.page_source

## Extract Data From HTML
soup = BeautifulSoup(html_response, "html.parser")
h1_text = soup.find('h1').text

print(h1_text)

ScrapeOps will take care of the proxy selection and rotation for you so you just need to send us the URL you want to scrape.


Response Format

After receiving a response from one of our proxy providers the ScrapeOps Residential Proxy Aggregator will then respond with the raw HTML content of the target URL along with a response code:


<html>
<head>
...
</head>
<body>
...
</body>
</html>

Status Codes

The ScrapeOps Residential Proxy Aggregator will return the status code returned by the target website.

However, if the proxy port will return the following status codes if there are specific errors in your request:

Status CodeBilledDescription
400NoBad request. Either your url or query parameters are incorrectly formatted.
401NoYou have consumed all your credits. Either turn off your scraper, or upgrade to a larger plan.
403NoEither no api_key included on request, or api_key is invalid.

Here is the full list of status codes the Proxy Port returns.

Limiting API Requests

When you use a headless browser to scrape a page, it will generate 10's or 100's of additional requests to download extra JS, CSS and image files and query external APIs. These additional requests will be treated as additional requests to our Proxy Port so it is recommended you configure your Playwright scraper to only make the requests that are absolutely necessary to retrieve the data you want to extract.