Skip to main content

Getting Started

ScrapeOps Proxy Aggregator is an easy to use proxy that gives you access to the best performing proxies via a single endpoint. We take care of finding the best proxies, so you can focus on the data.

Authorisation - API Key

To use the ScrapeOps proxy, you first need an API key which you can get by signing up for a free account here.

Your API key must be included with every request using the api_key query parameter otherwise the API will return a 403 Forbidden Access status code.


Integration Method 1 - API Endpoint

To make requests you need send the URL you want to scrape to the ScrapeOps Proxy endpoint https://proxy.scrapeops.io/v1/ by adding your API Key and URL to the request using the api_key and url query parameter:


curl "https://proxy.scrapeops.io/v1/?api_key=YOUR_API_KEY&url=http://httpbin.org/anything"

The ScrapeOps Proxy supports GET requests (POST requests coming soon).

The following is some example Python code to use with Proxy API:


import requests

response = requests.get(
url='https://proxy.scrapeops.io/v1/',
params={
'api_key': 'YOUR_API_KEY',
'url': 'http://httpbin.org/ip',
},
)

print('Body: ', response.content)


ScrapeOps will take care of the proxy selection and rotation for you so you just need to send us the URL you want to scrape.


Integration Method 2 - Proxy Port

Currently In Beta

This feature is currently in beta - please let us know if you see any issues with it and we'll have them fixed ASAP!

For those of you with existing proxy pools, we offer an easy to use proxy port solution which will take your requests and pass them through to the API endpoint which will then look after proxy rotation, captchas, and retries.

The proxy port is a light front-end for the API and has all the same functionality and performance as sending requests to the API endpoint.

The username for the proxy is scrapeops and the password is your API key.


curl -x "http://scrapeops:YOUR_API_KEY@proxy.scrapeops.io:5353" "http://httpbin.org/ip"


SSL Certificate Verification

Note: So that we can properly direct your requests through the API, your code must be configured to not verify SSL certificates.

To enable extra/advanced functionality, you can pass parameters by adding them to username, separated by periods.

For example, if you want to enable Javascript rendering with a request, the username would be scrapeops.render=true.

Also, multiple parameters can be included by separating them with periods, for example:


curl -x "http://scrapeops.country=us:YOUR_API_KEY@proxy.scrapeops.io:5353" "http://httpbin.org/ip"

Below we have an example of how you would use our proxy port with Python Requests.


import requests

proxies = {
"http": "http://scrapeops:YOUR_API_KEY@proxy.scrapeops.io:5353"
}
response = requests.get('http://httpbin.org/ip', proxies=proxies, verify=False)
print(response.text)

Scrapy users can likewise simply pass the proxy details via the meta object.


# ...other scrapy setup code
start_urls = ['http://httpbin.org/ip']
meta = {
"proxy": "http://scrapeops:YOUR_API_KEY@proxy.scrapeops.io:5353"
}

def parse(self, response):
# ...your parsing logic here
yield scrapy.Request(url, callback=self.parse, meta=meta)

Scrapy & SSL Certificate Verification

Note: Scrapy skips SSL verification by default so you don't need to worry about switching it off.


Response Format

After recieving a response from one of our proxy providers the ScrapeOps Proxy API will then respond with the raw HTML content of the target URL along with a response code:


<html>
<head>
...
</head>
<body>
...
</body>
</html>

The ScrapeOps Proxy API will return a 200 status code when it successfully got a response from the website that also passed response validation, or a 404 status code if the website responds with a 404 status code. Both of these status codes are considered successful requests.

Here is the full list of status codes the Proxy API returns.


Request Optimization

Certain domains are very hard to scrape and require you to use more advanced/expensive functionality to scrape them reliably at scale.

The ScrapeOps Proxy API provides an automatic Request Optimization functionality that when enabled will tell the API to find the optimal request settings to give you the best performance at the lowest cost.

Instead of you having to decide which features and proxies to use, the ScrapeOps Proxy API will enable/disable the following features for you to give you the best performance at the lowest cost:

To enable Request Optimization, simply add optimize_request=true to your request and the Proxy API will take care of the rest.


curl "https://proxy.scrapeops.io/v1/?api_key=YOUR_API_KEY&url=http://httpbin.org/anything&optimize_request=true"

For more details on how Request Optimization works then check out the documentation here.


Advanced Functionality

To manually enable other API functionality when using the Proxy API endpoint you need to add the appropriate query parameters to the ScrapeOps Proxy URL.

For example, if you want to enable Javascript rendering with a request, then add render_js=true to the request:


curl "https://proxy.scrapeops.io/v1/?api_key=YOUR_API_KEY&url=http://httpbin.org/anything&render_js=true"

The API will accept the following parameters:

ParameterDescription
optimize_requestRequest with request optimization enabled. Example: optimize_request=true
max_request_costUsed in conjunction with optimize_request to set the maximum number of API credits a request can use. Example: max_request_cost=30
bypassRequest with anti-bot bypass enabled. List of bypasses. Example: bypass=cloudflare
auto_extractUse maintained parsers to automatically extract data from HTML and return data in JSON format. List of parsers. Example: auto_extract=amazon
render_jsRequest with Javascript rendering enabled. Example: render_js=true
waitTell headless browser to wait a specfic period of time before returning response. Example: wait=3000
wait_forTell headless browser to wait a specfic page element to appear before returning response. Example: wait_for=.loading-done
scrollTell headless browser to scroll the page down a defined number of pixels before returning the response. Example: scroll=5000
js_scenarioSend a sequence of commands to a headless browser before returning the response. Examples
premiumRequest using premium proxy pools. Example: premium=true
residentialRequest using residential proxy pools. Example: residential=true
mobileRequest using mobile proxy pools. Example: mobile=true
countryMake requests from specific country. Example: country=us
keep_headersUse your own custom headers when making the request. Example: keep_headers=true
device_typeTell API to use desktop vs mobile user-agents when making requests. Default is desktop. Example: device_type=mobile
session_numberEnable sticky sessions that use the same IP address for multiple requests by setting a session_number. Example: session_number=7
follow_redirectsTell API to not follow redirects by setting follow_redirects=false.
initial_status_codeTell API to return the inital status code the website responses with in the headers by setting initial_status_code=true.
final_status_codeTell API to return the final status code the website responses with in the headers by setting final_status_code=true.

Check out this guide to see the full list of advanced functionality available.


Timeout

The ScrapeOps proxy keeps retrying a request for up to 2 minutes before returning a failed response to you.

To use the Proxy correctly, you should set the timeout on your request to a least 2 minutes to avoid you getting charged for any successful request that you timed out on your end before the Proxy API responded.