Getting Started

ScrapeOps Proxy Aggregator is an easy to use proxy that gives you access to the best performing proxies via a single endpoint. We take care of finding the best proxies, so you can focus on the data.

Authorisation - API Key

To use the ScrapeOps proxy, you first need an API key which you can get by signing up for a free account here.

Your API key must be included with every request using the api_key query parameter otherwise the API will return a 403 Forbidden Access status code.

Integration Method 1 - API Endpoint

To make requests you need send the URL you want to scrape to the ScrapeOps Proxy endpoint https://proxy.scrapeops.io/v1/ by adding your API Key and URL to the request using the api_key and url query parameter:

curl -k "https://proxy.scrapeops.io/v1/?api_key=YOUR_API_KEY&url=http://httpbin.org/anything"

The ScrapeOps Proxy supports GET and POST requests. For information on how to use POST requests then check out the documentation here.

The following is some example Python code to use with Proxy API:

import requests
from urllib.parse import urlencode

proxy_params = {
      'api_key': 'YOUR_API_KEY',
      'url': 'http://httpbin.org/ip', 
      'render_js': True,
  }

response = requests.get(
  url='https://proxy.scrapeops.io/v1/',
  params=urlencode(proxy_params),
  timeout=120,
)

print('Body: ', response.content)

ScrapeOps will take care of the proxy selection and rotation for you so you just need to send us the URL you want to scrape.

URL Encoding

When using the ScrapeOps Proxy Aggregator API integration method, you should always encode your target URL.

This is because if you send an unencoded URL that contains query parameters then the API can think those query parameters are meant for the API and not part of your URL.

Here is documentation on how to encode URLs in various programming languages.

Integration Method 2 - Proxy Port

For those of you with existing proxy pools, we offer an easy to use proxy port solution which will take your requests and pass them through to the API endpoint which will then look after proxy rotation, captchas, and retries.

The proxy port is a light front-end for the API and has all the same functionality and performance as sending requests to the API endpoint.

The username for the proxy is scrapeops and the password is your API key.

curl -x "http://scrapeops:YOUR_API_KEY@proxy.scrapeops.io:5353" "http://httpbin.org/ip"

Here are the individual connection details:

Proxy: proxy.scrapeops.io
Port: 5353
Username: scrapeops
Password: YOUR_API_KEY

SSL Certificate Verification

Note: So that we can properly direct your requests through the API, your code must be configured to not verify SSL certificates.

To enable extra/advanced functionality, you can pass parameters by adding them to username, separated by periods.

For example, if you want to enable Javascript rendering with a request, the username would be scrapeops.render=true.

Also, multiple parameters can be included by separating them with periods, for example:

curl -x "http://scrapeops.country=us:YOUR_API_KEY@proxy.scrapeops.io:5353" "http://httpbin.org/ip"

Below we have an example of how you would use our proxy port with Python Requests.

import requests

proxies = {
  "http": "http://scrapeops:YOUR_API_KEY@proxy.scrapeops.io:5353"
}
response = requests.get('http://httpbin.org/ip', proxies=proxies, verify=False)
print(response.text)

Scrapy users can likewise simply pass the proxy details via the meta object.

# ...other scrapy setup code
start_urls = ['http://httpbin.org/ip']
meta = {
  "proxy": "http://scrapeops:YOUR_API_KEY@proxy.scrapeops.io:5353"
}

def parse(self, response):
  # ...your parsing logic here
  yield scrapy.Request(url, callback=self.parse, meta=meta)

Scrapy & SSL Certificate Verification

Note: Scrapy skips SSL verification by default so you don't need to worry about switching it off.

Response Formats

The ScrapeOps Proxy Aggregator offers two possible formats:

Target Server Response (Default)
JSON Response

Target Server Response (Default)

The default response from our Proxy API endpoint and Proxy Port is the response returned by the target URL you request.

This response could be in HTML, JSON, XML, etc. format depending on the response returned by the websites server.

Example response:

<html>
  <head>
     ...
  </head>
  <body>
     ...
  </body>
</html>  

The response will contain the HTML, etc. response and any headers (Note: cookies aren't returned.)

JSON Response

If you add the parameter json_response=true to your request, then the proxy will return an extended JSON response with additional information about the request and response.

You can use this functionality when you would like to access additional response information such as cookies and XHR requests/responses.

The following are parameters returned.

Key	Description
`successful`	Boolean value indicating if the request was successful or not. `true` if successful.
`body`	The HTML, JSON, XML, etc. response from the target website.
`url`	The requested URL.
`status_code`	The ScrapeOps status code.
`sops_api_credits`	The number of ScrapeOps API credits consumed for the request.
`content_type`	The content type of the websites response.
`headers`	Any headers returned with the server response.
`cookies`	Any cookies returned with the server response.
`xhr`	A array of the XHR requests/responses made by the headless browser when making the request. Only works when `render_js=true` is enabled.

The following is an example response:

{
    "successful": true,
    "url": "https://www.example.com/",
    "content_type": "text/html;charset=UTF-8",
    "sops_api_credits": 1,
    "status_code": 200,
    "headers": {
        "Accept-Ch": "ect,rtt,downlink,device-memory,sec-ch-device-memory,viewport-width,sec-ch-viewport-width,dpr,sec-ch-dpr,sec-ch-ua-platform,sec-ch-ua-platform-version",
        "Content-Language": "en-GB",
        "Content-Security-Policy": "upgrade-insecure-requests;report-uri https://metrics.media-amazon.com/",
        "Content-Type": "text/html;charset=UTF-8",
        "X-Amz-Rid": "C347MNWAT7SE9XS4MJCN",
        "X-Cache": "Miss from cloudfront",
        "X-Content-Type-Options": "nosniff",
        "X-Frame-Options": "SAMEORIGIN",
        "X-Ua-Compatible": "IE=edge",
        "X-Xss-Protection": "1;"
    },
    "xhr": null,
    "cookies": [
        {
            "session-id": "258-3775233-4286404;"
        }
    ], 
    "body": "<html><head>...</head><body>...</body></html>",
}

Status Codes

The ScrapeOps Proxy API will return a 200 status code when it successfully got a response from the website that also passed response validation, or a 404 status code if the website responds with a 404 status code. Both of these status codes are considered successful requests.

Here is the full list of status codes the Proxy API returns.

Request Optimization

Certain domains are very hard to scrape and require you to use more advanced/expensive functionality to scrape them reliably at scale.

The ScrapeOps Proxy API provides an automatic Request Optimization functionality that when enabled will tell the API to find the optimal request settings to give you the best performance at the lowest cost.

Instead of you having to decide which features and proxies to use, the ScrapeOps Proxy API will enable/disable the following features for you to give you the best performance at the lowest cost:

To enable Request Optimization, simply add optimize_request=true to your request and the Proxy API will take care of the rest.

curl -k "https://proxy.scrapeops.io/v1/?api_key=YOUR_API_KEY&url=http://httpbin.org/anything&optimize_request=true"

For more details on how Request Optimization works then check out the documentation here.

Advanced Functionality

To manually enable other API functionality when using the Proxy API endpoint you need to add the appropriate query parameters to the ScrapeOps Proxy URL.

For example, if you want to enable Javascript rendering with a request, then add render_js=true to the request:

curl -k "https://proxy.scrapeops.io/v1/?api_key=YOUR_API_KEY&url=http://httpbin.org/anything&render_js=true"

The API will accept the following parameters:

Parameter	Description
`json_response`	Return an extended JSON response with additional information about the request and response such as cookies and XHR requests/responses. Example: `json_response=true`. More info on response formats
`optimize_request`	Request with request optimization enabled. Example: `optimize_request=true`
`max_request_cost`	Used in conjunction with `optimize_request` to set the maximum number of API credits a request can use. Example: `max_request_cost=30`
`bypass`	Request with anti-bot bypass enabled. List of bypasses. Example: `bypass=cloudflare_level_1`
`auto_extract`	Use maintained parsers to automatically extract data from HTML and return data in JSON format. List of parsers. Example: `auto_extract=amazon`
`render_js`	Request with Javascript rendering enabled. Example: `render_js=true`
`wait`	Tell headless browser to wait a specfic period of time before returning response. Example: `wait=3000`
`wait_for`	Tell headless browser to wait a specfic page element to appear before returning response. Example: `wait_for=.loading-done`
`scroll`	Tell headless browser to scroll the page down a defined number of pixels before returning the response. Example: `scroll=5000`
`js_scenario`	Send a sequence of commands to a headless browser before returning the response. Examples
`premium`	Request using premium proxy pools. Example: `premium=true`
`residential`	Request using residential proxy pools. Example: `residential=true`
`mobile`	Request using mobile proxy pools. Example: `mobile=true`
`country`	Make requests from specific country. Example: `country=us`
`keep_headers`	Use your own custom headers when making the request. Example: `keep_headers=true`
`device_type`	Tell API to use desktop vs mobile user-agents when making requests. Default is `desktop`. Example: `device_type=mobile`
`session_number`	Enable sticky sessions that use the same IP address for multiple requests by setting a `session_number`. Example: `session_number=7`
`follow_redirects`	Tell API to not follow redirects by setting `follow_redirects=false`.
`initial_status_code`	Tell API to return the inital status code the website responses with in the headers by setting `initial_status_code=true`.
`final_status_code`	Tell API to return the final status code the website responses with in the headers by setting `final_status_code=true`.

Check out this guide to see the full list of advanced functionality available.

Timeout

The ScrapeOps proxy keeps retrying a request for up to 2 minutes before returning a failed response to you.

To use the Proxy correctly, you should set the timeout on your request to a least 2 minutes to avoid you getting charged for any successful request that you timed out on your end before the Proxy API responded.

Getting Started

Authorisation - API Key​

Integration Method 1 - API Endpoint​

Integration Method 2 - Proxy Port​

Response Formats​

Target Server Response (Default)​

JSON Response​

Status Codes​

Request Optimization​

Advanced Functionality​

Timeout​