Proxy API Aggregator Quick Start
ScrapeOps Proxy API Aggregator is an easy to use proxy that gives you access to the best performing proxies via a single endpoint. We take care of finding the best proxies, so you can focus on the data.
Integration Guides
The following are quick start integration guides for the most popular programming languages:
- Python
- Python Scrapy
- Python Selenium
- NodeJs
- NodeJs Puppeteer
- NodeJs Playwright
- PHP
- Java
- Golang
- Go Colly
- Ruby
- C#
- R
Authorisation - API Key
To use the ScrapeOps proxy, you first need an API key which you can get by signing up for a free account here.
Your API key must be included with every request using the api_key
query parameter otherwise the API will return a 403 Forbidden Access
status code.
Integration Method 1 - API Endpoint
To make requests you need send the URL you want to scrape to the ScrapeOps Proxy endpoint https://proxy.scrapeops.io/v1/
by adding your API Key and URL to the request using the api_key
and url
query parameter:
curl -k "https://proxy.scrapeops.io/v1/?api_key=YOUR_API_KEY&url=http://httpbin.org/anything"
The ScrapeOps Proxy supports GET
and POST
requests. For information on how to use POST
requests then check out the documentation here.
The following is some example Python code to use with Proxy API:
import requests
from urllib.parse import urlencode
proxy_params = {
'api_key': 'YOUR_API_KEY',
'url': 'https://httpbin.org/ip',
'render_js': True,
}
response = requests.get(
url='https://proxy.scrapeops.io/v1/',
params=urlencode(proxy_params),
timeout=120,
)
print('Body: ', response.content)
ScrapeOps will take care of the proxy selection and rotation for you so you just need to send us the URL you want to scrape.
When using the ScrapeOps Proxy API Aggregator API integration method, you should always encode your target URL.
This is because if you send an unencoded URL that contains query parameters then the API can think those query parameters are meant for the API and not part of your URL.
Here is documentation on how to encode URLs in various programming languages.
Integration Method 2 - Proxy Port
For those of you with existing proxy pools, we offer an easy to use proxy port solution which will take your requests and pass them through to the API endpoint which will then look after proxy rotation, captchas, and retries.
The proxy port is a light front-end for the API and has all the same functionality and performance as sending requests to the API endpoint.
The username
for the proxy is scrapeops and the password
is your API key.
curl -x "http://scrapeops:YOUR_API_KEY@proxy.scrapeops.io:5353" "https://httpbin.org/ip"
Here are the individual connection details:
- Proxy: proxy.scrapeops.io
- Port: 5353
- Username: scrapeops
- Password: YOUR_API_KEY
Note: So that we can properly direct your requests through the API, your code must be configured to not verify SSL certificates.
To enable extra/advanced functionality, you can pass parameters by adding them to username, separated by periods.
For example, if you want to enable Javascript rendering with a request, the username would be scrapeops.render=true
.
Also, multiple parameters can be included by separating them with periods, for example:
curl -x "http://scrapeops.residential=true:YOUR_API_KEY@proxy.scrapeops.io:5353" "https://httpbin.org/ip"
Below we have an example of how you would use our proxy port with Python Requests.
import requests
proxies = {
"http": "http://scrapeops:YOUR_API_KEY@proxy.scrapeops.io:5353"
}
response = requests.get('https://httpbin.org/ip', proxies=proxies, verify=False)
print(response.text)
Scrapy users can likewise simply pass the proxy details via the meta object.
# ...other scrapy setup code
start_urls = ['https://httpbin.org/ip']
meta = {
"proxy": "http://scrapeops:YOUR_API_KEY@proxy.scrapeops.io:5353"
}
def parse(self, response):
# ...your parsing logic here
yield scrapy.Request(url, callback=self.parse, meta=meta)
Note: Scrapy skips SSL verification by default so you don't need to worry about switching it off.
Response Format
After receiving a response from one of our proxy providers the ScrapeOps Proxy API Aggregator will then respond with the raw HTML content of the target URL along with a response code:
<html>
<head>
...
</head>
<body>
...
</body>
</html>
The ScrapeOps Proxy API Aggregator will return a 200
status code when it successfully got a response from the website that also passed response validation, or a 404
status code if the website responds with a 404
status code. Both of these status codes are considered successful requests.
The following is a list of possible status codes:
Status Code | Billed | Description |
---|---|---|
200 | Yes | Successful response. |
404 | Yes | Page requested does not exist. |
400 | No | Bad request. Either your url or query parameters are incorrectly formatted. |
401 | No | You have consumed all your credits. Either turn off your scraper, or upgrade to a larger plan. |
403 | No | Either no api_key included on request, or api_key is invalid. Or you haven't validated your email address. |
429 | No | Exceeded your concurrency limit. |
500 | No | After retrying for up to 2 minutes, the API was unable to receive a successful response. |
Advanced Functionality
To enable other API functionality when using the Proxy API endpoint you need to add the appropriate query parameters to the ScrapeOps Proxy URL.
For example, if you want to enable Javascript rendering with a request, then add render_js=true
to the request:
curl -k "https://proxy.scrapeops.io/v1/?api_key=YOUR_API_KEY&url=http://httpbin.org/anything&render_js=true"
The API will accept the following parameters:
Parameter | Description |
---|---|
render_js | Request with Javascript rendering enabled. Example: render_js=true |
residential | Request using residential proxy pools. Example: residential=true |
country | Make requests from specific country. Example: country=us |
Timeout
The ScrapeOps proxy keeps retrying a request for up to 2 minutes before returning a failed response to you.
To use the Proxy correctly, you should set the timeout on your request to a least 2 minutes to avoid you getting charged for any successful request that you timed out on your end before the Proxy API responded.
Dashboard
You can monitor your scraping performance using the Proxy Dashboard.
Usage Endpoint
You can programmatically monitor your ScrapeOps Proxy API Aggregator credit consumption and concurrency usage using the usage endpoint.
curl "https://backend.scrapeops.io/v1/proxy/account/usage?api_key=YOUR_API_KEY"
Example response:
{
"plan_api_credits": 1000000,
"used_api_credits": 455332,
"plan_max_concurrency": 100,
"active_concurrency": 15,
"plan_renewal_date": "2024-08-24T10:41:04.000Z"
}