Oxylabs Residential Proxies: Web Scraping Guide
Oxylabs is a leading provider of premium proxy services and web scraping solutions. They offer a vast pool of IP addresses to help businesses gather data without getting blocked. Their services include residential proxies, data center proxies, and web scraping tools.
This guide aims to helps you understand and effectively use Oxylabs residential proxies for web scraping through detailed instructions, code examples, and practical tips.
- TLDR: How to Integrate Oxylabs Residential Proxy?
- Understanding Residential Proxies
- Oxylabs Residential Proxy Pricing
- Setting Up Oxylabs Residential Proxies
- Authentication
- Basic Request Using Oxylabs Residential Proxies
- Country Geotargeting
- City Geotargeting
- How to Use Static Proxies
- Error Codes
- Implementing Oxylabs Residential Proxies in Web Scraping
- Case Study: Scrape Zalando
- Alternative: ScrapeOps Residential Proxy Aggregator
- Ethical Considerations and Legal Guidelines
- Conclusion
- More Web Scraping Guides
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
TLDR: How to Integrate Oxylabs Residential Proxy?
For a quick integration guide:
-
Install Python Requests
pip install requests
-
Set Up Oxylabs Residential Proxy
import requests
username = "customer-USER"
password = "Your_password"
proxy = "pr.oxylabs.io:7777"
proxies = {
'http': f'http://{username}:{password}@{proxy}',
'https': f'http://{username}:{password}@{proxy}'
}
response = requests.get("https://example.com", proxies=proxies)
print(response.text)
In this script, we:
- Set Credentials and Proxy: We enter our Oxylabs credentials (
username
andpassword
) and the proxy address (pr.oxylabs.io:7777
). - Configure Proxies: Next, we create the
proxies
dictionary to set up the HTTP and HTTPS proxies using our credentials. - Send a Request: Then, we use the
requests.get
method to send a request to the target website (https://example.com
) through the configured proxies. - Print the Response: Finally, we print the response text from the target website.
Understanding Residential Proxies
Residential proxies are useful for various online activities where anonymity or geo-restricted access is important.
Here’s a breakdown of what residential proxies are and why they are valuable:
What Are Residential Proxies?
Residential proxies act as intermediaries between you and target websites, making your web traffic appear as if it’s coming from a real residential address.
When you use a residential proxy, websites see your requests as coming from a legitimate user, increasing your chances of bypassing blocks and avoiding detection.
Why Are Residential Proxies Important?
Residential proxies offer a layer of anonymity and security, essential for various online activities. They help you:
- Bypass Geo-Restrictions: Access content restricted to certain locations.
- Avoid IP Bans: Appear as different users, reducing the risk of being blocked.
- Improve Data Accuracy: Gather more accurate data for market research and analysis.
Types of Residential Proxies
To better understand the options available, let’s compare the two main types of residential proxies: rotating and static.
Rotating residential proxies and static residential proxies differ primarily in how they manage IP addresses.
-
Rotating Residential Proxies: Rotating residential proxies automatically change the IP address at regular intervals or with each request. This means that each request appears to come from a different IP address.
-
Static residential proxies: On the other hand, static residential proxies maintain the same IP address for the duration of the session or until manually changed.
Here’s a breakdown of their features:
Feature | Rotating Residential Proxies | Static Residential Proxies |
---|---|---|
IP Address | Changes with each request | Remains the same for an extended period |
Anonymity | High | Moderate |
Speed | Slower due to rotation process | Generally faster |
Management | Complex | Simpler |
Risk of Detection | Lower | Higher |
Residential vs. Data Center Proxies
Now, let’s look at how residential proxies compare with data center proxies. This will help you decide which type suits your needs better
Residential proxies and datacenter proxies differ significantly in their origins and how they are perceived by websites. Residential proxies use IP addresses assigned by Internet Service Providers (ISPs) to homeowners, meaning they are associated with real residential locations.
In contrast, datacenter proxies originate from data centers and cloud service providers rather than ISPs. These IP addresses are not tied to a physical location or residential address.
Feature | Residential Proxies | Data Center Proxies |
---|---|---|
IP Source | Real residential addresses from ISPs | Data centers and cloud service providers |
Anonymity | High | Lower |
Speed | Variable, often slower | Generally faster |
Cost | Higher | Lower |
Detection Risk | Lower | Higher |
Effectiveness for Geo-Access | High | Lower |
When Are Residential Proxies Useful?
Residential proxies are beneficial in various scenarios:
- Web Scraping and Data Collection: Ensure you get accurate data without being blocked by anti-scraping measures.
- SEO and SERP Analysis: Gather precise search engine results from different locations.
- Social Media Monitoring: Track social media activities and trends without being flagged.
- Ad Verification: Check if your ads are displayed correctly in different regions.
- Geo-Restricted Content Access: Stream content or access websites restricted to specific regions.
Why Choose Oxylabs Residential Proxies?
Oxylabs is a well-known provider of residential proxies, offering a reliable and high-performance solution for users who need access to a vast pool of real IP addresses.
Here are several reasons why Oxylabs' residential proxies stand out:
- Largest Residential Proxy Network: Oxylabs boasts one of the largest residential proxy networks in the world, with over 100 million IPs from real residential locations across 195 countries.
- High Anonymity and Security: With Oxylabs residential proxies, you can maintain a high level of anonymity as their IPs are sourced from legitimate residential devices.
- Global Coverage: Oxylabs provides residential proxies in nearly every country, ensuring that users have access to a broad range of geographic locations.
- Easy Integration and Use: Oxylabs’ residential proxies are easy to integrate into a wide range of applications and platforms.
Oxylabs is an excellent choice for anyone looking for reliable and high-quality residential proxies. With its massive proxy pool, global coverage, flexible pricing, and robust support, it caters to both small users and enterprises.
Let's check out the pricing of Oxylabs.
Oxylabs Residential Proxy Pricing
Oxylabs offers several pricing plans for their Residential Proxies, catering to different needs.
Pricing Table
Here is the updated pricing structure of Oxylabs residential proxies:
Plan | Price per GB | Monthly Price |
---|---|---|
7-day free trial eligible for verified company registration | FREE | FREE |
Pay as you go (Up to 50GB per month) | $8 /GB | No Commitment |
13 GB | $7.75 /GB | $99 + VAT / Billed monthly |
40 GB | $7.5 /GB | $300 + VAT / Billed monthly |
86 GB | $6.98 /GB | $600 + VAT / Billed monthly |
133 GB | $6.02 /GB | $800 + VAT / Billed monthly |
318 GB | $5.5 /GB | $1,750 + VAT / Billed monthly |
1 TB | $4 /GB | $4,000 + VAT / Billed monthly |
The above pricing structure shows that charges are primarily based on the bandwidth used. Each plan specifies a price per gigabyte (GB), allowing you to pay according to your data consumption.
For example, the "Pay as you go" plan costs $8 per GB, while larger plans like the 1 TB plan are priced at $4 per GB.
The Pay-As-You-Go plan offers flexibility, letting you pay based on the amount of data you use without committing to a fixed monthly fee. This is ideal for you if you have variable data needs or you prefer not to commit to a monthly subscription.
Pricing Comparison
Generally speaking, when proxy providers offer plans around $2-3 per GB, they are considered cheap. If they offer smaller plans in the $6-8 per GB range, they are more expensive.
For a detailed comparison of Oxylabs and other residential proxy providers, you can use our Proxy Comparison page. Check it out here.
Setting Up Oxylabs Residential Proxies
Creating an Oxylabs Account
To get started, visit the registration page. You can register with your Google account or use your username, email and password. We'll go with the option 2.
Fill in your details then click on the "Register" button.
After you completed the registration, Oxylabs will send you a verification link at the email you just registered.
Check your email "Inbox" and you will find an unread email entitled "Activate your Oxylabs account". Click on the "Activate your account" button in the body of the message.
And voila, you have completed the registration!
Now you can subscribe to residential proxies.
Purchase Residential Proxies from Oxylabs
Visit the pricing page. Select a proxy plan that suits you and click on the respective "Buy now" button. For example, let's go with the Pay-As-You-Go plan.
After clicking the "Buy now" button, we are redirected to another page to specify the type of user we are and if we have changed our mind, we can pick a different plan.
We are "Regular" user and will still go with the Pay-As-You-Go plan. Choose 8 GB and re-click the "Buy now" button.
Go with the default of "1GB" traffic and click on the "Continue" button.
Select payment method, agree with terms and conditions then click on the "Continue" button.
Fill in your payment details then click on the "Pay" button.
After successful payment, you will be redirected to a page to create your proxy user.
Alternatively, you create or update your proxy users from your dashboard.
Now you can authenticate your requests as shown below.
Authentication
When using Oxylabs to access proxy services, you have two primary methods for authenticating your requests:
- username and password or
- whitelisting an IP address.
These methods ensure that your requests are securely routed through Oxylabs' proxy servers.
Method 1: Username & Password Authentication
This method involves providing your Oxylabs account's username and password as part of your proxy configuration. This is a straightforward approach and is ideal when you need to use the proxy from multiple locations or devices.
Here’s a step-by-step guide to using username and password authentication:
-
Install the required packages: Ensure you have the
requests
library andpython-dotenv
to manage environment variables.pip install requests python-dotenv
-
Set up environment variables: Store your Oxylabs username and password in a
.env
file for security. Create a.env
file in your project directory and store your proxy user credentials:OXYLABS_USERNAME=your_oxylabs_username
OXYLABS_PASSWORD=your_oxylabs_password -
Load environment variables and configure the proxy: Use the
dotenv
library to load these variables into your script and configure the proxy settings.import requests
from dotenv import load_dotenv
import os
load_dotenv()
username = os.getenv("OXYLABS_USERNAME")
password = os.getenv("OXYLABS_PASSWORD")
proxy = "pr.oxylabs.io:7777"
proxies = {
'http': f'http://{username}:{password}@{proxy}',
'https': f'http://{username}:{password}@{proxy}'
}
response = requests.get("https://example.com", proxies=proxies)
print(response.text) -
Run your script: Execute your script to send a request through the Oxylabs proxy with authentication.
Note: Despite setting everything correctly, you may encounter a 407 error when sending requests via your new proxy user's credentials for the first time.
If that happens, change the proxy user's password and rerun your script. For more error codes and their solutions, check in the Error Codes section.
Method 2: Whitelisting an IP Address
Alternatively, you can whitelist your IP address with Oxylabs. This method is beneficial if you are accessing the proxy from a static IP address and prefer not to use credentials in your code.
Here’s how you can set up IP whitelisting:
- Log in to the Oxylabs Dashboard: First, log in to your Oxylabs account.
- Navigate to Whitelisting: On the left-hand side, select Residential Proxies and then Whitelist.
- Edit Whitelist: Click on Edit whitelist.
- Add IP Addresses: Enter up to 10 IP addresses in IPv4 format (xx.xx.xx.xx) and click "Submit"
Note: Ensure that these IP addresses are yours and that you are not using a proxy or VPN service when adding them. The SOCKS5 protocol does not support whitelisted IPs. Use other supported protocols for whitelisting.
Finding Your IP Address
Before whitelisting, you need to know your current IP address. Disconnect from any proxies or VPNs and then visit Oxylabs IP Location. The page will display your current IP address in a JSON file.
Code Example Using Whitelisted IP
Once you have whitelisted your IP addresses, you can configure your proxy without providing credentials:
import requests
# Configure proxy settings with whitelisted IP
proxy = "pr.oxylabs.io:7777"
proxies = {
'http': f'http://{proxy}',
'https': f'http://{proxy}'
}
# Make a request through the Oxylabs residential proxy
response = requests.get("https://example.com", proxies=proxies)
# Print the response content
print(response.text)
In the script above:
- We set up the
proxies
dictionary to configure the HTTP and HTTPS proxies using only the proxy address, as your IP is already whitelisted. - We use the
requests.get
function to send a GET request to"https://example.com"
through the configured proxy. The response from the server is printed to the console.
Basic Request Using Oxylabs Residential Proxies
To make requests using Oxylabs' residential proxies, you need to configure your request to route through their proxies.
Here’s a straightforward process to make requests using Oxylabs residential proxies:
-
Set Up Your Environment: Ensure you have the
requests
library installed, which allows you to send HTTP requests easily.pip install requests
-
Configure Proxy Settings: Use your Oxylabs credentials (username and password) and proxy address to configure the
requests
library. You need to specify the proxy server in the request settings. -
Make a Request: Send HTTP requests through the configured proxy to access web resources.
Code Example Using Python Requests
Let’s look at an example of how we can configure and use Oxylabs residential proxies in a Python script with the requests
library:
import requests
from dotenv import load_dotenv
import os
# Load environment variables from .env file
load_dotenv()
# Retrieve Oxylabs credentials from environment variables
username = os.getenv("OXYLABS_USERNAME")
password = os.getenv("OXYLABS_PASSWORD")
proxy = "pr.oxylabs.io:7777"
# Configure proxy settings
proxies = {
'http': f'http://{username}:{password}@{proxy}',
'https': f'http://{username}:{password}@{proxy}'
}
# Make a request through the Oxylabs residential proxy
response = requests.get("https://datadome.co/", proxies=proxies)
# Print the response content
print(response.text)
In the script above:
- We use the
dotenv
library to load our Oxylabs username and password from a.env
file. This practice keeps our credentials secure and out of our code. - Then, we set up the
proxies
dictionary to configure the HTTP and HTTPS proxies using our Oxylabs credentials and proxy address. - After, we use the
requests.get
function to send a GET request to"https://datadome.co/"
through the configured proxy. - Finally, we print the response from the server to the console.
Handling Proxy Errors
When using proxies, you may encounter errors such as connection timeouts or proxy failures. To handle these gracefully:
try:
response = requests.get("https://example.com", proxies=proxies, timeout=10)
response.raise_for_status()
except requests.exceptions.ProxyError:
print("Proxy error occurred. Please check your proxy settings.")
except requests.exceptions.Timeout:
print("The request timed out. Try increasing the timeout or check your internet connection.")
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
This code catches common exceptions, providing informative messages that help diagnose and fix issues with proxy configurations.
Country Geotargeting
Geotargeting allows you to connect to proxy servers in specific geographic locations, enabling you to bypass geo-restrictions and access content as if you were a local user in that area.
This capability is particularly useful for tasks such as market research, competitor analysis, and testing localized content.
Oxylabs provides extensive support for country-level geotargeting with their residential proxies. Their network covers 195 countries, offering a wide range of options for accessing location-specific content.
Top 10 Countries Supported by Oxylabs
Here's a table showing 10 popular countries supported by Oxylabs, along with their specific entry nodes:
Country | Entry Node |
---|---|
USA | us-pr.oxylabs.io:10000 |
Canada | ca-pr.oxylabs.io:30000 |
Great Britain | gb-pr.oxylabs.io:20000 |
Germany | de-pr.oxylabs.io:30000 |
France | fr-pr.oxylabs.io:40000 |
Spain | es-pr.oxylabs.io:10000 |
Italy | it-pr.oxylabs.io:20000 |
Sweden | se-pr.oxylabs.io:30000 |
Greece | gr-pr.oxylabs.io:40000 |
Portugal | pt-pr.oxylabs.io:10000 |
Using Country-Specific Proxies
To use country-specific proxies with Oxylabs, you have two main options:
- Country-Specific Entry Nodes: You can connect to a specific country's proxy pool by using the country's dedicated entry node. For example, to use a proxy from the USA, you would connect to
us-pr.oxylabs.io:10000
. - Country Code Parameter: Alternatively, you can add a
cc
flag to the authorization header, specifying the desired country code. This method allows you to use the main entry point while still targeting a specific country.
Let's examine both approaches:
Method A: Country-Specific Entry Nodes
import requests
from dotenv import load_dotenv
import os
load_dotenv()
username = os.getenv("OXYLABS_USERNAME")
password = os.getenv("OXYLABS_PASSWORD")
# Target country (Germany)
proxy = "de-pr.oxylabs.io:30000"
proxies = {
'http': f'http://{username}:{password}@{proxy}',
'https': f'http://{username}:{password}@{proxy}'
}
response = requests.get("https://example.com", proxies=proxies)
print(response.text)
Method B: Country Code Parameter
from dotenv import load_dotenv
import os
import requests
load_dotenv()
username = os.getenv("OXYLABS_USERNAME")
password = os.getenv("OXYLABS_PASSWORD")
# Target country (Germany)
country = 'DE'
entry = f'http://customer-{username}-cc-{country}:{password}@pr.oxylabs.io:7777'
proxies = {
'http': entry,
'https': entry,
}
response = requests.get('https://example.com', proxies=proxies)
print(response.text)
Both methods achieve the same goal of routing requests through a proxy in a specific country (Germany in these examples). However, they differ in how they specify the target country:
- Method A uses a country-specific entry node (
de-pr.oxylabs.io:30000
for Germany). - Method B uses the general entry point (
pr.oxylabs.io:7777
) and specifies the country in the username, wherecc
stands for "country code".
Both methods are valid and have their use cases. Method A might be preferred when consistently targeting the same country, while Method B offers more flexibility for frequently changing target countries.
City Geotargeting
City-level geotargeting allows you to connect to proxy servers in specific cities, enabling you to access hyper-local content and conduct precise market research.
This capability is particularly useful for tasks such as local SEO analysis, city-specific price monitoring, and testing localized advertising campaigns.
Oxylabs supports city-level targeting within their residential proxy network, offering an impressive level of granularity for your geotargeting needs.
Top 10 Cities Supported by Oxylabs
While Oxylabs supports every city in the world, here's a table showing 10 popular cities you can target, along with their country codes:
City | Country | Code Parameter |
---|---|---|
New York | US | cc-US-city-los_angeles |
London | GB | cc-GB-city-london |
Paris | FR | cc-FR-city-paris |
Berlin | DE | cc-DE-city-berlin |
Madrid | ES | cc-ES-city-barcelona |
Rome | IT | cc-IT-city-rome |
Stockholm | SE | cc-SE-city-stockholm |
Athens | GR | cc-GR-city-athens |
Lisbon | PT | cc-PT-city-lisbon |
Toronto | CA | cc-CA-city-toronto |
It's important to note that while Oxylabs supports all cities worldwide, the availability of proxies in a specific city at any given time may vary due to the dynamic nature of residential proxies.
Using City-Specific Proxies
To use city-specific proxies with Oxylabs, you need to add both the country code (cc
) and city
parameters to your request. The format is as follows:
cc-[COUNTRY_CODE]-city-[CITY_NAME]
For example, to target Munich, Germany, you would use: cc-DE-city-munich
Here's a Python code example demonstrating how to use Oxylabs residential proxies with city-specific targeting:
import requests
from dotenv import load_dotenv
import os
load_dotenv()
username = os.getenv("OXYLABS_USERNAME")
password = os.getenv("OXYLABS_PASSWORD")
# Target city (Munich, Germany)
country = 'DE'
city = 'munich'
entry = f'http://customer-{username}-cc-{country}-city-{city}:{password}@pr.oxylabs.io:7777'
proxies = {
'http': entry,
'https': entry,
}
response = requests.get('https://example.com', proxies=proxies)
print(response.text)
Explanation:
- We import the necessary libraries and load environment variables.
- We set up our Oxylabs credentials and specify the target country and city (Munich, Germany in this case).
- We configure the proxy URL, including both the country code and city in the username.
- We make a request to
https://example.com
using the configured proxy. - Finally, we print the response.
By changing the country
and city
variables, you can easily target different cities for your requests. This allows you to access city-specific content or test your applications from various urban locations around the world.
Remember that while Oxylabs supports a vast number of cities, the availability of proxies in very specific locations may vary. For a complete list of supported cities, you can refer to the City_list.csv file provided by Oxylabs.
How to Use Static Proxies
Static proxies, also known as ISP proxies, offer a powerful combination of residential and datacenter proxies' strengths. These proxies give you a consistent IP address that remains the same across sessions, providing the high anonymity of residential proxies and the speed of datacenter proxies.
You'll find static proxies particularly useful when you need reliability and a stable IP for long-term tasks.
Key Benefits of Static Proxies
By using static proxies, you gain several advantages:
- High Speed: Since static proxies originate from datacenter infrastructure, you get fast and efficient connections.
- Enhanced Anonymity: With IPs assigned by ISPs, these proxies provide a higher level of legitimacy and anonymity.
- Reliability: You can count on static proxies for consistent performance, thanks to their ISP-backed infrastructure.
- No IP Rotation: You don't have to worry about rotating IPs, which simplifies long sessions or tasks that require a consistent IP.
- Private Access: You can set up static proxies as private, ensuring that only you use that specific IP address.
Common Use Cases for Static Proxies
Static proxies are particularly beneficial when IP rotation isn't an option or could cause disruptions:
- E-commerce Activities: When you're making purchases or managing accounts on e-commerce sites, IP rotation might lead to blocks or bans. Static proxies help you maintain a continuous session, preventing such issues.
- Social Media Management: If you need to create and manage multiple social media accounts, static proxies provide a stable IP that avoids re-authentication problems.
- Brand Protection: You can use static proxies to monitor the web for brand abuse, like copyright infringement, without being detected as a bot.
- Web Scraping: Static proxies enable you to scrape the web quickly and reliably, while appearing as a real user, making it less likely to be flagged or blocked.
Example of Using Static Proxies with Python
Here’s how we can set up and use static proxies in Python:
import requests
from dotenv import load_dotenv
import os
load_dotenv()
username = os.getenv("OXYLABS_USERNAME")
password = os.getenv("OXYLABS_PASSWORD")
# Static proxy server configuration
proxy = "isp.oxylabs.io:8001"
proxies = {
'http': f'http://{username}:{password}@{proxy}',
'https': f'http://{username}:{password}@{proxy}'
}
# Making a request through the static proxy
response = requests.get("https://example.com", proxies=proxies)
print(response.text)
Explanation:
- We start by importing the necessary libraries and loading our environment variables.
- We then set up the static proxy configuration, using a proxy server (
isp.oxylabs.io:8001
). This server ensures that our IP remains consistent across all sessions. You can buy Oxylabs Static IP proxies here. - Finally, we make a request to
https://example.com
through the static proxy and print the response.
By using static proxies in this way, we combine speed, anonymity, and reliability, making them a great choice for various online tasks.
Error Codes
When using a proxy, you might encounter various HTTP error codes that indicate issues with your connection.
Below is a list of common proxy error codes along with explanations and suggested solutions to help you manage and resolve these issues effectively:
Error Code | Explanation | Solution |
---|---|---|
100 - Continue | The server has received the request header, and you can proceed with sending the body of the request. | Typically, no action is needed unless additional instructions are given. |
101 - Switching Protocols | The server is switching communication protocols as requested by the client. | No action needed; the server has acknowledged the protocol switch. |
102 - Processing (WebDav) | The server is processing a complex request and has not yet completed it. | Wait for the server to complete the request processing. |
103 - Early Hints | The server is about to send a final response and provides preliminary information. | No action needed; the final response will follow. |
301 - Moved Permanently | The requested resource has been permanently moved to a new URL. | Follow the new URL provided by the server. |
305 - Use Proxy | The requested resource can only be accessed via a proxy. | Connect to the specified proxy server and retry the request. |
306 - Switch Proxy | The client should use a different proxy server for the request. | Connect using a different proxy server. |
307 - Temporary Redirect | The client is temporarily redirected to a different location. | Follow the redirect and make the request again. |
400 - Bad Request | The request contains errors or malformed syntax. | Review and correct the request, then try again. |
401 - Unauthorized | Authentication is required to access the resource. | Provide the necessary authorization details. |
403 - Forbidden | Access to the requested resource is forbidden. | Verify permissions and credentials, and ensure you are authorized to access the resource. |
404 - Not Found | The requested resource is not available at the specified URL. | Double-check the URL and try again. |
407 - Proxy Authentication Required | Authentication with the proxy server is required. | Update proxy server settings with correct credentials and whitelisted IPs. |
408 - Request Timeout | The server timed out waiting for the client’s request. | Check your internet connection and retry the request. |
429 - Too Many Requests | Too many requests have been sent in a short period from the same IP. | Rotate IP addresses and introduce time delays between requests. |
502 - Bad Gateway | The proxy or gateway received an invalid response from the upstream server. | Clear cache and cookies, and try changing DNS settings. |
503 - Service Unavailable | The server is currently unable to handle the request, possibly due to overload or maintenance. | Rotate your IP address or try using a different proxy server. |
Understanding these error codes and implementing the appropriate solutions will help ensure smoother operations and fewer interruptions during your scraping activities.
KYC (Know-Your-Customer) Verification
Oxylabs upholds strict Know-Your-Customer (KYC) standards to safeguard customers, consumers, and the internet from malicious use of their solutions.
Oxylabs uses a multi-layered KYC approach that separates each customer into several distinct stages, depending on automated and manual initial assessment. However, the end goal is the same – to create accountability and identify the person(s) intending to purchase their services.
-
Initial Assessment: Customers provide personal and business information, verified through independent sources. The goal is to ensure accountability and prevent misuse.
-
Verification: Depending on risk factors, ID verification, compliance calls, and risk questionnaires may be required.
-
Post-onboarding: Continued due diligence is conducted to ensure that services are used as agreed. Regardless of the results gleaned from the KYC questionnaire, some use cases are outright forbidden and are not subject to assessment or negotiations.
Check out the KYC policy of Oxylabs to get more information about the process.
Implementing Oxylabs Residential Proxies in Web Scraping
In this section, we will demonstrate how to use Oxylabs residential proxies with various libraries. We'll focus on US geotargeting and rotating proxies, using the same example across different libraries to scrape the title
text from https://example.com
.
Python Requests
To integrate Oxylabs proxies with Python Requests, follow these steps:
-
Set up the proxy and authentication:
username = "customer-USER"
password = "Your_password"
proxy = "pr.oxylabs.io:7777"
proxies = {
'http': f'http://{username}:{password}@{proxy}',
'https': f'http://{username}:{password}@{proxy}'
} -
Use the proxy to scrape the title:
"""
Run `pip install requests beautifulsoup4` to install the libaries
"""
import requests
from bs4 import BeautifulSoup
url = "https://example.com"
response = requests.get(url, proxies=proxies)
soup = BeautifulSoup(response.content, "html.parser")
title = soup.title.string
print(title)
In this example, we first configure the proxy settings with the required authentication. Then, we use requests.get
to fetch the webpage content and BeautifulSoup
to extract the title text.
Python Selenium
To set up Oxylabs proxies with Selenium for browser automation, follow these steps:
-
Download the Proxy Auth Extension: Get authentication extension from GitHub.
-
Configure the proxy and authentication:
"""
Run `pip install selenium webdriver-manager` to install the libaries.
"""
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from proxy_auth_extension import create_proxy_auth_extension
import os
username = "customer-USER"
password = "Your_password"
proxy_host = "pr.oxylabs.io"
proxy_port = 7777
proxy_auth_extension = create_proxy_auth_extension(proxy_host, int(proxy_port), username, password)
# Set up Chrome options
chrome_options = Options()
chrome_options.add_extension(proxy_auth_extension)
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=chrome_options)
driver.get('https://example.com')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, "title")))
# Scrape the title
title = driver.title
print(f"Scraped Title: {title}")
os.remove(proxy_auth_extension)
We configure the Selenium WebDriver to use Oxylabs proxies by adding the proxy authentication extension to Chrome options. We then navigate to the target URL, extract the title text, and remove the extension.
Python Scrapy
To integrate Oxylabs proxies with Scrapy for web scraping, follow these steps:
-
Set up the proxy in Scrapy settings:
# settings.py
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 1,
}
HTTP_PROXY = 'http://customer-USER:Your_password@pr.oxylabs.io:7777' -
Create a Scrapy spider to scrape the title:
import scrapy
class ExampleSpider(scrapy.Spider):
name = "example"
start_urls = ["https://example.com"]
def parse(self, response):
title = response.xpath('//title/text()').get()
self.log(f'Title: {title}')
In this example, we configure the proxy settings in Scrapy's settings file. We then create a spider that fetches the target URL and extracts the title text using XPath.
Node.js Puppeteer
To set up Oxylabs proxies with Puppeteer for browser automation, follow these steps:
import puppeteer from 'puppeteer';
const run = async () => {
try {
const browser = await puppeteer.launch({
args: [
`--proxy-server=pr.oxylabs.io:7777`
]
});
const page = await browser.newPage();
await page.authenticate({
username: 'customer-USER',
password: 'Your_password'
});
await page.goto('https://example.com');
const title = await page.title();
console.log('Page title:', title);
await browser.close();
} catch (error) {
console.error('An error occurred:', error);
}
}
run();
We configure Puppeteer to use Oxylabs proxies by launching the browser with the proxy server argument. We then authenticate the proxy, navigate to the target URL, and extract the title text.
Node.js Playwright
To set up Oxylabs proxies with Playwright for browser automation, follow these steps:
import { chromium } from 'playwright'; // Run `npx playwright install` to download headless browsers
const run = async () => {
const browser = await chromium.launch({
proxy: {
server: 'http://pr.oxylabs.io:7777',
username: 'customer-USER',
password: 'Your_password'
}
});
const page = await browser.newPage();
await page.goto('https://example.com');
const title = await page.title();
console.log(title);
await browser.close();
}
run();
We configure Playwright to use Oxylabs proxies by launching the browser with the proxy
settings. We then authenticate the proxy, navigate to the target URL, and extract the title text.
Case Study: Scrape Amazon Prices with Oxylabs Proxies
In this case study, we will scrape price information for a product on Amazon's Spanish and Portuguese websites using Puppeteer and Oxylabs proxies. This demonstrates how regional pricing strategies can be observed by changing IP addresses.
We will:
- Configure Puppeteer to use Oxylabs proxies by launching the browser with the proxy server argument,
- Authenticating the proxy,
- Navigating to the target URL, and
- Extracting the product title and price.
Code Example
import puppeteer from 'puppeteer';
import dotenv from 'dotenv';
import { fileURLToPath } from 'url';
import { dirname, join } from 'path';
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
dotenv.config({ path: join(__dirname, '.env') });
const username = process.env.OXYLABS_USERNAME;
const password = process.env.OXYLABS_PASSWORD;
if (!username || !password) {
console.error('Please set OXYLABS_USERNAME and OXYLABS_PASSWORD in your .env file');
process.exit(1);
}
const spanish_proxy = "es-pr.oxylabs.io:10000";
const portuguese_proxy = "pt-pr.oxylabs.io:10000";
async function scrapeAmazonPrice(proxy, url) {
const browser = await puppeteer.launch({
args: [`--proxy-server=${proxy}`],
headless: false, // Set to true for production
});
try {
const page = await browser.newPage();
await page.authenticate({ username, password });
console.log(`Navigating to ${url} using proxy ${proxy}`);
await page.goto(url, { waitUntil: 'networkidle2', timeout: 60000 });
// Handle the pop-up
try {
console.log('Checking for pop-up...');
await page.waitForSelector('input[id="sp-cc-accept"]', { timeout: 5000 });
console.log('Pop-up found. Clicking "Aceptar" button...');
await page.click('input[id="sp-cc-accept"]');
await page.waitForNavigation({ waitUntil: 'networkidle2' });
console.log('Pop-up handled successfully.');
} catch (error) {
console.log('No pop-up found or unable to click. Proceeding with scraping.');
}
// Wait for the price element to load
await page.waitForSelector('.a-price', { timeout: 30000 });
// Extract the price
const price = await page.evaluate(() => {
const priceElement = document.querySelector('.a-price .a-offscreen');
if (!priceElement) {
console.log('Price element not found. HTML:', document.body.innerHTML);
return 'Price not found';
}
return priceElement.textContent.trim();
});
// Extract the product title
const title = await page.evaluate(() => {
const titleElement = document.querySelector('#productTitle');
return titleElement ? titleElement.textContent.trim() : 'Title not found';
});
console.log(`Scraped data - Title: ${title}, Price: ${price}`);
return { title, price };
} catch (error) {
console.error('An error occurred:', error);
return { title: 'Error', price: 'Error', error: error.message };
} finally {
await browser.close();
}
}
async function compareAmazonPrices(productUrl) {
console.log('Scraping with Spanish IP...');
const spanishResult = await scrapeAmazonPrice(spanish_proxy, productUrl);
console.log('Scraping with Portuguese IP...');
const portugueseResult = await scrapeAmazonPrice(portuguese_proxy, productUrl);
console.log('\nResults:');
console.log('Spanish IP:', spanishResult); // e.g., 21.20 EUR
console.log('Portuguese IP:', portugueseResult); // e.g., 14.75 EUR
if (spanishResult.price !== portugueseResult.price) {
console.log('\nThe price differs based on the IP address used.');
} else if (spanishResult.price === 'Price not found' || portugueseResult.price === 'Price not found') {
console.log('\nUnable to compare prices due to scraping issues.');
} else {
console.log('\nThe price is the same for both IP addresses.');
}
}
// Example usage
const amazonProductUrl = 'https://www.amazon.es/Harry-Potter-Crochet-Kits/dp/1684128870';
compareAmazonPrices(amazonProductUrl);
- Environment Setup:
- We import necessary modules and configure environment variables using
dotenv
. - Oxylabs credentials (username and password) are fetched from a
.env
file.
- We import necessary modules and configure environment variables using
- Proxy and Target URL:
- The function
scrapeAmazonPrice
is configured to scrape Amazon product data while using Spanish and Portuguese proxies. - Puppeteer handles proxy server arguments, allowing for requests from different locations.
- The function
- Data Extraction:
- The product title and price are extracted from the Amazon product page.
- Puppeteer interacts with pop-ups, which can often appear on Amazon pages.
Comparison of Prices: Spain vs Portugal
This example compares the price of the product Harry Potter Crochet Kit from Amazon's Spanish and Portuguese versions.
- Spanish Price: €21,20
- Portuguese Price: €14,75
The results show that the price may vary depending on the region, highlighting the importance of regional pricing strategies. There might be various reasons that impacts the pricing such as:
- Regional Pricing Strategies: Like many companies, Amazon adjusts prices based on region to reflect local market conditions, taxes, and shipping costs.
- Local Competition and Demand: Local demand and competition can influence pricing, causing variations between regions.
- Currency and Economic Conditions: Exchange rates and the general economic conditions of a region can also lead to different pricing models.
Troubleshooting Tips
- Verify Regional Content: Ensure that you are accessing the correct regional content by checking the website's URL and any location settings.
- Use Accurate Proxies: For accurate results, use residential proxies specific to the country you are targeting.
- Monitor Element Changes: Website structure might differ across regions; ensure your selectors are correctly targeting the desired elements.
- Analyze HTTP Responses: Check for any region-specific redirects or content adjustments in the HTTP responses.
Alternative: ScrapeOps Residential Proxy Aggregator
The ScrapeOps Residential Proxy Aggregator offers a compelling alternative to traditional proxy providers.
With its unique features and competitive pricing, it stands out as a robust solution for web scraping needs. Here’s why you should consider using it:
- Competitive Pricing: ScrapeOps offers lower pricing, allowing you to maximize your budget while maintaining high quality.
- Flexible Plans: With ScrapeOps, you have access to a wider variety of plans, including smaller, more affordable options tailored to your needs. The best part? You can start using the proxies with a free trial account.
- Enhanced Reliability: By leveraging multiple proxy providers through a single port, ScrapeOps offers greater reliability. If one provider faces issues, your requests can seamlessly switch to another, ensuring continuous access.
Using ScrapeOps with Python Requests
Here is an example of how to use the ScrapeOps Residential Proxy Aggregator with Python Requests:
import requests
from bs4 import BeautifulSoup
from dotenv import load_dotenv
import os
load_dotenv()
username = 'scrapeops'
api_key = os.getenv("SCRAPEOPS_API_KEY")
proxy = 'residential-proxy.scrapeops.io'
port = 8181
proxies = {
"http": f"http://{username}:{api_key}@{proxy}:{port}"
}
response = requests.get('https://plainenglish.io/', proxies=proxies)
# Check if the request was successful
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')
# Find the first 10 items with class 'mob-col-100'
items = soup.find_all(class_='mob-col-100', limit=10)
for i, item in enumerate(items, 1):
print(f"Item {i}: {item.get_text(strip=True)}")
else:
print(f"Failed to retrieve content. Status code: {response.status_code}")
Explanation:
Proxy Configuration:
- We set up the proxy configuration by specifying the username (
scrapeops
), the API key (api_key
), the proxy server (residential-proxy.scrapeops.io
), and the port (8181
). This configuration allows us to route our HTTP and HTTPS requests through the ScrapeOps Residential Proxy. - Our proxy dictionary
proxies
contains bothhttp
andhttps
proxy settings, ensuring that all types of requests are routed through the proxy.
Authentication and Routing:
- We use basic authentication (
http://username:api_key@proxy:port
). We combine theusername
andapi_key
with the proxy server and port to create the authenticated proxy URL. - This setup ensures that our requests made using the
requests
library are authenticated and routed through ScrapeOps’ residential proxies. This provides access to multiple IP addresses, making it harder for target websites to block our requests.
Request Execution:
- We use the
requests.get
function to make a GET request tohttps://zalando.be/
with theproxies
parameter. This means our request is sent via the ScrapeOps Residential Proxy. - By using the proxy, our request benefits from automatic IP rotation, residential IPs, and other anti-bot measures provided by ScrapeOps. This enhances the success rate of our web scraping and reduces the risk of being blocked.
Handling the Response:
- We check the response from the target website for a successful status code (
200
). If the request is successful, we parse the response content using BeautifulSoup to extract and print the first 10 items with the classmob-col-100
. - If the request fails, we print the status code, indicating an issue with retrieving the content. This could be due to various reasons like network issues, proxy configuration problems, or target site restrictions.
Apart from enjoying the benefits of using one of the most reliable and affordable residential proxies, you have a chance to start using ScrapeOps Residential Proxy Aggregator without paying any amount. All you do is take advantage of the free trial offering 100MB of free bandwidth.
For more details, visit the documentation.
Ethical Considerations and Legal Guidelines
When using residential proxies for web scraping, it's crucial to consider the ethical implications and legal responsibilities.
- Ethical Sourcing of Proxies: When choosing a proxy provider, it's crucial to consider if they promote their proxies as being ethically sourced. This means ensuring that the underlying IP holder has opted in for their IP address to be used for data gathering.
- Oxylabs' Policies: Oxylabs is a strong advocate of ethical business practices, operating strictly within the capacities of an established legitimate proxy pool. This ensures that their residential proxies are ethically sourced and that end-users have given documented and explicit consent.
- Importance of Scraping Ethically: Oxylabs has established clear standards for residential proxy acquisition. They emphasize fairness and transparency in their operations, ensuring that residential proxies are obtained with the full consent of the IP holders. Their approach includes rewarding network participants and maintaining high standards of ethics and transparency throughout the procurement process.
- User Consent and Awareness: Oxylabs' policies require that people who choose to share their unused internet traffic are presented with clear information about their participation. The intention to share internet traffic with third parties must be explicitly stated in the Terms and Conditions.
- Supplier Vetting: Oxylabs has a strict vetting process for their residential proxy providers. They have set explicit contractual obligations to ensure that end-users are aware and that their consent is documented. They are committed to terminating collaborations with providers who fail to meet these high standards.
For more information, refer to Oxylabs' Residential Proxy Pool Handbook and their whitepaper on proxy procurement processes and policies.
Conclusion
Residential proxies are crucial for web scraping, offering anonymity and helping to bypass geo-restrictions and anti-bot defenses. Throughout this guide, we've explored the world of residential proxies, with a particular focus on Oxylabs' offerings.
Oxylabs provides a robust residential proxy service with global coverage and advanced targeting options, while practical examples showed how to integrate them with popular scraping tools.
As you embark on your web scraping projects, we encourage you to implement residential proxies to enhance your data gathering capabilities. By using residential proxies, you can effectively avoid IP bans, overcome rate limiting, and access geo-restricted content. However, always remember to scrape responsibly, respecting website terms of service and adhering to ethical guidelines.
More Web Scraping Guides
At ScrapeOps, we offer a wide range of learning resources for every skill level—whether you're just starting out or an experienced developer, we've got something for you.
If you would like to learn more about Web Scraping with Python, then be sure to check out Python Web Scraping Playbook or check out one of our more in-depth guides: