IPRoyal Residential Proxies: Web Scraping Guide
IPRoyal is a top-tier provider of reliable proxy services, offering over 34 million proxies. They specialize in residential proxies that ensure anonymity and high-speed connections for various web scraping needs.
Their rotating and static residential proxies provide unmatched flexibility and control, helping businesses gather data efficiently without the risk of being blocked. IPRoyal also offers data center and mobile proxies, ensuring seamless integration with over 650 tools.
This guide will help you set up and integrate IPRoyal residential proxies into your web scraping scripts, highlighting their benefits and practical applications.
- TLDR: How to Integrate IPRoyal Residential Proxy?
- Understanding Residential Proxies
- Why Choose IPRoyal Residential Proxies?
- IPRoyal Residential Proxy Pricing
- Setting Up IPRoyal Residential Proxies
- Authentication
- Basic Request Using IPRoyal Residential Proxies
- Country Geotargeting
- City Geotargeting
- Error Codes
- KYC Verification
- Implementing IPRoyal Residential Proxies in Web Scraping
- Case Study: Scrape Zalando
- Alternative: ScrapeOps Residential Proxy Aggregator
- Ethical Considerations and Legal Guidelines
- Conclusion
- More Web Scraping Guides
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
TLDR: How to Integrate IPRoyal Residential Proxy?
Here are straightforward steps to get you up and running in seconds:
-
Install the Python
requests
package:pip install requests
-
Set Up IPRoyal Residential Proxy:
import requests
username = "your_proxy_username"
password = "your_proxy_password"
port = "your_proxy_port"
proxy = f"geo.iproyal.com:{port}"
proxies = {
"http": f"http://{username}:{password}@{proxy}",
"https": f"http://{username}:{password}@{proxy}"
}
response = requests.get("https://example.com", proxies=proxies)
print(response.text)
In this script:
- Set Credentials and Proxy: Provide your IPRoyal credentials (
username
andpassword
) along with the proxy address (geo.iproyal.com:<port>
). - Configure Proxies: Create a
proxies
dictionary to set up HTTP and HTTPS proxies using your credentials. - Send a Request: Use the
requests.get
method to send a request to the target website (https://example.com
) via the configured proxies. - Print the Response: Finally, print the response text received from the target website.
With that, you should be able to start using IPRoyal's residential proxies.
What if you want to find out more? Let's dive right into the next sections!
Understanding Residential Proxies
What Are Residential Proxies?
Residential proxies function as intermediaries between you and the target websites, making your web traffic appear as if it’s coming from a legitimate residential address.
When you use a residential proxy, the target website sees your requests as originating from a real user, increasing the chances of bypassing detection and avoiding IP bans.
Types of Residential Proxies
There are two main types of residential proxies: rotating and static.
-
Rotating Residential Proxies: These proxies automatically change the IP address with each request or at regular intervals, providing high anonymity but potentially slower speeds.
-
Static Residential Proxies: These proxies maintain the same IP address for the duration of a session, offering consistent performance but with a higher risk of detection.
Here's a comparison of these two proxy types:
Feature | Rotating Residential Proxies | Static Residential Proxies |
---|---|---|
IP Address | Changes with each request | Remains the same for an extended period |
Anonymity | High | Moderate |
Speed | Slower due to rotation process | Generally faster |
Management | Complex | Simpler |
Risk of Detection | Lower | Higher |
Residential vs. Data Center Proxies
Residential proxies are distinct from data center proxies, primarily in their origin and how they are perceived by websites.
Residential proxies use IP addresses assigned by ISPs to real households, making them appear more authentic to websites. In contrast, data center proxies are generated by data centers and are not tied to a physical location, making them easier to detect and block.
Here's a comparison table for residential and data center proxies to better understand the differences:
Feature | Residential Proxies | Data Center Proxies |
---|---|---|
IP Source | Real residential addresses from ISPs | Data centers and cloud service providers |
Anonymity | High | Lower |
Speed | Variable, often slower | Generally faster |
Cost | Higher | Lower |
Detection Risk | Lower | Higher |
Effectiveness for Geo-Access | High | Lower |
When Are Residential Proxies Useful?
Residential proxies are highly beneficial for various tasks:
- Web Scraping and Data Collection: Ensures accurate data without getting blocked by anti-scraping mechanisms.
- SEO and SERP Analysis: Gathers precise search engine results from different locations.
- Social Media Monitoring: Tracks trends and activities on social media platforms without being flagged.
- Ad Verification: Checks the correct display of ads in different regions.
- Geo-Restricted Content Access: Allows access to content limited to specific geographical areas.
Why Choose IPRoyal Residential Proxies?
IPRoyal’s residential proxies stand out with competitive pricing and a vast network of over 32 million IPs across 195 countries. This extensive pool supports efficient web scraping, SEO research, and data aggregation.
Here are top 3 reasons to choose IPRoyal residential proxies:
- Exceptional Coverage: Access a diverse range of IP addresses worldwide, reducing detection risks and enhancing data collection accuracy.
- Precise Targeting: Select proxies from any country, state, or city effortlessly, ensuring your data is as specific as needed without additional costs.
- Flexible and Reliable: Enjoy pay-as-you-go pricing with no contracts or expiration on purchased traffic, combined with 24/7 support and advanced technical features for seamless integration.
Next, it would help to find out IPRoyal residential proxies' pricing plans before learning how to apply them in various ways.
IPRoyal Residential Proxy Pricing
IPRoyal offers flexible and competitive pricing for its residential proxies, catering to various usage needs and budgets. Their pricing structure primarily revolves around bandwidth used, rather than charging per individual IP address or concurrency.
They provide a Pay-As-You-Go plan, with costs varying depending on the amount of bandwidth purchased. Here's a detailed look at their pricing:
Pricing Table
Plan Name | Plan Size (GB) | Cost per GB | Price |
---|---|---|---|
1GB Plan | 1 | $7.00 | $7.00 |
2GB Plan | 2 | $5.95 | $11.90 |
10GB Plan | 10 | $5.25 | $52.50 |
50GB Plan | 50 | $4.90 | $245.00 |
100GB Plan | 100 | $4.55 | $455.00 |
250GB Plan | 250 | $4.20 | $1,050.00 |
500GB Plan | 500 | $3.50 | $1,750.00 |
1000GB Plan | 1000 | $3.15 | $3,150.00 |
3000GB Plan | 3000 | $2.80 | $8,400.00 |
5000GB Plan | 5000 | $2.45 | $12,250.00 |
10000+GB Plan | 10000+ | Reach Out for a Special Deal! | Talk to Sales |
Pricing Comparison
IPRoyal's pricing is generally competitive compared to other residential proxy providers:
- Cheap Providers: Providers offering plans around $2-3 per GB are considered cheaper.
- Expensive Providers: Those with smaller plans priced in the $6-8 per GB range are more expensive.
For a comprehensive comparison of different residential proxy providers, including IPRoyal, you can visit our Proxy Comparison page. This resource will help you evaluate various options and find the best fit for your needs.
Setting Up IPRoyal Residential Proxies
Creating an IPRoyal Account
To get started, visit the registration page. You can register using your LinkedIn, Google or email.
Say you go with Google signup. Click on the "Login with Google" button, enter your login credentials then add phone number if prompted to.
After that, you will be redirected to your dashboard, from where you can buy Royal Residential proxies.
Selecting and Purchasing Residential Proxies
After logging in, navigate to the residential proxy pricing page.
Select a proxy plan that fits your needs. Scroll down the page and click on the "Continue" button. For example, I have selected the $7/1G package.
Follow the prompts to complete your purchase. Once your payment is processed, you will be redirected to residential proxies page and you can start setting up your residential proxies.
Scroll down the page to see or update your proxy credentials e.g username, password, port.
Authentication
When using IPRoyal proxies, you have two primary methods to authenticate your requests:
- username and password
- IP whitelisting
Method 1: Username & Password Authentication
This method involves adding your residential proxy’s username and password into the proxy configuration.
Here’s how you can set it up:
-
Install Required Packages:
pip install requests python-dotenv
-
Set Up Environment Variables:
Create a
.env
file in your project directory to securely store your IPRoyal credentials:IPROYAL_USERNAME=your_username
IPROYAL_PASSWORD=your_password
IPROYAL_PORT=your_proxy_port
Loading and Configuring Proxy Settings
To securely load your credentials from the .env
file and configure your proxy settings, follow these steps:
-
Load Environment Variables:
Use the
python-dotenv
package to load your credentials:from dotenv import load_dotenv
import os
load_dotenv()
username = os.getenv("IPROYAL_USERNAME")
password = os.getenv("IPROYAL_PASSWORD")
port = os.getenv("IPROYAL_PORT")
proxy = f"geo.iproyal.com:{port}" -
Set Up Proxies Dictionary:
Create a dictionary that contains your proxy settings:
proxies = {
"http": f"http://{username}:{password}@{proxy}",
"https": f"http://{username}:{password}@{proxy}"
} -
Send a Request Through the Proxy:
Use the
requests
library to send a request via the configured proxy:import requests
response = requests.get("https://example.com", proxies=proxies)
print(response.text)
This setup ensures that your requests are authenticated and routed through IPRoyal's residential proxies.
Method 2: IP Whitelisting Authentication
An alternative method for authenticating with IPRoyal residential proxies is IP whitelisting. This method allows you to authorize specific IP addresses to use the proxies without needing to include a username and password in your requests.
Here's how you can set it up:
-
Log in to Your IPRoyal Account:
- Navigate to the Royal Residential proxies configuration page in your IPRoyal dashboard.
-
Add Your IP Address:
- Scroll down to find authentication options and click the "Whitelist" button.
- On the IP whitelist configuration page, click "Add".
- Configure your proxies as needed (country, state, proxy type, session type).
- Enter the IP address you want to whitelist.
- Click "Create" to add the IP to your whitelist.
-
Configure Proxy Without Username and Password:
Once your IP is whitelisted, you can set up your proxies without including credentials. In the "Formatted proxy list" section of your dashboard, select your whitelisted IP and copy the IP:PORT information.
Then use it like this:
proxy = "copied_ip:copied_port"
proxies = {
'http': f'http://{proxy}',
'https': f'http://{proxy}'
} -
Send a Request:
You can now send requests through the proxy as usual:
response = requests.get("https://example.com", proxies=proxies)
print(response.text)
This method simplifies your code and is particularly useful when managing multiple users or devices.
Basic Request Using IPRoyal Residential Proxies
Let’s dive into a practical example that demonstrates how to make a basic web request using IPRoyal residential proxies:
Suppose you want to scrape the homepage of a website using IPRoyal residential proxies. Here’s how you would do it:
-
Install Dependencies:
Ensure you have the required packages installed:
pip install requests beautifulsoup4
-
Set Up the Proxy:
Configure the proxy settings as demonstrated earlier in the Authentication section.
-
Send the Request and Parse the HTML:
Use the
requests
library to send a request and theBeautifulSoup
library to parse the HTML content:import requests
from bs4 import BeautifulSoup
# Proxy configuration
proxies = {
'http': f'http://{username}:{password}@{proxy}',
'https': f'http://{username}:{password}@{proxy}'
}
# Send a request through the proxy
response = requests.get("https://example.com", proxies=proxies)
soup = BeautifulSoup(response.content, 'html.parser')
# Print the page title
print(soup.title.text)
In this example, we:
- Set Up Proxies: We configure the
proxies
dictionary with the necessary credentials and proxy address. - Send a Request: We use
requests.get()
to fetch the content of the target website. - Parse the HTML: We utilize
BeautifulSoup
to parse the HTML content and extract information like the page title.
Handling Proxy Errors
When using proxies, you may encounter errors such as connection timeouts or proxy failures. To handle these gracefully:
try:
response = requests.get("https://example.com", proxies=proxies, timeout=10)
response.raise_for_status()
except requests.exceptions.ProxyError:
print("Proxy error occurred. Please check your proxy settings.")
except requests.exceptions.Timeout:
print("The request timed out. Try increasing the timeout or check your internet connection.")
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
This code catches common exceptions, providing informative messages that help diagnose and fix issues with proxy configurations.
Country Geotargeting
Country-level geotargeting allows you to connect to proxy servers in specific countries, enabling you to access location-restricted content and gather localized data as if you were in that country.
This is particularly beneficial for tasks like market research, competitor analysis, and testing localized content or services.
IPRoyal offers extensive country-level geotargeting capabilities. Their network spans numerous countries across the globe, with the ability to select proxies from different locations according to your needs.
Top 10 Countries Supported by IPRoyal
Below is a table showcasing 10 popular countries supported by IPRoyal, along with the number of IPs available in each:
Country | Number of IPs |
---|---|
United States | 1,450,886 |
Germany | 439,883 |
United Kingdom | 421,770 |
France | 418,633 |
Canada | 373,796 |
Brazil | 908,824 |
Spain | 781,766 |
Italy | 393,154 |
Vietnam | 460,712 |
Philippines | 545,729 |
IPRoyal supports proxies from a wide array of countries, ensuring that no matter where you need to connect, there’s likely a proxy available.
Using Country-Specific Proxies
To use country-specific proxies with IPRoyal, you can configure your requests by specifying the country code using the X-Countries
header.
Here’s how you can do this in Python:
import requests
from requests.auth import HTTPProxyAuth
username = 'your_username'
password = 'your_password'
countries = 'country-us,fr,de'
port = "your_proxy_port"
proxy = f"geo.iproyal.com:{port}"
url = 'http://example.com'
proxies = {
'http': f'http://{proxy}',
'https': f'http://{proxy}',
}
auth = HTTPProxyAuth(username, password)
headers = {
'X-Countries': countries
}
response = requests.get(url, proxies=proxies, auth=auth, headers=headers)
print(response.text)
In this implementation:
- We have set up country geo-targeting by specifying the
X-Countries
header. - By including this header with a comma-separated list of country codes, such as
us
,fr
, andde
, we direct the request through the specified countries: United States, France and Germany, respectively.
The X-Countries
header ensures that our proxy server routes the request as if it's originating from the listed countries, allowing us to test how content appears from different locations.
City Geotargeting
City-level geotargeting allows you to access hyper-local content by connecting to proxies in specific cities. This is especially useful for tasks like local SEO analysis, price monitoring, and testing localized advertisements.
IPRoyal provides the ability to target proxies at the city level, giving you granular control over your web scraping and data collection activities.
Top 10 Cities Supported by IPRoyal
Here’s a table of popular cities supported by IPRoyal, along with their corresponding country:
City | Country |
---|---|
New York | United States |
Berlin | Germany |
London | United Kingdom |
Paris | France |
Toronto | Canada |
São Paulo | Brazil |
Madrid | Spain |
Milan | Italy |
Ho Chi Minh | Vietnam |
Manila | Philippines |
Using City-Specific Proxies
To target a specific city, you need to provide both the country and city names in the X-Country
and X-City
headers.
Here’s an example using Python:
import requests
from requests.auth import HTTPProxyAuth
username = 'your_username'
password = 'your_password'
country = 'country-us'
city = 'city-newyork'
port = 'your_proxy_port'
proxy = f'geo.iproyal.com:{port}'
url = 'http://example.com'
proxies = {
'http': f'http://{proxy}',
'https': f'http://{proxy}',
}
auth = HTTPProxyAuth(username, password)
headers = {
'X-Country': country,
'X-City': city
}
response = requests.get(url, proxies=proxies, auth=auth, headers=headers)
print(response.text)
In this script:
- We have implemented city geo-targeting by using both
X-Country
andX-City
headers. - The
X-Country
header specifies the country code (e.g., 'us' for the United States), while theX-City
header designates the city (e.g., 'newyork').
By including these headers, our request is routed through a proxy that simulates access from the specified city and country.
Error Codes
When using a proxy, you might encounter various HTTP error codes that indicate issues with your connection.
These error codes provide insights into what might be going wrong, whether it’s with the proxy server, your network, or the target website.
Below are some common HTTP proxy error codes and their meanings:
Error Name | Error Description | Solution |
---|---|---|
HTTP 400 Bad Request | This error happens when the client’s request to the proxy server is malformed or invalid. | Ensure the request is properly formatted with all required HTTP headers included. Double-check the syntax and structure. |
HTTP 403 Forbidden | The proxy server understands the request but refuses to fulfill it due to insufficient permissions or authentication. | Verify that you have the correct permissions and provide the necessary authentication credentials. |
HTTP 404 Not Found | The requested resource cannot be found on the proxy server or upstream server. | Ensure the URL is correct and that the resource hasn’t been moved or deleted. |
HTTP 407 Proxy Authentication Required | The proxy server requires authentication before allowing access to the requested resource. | Provide valid authentication credentials as required by the proxy server. |
HTTP 408 Request Timeout | The proxy server times out while waiting for the client’s request. | Check your network connection for stability and consider resending the request. |
HTTP 502 Bad Gateway | The proxy server received an invalid response from an upstream server. | Check the upstream server for any issues and ensure it is functioning correctly. |
HTTP 503 Service Unavailable | The proxy server or upstream server is temporarily unable to handle the request due to maintenance or overload. | Try again later or check if the server is undergoing maintenance. |
HTTP 504 Gateway Timeout | The proxy server does not receive a timely response from the upstream server, resulting in a timeout. | Investigate whether the upstream server is slow or experiencing network issues. |
HTTP 505 HTTP Version Not Supported | The proxy server does not support the HTTP version used in the client’s request. | Ensure that the HTTP version in your request is compatible with the proxy server and adjust it if necessary. |
Understanding these error codes can help you diagnose and troubleshoot issues with your proxy connection, ensuring smoother browsing and data retrieval.
KYC (Know-Your-Customer) Verification
IPRoyal implements a strict KYC (Know-Your-Customer) policy for both new and existing customers to ensure the security and credibility of their proxy network.
- IPRoyal requires KYC validation prior to granting full access to their proxy services.
- New accounts have limited access to services until KYC verification is completed.
The KYC process involves:
- Identity Verification:
- Customers must provide an image of a government-issued ID (passport, driver's license, or national ID).
- A selfie is required to confirm the identity matches the provided document.
- Document Authentication:
- IPRoyal uses iDenfy, a third-party platform, to verify documents and conduct identity checks.
- Advanced algorithms detect anomalies or inconsistencies in the provided documents.
- Facial recognition technology compares the photo on the ID with the provided selfie.
- Information may be cross-referenced with government databases for accuracy.
- Ongoing Monitoring:
- IPRoyal continuously monitors their network to prevent abuse.
- They reserve the right to refuse service based on risk level assessment.
Use cases that are not allowed:
- IPRoyal states they will refuse service for suspicious, inappropriate, or unethical use cases.
The KYC process is designed to be quick and easy, typically taking just a few minutes to complete via phone or desktop. This verification allows customers to access all of IPRoyal's services, including API access and the ability to purchase as little as one proxy per order.
IPRoyal emphasizes that their KYC process complies with high data safety standards, including GDPR, ISO 27001, eIDAS, and ETSI, ensuring the privacy and security of customer information.
Unlike some providers (such as Bright Data) that require a call before allowing proxy use, IPRoyal's process is online and automated through their third-party verification platform (iDenfy).
Implementing IPRoyal Residential Proxies in Web Scraping
Let's explore how to use IPRoyal residential proxies with various libraries. We'll use the same example for each, targeting the US and using rotating proxies.
Python Requests
Here's how we can integrate IPRoyal proxies with Python Requests:
import requests
from requests.auth import HTTPProxyAuth
def get_with_proxy(url):
username = 'your_username'
password = 'your_password'
port = 'your_proxy_port'
proxy = f'geo.iproyal.com:{port}'
country = 'country-us'
proxies = {
'http': f'http://{proxy}',
'https': f'http://{proxy}'
}
auth = HTTPProxyAuth(username, password)
headers = {'X-Country': country}
response = requests.get(url, proxies=proxies, auth=auth, headers=headers)
return response.text
print(get_with_proxy('http://example.com'))
In this example, we've set up a function that sends a GET request through an IPRoyal proxy. We're using the 'X-Country' header to target US proxies, and HTTPProxyAuth
for authentication.
Python Selenium
Now, let's set up IPRoyal proxies with Selenium for browser automation:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from proxy_auth_extension import create_proxy_auth_extension
def get_page_title_with_proxy():
username = 'your_username'
password = 'your_password'
port = 'your_proxy_port'
host = 'geo.iproyal.com'
country = 'us'
proxy_auth_extension = create_proxy_auth_extension(host, int(port), username, password)
chrome_options = webdriver.ChromeOptions()
chrome_options.add_extension(proxy_auth_extension)
chrome_options.add_argument(f'--header=X-Country:{country}')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Chrome(options=chrome_options)
driver.get('https://example.com')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, "title")))
# Scrape the title
title = driver.title
return f"Scraped Title: {title}"
page_title = get_page_title_with_proxy()
print(page_title)
Here, we've configured Chrome options to use the IPRoyal proxy. We're adding the proxy server, authentication, and the 'X-Country' header to target US proxies. Get the proxy extension proxy_auth_extension
here.
Python Scrapy
To integrate IPRoyal proxies with Scrapy for web scraping, follow these steps:
Open your Scrapy project's settings.py
file and add the following configuration:
# settings.py
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 350,
}
IPROYAL_PROXY = 'http://your_username:your_password@geo.iproyal.com:your_port'
IPROYAL_COUNTRY = 'us'
Replace your_username
, your_password
, and your_port
with your actual IPRoyal credentials and port number.
Next, open middlewares.py
in your Scrapy project and add the following code:
# middlewares.py
# ... (existing imports and code)
class IPRoyalCountryMiddleware:
@classmethod
def from_crawler(cls, crawler):
return cls(crawler.settings)
def __init__(self, settings):
self.country = settings.get('IPROYAL_COUNTRY', 'us')
def process_request(self, request, spider):
request.headers['X-Country'] = self.country
# ... (rest of the existing code)
Update your settings.py
file to include the new middleware:
# settings.py
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 350,
'your_project.middlewares.IPRoyalCountryMiddleware': 351,
}
Replace your_project
with the actual name of your Scrapy project.
Now, create a spider to fetch the target URL and extract the title. Create a new file named example_spider.py
and add the following code:
# example_spider.py
import scrapy
class ExampleSpider(scrapy.Spider):
name = "example"
start_urls = ["https://example.com"]
def parse(self, response):
title = response.css('title::text').get()
self.logger.info(f'Title: {title}')
yield {'title': title}
To run your spider, use the following command in your terminal:
scrapy crawl example
In this example, we've configured the proxy settings in Scrapy's settings file and created a custom middleware for country targeting. We then created a spider that fetches the target URL and extracts the title text using CSS selectors.
NodeJs Puppeteer
Let's set up IPRoyal proxies with Puppeteer for browser automation:
import puppeteer from 'puppeteer';
const scrapeWithProxy = async () => {
const username = 'your_username';
const password = 'your_password';
const host = 'geo.iproyal.com';
const port = 'your_proxy_port';
const country = 'us';
const browser = await puppeteer.launch({
args: [
`--proxy-server=${host}:${port}`
],
});
const page = await browser.newPage();
await page.setExtraHTTPHeaders({
'X-Country': country,
});
await page.authenticate({username, password});
await page.goto('https://example.com');
const content = await page.content();
console.log(content);
await browser.close();
}
scrapeWithProxy();
In this example, we've launched a browser with the IPRoyal proxy settings. We're setting the proxy server and authentication in the launch arguments, and adding the 'X-Country' header for US targeting.
NodeJs Playwright
Finally, let's set up IPRoyal proxies with Playwright:
import { chromium } from 'playwright';
const scrapeWithProxy = async () => {
const username = 'your_username';
const password = 'your_password';
const host = 'geo.iproyal.com';
const port = 'your_proxy_port';
const country = 'us';
const browser = await chromium.launch({
proxy: {
server: `http://${host}:${port}`,
username: username,
password: password,
},
});
const context = await browser.newContext();
await context.setExtraHTTPHeaders({
'X-Country': country,
});
const page = await context.newPage();
await page.goto('https://example.com');
const content = await page.content();
console.log(content);
await browser.close();
}
scrapeWithProxy();
In this Playwright setup, we've configured the browser to use the IPRoyal proxy. We're setting the proxy server and authentication in the launch options, and adding the 'X-Country' header for US targeting in the context.
Each of these examples demonstrates how we can integrate IPRoyal residential proxies with different web scraping libraries, consistently using US geotargeting and rotating proxies.
Case Study: Scrape Amazon Prices
In this case study, we demonstrate how to scrape price information for a product from different regional Amazon websites using Puppeteer.
We’ll configure the script to use IPRoyal proxies to access content from various country-specific Amazon sites.
Code Example
import puppeteer from 'puppeteer'; // npm i puppeteer
import dotenv from 'dotenv'; // npm i dotenv
import userAgent from 'random-useragent'; // npm i random-useragent
dotenv.config();
const delay = (ms) => new Promise(resolve => setTimeout(resolve, ms));
const scrapeAmazonPrice = async (country_code, url) => {
const username = process.env.IPROYAL_USERNAME;
const password = process.env.IPROYAL_PASSWORD;
const port = process.env.IPROYAL_PORT;
const host = 'geo.iproyal.com';
let browser;
let page;
try {
browser = await puppeteer.launch({
args: [
`--proxy-server=${host}:${port}`
],
headless: false
});
const context = await browser.createBrowserContext();
page = await context.newPage();
// Set proxy authentication
await page.authenticate({
username: username,
password: password
});
// Set user agent and headers
await page.setUserAgent(userAgent.getRandom());
await page.setExtraHTTPHeaders({
'X-Country': country_code,
});
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 90000 });
// Handle the cookie consent pop-up
try {
console.log('Checking for cookie consent pop-up...');
const acceptButton = await page.waitForSelector('#sp-cc-accept', { visible: true, timeout: 5000 });
if (acceptButton) {
console.log('Cookie consent pop-up found. Clicking "Accept" button...');
await acceptButton.click();
await page.waitForNetworkIdle({ timeout: 5000 });
console.log('Cookie consent handled successfully.');
}
} catch (error) {
console.log('No cookie consent pop-up found or unable to click. Proceeding with scraping.');
}
// Extract the price
const priceSelector = 'span.a-price-whole, span.a-price, span.a-color-price';
await page.waitForSelector(priceSelector, { visible: true, timeout: 10000 });
const price = await page.evaluate(selector => {
const element = document.querySelector(selector);
return element ? element.innerText.trim() : 'Price not found';
}, priceSelector);
console.log(`Scraped data - Price: ${price}`);
return price;
} catch (error) {
console.error('An error occurred:', error);
if (page) {
await page.screenshot({ path: `screenshots/${country_code}_error_page.png`, fullPage: true });
}
return { title: 'Error', price: 'Error', error: error.message };
} finally {
if (browser) {
await browser.close();
}
}
};
const scrapeMultipleCountries = async (url, countries, delayBetweenRequests=30000) => {
const results = {};
for (const country of countries) {
console.log(`Scraping for country: ${country}`);
results[country] = await scrapeAmazonPrice(country, url);
if (countries.indexOf(country) < countries.length - 1) {
console.log(`Waiting ${delayBetweenRequests / 1000} seconds before next request...`);
await delay(delayBetweenRequests);
}
}
return results;
};
const main = async () => {
const url = 'https://www.amazon.es/Harry-Potter-Crochet-Kits/dp/1684128870';
const countries = ['es', 'pt'];
try {
const results = await scrapeMultipleCountries(url, countries);
console.log('Final results:', results);
} catch (error) {
console.error('An error occurred in the main execution:', error);
}
};
// Run the main function
main().catch(console.error);
Explanation
- Environment Setup: We load environment variables using
dotenv
to securely manage sensitive information like the IPRoyal username, password, and proxy port. Credentials are stored in a.env
file. - Scraping Logic: The
scrapeAmazonPrice
function uses Puppeteer to navigate Amazon websites through IPRoyal proxies. We authenticate the proxy using credentials and add a randomized user agent for each request. - Handling Pop-ups: The script checks for cookie consent pop-ups and clicks "Accept" if found, ensuring uninterrupted scraping.
- Content Extraction: Once the page loads, the script looks for the price using a CSS selector and retrieves it. If the price isn’t found, an error message is returned.
Initial Attempt: Scrape Harry Potter Crochet Kit from Amazon Spain
We start by scraping price information for the "Harry Potter Crochet Kit" from the Spanish Amazon website:
scrapeAmazonPrice('es', 'https://www.amazon.es/Harry-Potter-Crochet-Kits/dp/1684128870');
Second Attempt: Scrape Harry Potter Crochet Kit from Amazon Portugal
Next, we scrape the same product from the Portuguese Amazon site:
scrapeAmazonPrice('pt', 'https://www.amazon.es/Harry-Potter-Crochet-Kits/dp/1684128870');
Analysis: Price Differences
After scraping, we compare the results:
- Spain Price: 21,20 €
- Portugal Price: 14,75 €
Difference Analysis: This demonstrates how product prices can vary significantly between regional Amazon sites, illustrating different pricing strategies for the same product across countries.
The use of proxies allowed us to gather these region-specific data points, highlighting price discrepancies in different markets.
Alternative: ScrapeOps Residential Proxy Aggregator
If you're looking for a more versatile and cost-effective solution, consider using the ScrapeOps Residential Proxy Aggregator. This service allows you to tap into multiple proxy providers through a single proxy port, offering a range of benefits compared to traditional proxy providers.
Top 3 Reasons to Choose ScrapeOps Residential Proxy Aggregator
- Competitive Pricing: ScrapeOps offers lower pricing, allowing you to maximize your budget while maintaining high quality.
- Flexible Plans: With ScrapeOps, you have access to a wider variety of plans, including smaller, more affordable options tailored to your needs. The best part? You can start using the proxies with a free trial account.
- Enhanced Reliability: By leveraging multiple proxy providers through a single port, ScrapeOps offers greater reliability. If one provider faces issues, your requests can seamlessly switch to another, ensuring continuous access.
Code Example Using Python Requests
Below is a code example demonstrating how you can scrape the same product from our previous case study using the ScrapeOps Residential Proxy Aggregator with Python requests
and BeautifulSoup
libraries:
import requests
from bs4 import BeautifulSoup
from dotenv import load_dotenv
import os
load_dotenv()
username = 'scrapeops'
api_key = os.getenv("SCRAPEOPS_API_KEY")
proxy = 'residential-proxy.scrapeops.io'
port = 8181
proxies = {
"http": f"http://{username}:{api_key}@{proxy}:{port}"
}
# Scrape "TECH TEE" price
product_url = 'https://www.zalando.be/heren/?q=tech+tee'
try:
response = requests.get(product_url, proxies=proxies)
# Check if the request was successful
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')
# Example: Locate the product and extract price information
product_name = soup.find('h3', text='TECH TEE - T-shirt basic - black')
if product_name:
price = product_name.find_next('span', class_='sDq_FX lystZ1 FxZV-M HlZ_Tf').get_text(strip=True)
print(f"Price for 'TECH TEE - T-shirt basic - black': {price}")
else:
print("Out of stock!")
else:
print(f"Failed to retrieve content. Status code: {response.status_code}")
except Exception as e:
print(f"An error occurred: {e}")
Explanation:
- Environment Setup: We start by loading environment variables using
dotenv
to securely manage sensitive information like the ScrapeOps API key. This setup allows you to keep credentials out of the codebase and configure them separately in a.env
file. - Proxy Configuration: We configure the ScrapeOps residential proxy by setting up the
proxies
dictionary. This includes the proxy server address, port, and necessary authentication details, allowing you to route our requests through the residential proxy. - Scraping Process: After setting up the proxy, we send a GET request to the specified Zalando URL, targeting a search page for "tech tee" products. We then check if the request was successful by verifying the status code.
- Content Parsing and Extraction: Once the page content is retrieved, we parse it using BeautifulSoup. We search for the product with the exact name "TECH TEE - T-shirt basic - black" using the
find
method. If the product is found, we locate its associated price by finding the nextspan
element that matches the specific class containing the price information. - Output: Finally, we print the price of the "TECH TEE" product to the console. If the product is not available or not found on the page, we print "Out of stock!" to inform you that the item is currently unavailable.
Ready to see how ScrapeOps can elevate your scraping projects?
Sign up for a free trial today and get 100MB of free bandwidth to test it out.
Ethical Considerations and Legal Guidelines
When using residential proxies for web scraping, it's crucial to consider the ethical implications and legal responsibilities. IPRoyal, like many proxy providers, emphasizes the importance of ethical practices in their service.
- Compliance with Terms of Service: Adhere to IPRoyal's terms of service, which prohibit illegal activities and abuse of their proxy network.
- Data Protection: Handle any collected data in compliance with relevant data protection laws, such as GDPR. This includes proper storage, processing, and deletion of personal data if collected.
- Handle Personal Data Carefully: If your scraping activities involve collecting personal data, ensure you have a legal basis for doing so and that you're complying with relevant data protection regulations like GDPR.
- Respect Robots.txt: Always check and adhere to the target website's robots.txt file, which specifies which parts of the site can be crawled.
- Use User-Agents Responsibly: While it's common practice to rotate user-agents, ensure they are recent and commonly used. Misrepresenting your scraper as a different browser could be seen as unethical.
By adhering to these ethical considerations and legal guidelines, you can ensure that your use of IPRoyal's residential proxies for web scraping is both effective and responsible, maintaining a balance between your data needs and the health of the web ecosystem.
Conclusion
Throughout this guide, we've explored the powerful capabilities of IPRoyal Residential Proxies for web scraping and data collection.
We demonstrated integration with popular tools like Python Requests, Selenium, Scrapy, and more, while a case study on Amazon showcased the value of residential proxies for uncovering regional pricing differences.
We strongly encourage you to implement residential proxies in your web scraping projects to avoid IP bans, bypass rate limiting, access geo-restricted content, improve data accuracy and enhance web scraping accuarcy.
By leveraging IPRoyal Residential Proxies and following the best practices outlined in this guide, you're well-equipped to take your web scraping and data collection projects to the next level.
More Web Scraping Guides
At ScrapeOps, we've got tons of learning resources. It doesn't matter if you're brand new to scraping or a hardened developer, we have something for you.
If you would like to learn more about Web Scraping with Python, then be sure to check out Python Web Scraping Playbook or check out one of our more in-depth guides: