Skip to main content

IPRoyal Residential Proxies: Web Scraping Guide

IPRoyal is a top-tier provider of reliable proxy services, offering over 34 million proxies. They specialize in residential proxies that ensure anonymity and high-speed connections for various web scraping needs.

Their rotating and static residential proxies provide unmatched flexibility and control, helping businesses gather data efficiently without the risk of being blocked. IPRoyal also offers data center and mobile proxies, ensuring seamless integration with over 650 tools.

This guide will help you set up and integrate IPRoyal residential proxies into your web scraping scripts, highlighting their benefits and practical applications.


TLDR: How to Integrate IPRoyal Residential Proxy?

Here are straightforward steps to get you up and running in seconds:

  1. Install the Python requests package:

    pip install requests
  2. Set Up IPRoyal Residential Proxy:

    import requests

    username = "your_proxy_username"
    password = "your_proxy_password"
    port = "your_proxy_port"
    proxy = f"geo.iproyal.com:{port}"

    proxies = {
    "http": f"http://{username}:{password}@{proxy}",
    "https": f"http://{username}:{password}@{proxy}"
    }

    response = requests.get("https://example.com", proxies=proxies)
    print(response.text)

In this script:

  1. Set Credentials and Proxy: Provide your IPRoyal credentials (username and password) along with the proxy address (geo.iproyal.com:<port>).
  2. Configure Proxies: Create a proxies dictionary to set up HTTP and HTTPS proxies using your credentials.
  3. Send a Request: Use the requests.get method to send a request to the target website (https://example.com) via the configured proxies.
  4. Print the Response: Finally, print the response text received from the target website.

With that, you should be able to start using IPRoyal's residential proxies.

What if you want to find out more? Let's dive right into the next sections!


Understanding Residential Proxies

What Are Residential Proxies?

Residential proxies function as intermediaries between you and the target websites, making your web traffic appear as if it’s coming from a legitimate residential address.

When you use a residential proxy, the target website sees your requests as originating from a real user, increasing the chances of bypassing detection and avoiding IP bans.

Types of Residential Proxies

There are two main types of residential proxies: rotating and static.

  • Rotating Residential Proxies: These proxies automatically change the IP address with each request or at regular intervals, providing high anonymity but potentially slower speeds.

  • Static Residential Proxies: These proxies maintain the same IP address for the duration of a session, offering consistent performance but with a higher risk of detection.

Here's a comparison of these two proxy types:

FeatureRotating Residential ProxiesStatic Residential Proxies
IP AddressChanges with each requestRemains the same for an extended period
AnonymityHighModerate
SpeedSlower due to rotation processGenerally faster
ManagementComplexSimpler
Risk of DetectionLowerHigher

Residential vs. Data Center Proxies

Residential proxies are distinct from data center proxies, primarily in their origin and how they are perceived by websites.

Residential proxies use IP addresses assigned by ISPs to real households, making them appear more authentic to websites. In contrast, data center proxies are generated by data centers and are not tied to a physical location, making them easier to detect and block.

Here's a comparison table for residential and data center proxies to better understand the differences:

FeatureResidential ProxiesData Center Proxies
IP SourceReal residential addresses from ISPsData centers and cloud service providers
AnonymityHighLower
SpeedVariable, often slowerGenerally faster
CostHigherLower
Detection RiskLowerHigher
Effectiveness for Geo-AccessHighLower

When Are Residential Proxies Useful?

Residential proxies are highly beneficial for various tasks:

  • Web Scraping and Data Collection: Ensures accurate data without getting blocked by anti-scraping mechanisms.
  • SEO and SERP Analysis: Gathers precise search engine results from different locations.
  • Social Media Monitoring: Tracks trends and activities on social media platforms without being flagged.
  • Ad Verification: Checks the correct display of ads in different regions.
  • Geo-Restricted Content Access: Allows access to content limited to specific geographical areas.

Why Choose IPRoyal Residential Proxies?

IPRoyal’s residential proxies stand out with competitive pricing and a vast network of over 32 million IPs across 195 countries. This extensive pool supports efficient web scraping, SEO research, and data aggregation.

Here are top 3 reasons to choose IPRoyal residential proxies:

  • Exceptional Coverage: Access a diverse range of IP addresses worldwide, reducing detection risks and enhancing data collection accuracy.
  • Precise Targeting: Select proxies from any country, state, or city effortlessly, ensuring your data is as specific as needed without additional costs.
  • Flexible and Reliable: Enjoy pay-as-you-go pricing with no contracts or expiration on purchased traffic, combined with 24/7 support and advanced technical features for seamless integration.

Next, it would help to find out IPRoyal residential proxies' pricing plans before learning how to apply them in various ways.


IPRoyal Residential Proxy Pricing

IPRoyal offers flexible and competitive pricing for its residential proxies, catering to various usage needs and budgets. Their pricing structure primarily revolves around bandwidth used, rather than charging per individual IP address or concurrency.

They provide a Pay-As-You-Go plan, with costs varying depending on the amount of bandwidth purchased. Here's a detailed look at their pricing:

Pricing Table

Plan NamePlan Size (GB)Cost per GBPrice
1GB Plan1$7.00$7.00
2GB Plan2$5.95$11.90
10GB Plan10$5.25$52.50
50GB Plan50$4.90$245.00
100GB Plan100$4.55$455.00
250GB Plan250$4.20$1,050.00
500GB Plan500$3.50$1,750.00
1000GB Plan1000$3.15$3,150.00
3000GB Plan3000$2.80$8,400.00
5000GB Plan5000$2.45$12,250.00
10000+GB Plan10000+Reach Out for a Special Deal!Talk to Sales

Pricing Comparison

IPRoyal's pricing is generally competitive compared to other residential proxy providers:

  • Cheap Providers: Providers offering plans around $2-3 per GB are considered cheaper.
  • Expensive Providers: Those with smaller plans priced in the $6-8 per GB range are more expensive.

For a comprehensive comparison of different residential proxy providers, including IPRoyal, you can visit our Proxy Comparison page. This resource will help you evaluate various options and find the best fit for your needs.


Setting Up IPRoyal Residential Proxies

Creating an IPRoyal Account

To get started, visit the registration page. You can register using your LinkedIn, Google or email.

IPRoyal Signup Page

Say you go with Google signup. Click on the "Login with Google" button, enter your login credentials then add phone number if prompted to.

After that, you will be redirected to your dashboard, from where you can buy Royal Residential proxies.

Personal Dashboard

Selecting and Purchasing Residential Proxies

After logging in, navigate to the residential proxy pricing page.

Select a proxy plan that fits your needs. Scroll down the page and click on the "Continue" button. For example, I have selected the $7/1G package.

Buy Residential Proxies

Follow the prompts to complete your purchase. Once your payment is processed, you will be redirected to residential proxies page and you can start setting up your residential proxies.

Proxy Purchased

Scroll down the page to see or update your proxy credentials e.g username, password, port.


Authentication

When using IPRoyal proxies, you have two primary methods to authenticate your requests:

  1. username and password
  2. IP whitelisting

Method 1: Username & Password Authentication

This method involves adding your residential proxy’s username and password into the proxy configuration.

Here’s how you can set it up:

  1. Install Required Packages:

    pip install requests python-dotenv
  2. Set Up Environment Variables:

    Create a .env file in your project directory to securely store your IPRoyal credentials:

    IPROYAL_USERNAME=your_username
    IPROYAL_PASSWORD=your_password
    IPROYAL_PORT=your_proxy_port

Loading and Configuring Proxy Settings

To securely load your credentials from the .env file and configure your proxy settings, follow these steps:

  1. Load Environment Variables:

    Use the python-dotenv package to load your credentials:

    from dotenv import load_dotenv
    import os

    load_dotenv()

    username = os.getenv("IPROYAL_USERNAME")
    password = os.getenv("IPROYAL_PASSWORD")
    port = os.getenv("IPROYAL_PORT")
    proxy = f"geo.iproyal.com:{port}"
  2. Set Up Proxies Dictionary:

    Create a dictionary that contains your proxy settings:

    proxies = {
    "http": f"http://{username}:{password}@{proxy}",
    "https": f"http://{username}:{password}@{proxy}"
    }
  3. Send a Request Through the Proxy:

    Use the requests library to send a request via the configured proxy:

    import requests

    response = requests.get("https://example.com", proxies=proxies)
    print(response.text)

This setup ensures that your requests are authenticated and routed through IPRoyal's residential proxies.

Method 2: IP Whitelisting Authentication

An alternative method for authenticating with IPRoyal residential proxies is IP whitelisting. This method allows you to authorize specific IP addresses to use the proxies without needing to include a username and password in your requests.

Here's how you can set it up:

  1. Log in to Your IPRoyal Account:

    • Navigate to the Royal Residential proxies configuration page in your IPRoyal dashboard.
  2. Add Your IP Address:

    • Scroll down to find authentication options and click the "Whitelist" button.
    • On the IP whitelist configuration page, click "Add". Whitelist Proxy
    • Configure your proxies as needed (country, state, proxy type, session type).
    • Enter the IP address you want to whitelist.
    • Click "Create" to add the IP to your whitelist.
  3. Configure Proxy Without Username and Password:

    Once your IP is whitelisted, you can set up your proxies without including credentials. In the "Formatted proxy list" section of your dashboard, select your whitelisted IP and copy the IP:PORT information.

    Then use it like this:

    proxy = "copied_ip:copied_port"

    proxies = {
    'http': f'http://{proxy}',
    'https': f'http://{proxy}'
    }
  4. Send a Request:

    You can now send requests through the proxy as usual:

    response = requests.get("https://example.com", proxies=proxies)
    print(response.text)

This method simplifies your code and is particularly useful when managing multiple users or devices.


Basic Request Using IPRoyal Residential Proxies

Let’s dive into a practical example that demonstrates how to make a basic web request using IPRoyal residential proxies:

Suppose you want to scrape the homepage of a website using IPRoyal residential proxies. Here’s how you would do it:

  1. Install Dependencies:

    Ensure you have the required packages installed:

    pip install requests beautifulsoup4
  2. Set Up the Proxy:

    Configure the proxy settings as demonstrated earlier in the Authentication section.

  3. Send the Request and Parse the HTML:

    Use the requests library to send a request and the BeautifulSoup library to parse the HTML content:

    import requests
    from bs4 import BeautifulSoup

    # Proxy configuration
    proxies = {
    'http': f'http://{username}:{password}@{proxy}',
    'https': f'http://{username}:{password}@{proxy}'
    }

    # Send a request through the proxy
    response = requests.get("https://example.com", proxies=proxies)
    soup = BeautifulSoup(response.content, 'html.parser')

    # Print the page title
    print(soup.title.text)

In this example, we:

  1. Set Up Proxies: We configure the proxies dictionary with the necessary credentials and proxy address.
  2. Send a Request: We use requests.get() to fetch the content of the target website.
  3. Parse the HTML: We utilize BeautifulSoup to parse the HTML content and extract information like the page title.

Handling Proxy Errors

When using proxies, you may encounter errors such as connection timeouts or proxy failures. To handle these gracefully:

try:
response = requests.get("https://example.com", proxies=proxies, timeout=10)
response.raise_for_status()
except requests.exceptions.ProxyError:
print("Proxy error occurred. Please check your proxy settings.")
except requests.exceptions.Timeout:
print("The request timed out. Try increasing the timeout or check your internet connection.")
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")

This code catches common exceptions, providing informative messages that help diagnose and fix issues with proxy configurations.


Country Geotargeting

Country-level geotargeting allows you to connect to proxy servers in specific countries, enabling you to access location-restricted content and gather localized data as if you were in that country.

This is particularly beneficial for tasks like market research, competitor analysis, and testing localized content or services.

IPRoyal offers extensive country-level geotargeting capabilities. Their network spans numerous countries across the globe, with the ability to select proxies from different locations according to your needs.

Top 10 Countries Supported by IPRoyal

Below is a table showcasing 10 popular countries supported by IPRoyal, along with the number of IPs available in each:

CountryNumber of IPs
United States1,450,886
Germany439,883
United Kingdom421,770
France418,633
Canada373,796
Brazil908,824
Spain781,766
Italy393,154
Vietnam460,712
Philippines545,729

IPRoyal supports proxies from a wide array of countries, ensuring that no matter where you need to connect, there’s likely a proxy available.

Using Country-Specific Proxies

To use country-specific proxies with IPRoyal, you can configure your requests by specifying the country code using the X-Countries header.

Here’s how you can do this in Python:

import requests
from requests.auth import HTTPProxyAuth

username = 'your_username'
password = 'your_password'
countries = 'country-us,fr,de'
port = "your_proxy_port"
proxy = f"geo.iproyal.com:{port}"
url = 'http://example.com'

proxies = {
'http': f'http://{proxy}',
'https': f'http://{proxy}',
}

auth = HTTPProxyAuth(username, password)
headers = {
'X-Countries': countries
}

response = requests.get(url, proxies=proxies, auth=auth, headers=headers)

print(response.text)

In this implementation:

  • We have set up country geo-targeting by specifying the X-Countries header.
  • By including this header with a comma-separated list of country codes, such as us, fr, and de, we direct the request through the specified countries: United States, France and Germany, respectively.

The X-Countries header ensures that our proxy server routes the request as if it's originating from the listed countries, allowing us to test how content appears from different locations.


City Geotargeting

City-level geotargeting allows you to access hyper-local content by connecting to proxies in specific cities. This is especially useful for tasks like local SEO analysis, price monitoring, and testing localized advertisements.

IPRoyal provides the ability to target proxies at the city level, giving you granular control over your web scraping and data collection activities.

Top 10 Cities Supported by IPRoyal

Here’s a table of popular cities supported by IPRoyal, along with their corresponding country:

CityCountry
New YorkUnited States
BerlinGermany
LondonUnited Kingdom
ParisFrance
TorontoCanada
São PauloBrazil
MadridSpain
MilanItaly
Ho Chi MinhVietnam
ManilaPhilippines

Using City-Specific Proxies

To target a specific city, you need to provide both the country and city names in the X-Country and X-City headers.

Here’s an example using Python:

import requests
from requests.auth import HTTPProxyAuth

username = 'your_username'
password = 'your_password'
country = 'country-us'
city = 'city-newyork'
port = 'your_proxy_port'
proxy = f'geo.iproyal.com:{port}'
url = 'http://example.com'

proxies = {
'http': f'http://{proxy}',
'https': f'http://{proxy}',
}

auth = HTTPProxyAuth(username, password)
headers = {
'X-Country': country,
'X-City': city
}

response = requests.get(url, proxies=proxies, auth=auth, headers=headers)

print(response.text)

In this script:

  • We have implemented city geo-targeting by using both X-Country and X-City headers.
  • The X-Country header specifies the country code (e.g., 'us' for the United States), while the X-City header designates the city (e.g., 'newyork').

By including these headers, our request is routed through a proxy that simulates access from the specified city and country.


Error Codes

When using a proxy, you might encounter various HTTP error codes that indicate issues with your connection.

These error codes provide insights into what might be going wrong, whether it’s with the proxy server, your network, or the target website.

Below are some common HTTP proxy error codes and their meanings:

Error NameError DescriptionSolution
HTTP 400 Bad RequestThis error happens when the client’s request to the proxy server is malformed or invalid.Ensure the request is properly formatted with all required HTTP headers included. Double-check the syntax and structure.
HTTP 403 ForbiddenThe proxy server understands the request but refuses to fulfill it due to insufficient permissions or authentication.Verify that you have the correct permissions and provide the necessary authentication credentials.
HTTP 404 Not FoundThe requested resource cannot be found on the proxy server or upstream server.Ensure the URL is correct and that the resource hasn’t been moved or deleted.
HTTP 407 Proxy Authentication RequiredThe proxy server requires authentication before allowing access to the requested resource.Provide valid authentication credentials as required by the proxy server.
HTTP 408 Request TimeoutThe proxy server times out while waiting for the client’s request.Check your network connection for stability and consider resending the request.
HTTP 502 Bad GatewayThe proxy server received an invalid response from an upstream server.Check the upstream server for any issues and ensure it is functioning correctly.
HTTP 503 Service UnavailableThe proxy server or upstream server is temporarily unable to handle the request due to maintenance or overload.Try again later or check if the server is undergoing maintenance.
HTTP 504 Gateway TimeoutThe proxy server does not receive a timely response from the upstream server, resulting in a timeout.Investigate whether the upstream server is slow or experiencing network issues.
HTTP 505 HTTP Version Not SupportedThe proxy server does not support the HTTP version used in the client’s request.Ensure that the HTTP version in your request is compatible with the proxy server and adjust it if necessary.

Understanding these error codes can help you diagnose and troubleshoot issues with your proxy connection, ensuring smoother browsing and data retrieval.


KYC (Know-Your-Customer) Verification

IPRoyal implements a strict KYC (Know-Your-Customer) policy for both new and existing customers to ensure the security and credibility of their proxy network.

  • IPRoyal requires KYC validation prior to granting full access to their proxy services.
  • New accounts have limited access to services until KYC verification is completed.

The KYC process involves:

  1. Identity Verification:
    • Customers must provide an image of a government-issued ID (passport, driver's license, or national ID).
    • A selfie is required to confirm the identity matches the provided document.
  2. Document Authentication:
    • IPRoyal uses iDenfy, a third-party platform, to verify documents and conduct identity checks.
    • Advanced algorithms detect anomalies or inconsistencies in the provided documents.
    • Facial recognition technology compares the photo on the ID with the provided selfie.
    • Information may be cross-referenced with government databases for accuracy.
  3. Ongoing Monitoring:
    • IPRoyal continuously monitors their network to prevent abuse.
    • They reserve the right to refuse service based on risk level assessment.

Use cases that are not allowed:

  • IPRoyal states they will refuse service for suspicious, inappropriate, or unethical use cases.

The KYC process is designed to be quick and easy, typically taking just a few minutes to complete via phone or desktop. This verification allows customers to access all of IPRoyal's services, including API access and the ability to purchase as little as one proxy per order.

IPRoyal emphasizes that their KYC process complies with high data safety standards, including GDPR, ISO 27001, eIDAS, and ETSI, ensuring the privacy and security of customer information.

Unlike some providers (such as Bright Data) that require a call before allowing proxy use, IPRoyal's process is online and automated through their third-party verification platform (iDenfy).


Implementing IPRoyal Residential Proxies in Web Scraping

Let's explore how to use IPRoyal residential proxies with various libraries. We'll use the same example for each, targeting the US and using rotating proxies.

Python Requests

Here's how we can integrate IPRoyal proxies with Python Requests:

import requests
from requests.auth import HTTPProxyAuth

def get_with_proxy(url):
username = 'your_username'
password = 'your_password'
port = 'your_proxy_port'
proxy = f'geo.iproyal.com:{port}'
country = 'country-us'

proxies = {
'http': f'http://{proxy}',
'https': f'http://{proxy}'
}
auth = HTTPProxyAuth(username, password)
headers = {'X-Country': country}

response = requests.get(url, proxies=proxies, auth=auth, headers=headers)
return response.text

print(get_with_proxy('http://example.com'))

In this example, we've set up a function that sends a GET request through an IPRoyal proxy. We're using the 'X-Country' header to target US proxies, and HTTPProxyAuth for authentication.

Python Selenium

Now, let's set up IPRoyal proxies with Selenium for browser automation:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from proxy_auth_extension import create_proxy_auth_extension

def get_page_title_with_proxy():
username = 'your_username'
password = 'your_password'
port = 'your_proxy_port'
host = 'geo.iproyal.com'
country = 'us'

proxy_auth_extension = create_proxy_auth_extension(host, int(port), username, password)

chrome_options = webdriver.ChromeOptions()
chrome_options.add_extension(proxy_auth_extension)
chrome_options.add_argument(f'--header=X-Country:{country}')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--disable-blink-features=AutomationControlled')

driver = webdriver.Chrome(options=chrome_options)

driver.get('https://example.com')

WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, "title")))

# Scrape the title
title = driver.title
return f"Scraped Title: {title}"

page_title = get_page_title_with_proxy()
print(page_title)

Here, we've configured Chrome options to use the IPRoyal proxy. We're adding the proxy server, authentication, and the 'X-Country' header to target US proxies. Get the proxy extension proxy_auth_extension here.

IPRoyal Selenium

Python Scrapy

To integrate IPRoyal proxies with Scrapy for web scraping, follow these steps:

Open your Scrapy project's settings.py file and add the following configuration:

# settings.py
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 350,
}

IPROYAL_PROXY = 'http://your_username:your_password@geo.iproyal.com:your_port'
IPROYAL_COUNTRY = 'us'

Replace your_username, your_password, and your_port with your actual IPRoyal credentials and port number.

Next, open middlewares.py in your Scrapy project and add the following code:

# middlewares.py

# ... (existing imports and code)

class IPRoyalCountryMiddleware:
@classmethod
def from_crawler(cls, crawler):
return cls(crawler.settings)

def __init__(self, settings):
self.country = settings.get('IPROYAL_COUNTRY', 'us')

def process_request(self, request, spider):
request.headers['X-Country'] = self.country

# ... (rest of the existing code)

Update your settings.py file to include the new middleware:

# settings.py
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 350,
'your_project.middlewares.IPRoyalCountryMiddleware': 351,
}

Replace your_project with the actual name of your Scrapy project.

Now, create a spider to fetch the target URL and extract the title. Create a new file named example_spider.py and add the following code:

# example_spider.py
import scrapy

class ExampleSpider(scrapy.Spider):
name = "example"
start_urls = ["https://example.com"]

def parse(self, response):
title = response.css('title::text').get()
self.logger.info(f'Title: {title}')
yield {'title': title}

To run your spider, use the following command in your terminal:

scrapy crawl example

In this example, we've configured the proxy settings in Scrapy's settings file and created a custom middleware for country targeting. We then created a spider that fetches the target URL and extracts the title text using CSS selectors.

NodeJs Puppeteer

Let's set up IPRoyal proxies with Puppeteer for browser automation:

import puppeteer from 'puppeteer';

const scrapeWithProxy = async () => {
const username = 'your_username';
const password = 'your_password';
const host = 'geo.iproyal.com';
const port = 'your_proxy_port';
const country = 'us';

const browser = await puppeteer.launch({
args: [
`--proxy-server=${host}:${port}`
],
});

const page = await browser.newPage();
await page.setExtraHTTPHeaders({
'X-Country': country,
});

await page.authenticate({username, password});

await page.goto('https://example.com');
const content = await page.content();
console.log(content);

await browser.close();
}

scrapeWithProxy();

In this example, we've launched a browser with the IPRoyal proxy settings. We're setting the proxy server and authentication in the launch arguments, and adding the 'X-Country' header for US targeting.

IPRoyal Puppeteer

NodeJs Playwright

Finally, let's set up IPRoyal proxies with Playwright:

import { chromium } from 'playwright';

const scrapeWithProxy = async () => {
const username = 'your_username';
const password = 'your_password';
const host = 'geo.iproyal.com';
const port = 'your_proxy_port';
const country = 'us';

const browser = await chromium.launch({
proxy: {
server: `http://${host}:${port}`,
username: username,
password: password,
},
});

const context = await browser.newContext();
await context.setExtraHTTPHeaders({
'X-Country': country,
});

const page = await context.newPage();
await page.goto('https://example.com');
const content = await page.content();
console.log(content);

await browser.close();
}

scrapeWithProxy();

In this Playwright setup, we've configured the browser to use the IPRoyal proxy. We're setting the proxy server and authentication in the launch options, and adding the 'X-Country' header for US targeting in the context.

IPRoyal Playwright

Each of these examples demonstrates how we can integrate IPRoyal residential proxies with different web scraping libraries, consistently using US geotargeting and rotating proxies.


Case Study: Scrape Amazon Prices

In this case study, we demonstrate how to scrape price information for a product from different regional Amazon websites using Puppeteer.

We’ll configure the script to use IPRoyal proxies to access content from various country-specific Amazon sites.

Code Example

import puppeteer from 'puppeteer'; // npm i puppeteer
import dotenv from 'dotenv'; // npm i dotenv
import userAgent from 'random-useragent'; // npm i random-useragent

dotenv.config();

const delay = (ms) => new Promise(resolve => setTimeout(resolve, ms));

const scrapeAmazonPrice = async (country_code, url) => {
const username = process.env.IPROYAL_USERNAME;
const password = process.env.IPROYAL_PASSWORD;
const port = process.env.IPROYAL_PORT;
const host = 'geo.iproyal.com';

let browser;
let page;

try {
browser = await puppeteer.launch({
args: [
`--proxy-server=${host}:${port}`
],
headless: false
});

const context = await browser.createBrowserContext();
page = await context.newPage();

// Set proxy authentication
await page.authenticate({
username: username,
password: password
});

// Set user agent and headers
await page.setUserAgent(userAgent.getRandom());
await page.setExtraHTTPHeaders({
'X-Country': country_code,
});

await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 90000 });

// Handle the cookie consent pop-up
try {
console.log('Checking for cookie consent pop-up...');
const acceptButton = await page.waitForSelector('#sp-cc-accept', { visible: true, timeout: 5000 });
if (acceptButton) {
console.log('Cookie consent pop-up found. Clicking "Accept" button...');
await acceptButton.click();
await page.waitForNetworkIdle({ timeout: 5000 });
console.log('Cookie consent handled successfully.');
}
} catch (error) {
console.log('No cookie consent pop-up found or unable to click. Proceeding with scraping.');
}

// Extract the price
const priceSelector = 'span.a-price-whole, span.a-price, span.a-color-price';
await page.waitForSelector(priceSelector, { visible: true, timeout: 10000 });
const price = await page.evaluate(selector => {
const element = document.querySelector(selector);
return element ? element.innerText.trim() : 'Price not found';
}, priceSelector);

console.log(`Scraped data - Price: ${price}`);
return price;
} catch (error) {
console.error('An error occurred:', error);
if (page) {
await page.screenshot({ path: `screenshots/${country_code}_error_page.png`, fullPage: true });
}
return { title: 'Error', price: 'Error', error: error.message };
} finally {
if (browser) {
await browser.close();
}
}
};

const scrapeMultipleCountries = async (url, countries, delayBetweenRequests=30000) => {
const results = {};
for (const country of countries) {
console.log(`Scraping for country: ${country}`);
results[country] = await scrapeAmazonPrice(country, url);
if (countries.indexOf(country) < countries.length - 1) {
console.log(`Waiting ${delayBetweenRequests / 1000} seconds before next request...`);
await delay(delayBetweenRequests);
}
}
return results;
};

const main = async () => {
const url = 'https://www.amazon.es/Harry-Potter-Crochet-Kits/dp/1684128870';
const countries = ['es', 'pt'];

try {
const results = await scrapeMultipleCountries(url, countries);
console.log('Final results:', results);
} catch (error) {
console.error('An error occurred in the main execution:', error);
}
};

// Run the main function
main().catch(console.error);

Explanation

  • Environment Setup: We load environment variables using dotenv to securely manage sensitive information like the IPRoyal username, password, and proxy port. Credentials are stored in a .env file.
  • Scraping Logic: The scrapeAmazonPrice function uses Puppeteer to navigate Amazon websites through IPRoyal proxies. We authenticate the proxy using credentials and add a randomized user agent for each request.
  • Handling Pop-ups: The script checks for cookie consent pop-ups and clicks "Accept" if found, ensuring uninterrupted scraping.
  • Content Extraction: Once the page loads, the script looks for the price using a CSS selector and retrieves it. If the price isn’t found, an error message is returned.

Initial Attempt: Scrape Harry Potter Crochet Kit from Amazon Spain

We start by scraping price information for the "Harry Potter Crochet Kit" from the Spanish Amazon website:

scrapeAmazonPrice('es', 'https://www.amazon.es/Harry-Potter-Crochet-Kits/dp/1684128870');

Second Attempt: Scrape Harry Potter Crochet Kit from Amazon Portugal

Next, we scrape the same product from the Portuguese Amazon site:

scrapeAmazonPrice('pt', 'https://www.amazon.es/Harry-Potter-Crochet-Kits/dp/1684128870');

Analysis: Price Differences

After scraping, we compare the results:

  • Spain Price: 21,20 €
  • Portugal Price: 14,75 €

Amazon Price Difference

Difference Analysis: This demonstrates how product prices can vary significantly between regional Amazon sites, illustrating different pricing strategies for the same product across countries.

The use of proxies allowed us to gather these region-specific data points, highlighting price discrepancies in different markets.


Alternative: ScrapeOps Residential Proxy Aggregator

If you're looking for a more versatile and cost-effective solution, consider using the ScrapeOps Residential Proxy Aggregator. This service allows you to tap into multiple proxy providers through a single proxy port, offering a range of benefits compared to traditional proxy providers.

Top 3 Reasons to Choose ScrapeOps Residential Proxy Aggregator

  1. Competitive Pricing: ScrapeOps offers lower pricing, allowing you to maximize your budget while maintaining high quality.
  2. Flexible Plans: With ScrapeOps, you have access to a wider variety of plans, including smaller, more affordable options tailored to your needs. The best part? You can start using the proxies with a free trial account.
  3. Enhanced Reliability: By leveraging multiple proxy providers through a single port, ScrapeOps offers greater reliability. If one provider faces issues, your requests can seamlessly switch to another, ensuring continuous access.

Code Example Using Python Requests

Below is a code example demonstrating how you can scrape the same product from our previous case study using the ScrapeOps Residential Proxy Aggregator with Python requests and BeautifulSoup libraries:

import requests
from bs4 import BeautifulSoup
from dotenv import load_dotenv
import os

load_dotenv()

username = 'scrapeops'
api_key = os.getenv("SCRAPEOPS_API_KEY")
proxy = 'residential-proxy.scrapeops.io'
port = 8181

proxies = {
"http": f"http://{username}:{api_key}@{proxy}:{port}"
}

# Scrape "TECH TEE" price
product_url = 'https://www.zalando.be/heren/?q=tech+tee'

try:
response = requests.get(product_url, proxies=proxies)

# Check if the request was successful
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')

# Example: Locate the product and extract price information
product_name = soup.find('h3', text='TECH TEE - T-shirt basic - black')
if product_name:
price = product_name.find_next('span', class_='sDq_FX lystZ1 FxZV-M HlZ_Tf').get_text(strip=True)
print(f"Price for 'TECH TEE - T-shirt basic - black': {price}")
else:
print("Out of stock!")

else:
print(f"Failed to retrieve content. Status code: {response.status_code}")

except Exception as e:
print(f"An error occurred: {e}")

Explanation:

  • Environment Setup: We start by loading environment variables using dotenv to securely manage sensitive information like the ScrapeOps API key. This setup allows you to keep credentials out of the codebase and configure them separately in a .env file.
  • Proxy Configuration: We configure the ScrapeOps residential proxy by setting up the proxies dictionary. This includes the proxy server address, port, and necessary authentication details, allowing you to route our requests through the residential proxy.
  • Scraping Process: After setting up the proxy, we send a GET request to the specified Zalando URL, targeting a search page for "tech tee" products. We then check if the request was successful by verifying the status code.
  • Content Parsing and Extraction: Once the page content is retrieved, we parse it using BeautifulSoup. We search for the product with the exact name "TECH TEE - T-shirt basic - black" using the find method. If the product is found, we locate its associated price by finding the next span element that matches the specific class containing the price information.
  • Output: Finally, we print the price of the "TECH TEE" product to the console. If the product is not available or not found on the page, we print "Out of stock!" to inform you that the item is currently unavailable.

ScrapeOps Residential Proxies

Ready to see how ScrapeOps can elevate your scraping projects?

Sign up for a free trial today and get 500MB of free bandwidth to test it out.


When using residential proxies for web scraping, it's crucial to consider the ethical implications and legal responsibilities. IPRoyal, like many proxy providers, emphasizes the importance of ethical practices in their service.

  1. Compliance with Terms of Service: Adhere to IPRoyal's terms of service, which prohibit illegal activities and abuse of their proxy network.
  2. Data Protection: Handle any collected data in compliance with relevant data protection laws, such as GDPR. This includes proper storage, processing, and deletion of personal data if collected.
  3. Handle Personal Data Carefully: If your scraping activities involve collecting personal data, ensure you have a legal basis for doing so and that you're complying with relevant data protection regulations like GDPR.
  4. Respect Robots.txt: Always check and adhere to the target website's robots.txt file, which specifies which parts of the site can be crawled.
  5. Use User-Agents Responsibly: While it's common practice to rotate user-agents, ensure they are recent and commonly used. Misrepresenting your scraper as a different browser could be seen as unethical.

By adhering to these ethical considerations and legal guidelines, you can ensure that your use of IPRoyal's residential proxies for web scraping is both effective and responsible, maintaining a balance between your data needs and the health of the web ecosystem.


Conclusion

Throughout this guide, we've explored the powerful capabilities of IPRoyal Residential Proxies for web scraping and data collection.

We demonstrated integration with popular tools like Python Requests, Selenium, Scrapy, and more, while a case study on Amazon showcased the value of residential proxies for uncovering regional pricing differences.

We strongly encourage you to implement residential proxies in your web scraping projects to avoid IP bans, bypass rate limiting, access geo-restricted content, improve data accuracy and enhance web scraping accuarcy.

By leveraging IPRoyal Residential Proxies and following the best practices outlined in this guide, you're well-equipped to take your web scraping and data collection projects to the next level.


More Web Scraping Guides

At ScrapeOps, we've got tons of learning resources. It doesn't matter if you're brand new to scraping or a hardened developer, we have something for you.

If you would like to learn more about Web Scraping with Python, then be sure to check out Python Web Scraping Playbook or check out one of our more in-depth guides: