IPRoyal Residential Proxies: Web Scraping Guide
IPRoyal is a top-tier provider of reliable proxy services, offering over 34 million proxies. They specialize in residential proxies that ensure anonymity and high-speed connections for various web scraping needs.
Their rotating and static residential proxies provide unmatched flexibility and control, helping businesses gather data efficiently without the risk of being blocked. IPRoyal also offers data center and mobile proxies, ensuring seamless integration with over 650 tools.
This guide will help you set up and integrate IPRoyal residential proxies into your web scraping scripts, highlighting their benefits and practical applications.
TLDR: How to Integrate IPRoyal Residential Proxy?
Here are straightforward steps to get you up and running in seconds:
-
Install the Python
requests
package:
-
Set Up IPRoyal Residential Proxy:
import requests
username = "your_proxy_username"password = "your_proxy_password"port = "your_proxy_port"proxy = f"geo.iproyal.com:{port}"
proxies = { "http": f"http://{username}:{password}@{proxy}", "https": f"http://{username}:{password}@{proxy}"}
response = requests.get("https://example.com", proxies=proxies)print(response.text)
In this script:
- Set Credentials and Proxy: Provide your IPRoyal credentials (
username
and password
) along with the proxy address (geo.iproyal.com:<port>
).
- Configure Proxies: Create a
proxies
dictionary to set up HTTP and HTTPS proxies using your credentials.
- Send a Request: Use the
requests.get
method to send a request to the target website (https://example.com
) via the configured proxies.
- Print the Response: Finally, print the response text received from the target website.
With that, you should be able to start using IPRoyal's residential proxies.
What if you want to find out more? Let's dive right into the next sections!
Understanding Residential Proxies
What Are Residential Proxies?
Residential proxies function as intermediaries between you and the target websites, making your web traffic appear as if it’s coming from a legitimate residential address.
When you use a residential proxy, the target website sees your requests as originating from a real user, increasing the chances of bypassing detection and avoiding IP bans.
Types of Residential Proxies
There are two main types of residential proxies: rotating and static.
-
Rotating Residential Proxies: These proxies automatically change the IP address with each request or at regular intervals, providing high anonymity but potentially slower speeds.
-
Static Residential Proxies: These proxies maintain the same IP address for the duration of a session, offering consistent performance but with a higher risk of detection.
Here's a comparison of these two proxy types:
|
IP Address | Changes with each request | Remains the same for an extended period |
Anonymity | High | Moderate |
Speed | Slower due to rotation process | Generally faster |
Management | Complex, easy with services like IPRoyal | Simpler |
Risk of Detection | Lower | Higher |
Residential vs. Data Center Proxies
Residential proxies are distinct from data center proxies, primarily in their origin and how they are perceived by websites.
Residential proxies use IP addresses assigned by ISPs to real households, making them appear more authentic to websites. In contrast, data center proxies are generated by data centers and are not tied to a physical location, making them easier to detect and block.
Here's a comparison table for residential and data center proxies to better understand the differences:
|
IP Source | Real residential addresses from ISPs | Data centers and cloud service providers |
Anonymity | High | Lower |
Speed | Variable, often slower | Generally faster |
Cost | Higher | Lower |
Detection Risk | Lower | Higher |
Effectiveness for Geo-Access | High | Lower |
When Are Residential Proxies Useful?
Residential proxies are highly beneficial for various tasks:
- Web Scraping and Data Collection: Ensures accurate data without getting blocked by anti-scraping mechanisms.
- SEO and SERP Analysis: Gathers precise search engine results from different locations.
- Social Media Monitoring: Tracks trends and activities on social media platforms without being flagged.
- Ad Verification: Checks the correct display of ads in different regions.
- Geo-Restricted Content Access: Allows access to content limited to specific geographical areas.
Why Choose IPRoyal Residential Proxies?
IPRoyal’s residential proxies stand out with competitive pricing and a vast network of over 32 million IPs across 195 countries. This extensive pool supports efficient web scraping, SEO research, and data aggregation.
Here are top 3 reasons to choose IPRoyal residential proxies:
- Exceptional Coverage: Access a diverse range of IP addresses worldwide, reducing detection risks and enhancing data collection accuracy.
- Precise Targeting: Select proxies from any country, state, or city effortlessly, ensuring your data is as specific as needed without additional costs.
- Flexible and Reliable: Enjoy pay-as-you-go pricing with no contracts or expiration on purchased traffic, combined with 24/7 support and advanced technical features for seamless integration.
Next, it would help to find out IPRoyal residential proxies' pricing plans before learning how to apply them in various ways.
IPRoyal Residential Proxy Pricing
IPRoyal offers flexible and competitive pricing for its residential proxies, catering to various usage needs and budgets. Their pricing structure primarily revolves around bandwidth used, rather than charging per individual IP address or concurrency.
They provide a Pay-As-You-Go plan, with costs varying depending on the amount of bandwidth purchased. Here's a detailed look at their pricing:
Pricing Table
|
1GB Plan | 1 | $7.00 | $7.00 |
2GB Plan | 2 | $5.95 | $11.90 |
10GB Plan | 10 | $5.25 | $52.50 |
50GB Plan | 50 | $4.90 | $245.00 |
100GB Plan | 100 | $4.55 | $455.00 |
250GB Plan | 250 | $4.20 | $1,050.00 |
500GB Plan | 500 | $3.50 | $1,750.00 |
1000GB Plan | 1000 | $3.15 | $3,150.00 |
3000GB Plan | 3000 | $2.80 | $8,400.00 |
5000GB Plan | 5000 | $2.45 | $12,250.00 |
10000+GB Plan | 10000+ | Reach Out for a Special Deal! | Talk to Sales |
Pricing Comparison
IPRoyal's pricing is generally competitive compared to other residential proxy providers:
- Cheap Providers: Providers offering plans around $2-3 per GB are considered cheaper.
- Expensive Providers: Those with smaller plans priced in the $6-8 per GB range are more expensive.
For a comprehensive comparison of different residential proxy providers, including IPRoyal, you can visit our Proxy Comparison page. This resource will help you evaluate various options and find the best fit for your needs.
Setting Up IPRoyal Residential Proxies
Creating an IPRoyal Account
To get started, visit the registration page. You can register using your LinkedIn, Google or email.
Say you go with Google signup. Click on the "Login with Google" button, enter your login credentials then add phone number if prompted to.
After that, you will be redirected to your dashboard, from where you can buy Royal Residential proxies.
Selecting and Purchasing Residential Proxies
After logging in, navigate to the residential proxy pricing page.
Select a proxy plan that fits your needs. Scroll down the page and click on the "Continue" button. For example, I have selected the $7/1G package.
Follow the prompts to complete your purchase. Once your payment is processed, you will be redirected to residential proxies page and you can start setting up your residential proxies.
Scroll down the page to see or update your proxy credentials e.g username, password, port.
Authentication
When using IPRoyal proxies, you have two primary methods to authenticate your requests:
- username and password
- IP whitelisting
Method 1: Username & Password Authentication
This method involves adding your residential proxy’s username and password into the proxy configuration.
Here’s how you can set it up:
-
Install Required Packages:
pip install requests python-dotenv
-
Set Up Environment Variables:
Create a
.env
file in your project directory to securely store your IPRoyal credentials:
IPROYAL_USERNAME=your_usernameIPROYAL_PASSWORD=your_passwordIPROYAL_PORT=your_proxy_port
Loading and Configuring Proxy Settings
To securely load your credentials from the .env
file and configure your proxy settings, follow these steps:
-
Load Environment Variables:
Use the
python-dotenv
package to load your credentials:
from dotenv import load_dotenvimport os
load_dotenv()
username = os.getenv("IPROYAL_USERNAME")password = os.getenv("IPROYAL_PASSWORD")port = os.getenv("IPROYAL_PORT")proxy = f"geo.iproyal.com:{port}"
-
Set Up Proxies Dictionary:
Create a dictionary that contains your proxy settings:
proxies = { "http": f"http://{username}:{password}@{proxy}:{port}", "https": f"http://{username}:{password}@{proxy}:{port}"}
-
Send a Request Through the Proxy:
Use the
requests
library to send a request via the configured proxy:
import requestsfrom dotenv import load_dotenvimport os load_dotenv() username = os.getenv("IPROYAL_USERNAME")password = os.getenv("IPROYAL_PASSWORD")proxy = os.getenv("IPROYAL_HOST")port = os.getenv("IPROYAL_PORT")
import requests
proxies = { "http": f"http://{username}:{password}@{proxy}:{port}", "https": f"http://{username}:{password}@{proxy}:{port}"} response = requests.get("https://httpbin.org/ip", proxies=proxies)print(response.text)
This setup ensures that your requests are authenticated and routed through IPRoyal's residential proxies.
Method 2: IP Whitelisting Authentication
An alternative method for authenticating with IPRoyal residential proxies is IP whitelisting. This method allows you to authorize specific IP addresses to use the proxies without needing to include a username and password in your requests.
Here's how you can set it up:
-
Log in to Your IPRoyal Account:
- Navigate to the Royal Residential proxies configuration page in your IPRoyal dashboard.
-
Add Your IP Address:
- Scroll down to find authentication options and click the "Whitelist" button.
- On the IP whitelist configuration page, click "Add".

- Configure your proxies as needed (country, state, proxy type, session type).
- Enter the IP address you want to whitelist.
- Click "Create" to add the IP to your whitelist.
-
Configure Proxy Without Username and Password:
Once your IP is whitelisted, you can set up your proxies without including credentials. In the "Formatted proxy list" section of your dashboard, select your whitelisted IP and copy the IP:PORT information.
Then use it like this:
proxy = "copied_ip:copied_port"
proxies = { 'http': f'http://{proxy}', 'https': f'http://{proxy}'}
-
Send a Request:
You can now send requests through the proxy as usual:
import requests
proxy = "copied_ip:copied_port"
proxies = { 'http': f'http://{proxy}', 'https': f'http://{proxy}'}
response = requests.get("https://example.com", proxies=proxies)print(response.text)
This method simplifies your code and is particularly useful when managing multiple users or devices.
Basic Request Using IPRoyal Residential Proxies
Let’s dive into a practical example that demonstrates how to make a basic web request using IPRoyal residential proxies:
Suppose you want to scrape the homepage of a website using IPRoyal residential proxies. Here’s how you would do it:
-
Install Dependencies:
Ensure you have the required packages installed:
pip install requests beautifulsoup4
-
Set Up the Proxy:
Configure the proxy settings as demonstrated earlier in the Authentication section.
-
Send the Request and Parse the HTML:
Use the
requests
library to send a request and the BeautifulSoup
library to parse the HTML content:
import requestsfrom bs4 import BeautifulSoup
url = 'https://ipv4.icanhazip.com'proxy = 'geo.iproyal.com'port_number = 12321proxy_auth = 'YOUR-USERNAME:YOUR-PASSWORD'proxies = { 'http': f'http://{proxy_auth}@{proxy}:{port_number}', 'https': f'http://{proxy_auth}@{proxy}:{port_number}'}print(proxies)
response = requests.get("https://quotes.toscrape.com", proxies=proxies)soup = BeautifulSoup(response.content, 'html.parser') print(soup.title.text)
In this example, we:
- Set Up Proxies: We configure the
proxies
dictionary with the necessary credentials and proxy address.
- Send a Request: We use
requests.get()
to fetch the content of the target website.
- Parse the HTML: We utilize
BeautifulSoup
to parse the HTML content and extract information like the page title.
Handling Proxy Errors
When using proxies, you may encounter errors such as connection timeouts or proxy failures. To handle these gracefully:
try: response = requests.get("https://example.com", proxies=proxies, timeout=10) response.raise_for_status()except requests.exceptions.ProxyError: print("Proxy error occurred. Please check your proxy settings.")except requests.exceptions.Timeout: print("The request timed out. Try increasing the timeout or check your internet connection.")except requests.exceptions.RequestException as e: print(f"An error occurred: {e}")
This code catches common exceptions, providing informative messages that help diagnose and fix issues with proxy configurations.
Country Geotargeting
Country-level geotargeting allows you to connect to proxy servers in specific countries, enabling you to access location-restricted content and gather localized data as if you were in that country.
This is particularly beneficial for tasks like market research, competitor analysis, and testing localized content or services.
IPRoyal offers extensive country-level geotargeting capabilities. Their network spans numerous countries across the globe, with the ability to select proxies from different locations according to your needs.
Top 10 Countries Supported by IPRoyal
Below is a table showcasing 10 popular countries supported by IPRoyal, along with the number of IPs available in each:
|
United States | 1,450,886 |
Germany | 439,883 |
United Kingdom | 421,770 |
France | 418,633 |
Canada | 373,796 |
Brazil | 908,824 |
Spain | 781,766 |
Italy | 393,154 |
Vietnam | 460,712 |
Philippines | 545,729 |
IPRoyal supports proxies from a wide array of countries, ensuring that no matter where you need to connect, there’s likely a proxy available.
Using Country-Specific Proxies
To use country-specific proxies with IPRoyal, you can configure your requests by specifying the country code using the X-Countries
header.
Here’s how you can do this in Python:
import requests
url = 'https://ipv4.icanhazip.com'proxy = 'geo.iproyal.com:12321'proxy_auth = 'YOUR-USERNAME:YOUR-PASSWORD'country = 'us'proxies = { 'http': f'http://{proxy_auth}_country-{country}@{proxy}', 'https': f'http://{proxy_auth}_country-{country}@{proxy}'}
response = requests.get(url, proxies=proxies)print(response.text)
In this implementation:
- We have set up country geo-targeting by adding the
_country
flag to our url..
- This flag allows us to route our request through the specified country (in this case,
"us"
).
According to the IPRoyal docutmentaton, their country codes are listed here.
City Geotargeting
City-level geotargeting allows you to access hyper-local content by connecting to proxies in specific cities. This is especially useful for tasks like local SEO analysis, price monitoring, and testing localized advertisements.
IPRoyal provides the ability to target proxies at the city level, giving you granular control over your web scraping and data collection activities.
Top 10 Cities Supported by IPRoyal
Here’s a table of popular cities supported by IPRoyal, along with their corresponding country:
|
New York | United States |
Berlin | Germany |
London | United Kingdom |
Paris | France |
Toronto | Canada |
São Paulo | Brazil |
Madrid | Spain |
Milan | Italy |
Ho Chi Minh | Vietnam |
Manila | Philippines |
Using City-Specific Proxies
To target a specific city, you need to provide both the country and city flags inside your proxy url..
Here’s an example using Python:
import requests
url = 'https://ipv4.icanhazip.com'proxy = 'geo.iproyal.com:12321'proxy_auth = 'YOUR-USERNAME:YOUR-PASSWORD'country = 'us'city = 'detroit'proxies = { 'http': f'http://{proxy_auth}_country-{country}_city-{city}@{proxy}', 'https': f'http://{proxy_auth}_country-{country}_city-{city}@{proxy}'}
response = requests.get(url, proxies=proxies)print(response.text)
The code gives us the following output.
Now, it's time to verify that this IP address is actually correct. Our location should be Detroit, MI. If you look at the screenshot below, you can see that we are in fact showing up in Detroit.
In this script:
- We have implemented city geo-targeting by using both
_country
and _city
flags.
- The
_country
flag specifies the country code (e.g., 'us'
for the United States), while the _city
flag designates the city (e.g., 'detroit'
).
By including these flags, our request is routed through a proxy that simulates access from the specified city and country.
Error Codes
When using a proxy, you might encounter various HTTP error codes that indicate issues with your connection.
These error codes provide insights into what might be going wrong, whether it’s with the proxy server, your network, or the target website.
Below are some common HTTP proxy error codes and their meanings:
|
HTTP 400 Bad Request | This error happens when the client’s request to the proxy server is malformed or invalid. | Ensure the request is properly formatted with all required HTTP headers included. Double-check the syntax and structure. |
HTTP 403 Forbidden | The proxy server understands the request but refuses to fulfill it due to insufficient permissions or authentication. | Verify that you have the correct permissions and provide the necessary authentication credentials. |
HTTP 404 Not Found | The requested resource cannot be found on the proxy server or upstream server. | Ensure the URL is correct and that the resource hasn’t been moved or deleted. |
HTTP 407 Proxy Authentication Required | The proxy server requires authentication before allowing access to the requested resource. | Provide valid authentication credentials as required by the proxy server. |
HTTP 408 Request Timeout | The proxy server times out while waiting for the client’s request. | Check your network connection for stability and consider resending the request. |
HTTP 502 Bad Gateway | The proxy server received an invalid response from an upstream server. | Check the upstream server for any issues and ensure it is functioning correctly. |
HTTP 503 Service Unavailable | The proxy server or upstream server is temporarily unable to handle the request due to maintenance or overload. | Try again later or check if the server is undergoing maintenance. |
HTTP 504 Gateway Timeout | The proxy server does not receive a timely response from the upstream server, resulting in a timeout. | Investigate whether the upstream server is slow or experiencing network issues. |
HTTP 505 HTTP Version Not Supported | The proxy server does not support the HTTP version used in the client’s request. | Ensure that the HTTP version in your request is compatible with the proxy server and adjust it if necessary. |
Understanding these error codes can help you diagnose and troubleshoot issues with your proxy connection, ensuring smoother browsing and data retrieval.
KYC (Know-Your-Customer) Verification
IPRoyal implements a strict KYC (Know-Your-Customer) policy for both new and existing customers to ensure the security and credibility of their proxy network.
- IPRoyal requires KYC validation prior to granting full access to their proxy services.
- New accounts have limited access to services until KYC verification is completed.
The KYC process involves:
- Identity Verification:
- Customers must provide an image of a government-issued ID (passport, driver's license, or national ID).
- A selfie is required to confirm the identity matches the provided document.
- Document Authentication:
- IPRoyal uses iDenfy, a third-party platform, to verify documents and conduct identity checks.
- Advanced algorithms detect anomalies or inconsistencies in the provided documents.
- Facial recognition technology compares the photo on the ID with the provided selfie.
- Information may be cross-referenced with government databases for accuracy.
- Ongoing Monitoring:
- IPRoyal continuously monitors their network to prevent abuse.
- They reserve the right to refuse service based on risk level assessment.
Use cases that are not allowed:
- IPRoyal states they will refuse service for suspicious, inappropriate, or unethical use cases.
The KYC process is designed to be quick and easy, typically taking just a few minutes to complete via phone or desktop. This verification allows customers to access all of IPRoyal's services, including API access and the ability to purchase as little as one proxy per order.
IPRoyal emphasizes that their KYC process complies with high data safety standards, including GDPR, ISO 27001, eIDAS, and ETSI, ensuring the privacy and security of customer information.
Unlike some providers (such as Bright Data) that require a call before allowing proxy use, IPRoyal's process is online and automated through their third-party verification platform (iDenfy).
Implementing IPRoyal Residential Proxies in Web Scraping
Let's explore how to use IPRoyal residential proxies with various libraries. We'll use the same example for each, targeting the US and using rotating proxies.
Python Requests
Here's how we can integrate IPRoyal proxies with Python Requests:
import requestsfrom dotenv import load_dotenvimport os load_dotenv() username = os.getenv("IPROYAL_USERNAME")password = os.getenv("IPROYAL_PASSWORD")proxy = os.getenv("IPROYAL_HOST")port = os.getenv("IPROYAL_PORT")
import requests
proxies = { "http": f"http://{username}:{password}@{proxy}:{port}", "https": f"http://{username}:{password}@{proxy}:{port}"} response = requests.get("https://httpbin.org/ip", proxies=proxies)print(response.text)
You've seen the example above before. It's basically just here to ensure consistency. Most importantly, we use a dict
to create a proxies
object. Then, we pass our proxies
into the proxies
argument, requests.get("https://httpbin.org/ip", proxies=proxies)
.
Python Selenium
SeleniumWire has always been a tried and true method for using authenticated proxies with Selenium. As you may or may not know, vanilla Selenium does not support authenticated proxies. Even worse, SeleniumWire has been deprecated! This being said, it is still technically possible to integrate IPRoyal Residential Proxies via SeleniumWire, but we highly advise against it.
When you decide to use SeleniumWire, you are vulnerable to the following risks:
-
Security: Browsers are updated with security patches regularly. Without these patches, your browser will have holes in the security that have been fixed in other browsers such as Chromedriver or Geckodriver.
-
Dependency Issues: SeleniumWire is no longer maintained. In time, it may not be able to keep up with its dependencies as they get updated. Broken dependencies can be a source of unending headache for anyone in software development.
-
Compatibility: As the web itself gets updated, SeleniumWire doesn't. Regular browsers are updated all the time. Since SeleniumWire no longer receives updates, you may experience broken functionality and unexpected behavior.
As time goes on, the probability of all these problems increases. If you understand the risks but still wish to use SeleniumWire, you can view a guide on that here.
Depending on your time of reading, the code example below may or may not work. As mentioned above, we strongly recommend against using SeleniumWire because of its deprecation, but if you decide to do so anyway here you go. We are not responsible for any damage that this may cause to your machine or your privacy.
from seleniumwire import webdriver
username = "your-username"password = "your-password"proxy = "geo.iproyal.com"port = 12321
proxy_options = { "proxy": { "http": f"http://{username}:{password}@{proxy}:{port}", "https": f"http://{username}:{password}@{proxy}:{port}", "no_proxy": "localhost:127.0.0.1" }}
driver = webdriver.Chrome(seleniumwire_options=proxy_options)
driver.get('https://httpbin.org/ip')
- We setup our url the same way we did with Python Requests:
f"http://{username}:{password}@{proxy}:{port}"
.
- We assign this url to both the
http
and https
protocols of our proxy settings.
driver = webdriver.Chrome(seleniumwire_options=proxy_options)
tells webdriver
to open Chrome with our custom seleniumwire_options
,
Python Scrapy
There are many ways to integrate a proxy with Scrapy. The example we'll go through here is just one of the ways you can do it.
First, create a new Scrapy project.
scrapy startproject iproyal_scraper
Then, from inside the spiders
folder, create a new Python file and add the following code. Make sure to replace the proxy_auth
variable with your own username and password.
import scrapy
class MySpider(scrapy.Spider): name = 'myspider' def start_requests(self): proxy = 'geo.iproyal.com:12321' proxy_auth = 'YOUR-USERNAME:YOUR-PASSWORD' country = 'us' proxy = f'http://{proxy_auth}_country-{country}_city-detroit@{proxy}' urls = [ 'https://httpbin.org/ip', ] for url in urls: yield scrapy.Request(url=url, callback=self.parse, meta={'proxy': proxy})
def parse(self, response): self.log(f"Visited {response.url}, response IP: {response.text}")
To run your spider, use the following command in your terminal:
In this example, we've configured our proxy settings inside of our spider. You can view some other proxy examples of Scrapy here.
NodeJS Puppeteer
Let's set up IPRoyal proxies with Puppeteer for browser automation:
First, we need new project folder.
Next, we'll cd
into that folder and initialize a new NodeJS project.
cd puppeteer-scrapernpm init --y
Next, we install Puppeteer.
Once you've got your JavaScript project setup, go ahead and paste the code below into a new JavaScript file. The code below checks your IP and takes a screenshot of the result.
const puppeteer = require('puppeteer');
PROXY_USERNAME = 'YOUR-USERNAME';PROXY_PASSWORD = 'YOUR-PASSWORD';PROXY_SERVER = 'geo.iproyal.com';PROXY_SERVER_PORT = '12321';
(async () => { const browser = await puppeteer.launch({ ignoreHTTPSErrors: true, args: [ `--proxy-server=http://${PROXY_SERVER}:${PROXY_SERVER_PORT}` ] }); const page = await browser.newPage(); await page.authenticate({ username: PROXY_USERNAME, password: PROXY_PASSWORD, });
try { await page.goto('https://httpbin.org/ip', {timeout: 180000}); await page.screenshot({path: "ip-1.png"})
} catch(err) { console.log(err); } await browser.close();
})();
In this example, we've launched a browser with the IPRoyal proxy settings. We then take a screenshot of the resulting IP address.
NodeJs Playwright
Finally, let's set up IPRoyal proxies with Playwright:
Make a new project folder.
cd
into the folder and initialize a NodeJS project.
cd playwright-scrapernpm init --y
Download Playwright as a dependency.
Go through the Playwright installation process.
The code below once again checks our IP address our IP address and takes a screenshot of the result.
const playwright = require('playwright');
PROXY_USERNAME = 'YOUR-USERNAME';PROXY_PASSWORD = 'YOUR-PASSWORD';PROXY_SERVER = 'geo.iproyal.com';PROXY_SERVER_PORT = '12321';
(async () => {
const browser = await playwright.chromium.launch({ headless: true, proxy: { server: `http://${PROXY_SERVER}:${PROXY_SERVER_PORT}`, username: PROXY_USERNAME, password: PROXY_PASSWORD } });
const context = await browser.newContext({ignoreHTTPSErrors: true });
const page = await context.newPage();
try { await page.goto('https://httpbin.org/ip', {timeout: 180000});
await page.screenshot({path: "ip.png"})
} catch(err) { console.log(err); } await browser.close();
})();
Here is the resulting screenshot.
In this Playwright setup, we've configured the browser to use the IPRoyal proxy. We then take a screenshot of our IP address to verify the results.
Each of these examples demonstrates how we can integrate IPRoyal residential proxies with different web scraping libraries, consistently using US geotargeting and rotating proxies.
Case Study: Scrape Amazon Prices
In this case study, we demonstrate how to scrape price information for a product from different regional Amazon websites using Python Requests.
We’ll configure the script to use IPRoyal proxies to access content from various country-specific Amazon sites. First, we set our country to Portugal, "pt"
. After we've scraped from Portugal, we then go through and perform the scrape from Spain, "es"
.
Code Example
import requestsfrom bs4 import BeautifulSoup
country = "pt"proxy_auth = 'YOUR-USERNAME:YOUR-PASSWORD'
proxies = { "http": f"http://{proxy_auth}_country-{country}@geo.iproyal.com:12321", "https": f"http://{proxy_auth}_country-{country}@geo.iproyal.com:12321"}
headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36"}
print("----------------------PT-------------------")
response = requests.get('https://www.amazon.es/s?k=port%C3%A1til', proxies=proxies, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
location = soup.select_one("span[id='glow-ingress-line2']")
print("Location:", location.text)
first_price_holder = soup.select_one("span[class='a-price']")first_price = first_price_holder.select_one("span[class='a-offscreen']")
print("First Price:", first_price.text)
print("----------------------ES---------------------")
country = "es"
proxies = { "http": f"http://{proxy_auth}_country-{country}@geo.iproyal.com:12321", "https": f"http://{proxy_auth}_country-{country}@geo.iproyal.com:12321"}
response = requests.get('https://www.amazon.es/s?k=port%C3%A1til', proxies=proxies, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
location = soup.select_one("div[id='glow-ingress-block']")
print("Location:", location.text.strip())
first_price_holder = soup.select_one("span[class='a-price']")first_price = first_price_holder.select_one("span[class='a-offscreen']")
print("First Price:", first_price.text)
Explanation
- Scraping Logic: We extract our location with
soup.select_one("div[id='glow-ingress-block']")
. We extract our first price holder on the page with soup.select_one("span[class='a-price']")
. Then we extract the price with first_price_holder.select_one("span[class='a-offscreen']")
.
Initial Attempt: Scrape Location and Price from Amazon.es
We start by scraping location and price information for "portátil" (laptop) from the Spanish Amazon website. At the very beginning of the script, we set our country
to "pt"
(Portugal).
Second Attempt: Scrape Location and Price from Amazon.es
Next, we scrape the same product from the same site using a Spanish location instead. We near the end of our script, we simply change our country
variable and rerun the scrape:
Analysis: Price Differences
You can view the output from our scrape below.
----------------------PT-------------------Location: Portugal First Price: 406,60 €----------------------ES---------------------Location: Entrega en Barcelona 08035
Actualizar ubicaciónFirst Price: 399,99 €
After scraping, we compare the results:
- On our first, run, our location was detected as
Portugal
and our first price on the page was 406,60 €
.
- After we change the location, we were detected in
Entrega en Barcelona 08035
(Barcelona, Spain) and our first price on the page was 399,99 €
.
Difference Analysis: This demonstrates how product prices can vary significantly even on the same Amazon site, illustrating different pricing strategies for the same product across countries.
The use of proxies allowed us to gather these region-specific data points, highlighting price discrepancies in different markets.
Alternative: ScrapeOps Residential Proxy Aggregator
If you're looking for a more versatile and cost-effective solution, consider using the ScrapeOps Residential Proxy Aggregator. This service allows you to tap into multiple proxy providers through a single proxy port, offering a range of benefits compared to traditional proxy providers.
Top 3 Reasons to Choose ScrapeOps Residential Proxy Aggregator
- Competitive Pricing: ScrapeOps offers lower pricing, allowing you to maximize your budget while maintaining high quality.
- Flexible Plans: With ScrapeOps, you have access to a wider variety of plans, including smaller, more affordable options tailored to your needs. The best part? You can start using the proxies with a free trial account.
- Enhanced Reliability: By leveraging multiple proxy providers through a single port, ScrapeOps offers greater reliability. If one provider faces issues, your requests can seamlessly switch to another, ensuring continuous access.
Code Example Using Python Requests
Here's a basic example of how to get started with the ScrapeOps Residential Proxy Aggregator. There are a few small differences, but its more or less the same as the connections we've made in this article.
import requests
API_KEY = "YOUR-SUPER-SECRET-API-KEY"
proxies = { "http": f"http://scrapeops:{API_KEY}@residential-proxy.scrapeops.io:8181", "https": f"http://scrapeops:{API_KEY}@residential-proxy.scrapeops.io:8181"}response = requests.get('https://httpbin.org/ip', proxies=proxies, verify=False)print(response.text)
Explanation:
-
Proxy Configuration: We configure the ScrapeOps residential proxy by setting up the
proxies
dictionary. This includes the proxy server address, port, and necessary authentication details, allowing you to route our requests through the residential proxy.
-
Output: We print our IP address to the terminal so we can check and verify that our proxy is configured properly.
Ready to see how ScrapeOps can elevate your scraping projects?
Sign up for a free trial today and get 500MB of free bandwidth to test it out.
Ethical Considerations and Legal Guidelines
When using residential proxies for web scraping, it's crucial to consider the ethical implications and legal responsibilities. IPRoyal, like many proxy providers, emphasizes the importance of ethical practices in their service.
- Compliance with Terms of Service: Adhere to IPRoyal's terms of service, which prohibit illegal activities and abuse of their proxy network.
- Data Protection: Handle any collected data in compliance with relevant data protection laws, such as GDPR. This includes proper storage, processing, and deletion of personal data if collected.
- Handle Personal Data Carefully: If your scraping activities involve collecting personal data, ensure you have a legal basis for doing so and that you're complying with relevant data protection regulations like GDPR.
- Respect Robots.txt: Always check and adhere to the target website's robots.txt file, which specifies which parts of the site can be crawled.
- Use User-Agents Responsibly: While it's common practice to rotate user-agents, ensure they are recent and commonly used. Misrepresenting your scraper as a different browser could be seen as unethical.
By adhering to these ethical considerations and legal guidelines, you can ensure that your use of IPRoyal's residential proxies for web scraping is both effective and responsible, maintaining a balance between your data needs and the health of the web ecosystem.
Conclusion
Throughout this guide, we've explored the powerful capabilities of IPRoyal Residential Proxies for web scraping and data collection.
We demonstrated integration with popular tools like Python Requests, Scrapy, Puppeteer, and Playwright. A case study on Amazon showcased the value of residential proxies for uncovering regional pricing differences.
We strongly encourage you to implement residential proxies in your web scraping projects to avoid IP bans, bypass rate limiting, access geo-restricted content, improve data accuracy and enhance web scraping accuarcy.
By leveraging IPRoyal Residential Proxies and following the best practices outlined in this guide, you're well-equipped to take your web scraping and data collection projects to the next level.
More Web Scraping Guides
At ScrapeOps, we've got tons of learning resources. It doesn't matter if you're brand new to scraping or a hardened developer, we have something for you.
If you would like to learn more about Web Scraping with Python, then be sure to check out Python Web Scraping Playbook or check out one of our more in-depth guides: