Smartproxy Residential Proxies: Web Scraping Guide
Smartproxy is a leading proxy provider, recognised for its exceptional quality products and security standards. One of Smartproxy's leading products is its residential proxies, which are uniquely tied to physical devices, making them extremely difficult for websites to detect and block.
In this article, we'll take a deep dive into the key features and benefits of Smartproxy's residential proxies, and provide a step-by-step guide on how to set up and integrate these proxies with web scraping scripts.
- TLDR: How to Integrate Smartproxy Residential Proxy?
- Understanding Residential Proxies
- Smartproxy Residential Proxy Pricing
- Setting Up Smartproxy Residential Proxies
- Authentication
- Basic Request Using Smartproxy Residential Proxies
- Country Geotargeting
- City Geotargeting
- How to Use Static Proxies
- Error Codes
- Implementing Smartproxy Residential Proxies in Web Scraping
- Case Study: Scrape Amazon
- Alternative: ScrapeOps Residential Proxy Aggregator
- Ethical Considerations and Legal Guidelines
- Conclusion
- More Web Scraping Guides
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
TLDR: How to Integrate Smartproxy Residential Proxy?
Setting up Smartproxy is user-friendly and quick. After creating an account and answering a couple of simple questions, you’ll be directed to a dashboard that suggests plans based on your needs.
Once you choose a plan and provide your payment information, you gain access to your credentials, which can be easily incorporated into your web scraping scripts to ensure seamless and uninterrupted data extraction.
Here's a Python web scraping example for extracting Amazon data regarding the category Camera & Photo from Portugal:
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
with sync_playwright() as pw:
browser = pw.chromium.launch(
proxy={
'server': "pt.smartproxy.com:20001",
'username': 'spwtkblz1o',
'password': '<my_password>'
},
headless=False,
timeout=10000000
)
# creates a new browser page (tab) within the browser instance
page = browser.new_page()
# go to amazon
page.goto(
("https://www.amazon.com/s?i=specialty-"
"aps&bbn=16225009011&rh=n%3A%2116225009011%2Cn%3A502394"),
timeout=1000000000)
# get HTML content
html = page.inner_html('body')
# make parser object
soup = BeautifulSoup(html, 'html.parser')
# select cards
cards = soup.select(
".a-section.a-spacing-small.puis-"
"padding-left-small.puis-padding-right-small")
# get a list of products
products = [card.select_one('h2').text for card in cards]
prices = [card.select_one('span.a-price').text.split("$")[1]
for card in cards]
num_reviews = [card.select_one(
'span.a-size-base.s-underline-text').text.split(' ')[0]
for card in cards]
data_combined = zip(products, prices, num_reviews)
result = [
{
"product": product,
"price": price,
"n_reviews": n_review}
for product, price, n_review in data_combined]
print(result)
This Python script uses Playwright to automate web scraping from Amazon through a proxy server. Here's a breakdown:
- Proxy Setup: The proxy server and credentials are provided for browsing through
sync_playwright
. - Open Browser: The script launches a Chromium browser and opens a specified Amazon search results page.
- HTML Parsing: The page content is retrieved and parsed with
BeautifulSoup
. - Data Extraction: The script collects
product names
,prices
, andreview counts
from Amazon search results. - Output: The data is combined into a list of dictionaries containing the product name, price, and number of reviews. Finally, the results are printed as structured data.
Smartproxy offers integration examples for other popular web scraping libraries such as Python Selenium, Python Scrapy, and NodeJS Playwright, ensuring that users have the tools they need regardless of their preferred scraping framework.
Understanding Residential Proxies
What Are Residential Proxies?
A residential proxy routes your internet traffic through an IP address provided by an Internet Service Provider (ISP) to a homeowner. Since the IP address belongs to an actual physical location, it appears more authentic and trustworthy to websites.
This makes it harder for websites to detect that traffic is coming through a proxy, in contrast to datacenter proxies, which use IPs from cloud services and can be easily flagged.
Why Are Residential Proxies Important?
Residential proxies have become the leading choice for web scraping activities, due to their ability to mimic the behaviour of real users.
Their decentralised nature, allows users to have access to a vast range of IP addresses worldwide while masking their location. This is important because it guarantees anonymity and web scraping data at scale without interruptions.
Users can leverage web scraping activities with residential proxies in the following ways:
- Bypass geo-restrictions and access content not available in your region
- Protect their online identity and maintain privacy
- Conduct market research and gather data without being detected
- Monitor social media and online trends without being flagged as a bot
- Conduct several website requests without interruptions
Types of Residential Proxies
There are two main types of residential proxies: Rotating and Static.
Let’s take a look at the main differences between these two and the pros and cons of each.
- Rotating Residential Proxies
Rotating residential proxies are a pool of IP addresses that are rotated periodically, usually every few minutes. Proxy providers with this type of proxies, often have their own proxy management system that is in charge of assigning new IPs and closing the ones that are not in use.
This rotation ensures that the IP address used to access a website changes frequently, making it difficult for websites to detect and block the proxy. These proxies are particularly suitable for eCommerce, trend analysis, and business intelligence.
Pros:
- High level of anonymity
- Reduced risk of IP blocking
- Suitable for high-volume scraping and data collection
Cons:
- Can be more expensive than static proxies
- Slower than datacenter and static residential proxies
- Lower bandwidth
- Static Residential Proxies
Static residential proxies, also known as ISP proxies combine the qualities of datacenter and residential IPs, they are fast and difficult to detect.
They borrow the IPs of mobile and desktop devices, relying on people’s connectivity to the wifi, therefore the residential IPs belong to an actual physical device with an internet connection issued by an Internet Service Provider (ISP).
Pros:
- High speed and stability
- Often less expensive than rotating proxies
- Suitable for tasks that require a consistent IP address
Cons:
- More likely to be detected
- Smaller range of IPs worldwide
Here’s a breakdown of their features:
Feature | Rotating Residential Proxies | Static Residential Proxies |
---|---|---|
IP Address | Changes with each request | Remains the same for an extended period |
Anonymity | High | Moderate |
Speed | Slower due to rotation process | Generally faster |
Management | Complex | Simpler |
Risk of Detection | Lower | Higher |
Residential vs. Datacenter Proxies
Residential proxies are frequently compared to datacenter proxies, which are IP addresses supplied by datacenters instead of physical devices.
Datacenter proxies are hosted on centralized servers, providing high speed and stability but offering less anonymity than rotating and static residential proxies.
Consequently, datacenter proxies are ideal for tasks needing high-speed connections. However, they are less suitable for large-scale web scraping, as websites can detect these IPs, potentially breaking your ability to continue data extraction.
Pros of Data Center Proxies
- Faster speeds and stability due to datacenter infrastructure
- Often less expensive than residential proxies
- Suitable for tasks that require high-speed connections
Cons of Data Center Proxies
- Easier to detect and block by websites
- Lower level of anonymity
When Are Residential Proxies Useful?
We’ve covered the importance of residential proxies, the differences between static and rotating and why they are more suitable for web scraping activities than datacenter proxies.
We’ve also mentioned some use cases, but let’s now delve deeper into the applications of residential proxies (both static and rotating):
- Web Scraping and Data Collection: As mentioned previously in this article, the most common application is web scraping, where rotating residential proxies allow for high-volume scraping and data collection without being detected or blocked.
- SEO and SERP Analysis: Residential proxies provide a high level of anonymity, enabling SEO professionals to analyze search engine results without being flagged.
- Social Media Monitoring: Residential proxies allow for social media monitoring, and manage Facebook, Instagram, X and other social media accounts from different locations.
- Ad Verification: Static Residential proxies are particularly preferred for ad verification which enables marketers to ensure that the ads are tailored for the right audience.
- Geo-Restricted Content Access: Residential proxies provide access to geo-restricted content by masking the user's IP address.
Why Use Smartproxy Residential Proxies?
Smartproxy’s residential proxies stand out because of their great balance between performance and affordability. Smartproxy Residential Proxies offer several advantages, including:
-
Global Coverage: With a vast network of over 55 million ethically-sourced IP addresses across 195 countries, users can target specific locations with precision - from global to ZIP code level.
-
High Success Rates: These proxies excel at evading IP blocks, CAPTCHAs, geo-restrictions, and anti-bot measures, boasting an impressive 99.68% success rate and lightning-fast speeds of under 0.5 seconds.
-
Ease of Integration: Integration is seamless with various programming languages and software infrastructures, backed by 24/7 support.
-
Anonymity: For added flexibility, these proxies can rotate with each connection request or maintain sticky sessions (which act like static residential proxies). Sessions can last up to 30 minutes, with automatic IP address rotation every 30 minutes or when the connection drops.
These features make Smartproxy ideal for those needing reliable, scalable proxy solutions.
Smartproxy Residential Proxy Pricing
Smartproxy’s residential proxies have different pricing alternatives. There are two main plans:
- A regular plan which focused on individuals and small companies and
- An enterprise plan which comes with much higher bandwidth and a dedicated account manager.
The regular plan is designed for smaller users, with a pay-as-you-go option at $7 per GB, ideal for those just starting with web scraping or wanting to test the product.
Alternatively, you can choose from a range of plans with increasing bandwidth, starting from 2 GB at $6 per GB, with the price per GB decreasing as the bandwidth increases, up to 100 GB at $4.5 per GB.
For larger needs, the enterprise plan offers up to 5000 GB at $2.2 per GB, and custom plans are available upon request.
Pricing Table
Here's a comparison of pricing between different plans:
Plan | GB | Price | Free trial |
Pay-as-you-go | Unlimited | $7 per GB | No |
Regular | From 2 GB up to 100 GB | Starting at $4.5 up to $6 | Yes |
Enterprise | From 250 GB to 5000 GB | Starting at $2.2 up to $4 | No |
To provide further flexibility, Smartproxy allows users to top up their plans at the same rate per GB, up to 80% of the plan's value, making it easy to scale up without being locked into a specific tier.
Once you reach 80%, it's more cost-effective to upgrade to a larger plan.
Pricing Comparison
Smartproxy offers a pay-as-you-go plan that is expensive compared to other alternatives in the market, but the quality of its residential proxies is also above the average, hence this plan can be suitable for small projects that need high-quality IPs.
Smartproxy’s proxies get really affordable when opting for very high bandwidth plans, with prices between $2.2 and $4 per GB.
Generally speaking, when proxy providers offer plans around $2-3 per GB, they are considered cheap. If they offer smaller plans in the $6-8 per GB range, they are more expensive.
The comparison can be more difficult regarding other bandwidth packages, therefore you can use our Proxy Comparison tool to decide which proxy provider has residential proxies that are tailored for your needs.
Setting Up Smartproxy Residential Proxies
To set up Smartproxy Residential Proxies, follow these steps:
Creating an Smartproxy Account
The registration process with Smartproxy is straightforward and quick.
Simply provide a valid email address and password, answer a two-question survey, and then confirm your email address to complete the sign-up process. You also have the option to register using your existing Google account.
There’s no KYC, so once connected you have automatically access to a user-friendly dashboard, where you can:
- Add funds to your Smartproxy wallet
- Purchase and upgrade plans to suit your needs
- Grant access to authorized users
- Configure your proxy server settings
- Create sub-accounts for team members
- Monitor your traffic usage in real-time
- Reach out to customer support for assistance
Purchase Residential Proxies from Smartproxy
On the main page of the dashboard, you can specify a use case, and you’ll automatically get recommendations. One of the most popular products for eCommerce is residential proxies as you can see in the image above, and from there you can go to the different pricing plans.
You can also see the option Residential Proxies on the left side of the page. From there you can pick the plan that best suits your needs, let’s start with a three-day free trial of 100 MB of bandwidth.
Click on Continue to checkout and you will be asked to add your credit/debit card followed by your contact information.
Once done, you’re ready to start using Smartproxy’s residential proxies and explore more functionalities of the dashboard.
Authentication
Smartproxy’s authentication supports residential proxy authentication via
- username and password or
- IP-based whitelisting.
To manage your proxy users and authorized IPs, simply navigate to the Authentication section within the Residential Proxies feature.
Upon purchasing a plan or subscription, a unique proxy username and password will be automatically generated and available in the Authentication section.
Additionally, you will need to manually add any whitelisted IPs to complete the setup. You can choose between username and password authentication or whitelisted IP authentication by clicking on either Users or Whitelisted IPs, under the Authentication window.
Method 1: Username and Password Authentication
This method involves using a unique username and password to authenticate your proxy requests.
Here's an example of how you might use username and password authentication in Python using the requests
library:
import requests
proxy_username = "your_username"
proxy_password = "your_password"
proxy_url = "http://gateway.smartproxy.com:7000"
proxies = {
"http": f"http://{proxy_username}:{proxy_password}@{proxy_url}",
"https": f"http://{proxy_username}:{proxy_password}@{proxy_url}"
}
response = requests.get("https://example.com", proxies=proxies)
print(response.text)
Method 2: Whitelisted IPs
This method involves adding your IP address to a whitelist, allowing only authorized IPs to access the proxy.
Here's an example of how you might use IP-based whitelisting in Python using the requests
library.
import requests
proxy_url = "http://gateway.smartproxy.com:7000"
proxies = {
"http": proxy_url,
"https": proxy_url
}
response = requests.get("https://example.com", proxies=proxies)
print(response.text)
Basic Request Using Smartproxy Residential Proxies
The most common way to make basic requests with Python is by using the requests library, which you can install using the following pip command:
pip install requests
To use the Smartproxy’s residential IPS in your script, you must define the proxy configuration in your request.
import requests
# Set up your Smartproxy credentials and proxy URL
proxy_username = "your_username"
proxy_password = "your_password"
proxy_url = "http://gateway.smartproxy.com:7000"
# Set up the proxy configuration
proxies = {
"http": f"http://{proxy_username}:{proxy_password}@{proxy_url}",
"https": f"http://{proxy_username}:{proxy_password}@{proxy_url}"
}
In the example above replace your proxy_username
and proxy_password
with the credentials you have under the Residential Proxies window, which are automatically generated when you buy a product.
Now you can also specify the headers and finally make a get request:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}
# Make the request
response = requests.get("https://example.com", proxies=proxies, headers=headers)
# Print the response content
print(response.text)
With the above configuration, you should be able to make a simple Python request to your target website.
Handling Proxy Errors
When using proxies, you may encounter errors such as connection timeouts or proxy failures. To handle these gracefully:
try:
response = requests.get("https://example.com", proxies=proxies, timeout=10)
response.raise_for_status()
except requests.exceptions.ProxyError:
print("Proxy error occurred. Please check your proxy settings.")
except requests.exceptions.Timeout:
print("The request timed out. Try increasing the timeout or check your internet connection.")
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
This code catches common exceptions, providing informative messages that help diagnose and fix issues with proxy configurations.
Country Geotargeting
Country-level geotargeting allows users to target specific countries or regions to access location-specific content, bypass geo-restrictions, or gather data from specific regions.
This is particularly useful for market research, competitor analysis, or social media monitoring. For instance, a business might want to scrape data from a specific country's e-commerce website or access content only available in that country.
Top 5 Countries Supported by Smartproxy
Smartproxy allows you to use proxies from different countries. With a vast range of IPs for countries such as the United States, United Kingdom, India, Germany, Canada, Japan, Netherlands, France, Israel, Australia and many more, up to 195 countries.
Below is a table showcasing 5 popular countries supported by Smartproxy, along with the number of IPs available in each:
Country | Number of IPs |
---|---|
India | 9,454,840 |
United States | 7,198,166 |
Germany | 2,332,378 |
United Kingdom | 1,727,716 |
Canada | 540,925 |
Using Country-Specific Proxies
To use country-specific proxies with Smartproxy, you need to specify the country code in your proxy request. You can do this by adding the country parameter to your proxy URL.
For example, to use a proxy from the United States, you would add country=us
to your proxy URL.
Here's an example of how to use a country-specific proxy with Smartproxy using Python:
import requests
# Set the country code and proxy URL
country_code = 'us'
proxy_url = f'http://{country_code}.smartproxy.com:8080'
# Set the target URL
target_url = 'https://www.example.com'
# Create a proxy dictionary
proxies = {
'http': proxy_url,
'https': proxy_url
}
# Make a request using the proxy
response = requests.get(target_url, proxies=proxies)
# Print the response
print(response.text)
City Geotargeting
City-level geotargeting allows users to target specific cities or metropolitan areas, which is particularly useful for localized advertising, as it allows businesses to target ads to specific city audiences.
Smartproxy’s residential proxies, support city-level targeting in any city worldwide, including: New York, Los Angeles, Chicago, Houston, Miami, London, Berlin, Moscow and more.
Using City-Specific Proxies
To use city-specific proxies with Smartproxy, you need to specify the city code in your proxy request. You can do this by adding the city parameter to your proxy URL.
For example, to use a proxy from New York City, you would add city=nyc
to your proxy URL.
Here's an example of how to use a city-specific proxy with Smartproxy using Python:
import requests
# Set the city code and proxy URL
city_code = 'nyc'
proxy_url = f'http://{city_code}.smartproxy.com:8080'
# Set the target URL
target_url = 'https://www.example.com'
# Create a proxy dictionary
proxies = {
'http': proxy_url,
'https': proxy_url
}
# Make a request using the proxy
response = requests.get(target_url, proxies=proxies)
# Print the response
print(response.text)
How to Use Static Proxies
Static proxies are proxy servers that provide a fixed, non-changing IP address. Unlike rotating proxies, which assign a new IP address for each request, static proxies maintain the same IP for all your connections.
This is useful for tasks requiring IP consistency, such as managing multiple social media accounts, accessing services that require IP whitelisting, or performing web scraping activities where maintaining the same session across requests is necessary.
Key Benefits of Static Proxies
There are certain benefits of using static proxies.
- Consistency: The same IP address is maintained throughout your session, making them ideal for activities that require stable IPs.
- IP Whitelisting: They work well for services that rely on a trusted, unchanging IP.
- Bypassing Geo-restrictions: Static proxies can be used to access geo-blocked content without changing locations.
- Session Persistence: Great for tasks like managing multiple accounts or scraping data that needs consistent session information.
Common Use Cases for Static Proxies
Static proxies ensure a stable IP for tasks requiring long-term access.
- Social Media Management: Maintaining multiple accounts without triggering suspicious activity.
- Ad Verification: Ensuring that ads appear correctly in specific regions.
- Web Scraping: Gathering data while maintaining a consistent IP to avoid detection.
- SEO Monitoring: Tracking rankings from a consistent location without changing IP addresses.
- Accessing Geo-restricted Content: Bypassing regional restrictions without rotating IPs.
- Account Creation: Ensuring reliability when creating multiple accounts for services.
Example of Using Static Proxies with Python
Here’s how we can set up and use static proxies in Python:
import requests
url = 'https://ip.smartproxy.com/'
username = 'USERNAME'
password = 'PASSWORD'
proxy = f'http://{username}:{password}@gate.smartproxy.com:7000'
response = requests.get(url, proxies={'http': proxy, 'https': proxy})
print(response.text)
Here's how the script works:
- URL: The script queries https://ip.smartproxy.com/ to check the IP being used.
- Proxy Setup: The proxy credentials (username, password, server) are formatted into the proxy URL.
- HTTP Request: A GET request is made to the target URL via the Smartproxy proxy server, supporting both http and https protocols.
- Response: The resulting response, typically the IP address being used, is printed.
Error Codes
When using Smartproxy residential proxies, you may encounter errors that prevent your requests from being successfully processed. These errors can occur due to various reasons such as incorrect request formatting, authentication issues, or server overload.
Understanding the error codes and their corresponding solutions is crucial to troubleshooting and resolving these issues efficiently.
Below is a list of common error codes that you may encounter when using Smartproxy residential proxies, along with their descriptions and suggested solutions to help you resolve them quickly and get back to scraping the web with ease.
Error code | Description | Solution |
---|---|---|
400 - Bad Request | The proxy server cannot process the request due to a missing host or parsing error. | Check the request format and include the URL. Try again. |
401 - Unauthorized | Authentication failed on the target website. | Provide correct authentication information and try again. |
403 - Forbidden | The proxy server is blocked from accessing the target website. | Try a different proxy server location or port. |
404 - Not Found | The requested resource cannot be found. | Check the link for errors and try again. |
408 - Request Timeout | The server closed the connection due to a timeout. | Try sending the request again. If the issue persists, change the endpoint or session type. |
407 - Proxy Authentication Required | The request lacks proxy authentication information or has invalid credentials. | Include the proxy authentication header and ensure the correct username and password. |
500 - Internal Server Error | The proxy server encountered an internal error. | Retry the request at a later time. |
502 - Bad Gateway | The proxy server received an invalid response from the upstream server. | Retry the request. |
503 - Service Unavailable | The server is down or overloaded. | Retry the request and check if the targeted resource is under maintenance. |
504 - Gateway Timeout | The proxy server did not receive a response from the upstream server in time. | Retry the request. |
522 - CONNECT Timeout | The proxy connection timed out during the CONNECT phase. | Retry the request. |
525 - No Exit Found | The proxy was unable to find an exit node that satisfies the request. | Change the request filter parameters or try again at a later time. |
KYC (Know Your Customer) Verification
Smartproxy does not require KYC (Know Your Customer) verification during registration. However, this process becomes necessary if the system detects unusual behaviour from users.
For instance, targeting banks or government websites may trigger suspicious activity alerts. Additionally, if you register from countries where Smartproxy is unavailable, such as Russia, you will be asked to complete KYC.
The KYC data is used solely for authentication purposes. The steps are straightforward and quick, taking only a few minutes. You need to provide a selfie and a photo of your ID card.
Validation of the provided information takes a few minutes. If the information is accurate, you can start using the dashboard immediately. If the information is incorrect or the pictures are of poor quality, the validation process will be reviewed manually, which can take between 1 and 2 business days.
For companies and businesses, the KYC process requires additional information, such as the company's registration documents. If the KYC process takes longer than expected, contact support at compliance@smartproxy.com.
Implementing Smartproxy Residential Proxies in Web Scraping
Using Smartproxy for web scraping involves routing your web requests through their proxy network to avoid IP bans and geographic restrictions.
Here's a brief overview of how to use Smartproxy with various web scraping libraries in Python and Node.js.
Python Requests
import requests
proxies = {
'http': 'http://username:password@gate.smartproxy.com:10001',
'https': 'http://username:password@gate.smartproxy.com:10001'
}
url = 'http://httpbin.org/ip'
response = requests.get(url, proxies=proxies)
print(response.json())
This Python script uses the requests
library to send an HTTP request through a proxy server and print the IP address that the request appears to come from. Here’s a step-by-step explanation:
- The proxies dictionary contains the proxy configuration for both HTTP and HTTPS requests. The
username:password
part is for authentication with the proxy server, andgate.smartproxy.com:10001
is the proxy server's address and port. - The URL
http://httpbin.org/ip
is a test endpoint provided by httpbin.org that returns the IP address of the requester. - The
requests.get
method is used to send a GET request to the specified URL, using the proxy settings defined earlier. - The
response.json()
method parses the JSON response from the server, which includes the IP address seen by the server (the proxy's IP address). This IP address is then printed.
Python Selenium
SeleniumWire has always been a tried and true method for using authenticated proxies with Selenium. As you may or may not know, vanilla Selenium does not support authenticated proxies. Even worse, SeleniumWire has been deprecated! This being said, it is still technically possible to integrate Smartproxy Residential Proxies via SeleniumWire, but we highly advise against it.
When you decide to use SeleniumWire, you are vulnerable to the following risks:
-
Security: Browsers are updated with security patches regularly. Without these patches, your browser will have holes in the security that have been fixed in other browsers such as Chromedriver or Geckodriver.
-
Dependency Issues: SeleniumWire is no longer maintained. In time, it may not be able to keep up with its dependencies as they get updated. Broken dependencies can be a source of unending headache for anyone in software development.
-
Compatibility: As the web itself gets updated, SeleniumWire doesn't. Regular browsers are updated all the time. Since SeleniumWire no longer receives updates, you may experience broken functionality and unexpected behavior.
As time goes on, the probability of all these problems increases. If you understand the risks but still wish to use SeleniumWire, you can view a guide on that here.
Depending on your time of reading, the code example below may or may not work. As mentioned above, we strongly recommend against using SeleniumWire because of its deprecation, but if you decide to do so anyway here you go. We are not responsible for any damage that this may cause to your machine or your privacy.
from seleniumwire import webdriver
## Define Your Proxy Endpoints
proxy_options = {
"proxy": {
"http": "http://username:password@gate.smartproxy.com:10001",
"https": "http://username:password@gate.smartproxy.com:10001",
"no_proxy": "localhost:127.0.0.1"
}
}
## Set Up Selenium Chrome driver
driver = webdriver.Chrome(seleniumwire_options=proxy_options)
## Send Request Using Proxy
driver.get('https://httpbin.org/ip')
- We setup our url the same way we did with Python Requests:
http://username:password@gate.smartproxy.com:10001
. - We assign this url to both the
http
andhttps
protocols of our proxy settings. driver = webdriver.Chrome(seleniumwire_options=proxy_options)
tellswebdriver
to open Chrome with our customseleniumwire_options
.
Python Scrapy
import scrapy
class ExampleSpider(scrapy.Spider):
name = "example"
start_urls = ["https://example.com"]
def __init__(self):
self.proxy_url = "username:password@gate.smartproxy.com:10001"
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url, meta={"proxy": self.proxy_url})
def parse(self, response):
print(response.text)
Here's an explanation of the Scrapy script:
- First, we import the Scrapy library.
- Then, we define a Scrapy spider class named ExampleSpider:
name
specifies the name of the spider.start_urls
is a list of URLs where the spider will begin its crawling.
- Next, we initialize the spider and set up the proxy URL in the
__init__
method.self.proxy_url
stores the proxy server address and authentication details.
- After that, we override the
start_requests
method to send requests through the proxy. This method generates requests for each URL instart_urls
and attaches the proxy settings via themeta
parameter. - Finally, we define the
parse
method to handle the response.parse
processes the response received from the server and prints the HTML content of the page.
NodeJS Puppeteer
const puppeteer = require('puppeteer');
const HttpsProxyAgent = require('https-proxy-agent');
(async () => {
// Smartproxy credentials
const proxyUsername = 'your-smartproxy-username';
const proxyPassword = 'your-smartproxy-password';
const proxyServer = 'your-smartproxy-address'; // e.g., 'us.smartproxy.com:10000'
const proxyUrl = `http://${proxyUsername}:${proxyPassword}@${proxyServer}`;
const proxyAgent = new HttpsProxyAgent(proxyUrl);
const browser = await puppeteer.launch({
args: [
`--proxy-server=${proxyUrl}`
]
});
const page = await browser.newPage();
await page.goto('https://www.example.com');
// Take a screenshot for verification
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
Here's an explanation of the NodeJS Puppeteer script:
- First, we import the necessary libraries.
puppeteer
is used for controlling the Chrome browser.HttpsProxyAgent
helps to create an agent for HTTP/HTTPS proxy requests.
- Then, we define the proxy credentials and URL.
proxyUsername
andproxyPassword
are the authentication details for the proxy.proxyServer
is the address of the proxy server.proxyUrl
combines these details into a format suitable for use with the proxy agent.
- Next, we launch the Puppeteer browser with the proxy server configuration.
args
is used to pass the proxy server configuration to the browser at launch.
- Then, we create a new page, navigate to a URL, and take a screenshot.
newPage()
creates a new page within the browser.page.goto()
navigates to the specified URL.page.screenshot()
takes a screenshot of the page and saves it as example.png.
- Finally, we close the browser.
NodeJS Playwright
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch();
const context = await browser.newContext({
proxy: {
server: 'http://gate.smartproxy.com:10000',
username: 'username',
password: 'password'
}
});
const page = await context.newPage();
await page.goto('http://httpbin.org/ip');
const content = await page.content();
console.log(content);
await browser.close();
})();
Here's an explanation of the NodeJS Playwright script:
- First, we import the
chromium
object from the playwright library.chromium
is used to control the Chromium-based browser. - Then, we launch the browser and configure it to use a proxy server:
chromium.launch()
starts a new instance of the Chromium browser.browser.newContext()
creates a new browser context with proxy settings.server
specifies the proxy server address.username
andpassword
are used for proxy authentication.
- Next, we create a new page within the context, navigate to a URL, and retrieve the page content.
context.newPage()
creates a new page within the specified context.page.goto()
navigates to the URL 'http://httpbin.org/ip'.page.content()
retrieves the HTML content of the page.
- Finally, we print the page content and close the browser.
Case Study: Scrape Amazon.es Prices with SmartProxy
In this case study, we'll show why web scraping eCommerce sites is important for targeting specific geographic regions. For example, if you're in Portugal, you might see different products compared to someone in Germany.
This can be particularly important to get access to exclusive products which are not yet available in your region.
Therefore, let’s scrape Amazon using Portugal and Germany geo-locations and compare the results. To achieve this, we'll utilize Smartproxy's residential proxies, Playwright, and Beautiful Soup with Python.
First, you need to install Playwright and Beautiful Soup with the pip command:
pip install playwright beautifulsoup4
Then you need to install Playwright’s browsers using this command:
playwright install
We’re going to scrape the category Camera & Photo from the Amazon website.
In your Smartproxy dashboard, you need to choose the country you want to target.
Now let’s use the above credentials in our script, and use Playwright and Beautiful Soup to scrape this Amazon category.
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
with sync_playwright() as pw:
browser = pw.chromium.launch(
proxy={
'server': "pt.smartproxy.com:20001",
'username': 'spwtkblz1o',
'password': '<my_password>'
},
headless=False,
timeout=10000000
)
# creates a new browser page (tab) within the browser instance
page = browser.new_page()
# go to amazon
page.goto(
("https://www.amazon.com/s?i=specialty-"
"aps&bbn=16225009011&rh=n%3A%2116225009011%2Cn%3A502394"),
timeout=1000000000)
# get HTML content
html = page.inner_html('body')
# make parser object
soup = BeautifulSoup(html, 'html.parser')
# select cards
cards = soup.select(
".a-section.a-spacing-small.puis-"
"padding-left-small.puis-padding-right-small")
# get a list of products
products = [card.select_one('h2').text for card in cards]
prices = [card.select_one('span.a-price').text.split("$")[1]
for card in cards]
num_reviews = [card.select_one(
'span.a-size-base.s-underline-text').text.split(' ')[0]
for card in cards]
data_combined = zip(products, prices, num_reviews)
result = [
{
"product": product,
"price": price,
"n_reviews": n_review}
for product, price, n_review in data_combined]
print(result)
The output for Portugal is the following:
[{'product': 'Mafiti Kids Waterproof Digital Camera Underwater Camera for Snorkeling and Diving with 2.4" IPS Display USB Rechargeable Digital Camera for Videos and Photos with 32GB Card Blue ', 'price': '33.99', 'n_reviews': '15'}, {'product': 'Fujifilm Instax Mini 12 Instant Camera - Blossom Pink ', 'price': '78.50', 'n_reviews': '2,837'}, {'product': 'Fujifilm Instax Mini 11 Instant Camera Sky Blue + Custom Case Fuji Film Value Pack (50 Sheets) Flamingo Designer Photo Album for instax Photos ', 'price': '129.95', 'n_reviews': '1,364'}, {'product': 'Datacolor Spyder Print - Advanced Data Analysis and Calibration Tool for Optimal Print Results, Perfect for Photographers, Graphic Designers, and Printing Professionals ', 'price': '344.00', 'n_reviews': '33'}, {'product': 'Digital Cameras for Photography 4K ,Vlogging Camera 48MP Video Camera 16X Digital Zoom Mini Camera Super Wide Angle Point and Shoot Digital Cameras with 32GB SD Card and Bag ', 'price': '149.99', 'n_reviews': '14'}, {'product': 'Weeylite LED On Camera Video Light, 360° Full Color RGB LED Camera Light with App Control, Pocket Photo Light 2800-6800K Portable Panel Lights Photography Lighting for Photoshoot Zoom Lighting ', 'price': '25.99', 'n_reviews': '156'}, {'product': '2 Packs LED Photography Lighting Dimmable 3200K-5500K LED Video Light Photo Studio Light Kit with Tripod Stand Bag for Camera Video Studio YouTube Product Portrait Live Streaming ', 'price': '94.90', 'n_reviews': '1'}, {'product': '3 in 1 Monocular/Camera/Camcorder, Max 4000ft 104X Digital Telescope with 3.0" Screen/Hood/Battery/SD Card/Case, 4K Video /48MP Photo/Timelapse Recording Telescope Camera for Hunting ', 'price': '59.99', 'n_reviews': '4'}, {'product': 'Instant Print Camera for Kids,Christmas Birthday Gifts for Age 3-12 Girls Boys,1080P HD Digital Video Cameras for Toddler,Kids Portable Toy with 3 Rolls Photo Paper,32GB Card-Blue ', 'price': '38.89', 'n_reviews': '129'}, {'product': 'Panasonic LUMIX G100 4k Mirrorless Camera for Photo and Video, Built-in Microphone with Tracking, Micro Four Thirds Interchangeable Lens System, 12-32mm Lens, 5-Axis Hybrid I.S., DC-G100DKK (Black) ', 'price': '597.99', 'n_reviews': '233'}]
As for Germany you just need to change the country as follows:
browser = pw.chromium.launch(
proxy={
'server': "de.smartproxy.com:20001",
'username': 'spwtkblz1o',
'password': '<my_password>'
},
headless=False,
timeout=1000000000
)
The output is:
[{'product': '3 in 1 Monocular/Camera/Camcorder, Max 4000ft 104X Digital Telescope with 3.0" Screen/Hood/Battery/SD Card/Case, 4K Video /48MP Photo/Timelapse Recording Telescope Camera for Hunting ', 'price': '59.99', 'n_reviews': '4'}, {'product': 'Mafiti Kids Waterproof Digital Camera Underwater Camera for Snorkeling and Diving with 2.4" IPS Display USB Rechargeable Digital Camera for Videos and Photos with 32GB Card Blue ', 'price': '33.99', 'n_reviews': '15'}, {'product': 'Fujifilm Instax Mini 12 Instant Camera - Blossom Pink ', 'price': '78.50', 'n_reviews': '2,837'}, {'product': 'Datacolor Spyder Print - Advanced Data Analysis and Calibration Tool for Optimal Print Results, Perfect for Photographers, Graphic Designers, and Printing Professionals ', 'price': '344.00', 'n_reviews': '33'}, {'product': 'Instant Print Camera for Kids,Christmas Birthday Gifts for Age 3-12 Girls Boys,1080P HD Digital Video Cameras for Toddler,Kids Portable Toy with 3 Rolls Photo Paper,32GB Card-Blue ', 'price': '38.89', 'n_reviews': '129'}, {'product': 'Fujifilm Instax Mini 11 Instant Camera Sky Blue + Custom Case Fuji Film Value Pack (50 Sheets) Flamingo Designer Photo Album for instax Photos ', 'price': '129.95', 'n_reviews': '1,364'}, {'product': 'Digital Cameras for Photography 4K ,Vlogging Camera 48MP Video Camera 16X Digital Zoom Mini Camera Super Wide Angle Point and Shoot Digital Cameras with 32GB SD Card and Bag ', 'price': '149.99', 'n_reviews': '14'}, {'product': 'Weeylite LED On Camera Video Light, 360° Full Color RGB LED Camera Light with App Control, Pocket Photo Light 2800-6800K Portable Panel Lights Photography Lighting for Photoshoot Zoom Lighting ', 'price': '25.99', 'n_reviews': '156'}, {'product': 'Panasonic LUMIX G100 4k Mirrorless Camera for Photo and Video, Built-in Microphone with Tracking, Micro Four Thirds Interchangeable Lens System, 12-32mm Lens, 5-Axis Hybrid I.S., DC-G100DKK (Black) ', 'price': '597.99', 'n_reviews': '233'}, {'product': 'Platinum Passport Photo Printer System - Pre-Configured for U. S. Passports - includes Upgraded Camera and Photo Cutter ', 'price': '695.00', 'n_reviews': '39'}]
We can now compare the main differences:
-
The product 3 in 1 Monocular/Camera/Camcorder comes first for Germany. While the product Mafiti Kids Waterproof Digital Camera Underwater Camera is the first one to be shown in Portugal.
-
The product 2 Packs LED Photography Lighting Dimmable 3200K-5500K LED Video Light Photo Studio Light Kit with Tripod Stand Bag for Camera Video Studio YouTube Product Portrait Live Streaming only appears for Portugal in the first results.
-
The product Platinum Passport Photo Printer System - Pre-Configured for U. S. Passports - includes Upgraded Camera and Photo Cutter only appears for Germany in the first results.
Alternative: ScrapeOps Residential Proxy Aggregator
In case you want more versatility of proxy providers, without the need to spend time learning about their perks, prices and set-up, you can easily use ScrapeOps Residential Proxy Aggregator.
ScrapeOps Residential Proxy Aggregator provides access to the top 20 residential proxy providers through a single port, ensuring a high success rate for web scraping tasks. It automatically switches proxies to avoid blocks, optimizing performance and cost with flexible pricing plans.
- Access to the top 20 residential proxy providers, including Smartproxy, Bright Data, and Oxylabs.
- 98% success rate due to automatic proxy switching.
- Bypasses anti-bot measures and avoids blocks.
- Optimizes performance and cost by monitoring proxy performance and pricing.
- Flexible pricing plans starting at $15 per month, with up to $999 for higher usage.
- 500 MB of free bandwidth credits to start.
Let’s now see how to use it with the Python requests library:
import requests
api_key = 'YOUR_API_KEY'
target_url = 'http://httpbin.org/ip'
proxy_url = f'http://scrapeops:{api_key}@residential-proxy.scrapeops.io:8181'
proxies = {
'http': proxy_url,
'https': proxy_url,
}
response = requests.get(
url=target_url,
proxies=proxies,
timeout=120,
)
print('Body:', response.content)
Try it now for free and get 500 MB of bandwidth.
Ethical Considerations and Legal Guidelines
Smartproxy is a proud member of EWDCI (Ethical Web Data Collection Initiative), demonstrating its commitment to collecting residential proxies in an ethical and sustainable manner. The company prioritises user privacy, transparency, and fairness in all its operations.
To maintain the integrity of its residential IP addresses, Smartproxy partners with trusted providers who conduct rigorous verification processes to ensure that users voluntarily participate in their networks and fully comprehend the data collection involved. This collaboration guarantees that IP addresses originate from legitimate sources, and users are not misled or exploited.
Moreover, this approach empowers users to monetize their internet resources by sharing their WiFi connections in a peer-to-peer network. Users earn revenue based on the amount of data traffic they contribute to the network, measured in gigabytes (GBs). This model fosters a collaborative community and promotes a sense of mutual benefit.
In addition to its ethical data collection practices, Smartproxy prioritizes robust data protection measures to safeguard user information. These measures include:
- Advanced data encryption to prevent unauthorized access
- Secure storage solutions to protect user data
- Regular security audits to identify and address potential vulnerabilities
These measures ensure that only essential data required for Smartproxy's residential pool functionality is collected and stored, minimizing the risk of data exploitation.
Conclusion
In summary, Smartproxy stands out as a premier provider of residential proxies, offering a comprehensive platform that simplifies web scraping and data collection while maintaining high standards of anonymity and security.
Its diverse array of services, including rotating and static residential proxies, allows users to bypass geo-restrictions, protect their online identity, and gather data on a large scale without interruptions.
If you'd like to know more about any of the tools or frameworks used in this article, you can find their documentation here:
- Selenium Documentation
- Requests Documentation
- Scrapy Documentation
- NodeJS Playwright Documentation
- NodeJS Puppeteer Documentation
- ScrapeOps Residential Proxy Aggregator
More Web Scraping Guides
Now that you've gotten your feet wet with each of these tools, go build something!
If you're in the mood to binge read, check our extensive Python Web Scraping Playbook or take a look at these articles: