Smartproxy DataCenter Proxies Guide

Smartproxy Datacenter Proxies: Web Scraping Guide

Much like ScrapeOps, Smartproxy tries to be a one-stop shop for all things web scraping. Their offerings include Residential Proxies, Datacenter Proxies, Mobile Proxies, and Site Unblocker. All of these products are pretty good, but today's article is going to focus primarily on their Datacenter Proxies. In this article, we're going to test their Datacenter proxies. When we use datacenter proxies, we get speed, reliablitiy and stability. We'll go through the process from start to finish... from signing up to performing a small scrape. By the end of this guide, you'll be able to use Smartproxy's Datacenter Proxies effectively.

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.


TLDR: How to Integrate Smartproxy Datacenter Proxy?

To use datacenter proxies with Smartproxy, you need a username and password. Once you have these, you're all good to go. The code below just gives a basic proxy connection with everything you need to get started. If you'd like to learn the finer details about this product, feel free to read on. Otherwise, just create an account. Then you can copy/paste the code below into a Python file of your own. Remember to replace username and password with your actual username and password!
import requests
username = "your-username"password = "your-password"port = 10000proxy_url = f"http://{username}:{password}@dc.smartproxy.com:{port}"
proxies = {    "http": proxy_url,    "https": proxy_url}
result = requests.get("https://ip.smartproxy.com/json", proxies=proxies)print(result.text)
  • First, we setup our configuration variables: username, password, port.
  • Next, we create a proxy_url using the configuration from above.
  • We create a dict object to hold our http and https proxies. We then assign both of them to our proxy_url.
  • We finally print our result.text to the terminal, revealing the location information of our proxy.

Understanding Datacenter Proxies

Some websites tend to block datacenter proxies. When a request comes in from a datacenter IP, the traffic immediately sticks out in comparison to premium proxies because they show up as a non residential IP address. However, for the vast majority of websites, datacenter proxies are will get the job done without any issues. Premium proxies are made up of actual mobile and residential IP addresses. They tend to cost quite a bit more and they are often significantly slower than datacenter proxies. In the next few sections, we'll go over the main differences between residential/premium and datacenter proxies. by the time you're finished reading, you'll be able to make an informed decision based on your needs.

Datacenter vs Residential

When choosing between datacenter and residential proxies, understanding their strengths and weaknesses is crucial.

Datacenter

Pros
  • Price: No matter which provider you're choosing, datacenter proxies tend to be very affordable. When using residential proxies, it's not uncommon to pay up to $8/GB.
  • Speed: When you use a proxy like this, it gets hosted inside an actual datacenter. Datacenters usually use the best of the best internet connections and hardware. As long as you're not getting blocked, datacenter proxies offer unparalleled performance.
  • Availability: Datacenters are huge. Each machine in a datacenter typically has its own IP address. This gives us an enormous proxy pool to work with.
Cons
  • Blocking: As we mentioned earlier, some webites block all datacenter IPs by default. This can cause unending headache when scraping more difficult sites.
  • Less Geotargeting Support: Most datacenter proxies do give us the option to use geotargeting. However, you still show up with a datacenter IP no matter which country you choose. When we use geotargeting with datacenter proxies, we often don't get the pinpoint accuracy that we can achieve with residential proxies.
  • Less Anonymity: When you use a datacenter proxy, your location is always going to show inside a datacenter. This doesn't give you the ability to blend in with normal residential and mobile users.

Residential

Pros
  • Anonymity: When we use a residential proxy, we get an actual IP address assigned to an actual residential device. When dealing with more difficult sites, this is incredibly important.
  • Better Access: When using a residential IP address, it doesn't matter if your target site blocks datacenter IP addresses. Your traffic is coming from somebody's real device in a real home.
Cons
  • Price: When using a residential proxy service, it's not uncommon to pay between $5 and $8 per GB! Datacenter proxies, on the other hand, can be 10x cheaper!
  • Speed: As we mentioned earlier, datacenters tend to use the best of the best internet connections and hardware. When you use a residential proxy, you can often be stuck with a low grade internet connection on a low grade device. This setup will still get you the access you need, but at the cost of both price and performance.
Residential proxies are ideal for SERP results, ad verification, social media monitoring/scraping and much more.

Why Use Smartproxy Datacenter Proxies?

When using Smartproxy datacenter proxies, we get some really, really great upsides. To start, we can get 100 IP addresses with 50 GB of bandwidth for about $10. This comes out to a little over $0.50 per GB. When you compare this to the $5-$8 you would pay for a residential proxy, you are saving at least 90% on your cost. To get started, it only is going to cost us about $11 total! Along with the cost benefits, we get access to geotargeting and our choice of rotating and static sessions. To elaborate further, we get to choose our location. We also get to choice between automatically rotating proxies and static (sticky) sessions, which allow us to keep our browsing session in tact.
  • Smartproxy's Datacenter Proxies tend to be very cost effective.
  • We can use geotargeting to select which country we want to appear in.

Smartproxy Datacenter Proxy Pricing

Smartproxy allows us to purchase our proxies two different ways. We can choose to pay per IP address, or we can choose to purchase bandwidth directly and let Smartproxy manage our IP addresses 100%. If you decide to pay per IP, you will be given the option to choose a country for your IP addresses. The table below outlines their pricing plans for pay per IP. When you pay per IP, you also need to select a bandwidth limit. All of the prices below are for their lowest bandwidth tier (50GB). If you need to purchase more than 50GB, you can use their pricing tool here.
IP AddressesCostCost Per IP
100$10$0.10
200$19$0.095
500$45$0.09
1,000$85$0.085
2,000$160$0.08
4,000$300$0.075
If you just need bandwidth and you're looking to pay a bit more, you can choose one of their bandwidth based plans. These give you full access to all of their proxy pools, but they cost a bit more. Instead of a $10 barrier to entry, the lowest tier plan comes in at $30. However, you get unlimited IP addresses, and Smartproxy will manage these IPs for you.
PlanMonthly CostPrice Per GB
50GB$30$0.60
100GB$55$0.55
300GB$160$0.53
1,000GB$500$0.50
To compare Smartproxy to other providers, we built a tool that allows you to compare virtually every provider on the market, you can use it here. No matter what your needs are, our comparison tool can help you find what you're looking for. We're not always the best price, and when we're not, we tell you.

Setting Up Smartproxy Datacenter Proxies

In this section, we're going to go through their signup process from start to finish. You'll be able to understand (in detail), what you need to do in order to create an account, purchase a plan, and get setup with your first request. Follow the steps below to get up and running in minutes. First, you need to create an account. Smartproxy gives you the option to create an account using your email, or your Google account. Both ways work just fine, however Google tends to be more convenient. Smartproxy Homepage Now, we need to set our plan up. There are many things we'll need to choose, but we're going to start by deciding whether we want to pay per IP address, or to pay per GB. Since this is just a demonstration, we'll elect to pay per IP. As we mentioned ealier, it has the lowest barrier to entry. Smartproxy Pricing Pay per IP Next, we need to decide on our datacap (bandwidth). Once again, since this is only a demonstration, we're going to go with 50GB. This 50GB is far more than I'll probably ever use. Smartproxy Pricing IPs One you've chosen your pricing and bandwidth model, you'll need to go through and decide which countries you want. In the image below, we select 50 IP addresses from the UK, and 50 from the US. You can choose any combination you'd like, but that's what we're going with for our demo. Smartproxy Location Setup Finally you need to checkout and actually purchase your plan. There are several options for payment including PayPal, Credit Card, and Google Pay. You will be charged sales tax based on your locality. For instance, with a 6% sales tax, our final price in the image below is $10.60. Smartproxy Checkout Now that we've purchased our plan, we need to finish setting it up. You should see a button on your screen titled, Begin Proxy Setup. Click it. Smartproxy Purchase Once the dashboard loads, you'll see your list of proxies. Smartproxy Proxy Lit If you choose to scroll down a bit further, you'll see their request builder tool. This tool is incredibly handy when you're getting used to working with their proxy connections. Smartproxy Build Request

Authentication

When we make our requests through Smartproxy, we get the option to make authenticated requests with our username and password, or to whitelist IP addresses. Throughout this guide, we make use of the username and password. If this doesn't work for you, you always have the option to whitelist your IP address. This will allow you to make requests directly to through the proxy without a username and password. To add your whitelist your IP, you can simply click the authentication tab on your dashboard. Smartproxy IP Whitelist If you decide to authenticate via username and password, your requests will look something like this. Let's take a deeper look at our basic request from our TLDR section. We construct our URL with our username, password and port. Our final proxy_url looks like this:
http://{username}:{password}@dc.smartproxy.com:{port}
import requests
username = "your-username"password = "your-password"port = 10000proxy_url = f"http://{username}:{password}@dc.smartproxy.com:{port}"
proxies = {    "http": proxy_url,    "https": proxy_url}
result = requests.get("https://ip.smartproxy.com/json", proxies=proxies)print(result.text)
Key takeaways from the authentication process here:
  • We save our configuration variables: username, password, port. This way, they're easy to tweak.
  • All of our HTTP requests get routed through a custom url: http://{username}:{password}@dc.smartproxy.com:{port}.
  • We use a dict to hold both our http and https proxies. We set each one to our proxy_url.
  • Finally, we query Smartproxy's API to get our location information.

Basic Request Using Smartproxy Datacenter Proxies

If you've been following along, you've already seen how to make a basic request through using one of Smartproxy's Datacenter proxies. However, if you skipped over those sections and you just want to know how to do it, here you go! The code example below gives you everything you need to get started using Smartproxy's Datacenter proxy service.
import requests
username = "your-username"password = "your-password"port = 10000proxy_url = f"http://{username}:{password}@dc.smartproxy.com:{port}"
proxies = {    "http": proxy_url,    "https": proxy_url}
result = requests.get("https://ip.smartproxy.com/json", proxies=proxies)print(result.text)
When we make requests through Smartproxy, they're authenticated using our username and password. It's also a good idea to save your port as a variable. When we make our requests through port 10000, Smartproxy will automatically select a proxy for us. If you decide to use another port number (>10000), it will select a specific proxy from your list.

Country Geotargeting

Geotargeting is pretty intuitive when using Smartproxy. We simply append the country flag to our username. For example, if you want a US based proxy, your URL would look like this:
http://user-{username}-country-{country}:{password}@dc.smartproxy.com:{port}
In the code below, we make a request using a US based proxy. You can view their full geotargeting documentation here.
import requests
username = "your-username"password = "your-username"country = "us"port = 10000

proxy_url = f"http://user-{username}-country-{country}:{password}@dc.smartproxy.com:{port}"
proxies = {    "http": proxy_url,    "https": proxy_url}
result = requests.get("https://ip.smartproxy.com/json", proxies=proxies)print(result.text)
There are just a couple subtle differences between this code and our basic request.
  • Inside of our code, we add the country flag followed by the country of our choosing.
  • We also add the user flag to our url. This allows their service to differentiate between our country and our username.
  • Our full url now looks like this: http://user-{username}-country-{country}:{password}@dc.smartproxy.com:{port}.
We ran this code and received the following output.
{    "browser": {        "name": "",        "version": ""    },    "platform": {        "os": "undefined undefined"    },    "engine": {},    "isp": {        "isp": "M247 Europe",        "asn": 9009,        "domain": "",        "organization": "M247 Europe"    },    "city": {        "name": "",        "code": "",        "state": "",        "time_zone": "America/Chicago",        "zip_code": "",        "latitude": 37.751,        "longitude": -97.822    },    "proxy": {        "ip": "198.56.0.20"    },    "country": {        "code": "US",        "name": "United States",        "continent": "North America"    }}
There are a few of fields you should pay attention to in the JSON above: isp, city, and country. Each of these fields holds some valuable information about our proxy connection.
  • isp: This holds data about our internet service provider. As odd as it sounds, our service provider is in Europe.
  • country: Our country shows up in the United States of America with a code of US. As odd as it might seem, even though our ISP is located in Europe, our actual location is within the United States.
  • city: There isn't much information here other than our latitude, longitude and timezone. Our timezone shows up in Chicago. If you lookup these coordinates, it actually places us inside the Cheney Resevoir in Kansas... definitely within the US. You can verify that here with Google Maps.
NOTE Oddly enough, Smartproxy doesn't list their country codes, but sites that don't list them typically follow the ISO 3166 standard. If we follow this standard, their list of country codes should look like you see in the table below.
CountryCountry Code
Australiaau
Canadaca
Francefr
Germanyde
Italyit
Israelil
Netherlandsnl
United Kingdomgb
United Statesus

City Geotargeting

Like most other Datacenter proxies, Smartproxy does not give us access to city level geotargeting. If you need to control things at a fine, granular level like this, your best choice is going to be a residential proxy. Smartproxy also offers a residential service with city level geotargeting. While their list of cities is small, it is quite stable and consistent. You can view their docs for that here. Their residential plan is available for purchase here. Smartproxy Residential Proxies City level geotargeting gives us access to hyper localized content. When you're dealing with local content, you can extract the following types of data at a local level. This allows you to collect and manage your precious data at a much more granular level.
  • Local Ads
  • Local Businesses
  • Local Social Media
  • Local Events
If you need city level geotargeting from Smartproxy, you're going to need to use their residential services intead.

Error Codes

Error codes are irreplaceable when it comes to debugging anything in web development. Most of us already know that a status code of 200 indicates a successful request. However, when it comes to other codes (error codes in particular), they can be a bit more tricky. In the table below, we outline those status codes and how to handle them. If you need to view their full documentation on status codes, it's available here.
Status CodeTypeDescription
200SuccessEveything worked as expected!
400Bad RequestThe server couldn't read your request, double check it.
401UnauthorizedDouble check your username/password.
403ForbiddenThe domain you requested is forbidden.
404Not FoundThe requested domain could not be found.
407Authentication RquiredYour request was missing a username/password.
408Request TimeoutThe server took to long to respond.
500Internal Server ErrorThe server encountered an error, please try again later.
502Bad GatewayThe server received an invalid response from your domain.
503Service UnavailableThe server is down or overloaded with requests.
504Gateway TimeoutThe server didn't receive a response from your domain.
522Connect TimeoutThe proxy connection timed out.
525No Exit FoundThe proxy service was unable to find an exit node.
Status codes are imperative. When you encounter an error, you need to look up the status code and troubleshoot accordingly.

KYC Verification

Smartproxy does not require you to undergo a KYC check to use their datacenter proxies. When we sign up to use a datacenter proxy, the process is a bit different. When you do something nefarious on a datacenter proxy, it gets traced to the datacenter and then immediately to you through your account. KYC is far more common for residential services. Most proxy providers implement these policies in order to protect their residential providers (real people sharing their actual bandwidth). It's not fair for these residents to deal with your legal problems. When using Datacenter Proxies, Smartproxy does not require users to undergo the KYC process. KYC procedures are far more common when using residential proxies.

Implementing Smartproxy Datacenter Proxies in Web Scraping

Now that we know how to use Smartproxy's datacenter proxies, we're going to look at implementing them with different frameworks. Pick your poison. After this section, you'll be able to handle proxy implementation pretty much anywhere, regardless of framework. We'll go through a few popular Python frameworks and a couple of popular JavaScript ones as well.

Python Requests

We've been using Python Requests throughout this article. Since you're already familiar with it, this is where we'll start.
import requests
username = "your-username"password = "your-password"country = "us"port = 10000

proxy_url = f"http://user-{username}-country-{country}:{password}@dc.smartproxy.com:{port}"
proxies = {    "http": proxy_url,    "https": proxy_url}
result = requests.get("https://ip.smartproxy.com/json", proxies=proxies)print(result.text)
  • Once we've got our credentials, we use them to build our proxy_url: http://user-{username}-country-{country}:{password}@dc.smartproxy.com:{port}.
  • We then create a dict object that holds both our http and https proxies.
  • When making our requests, we make sure to pass proxies=proxies. This tells Python Requests to use the dict object we created for our proxy settings.

Python Selenium

SeleniumWire has been a staple with many Selenium users for years. You might already know this: vanilla Selenium does not support authenticated proxies. Sadly, SeleniumWire has been deprecated! However, it is still technically possible to integrate Smartproxy Datacenter Proxies via SeleniumWire, but we strongly advise against it. When you decide to use SeleniumWire, you are vulnerable to the following risks:
  • Security: Browsers are updated with security patches regularly. Without these patches, your browser will have holes in the security that have been fixed in other browsers such as Chromedriver or Geckodriver.
  • Dependency Issues: SeleniumWire is no longer maintained. In time, it may not be able to keep up with its dependencies as they get updated. Broken dependencies can be a source of unending headache for anyone in software development.
  • Compatibility: As the web itself gets updated, SeleniumWire doesn't. Regular browsers are updated all the time. Since SeleniumWire no longer receives updates, you may experience broken functionality and unexpected behavior.
As time goes on, the probability of all these problems increases. If you understand the risks but still wish to use SeleniumWire, you can view a guide on that here. Depending on your time of reading, the code example below may or may not work. As mentioned above, we strongly recommend against using SeleniumWire because of its deprecation, but if you decide to do so anyway, here you go. We are not responsible for any damage that this may cause to your machine or your privacy.
from seleniumwire import webdriver
username = "your-username"password = "your-password"country = "us"port = 10000

proxy_url = f"http://user-{username}-country-{country}:{password}@dc.smartproxy.com:{port}"

## Define Your Proxy Endpointsproxy_options = {    "proxy": {        "http": proxy_url,        "https": proxy_url,        "no_proxy": "localhost:127.0.0.1"    }}
## Set Up Selenium Chrome driverdriver = webdriver.Chrome(seleniumwire_options=proxy_options)
## Send Request Using Proxydriver.get('https://httpbin.org/ip')
  • We build our url exactly how we did above with Python Requests: http://user-{username}-country-{country}:{password}@dc.smartproxy.com:{port}.
  • We assign this url to both the http and https protocols of our proxy settings.
  • driver = webdriver.Chrome(seleniumwire_options=proxy_options) tells webdriver to open Chrome with our custom seleniumwire_options.

Python Scrapy

There are several different ways to integrate your new proxy connection with Scrapy. In this example, we're going to integrate our proxy directly into our spider. To start, we need to make a new Scrapy project.
scrapy startproject datacenter
Then, from within your new Scrapy project, create a new Python file inside the spiders folder with the following code.
import scrapy
username = "your-username"password = "your-password"country = "us"port = 10000

proxy_url = f"http://user-{username}-country-{country}:{password}@dc.smartproxy.com:{port}"

class ExampleSpider(scrapy.Spider):    name = "datacenter_proxy"
    def start_requests(self):        request = scrapy.Request(url="https://httpbin.org/ip", callback=self.parse)        request.meta['proxy'] = proxy_url        yield request
    def parse(self, response):        print(response.body)
You can run this spider with the following command.
scrapy crawl datacenter_proxy
  • Once again, we create the same proxy_url: http://user-{username}-country-{country}:{password}@dc.smartproxy.com:{port}.
  • From inside start_requests, we assign our proxy_url to request.meta['proxy']. This tells Scrapy that all of this spider's requests are to be made through the proxy_url we created earlier.

NodeJS Puppeteer

Getting started with Puppeteer is just as easy. Like Scrapy, we need to create a new project first. Follow the steps below to get up and running in minutes. Create a new folder.
mkdir puppeteer-datacenter
cd into the new folder and create a new JavaScript project.
cd puppeteer-datacenternpm init --y
Next, we need to install Puppeteer.
npm install puppeteer
Next, from within your new JavaScript project, copy/paste the code below into a new .js file.
const puppeteer = require('puppeteer');
const username = "your-username";const password = "your-password";const country = "us";const port = 10000;const hostname = "dc.smartproxy.com";
(async () => {  const browser = await puppeteer.launch({    args: [`--proxy-server=${hostname}:${port}`]  });
  const page = await browser.newPage();
  await page.authenticate({    username: `user-${username}-country-${country}`,    password: password  });
  await page.goto('http://lumtest.com/myip.json');  await page.screenshot({path: 'puppeteer.png'});
  await browser.close();})();
  • First, we declare all of our configuration variables as constants: username, password, country, and hostname.
  • We set our url when we launch our browser with this arg added --proxy-server=${hostname}:${port}.
  • We add our username and country to the username field: user-${username}-country-${country}.
  • We also add our password to the password field for authentication: password: password.
With Puppeteer, we get first class proxy support right out of the box. Puppeteer's builtin authenticate() method gives us a special place to put both our username and password. Make sure to add your country to the username field. The screenshot from this code is available for you to view below. Smartproxy Puppeteer Integration

NodeJS Playwright

If you paid attention during the Puppeteer integration above, Playwright is going to seem very similar. Puppeteer and Playwright both actually share a common origin in Chrome's DevTools. The steps below should look at least somewhat familiar, however it does get slightly different near the end. Create a new project folder.
mkdir playwright-datacenter
cd into the new folder and initialize a JavaScript project.
cd playwright-datacenternpm init --y
Install Playwright.
npm install playwrightnpx playwright install
Next, you can copy/paste the code below into a JavaScript file.
const playwright = require('playwright');
const username = "your-username";const password = "your-password";const country = "us";const port = 10000;const hostname = "dc.smartproxy.com";
const options = {    proxy: {        server: `http://${hostname}:${port}`,        username: `user-${username}-country-${country}`,        password: password    }};
(async () => {    const browser = await playwright.chromium.launch(options);    const page = await browser.newPage();
    await page.goto('http://lumtest.com/myip.json');
    await page.screenshot({ path: "playwright.png" })
    await browser.close();})();
  • Like our Puppeteer example, we first setup our configuration variables: port, username, password, hostname.
  • We create a proxy object with the following fields:
  • server: `http://${hostname}:${port}
  • username: user-${username}-country-${country}
  • password: password
Just like Puppeteer, Playwright gives us first class support for authenticated proxies. Y. ou can view the screenshot from this code below. Smartproxy Playwright Integration

Case Study: Scrape The Guardian

Most strict anti-bots block datacenter IP addresses. When scraping, we tend to use datacenter proxies for more general sites. Datacenter proxies are just so much cheaper and more efficient. Residential proxies tend to work better as a fallback for cases when datacenter proxies just aren't getting the job done. In this next section, we're going to scrape The Guardian. This is more about concepts than data harvesting. In the code below:
  1. We first setup a proxy based in the US. We make a GET to verify our location information and then we make our GET request to The Guardian.
  2. Next, we print our location info and the navbar from The Guardian.
  3. After the first run, we reset our proxy connection to use a proxy form the UK.
  4. If our proxies are working, we'll receive different output from each proxy.
Take a look at the code below.
import requestsfrom bs4 import BeautifulSoup
username = "your-username"password = "your-password"country = "us"hostname = "dc.smartproxy.com"port = 10000

proxy_url = f"http://user-{username}-country-{country}:{password}@{hostname}:{port}"
proxies = {    "http": proxy_url,    "https": proxy_url}


print("----------------------US---------------------")

location_info = requests.get("http://lumtest.com/myip.json", proxies=proxies)print(location_info.text)
response = requests.get('https://www.theguardian.com/', proxies=proxies)
soup = BeautifulSoup(response.text, "html.parser")
subnav = soup.select_one("div[data-testid='sub-nav']")print(subnav.text)

print("----------------------UK---------------------")
country = "gb"
proxy_url = f"http://user-{username}-country-{country}:{password}@{hostname}:{port}"
proxies = {    "http": proxy_url,    "https": proxy_url}
location_info = requests.get("http://lumtest.com/myip.json", proxies=proxies)print(location_info.text)
response = requests.get("https://www.theguardian.com/", proxies=proxies)
soup = BeautifulSoup(response.text, "html.parser")
subnav = soup.select_one("div[data-testid='sub-nav']")
print(subnav.text)
There are some subtle differences you should notice above from our earlier geotargeting example:
  • location_info = requests.get("http://lumtest.com/myip.json", proxies=proxies) is used after setting up each proxy connection. When we make this request, we do it simply so we can verify our location.
  • We're using two different locations. First, we set our country to "us", and later on, we set it to "gb".
  • Once we've reset our country, we need to reset our proxy_url.
  • proxies: After resetting our country, we need to reset our proxies.
If we run our code, we get output similar to what you see below.
---------------------US--------------------- {"country":"US","asn":{"asnum":9009,"org_name":"M247 Europe SRL"},"geo":{"city":"","region":"","region_name":"","postal_code":"","latitude":37.751,"longitude":-97.822,"tz":"America/Chicago"}} USUS elections 2024WorldEnvironmentUkraineSoccerBusinessTechScienceNewslettersWellness ----------------------UK--------------------- {"country":"GB","asn":{"asnum":203346,"org_name":"Proper Support LLP"},"geo":{"city":"","region":"","region_name":"","postal_code":"","latitude":51.4964,"longitude":-0.1224,"tz":"Europe/London"}} UKWorldClimate crisisUkraineFootballNewslettersBusinessEnvironmentUK politicsEducationSocietyScienceTechGlobal developmentObituaries
First, we'll look at our locations here. We cleaned up the important information from the JSON and made it a little easier to read. Our US proxy is located in the US and our UK proxy is located in the UK.
Proxy CountryResulting Country
usUS
gbGB
Now let's take a closer look at our navbar text from each run.
  • us: USUS elections 2024WorldEnvironmentUkraineSoccerBusinessTechScienceNewslettersWellness
  • gb: UKWorldClimate crisisUkraineFootballNewslettersBusinessEnvironmentUK politicsEducationSocietyScienceTechGlobal developmentObituaries
Let's make these a little easier to read.
  • us: US | US elections 2024 | World | Environment | Ukraine | Soccer | Business | Tech | Science | Newsletters | Wellness
  • gb: UK | World | Climate crisis | Ukraine | Football | Newsletters | Business | Environment | UK politics | Education | Society | Science | Tech | Global development | Obituaries
As you can see, there are some differences in our navbar layout.
  • On the US version of the site, the top left hand corner of the navbar holds US followed by US Elections.
  • In the UK, the viewer's attention is prioritized a bit differently, it holds UK followed by World.
The Guardian knows that the average UK user is probably not as concerned about the US and US elections, so they instead see UK followed by World. Many websites will prioritize your attention differently based on your location.

Alternative: ScrapeOps Proxy Aggregator

Datacenter proxies through Smartproxy are a pretty good deal. We've got some great deals too. We offer a different product with many more features for a really competitive price! Take a look at our ScrapeOps Proxy Aggregator! When you use our Proxy Aggregator, you don't need to pay for bandwidth, instead, you pay per request. Even better, you only pay for successful requests! Proxy Aggregator is a managed proxy. It always goes through and selects the best available proxy from our providers. We source these pools from tons of different providers, including Smartproxy. Unless you tell it to otherwise, Proxy Aggregator will first try your request with a datacenter proxy. If your request fails, we then retry it using a premium (residential or mobile) proxy for you with no additional charge! When you use our Proxy Aggregator, you get the stability and reliability you can count on. The table below outlines our pricing.
Monthly PriceAPI CreditsBasic Request Cost
$99,000$0.00036
$1550,000$0.0003
$19100,000$0.00019
$29250,000$0.000116
$54500,000$0.000108
$991,000,000$0.000099
$1992,000,000$0.0000995
$2543,000,000$0.000084667
All of these plans offer the following awesome features:
  • JavaScript Rendering
  • Screenshot Capability
  • Country Geotargeting
  • Residential and Mobile Proxies
  • Anti-bot Bypass
  • Custom Headers
  • Sticky Sessions
As we mentioned earlier, Smartproxy is one of our providers! When you sign up for ScrapeOps, you get access to proxies from Smartproxy and a ton of other providers! Go a head and start your free trial here. Once you've got your free trial, you can copy and paste the code below to check your proxy connection.
import requestsfrom urllib.parse import urlencode
API_KEY = "your-super-secret-api-key"LOCATION = "us"
def get_scrapeops_url(url, location=LOCATION):    payload = {        "api_key": API_KEY,        "url": url,        "country": location    }    proxy_url = "https://proxy.scrapeops.io/v1/?" + urlencode(payload)    return proxy_url
response = requests.get(get_scrapeops_url("http://lumtest.com/myip.json"))print(response.text)
In the code above, we do the following.
  • Create our configuration variables: API_KEY and LOCATION.
  • Write a get_scrapeops_url() function. This function takes all of our parameters along with a target url and wraps it into a ScrapeOps Proxied url. This is an incredibly easy way to scrape and it makes our proxy code much more modular.
  • Check our IP info with response = requests.get(get_scrapeops_url("http://lumtest.com/myip.json")).
  • Finally, we print it to the terminal. You should get an output similar to this.
{"country":"US","asn":{"asnum":16509,"org_name":"AMAZON-02"},"geo":{"city":"Columbus","region":"OH","region_name":"Ohio","postal_code":"43215","latitude":39.9625,"longitude":-83.0061,"tz":"America/New_York","lum_city":"columbus","lum_region":"oh"}}
Take a look at the org_name, AMAZON-02. This is an Amazon datacenter. Like we said earlier, our Proxy Aggregator gives us access to datacenter proxies by default.
Smartproxy is a big proponent of ethical proxies. They pride themselves in using proxies only ethical sources. The screenshot below comes straight from their residential proxies page. It reads "Ethical Residential Proxy Sourcing and Usage". As you can see, they take ethical sourcing very seriously. Smartproxy Ethics When residential proxies are sourced, they come from real people using real devices on their real internet connection. Ethical sourcing of residential proxies means that everyone providing bandwidth knows they're providing bandwidth. When we use datacenter proxies, they come from a datacenter, there is no way that our proxy could come from a user unknowingly running software on their smartphone. Don't use your proxy provider to break laws. Obviously, it's illegal and something you might not have considered: it harms everyone involved. It harms the proxy provider. It eventually harms you too. If you do something illegal when using a proxy, first, your action will be traced to the proxy provider. Then, the action will be traced to your account through either your API key, or your username and password. This creates problems for both you and your proxy service.
  • Don't use residential proxies to access illegal content: These actions can come with intense legal penalties that can even prison or jail time depending on severity.
  • Don't scrape and disseminate other people's private data: Depending on what jurisdiction you're dealing with, this is also a highly illegal and dangerous practice. Doxxing private data can also lead to heavy fines and possibly jail/prison time.

Ethical

When we scrape, we don't only need to consider legality, we need to make some ethical considerations too. Just because something is legal doesn't mean that it's morally right or acceptable. No one wants to be the next headline concerning unethical practices.
  • Social Media Monitoring: Social media stalking can be a very destructive and disrespectful behavior. How would you feel if someone used data collection methods on your account?
  • Respect Site Policies: Failure to respect a site's policies can get your account suspended/banned. It can even lead to legal troubles for those of you who sign and violate a terms of service agreement.

Conclusion

Datacenter proxies are a great tool for general web scraping. By this point, you should understand that residential proxies are only needed in certain cases. You should also have a decent understanding of how to implement datacenter proxies using Python Requests, Scrapy, NodeJS Puppeteer and NodeJS Playwright. You can view the full documentation for Smartproxy's Datacenter proxies here. You should also know how to use a basic proxy connection with our very own Proxy Aggregator. We have numerous features and some very affordable plans. Now, take your new skills and go build something with Smartproxy's Datacenter Proxies or the ScrapeOps Proxy Aggregator.

More Cool Articles

Are you in the mood binge-read? We've got a ton of content that can satisfy that. Whether you're a seasoned dev or your brand new to web scraping, we've got something useful for you. We love scraping so much that we wrote the Python Web Scraping Playbook. If you want to learn more, take a look at the guides below.