Bright Data Datacenter Proxies: Web Scraping Guide
Bright Data provides the world with all sorts of proxy products. They boast access to a Scraping Browser, Residential Proxies, ISP Proxies, Datacenter Proxies, Mobile Proxies, Web Unlocker, and an SERP API.
Today, we're going to test out their Datacenter proxies. Datacenter proxies offer speed, reliablitiy and stability. We'll go through the process of signing up, all the way to using the proxy in production.
- TLDR: How To Integrate Bright Data Datacenter Proxy
- Understanding Datacenter Proxies
- Why Use Datacenter Proxies?
- Datacenter Proxy Pricing
- Setting Up Bright Data Datacenter Proxies
- Authentication
- Basic Request Using Bright Data Datacenter Proxies
- Country Geotargeting
- City Geotargeting
- Error Codes
- KYC Verification
- Implementing Bright Data Datacenter Proxies
- Case Study: Scraping The Guardian
- Alternative: ScrapeOps Proxy Aggregator
- Ethical Considerations and Legal Guidelines
- Conclusion
- More Web Scraping Guides
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
TLDR: How to Integrate Bright Data Datacenter Proxy?
Getting setup with Datacenter Proxies from Bright Data is really easy. Once you've got your USERNAME
, ZONE
and PASSWORD
, simply put them into the file below. This code sets up a proxy. Then, it checks your actual IP address and your proxied IP so you can compare them and ensure that everything's working.
import requests
USERNAME = "your-username"
ZONE = "your-zone-name"
PASSWORD = "your-password"
HOSTNAME = "brd.superproxy.io"
PORT = 22225
proxies = {
"http": f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}",
"https": f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}"
}
url = "https://httpbin.org/ip"
actual_ip = requests.get(url)
proxied_ip = requests.get(url, proxies=proxies)
print(actual_ip.text)
print(proxied_ip.text)
- First, we setup our configuration variables:
USERNAME
,ZONE
,PASSWORD
,HOSTNAME
,PORT
. - We create a
dict
object holding ourhttp
andhttps
proxies. We set them both tof"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}"
. - Next, we get our
actual_ip
and ourproxied_ip
from httpbin. - Once we've retrieved both IP addresses, we print them to the console to compare.
Understanding Datacenter Proxies
What Are Datacenter Proxies?
Some sites block datacenter IP addresses. However, not many do this. For the vast majority of the web, datacenter proxies are more than sufficient.
There are two main types of proxies in the industry: Datacenter and Premium. Premium proxies are made up of actual mobile and residential IP addresses. They tend to cost quite a bit more and they are significantly slower than datacenter proxies.
In the next few sections, we're going to go over the differences between residential and datacenter proxies and the reasons to choose each type.
Why Are Mobile Proxies Important?
Datacenter vs Residential
Datacenter
Pros
-
Price: Datacenter proxies are cheap... really cheap. When using residential proxies, it's not uncommon to pay up to $8/GB.
-
Speed: These proxies are hosted inside actual datacenters. Datacenters usually use top of the line internet connections and hardware. This can really increase the speed of our scrape.
-
Availability: Datacenter proxies usually operate with a much larger pool of IP addresses.
Cons
-
Blocking: Some sites block datacenter IPs by default. This makes some sites more difficult to scrape when using a datacenter proxy.
-
Less Geotargeting Support: While we often do get the option to choose our location with datacenter proxies, they still don't appear quite as normal as a residential proxy. We choose our location, but it still shows up as a datacenter.
-
Less Anonymity: Since you're not traced to an individual residential IP address, your proxy location can be traced easily to a datacenter. This doesn't reveal your identity, but it does reveal that the request is not coming from a standard residential location. Your request isn't tied to some random household, it's tied to a company.
Residential
Pros
-
Anonymity: Residential proxies do offer a higher degree of anonymity. Since you're getting an IP address tied to an actual house, your traffic blends in much more.
-
Better Access: There are quite a few sites that block datacenter IP addresses. If you're trying to access a site that blocks them, you need to use a residential IP address.
Cons
-
Price: Residential Proxies are far more expensive than their Datacenter counterparts. Bright Data charges $8.40/GB on their Pay As You Go plan!
-
Speed: Residential proxies can are often slower than their datacenter counterparts. You're not always tied to a state of the art machine with the best connection. You are tied to a real residential device using a residential internet connection.
Residential proxies are ideal for SERP results, ad verification, social media monitoring/scraping and much more.
Why Use Bright Data Datacenter Proxies?
When we decide to use Datacenter Proxies from Bright Data, we get some pretty decent perks. For starters, their datacenter proxies are dirt cheap. You can get started for $0.60/GB. That's less than 1/10 the price of the residential proxies. On top of that, we get access to free geotargeting!
-
Datacenter Proxies with Bright Data are very affordable.
-
We can use geotargeting to select which country we want to appear in.
Bright Data Datacenter Proxy Pricing
The pricing plans for Bright Data Datacenter Proxies are pretty straightforward. For people just looking to test it out, they offer a Pay As You Go Plan. If you're looking to use these proxies at more of an industrial level, they offer monthly plans as well.
The table below outlines their pricing plans.
Plan | Monthly Cost | Cost Per GB |
---|---|---|
Pay As You Go | N/A | $0.60 |
1 TB | $499+tax | $0.51 |
2 TB | $999+tax | $0.45 |
5 TB | $1999+tax | $0.42 |
While the higher tier plans definitely require a pretty large commitment, these prices are pretty good. If you want to compare them to other providers, take a look here. We actually built a tool to help you shop for the best proxy provider to meet your needs.
Setting Up Bright Data Datacenter Proxies
Signing up for Bright Data is pretty simple. They give you the option to simply create an account, or to also sign up using Google, GitHub, or your email address.
Once you've signed up, you'll need to click on My Zones. If you're new to Bright Data, your dashboard will look similar to the one below, but you won't have any zones yet.
Click on Add and then select Datacenter Proxies.
You'll be given the option to customize your proxies if you want. Here, I decided to use just standard Datacenter Proxies and pay per bandwidth. As you can see in the screenshot, Datacenter proxies are dirt cheap. Our cost is only $0.60/GB!
Authentication
Authentication is relatively simple. We can authenticate our requests using either a username and password or we can also whitelist an IP address. To whitelist an IP address, first click on your new datacenter_proxy zone and click Whitelisted IPs.
When you whitelist an IP address, it no longer requires authentication.
In our standard requests, we authenticate with a username
and a password
. Take a look at the request below.
import requests
USERNAME = "your-username"
ZONE = "your-zone-name"
PASSWORD = "your-password"
HOSTNAME = "brd.superproxy.io"
PORT = 22225
proxies = {
"http": f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}",
"https": f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}"
}
url = "https://httpbin.org/ip"
actual_ip = requests.get(url)
proxied_ip = requests.get(url, proxies=proxies)
print(actual_ip.text)
print(proxied_ip.text)
In the example above, we go through and check our actual IP address against our proxied one. We use httpbin's /ip
endpoint to do this.
- First, we setup all the basic pieces of our url:
USERNAME
,ZONE
,PASSWORD
,HOSTNAME
,PORT
- Next, we create a
proxies
dictionary and set both ourhttp
andhttps
proxies tof"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}"
. - We get our actual IP address with
requests.get(url)
- We then get our proxied IP with
requests.get(url, proxies=proxies)
. - Once we've finished retrieving our information, we print it to the termimal so we can compare.
In the screenshot below, you can verify that this proxy connection is working. Our proxied request yields different results than our standard one.
Basic Request Using Bright Data Datacenter Proxies
In the Authentication section above, you already learned how to make basic requests. For consistency, we'll show an example of that here as well. This is a more simplified version of our request. We already know that the proxy connection is working. No need to test it against our real IP address again.
import requests
import json
USERNAME = "your-username"
ZONE = "your-zone-name"
PASSWORD = "your-password"
HOSTNAME = "brd.superproxy.io"
PORT = 22225
proxies = {
"http": f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}",
"https": f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}"
}
url = "https://httpbin.org/ip"
proxied_ip = requests.get(url, proxies=proxies)
print("Proxy Location:", proxied_ip.json()["origin"])
This example is a little bit cleaner than our last one. Here's what's changed.
- We import
json
so we can properly parse JSON data. - We omit the check on our actual IP address.
- After we get our response, we index the JSON data and print our IP address:
print("Proxy Location:", proxied_ip.json()["origin"])
.
All in all, we're still in pretty simple territory here. In the coming sections we'll take a look at some of Bright Data's more advanced functionality when it comes to using Datacenter IPs.
Country Geotargeting
Using Bright Data's Datacenter proxies, we get access to geolocation targeting. This is a super useful feature. When we set a specific geolocation, Bright Data will go through and automatically route our request through an IP address in that geolocation. We can set a custom location using the country
flag. If we want a US based IP address, we would pass country-us
into our url.
Take a look at the example below.
import requests
import json
USERNAME = "your-username"
ZONE = "your-zone-name"
PASSWORD = "your-password"
HOSTNAME = "brd.superproxy.io"
PORT = 22225
COUNTRY = "us"
proxies = {
"http": f"http://brd-customer-{USERNAME}-zone-{ZONE}-country-{COUNTRY}:{PASSWORD}@{HOSTNAME}:{PORT}",
"https": f"http://brd-customer-{USERNAME}-zone-{ZONE}-country-{COUNTRY}:{PASSWORD}@{HOSTNAME}:{PORT}"
}
url = "https://httpbin.org/ip"
proxied_ip = requests.get(url, proxies=proxies)
print("Proxy Location:", proxied_ip.json()["origin"])
We ran this code and got the following output.
Now, we need to verify that this IP address is in our selected country (us
). If you look at the screenshot below, our location shows up as Wilmington, Delaware. Our geotargeting is working.
The country
flag is the key difference in this example. To set a country, you pass a country code along with your country
flag. In this case, we pass us
. There are a ton of options available, we'll break then down in a table below.
Country Codes
Country | Country Code |
---|---|
Albania | al |
Argentina | ar |
Armenia | am |
Australia | au |
Austria | at |
Azerbaijan | az |
Bangladesh | bd |
Belarus | by |
Belgium | be |
Bolivia | bo |
Brazil | br |
Bulgaria | bg |
Cambodia | kh |
Canada | ca |
Chile | cl |
China | cn |
Colombia | co |
Costa Rica | cr |
Croatia | hr |
Cypress | cy |
Czech Republic | cz |
Denmark | dk |
Dominican Republic | do |
Ecuador | ec |
Egypt | eg |
Estonia | ee |
Finland | fi |
France | fr |
Georgia | ge |
Germany | de |
Great Britain | gb |
Greece | gr |
Guatemala | gt |
Hong Kong | hk |
Hungary | hu |
Iceland | is |
India | in |
Indonesia | id |
Ireland | ie |
Isle of Man | im |
Israel | il |
Italy | it |
Jamaica | jm |
Japan | jp |
Jordan | jo |
Kazakhstan | kz |
Kyrgyzstan | kg |
Laos | la |
Latvia | lv |
Lithuania | lt |
Luxembourg | lu |
Malaysia | my |
Mexico | mx |
Moldova | md |
Netherlands | nl |
New Zealand | nz |
Norway | no |
Peru | pe |
Phillipines | ph |
Russia | ru |
Saudi Arabia | sa |
Singapore | sg |
South Korea | kr |
Spain | es |
Sri Lanka | lk |
Sweden | se |
Switzerland | ch |
Taiwan | tw |
Tajikistan | tj |
Thailand | th |
Turkey | tr |
Turkmenistan | tm |
Ukraine | ua |
United Arab Emirates | ae |
United States | us |
Uzbekistan | uz |
Vietnam | vn |
City Geotargeting
City level geotargeting can be a very useful feature. With datacenter proxies, you typically don't get support for city level geotargeting. If you want to geotarget a specific city, it's best to choose a residential service. Bright Data also has a residential proxy service you can sign up for a free trial here.
When we use datacenter proxies, more often than not, we need to just make due with country level geotargeting. If you need city level geotargeting, it's best to sign up for a Residential Proxy service. These products give us much better support for city geotargeting and allow us to scrape all sorts of local data such as:
- Local Ads
- Local Businesses
- Local Social Media
- Local Events
Error Codes
In their FAQ Section, Bright Data only lists two error codes that you might receive. These are status 403, and 502. Of course, status 200 still means that our request was sucessful. The table below is small, but it will hopefully give you some insight into any problems you may face when using Bright Data Datacenter Proxies.
Status Code | Type | Description |
---|---|---|
200 | Success | Eveything worked as expected! |
403 | Forbidden | You are forbidden from accessing this URL. |
502 | Bad Gateway | Bright Data failed to get a response from the server. |
KYC Verification
For datacenter proxies, Bright Data does not require a KYC process. However, if you decide to use their residential services, they have quite a stringent KYC process. It even includes a live video call! You can read more about the residential KYC process here.
When using Datacenter Proxies, Bright Data does not require users to undergo the KYC process. KYC is only required in order to use their residential services.
Implementing Bright Data Datacenter Proxies in Web Scraping
Now, let's take a look at all the different ways we can integrate with Datacenter Proxies from Bright Data. In the coming sections, we'll show you several Python integrations and a couple using NodeJS as well. This should leave you well equipped to get started with these proxies on your own.
Python Requests
If you've been following along, you've already seen integration using Python Requests. For consistency, we're going to post an example of it here anyway.
USERNAME = "your-username"
ZONE = "your-zone-name"
PASSWORD = "your-password"
HOSTNAME = "brd.superproxy.io"
PORT = 22225
proxies = {
"http": f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}",
"https": f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}"
}
url = "https://httpbin.org/ip"
proxied_ip = requests.get(url, proxies=proxies)
print("Proxy Location:", proxied_ip.json()["origin"])
- Once we've got our credentials, we write our proxy url like this:
f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}"
- We then create a
dict
object that holds both ourhttp
andhttps
proxies. - When making our requests, we make sure to pass
proxies=proxies
. This tells Python Requests to use thedict
object we created for our proxy settings.
Python Selenium
SeleniumWire has always been a tried and true method for using authenticated proxies with Selenium. As you may or may not know, vanilla Selenium does not support authenticated proxies. Even worse, SeleniumWire has been deprecated! This being said, it is still technically possible to integrate Bright Data Datacenter Proxies via SeleniumWire, but we highly advise against it.
When you decide to use SeleniumWire, you are vulnerable to the following risks:
-
Security: Browsers are updated with security patches regularly. Without these patches, your browser will have holes in the security that have been fixed in other browsers such as Chromedriver or Geckodriver.
-
Dependency Issues: SeleniumWire is no longer maintained. In time, it may not be able to keep up with its dependencies as they get updated. Broken dependencies can be a source of unending headache for anyone in software development.
-
Compatibility: As the web itself gets updated, SeleniumWire doesn't. Regular browsers are updated all the time. Since SeleniumWire no longer receives updates, you may experience broken functionality and unexpected behavior.
As time goes on, the probability of all these problems increases. If you understand the risks but still wish to use SeleniumWire, you can view a guide on that here.
Depending on your time of reading, the code example below may or may not work. As mentioned above, we strongly recommend against using SeleniumWire because of its deprecation, but if you decide to do so anyway here you go. We are not responsible for any damage that this may cause to your machine or your privacy.
from seleniumwire import webdriver
USERNAME = "your-username"
ZONE = "your-zone-name"
PASSWORD = "your-password"
HOSTNAME = "brd.superproxy.io"
PORT = 22225
## Define Your Proxy Endpoints
proxy_options = {
"proxy": {
"http": f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}",
"https": f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}",
"no_proxy": "localhost:127.0.0.1"
}
}
## Set Up Selenium Chrome driver
driver = webdriver.Chrome(seleniumwire_options=proxy_options)
## Send Request Using Proxy
driver.get('https://httpbin.org/ip')
- We setup our url the same way we did with Python Requests:
f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}"
. - We assign this url to both the
http
andhttps
protocols of our proxy settings. driver = webdriver.Chrome(seleniumwire_options=proxy_options)
tellswebdriver
to open Chrome with our customseleniumwire_options
,
Python Scrapy
Using these Datacenter Proxies with Scrapy is really straightforward. There are many ways to do it. In the example below, we'll setup our proxy from within our spider.
To start, we need to make a new Scrapy project.
scrapy startproject datacenter
Then, from within your new Scrapy project, create a new Python file inside the spiders folder with the following code.
import scrapy
USERNAME = "your-username"
ZONE = "your-zone-name"
PASSWORD = "your-password"
HOSTNAME = "brd.superproxy.io"
PORT = 22225
class BrightdataScrapyExampleSpider(scrapy.Spider):
name = "datacenter_proxy"
def start_requests(self):
request = scrapy.Request(url="https://httpbin.org/ip", callback=self.parse)
request.meta['proxy'] = f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}"
yield request
def parse(self, response):
print(response.body)
- We construct our url the same we we did in the other previous two examples:
f"http://brd-customer-{USERNAME}-zone-{ZONE}:{PASSWORD}@{HOSTNAME}:{PORT}"
. - Inside of our
start_requests
method, we assign this url torequest.meta['proxy']
.
NodeJS Puppeteer
Now, we're going to run the same setup for NodeJS Puppeteer. Similar to Scrapy, we need to create a new project first. Follow the steps below to get up and running in minutes.
Create a new folder.
mkdir puppeteer-datacenter
cd
into the new folder and create a new JavaScript project.
cd puppeteer-datacenter
npm init --y
Next, we need to install Puppeteer.
npm install puppeteer
Next, from within your new JavaScript project, copy/paste the code below into a new .js
file.
const puppeteer = require('puppeteer');
const PORT = 22225;
const USERNAME = "your-username";
const PASSWORD = "your-password";
const ZONE = "your-zone-name";
(async () => {
const browser = await puppeteer.launch({
args: [`--proxy-server=brd.superproxy.io:${PORT}`]
});
const page = await browser.newPage();
await page.authenticate({
username: `brd-customer-${USERNAME}-zone-${ZONE}`,
password: PASSWORD
});
await page.goto('http://lumtest.com/myip.json');
await page.screenshot({path: 'puppeteer.png'});
await browser.close();
})();
- First, we declare all of our configuration variables as constants:
PORT
,USERNAME
,PASSWORD
,ZONE
. - We set our url when we launch our
browser
with this arg addedargs: [`--proxy-server=brd.superproxy.io:${PORT}`]
. - We add our
USERNAME
to the authentication:brd-customer-${USERNAME}-zone-${ZONE}
. - We also add our
PASSWORD
to the authentication:password: PASSWORD
.
Puppeteer gives first class support for proxy integration right out of the box. With Puppeteer's builtin authenticate()
method, we even have a special spot for both our USERNAME
and our PASSWORD
. The screenshot from this code is available for you to view below.
NodeJS Playwright
Integration with Playwright is almost identical to Puppeteer. Puppeteer and Playwright both actual share a common origin in Chrome's DevTools. The steps below should look at least somewhat familiar, however it does get slightly different at the end.
Create a new project folder.
mkdir playwright-datacenter
cd
into the new folder and initialize a JavaScript project.
cd playwright-datacenter
npm init --y
Install Playwright.
npm install playwright
npx playwright install
Next, you can copy/paste the code below into a JavaScript file.
const playwright = require('playwright');
const PORT = 22225;
const USERNAME = "your-username";
const PASSWORD = "your-password";
const ZONE = "your-zone-name";
const options = {
proxy: {
server: `http://brd.superproxy.io:${PORT}`,
username: `brd-customer-${USERNAME}-zone-${ZONE}`,
password: PASSWORD
}
};
(async () => {
const browser = await playwright.chromium.launch(options);
const page = await browser.newPage();
await page.goto('http://lumtest.com/myip.json');
await page.screenshot({ path: "playwright.png" })
await browser.close();
})();
- Like our Puppeteer example, we first setup our configuration variables:
PORT
,USERNAME
,PASSWORD
,ZONE
. - We create a
proxy
object with the following fields: server: `http://brd.superproxy.io:${PORT}
username: `brd-customer-${USERNAME}-zone-${ZONE}
password: PASSWORD
When we setup our proxy port using Playwright, we get solid support and easy configuration for our proxy. You can view the resulting screenshot from this code below.
Case Study: Scrape The Guardian
Sites with super strict anti-bots will automatically block our datacenter proxy, and that's normal. A datacenter proxy is for more general sites. Datacenter proxies are far cheaper and more efficient. Residential proxies are designed as more of a fallback in the even that datacenter proxies don't work.
Now, we're going to scrape The Guardian. This scrape is more about showing you concepts than gathering vast amounts of data. In the code below, we first setup a proxy based in the US. We perform a GET to verify our location information and then we make a GET to The Guardian. We print our location information and we find the navbar from the Guardian's front page. After this initial run, we reset our connection to use an IP inside of the UK. If our proxies are working, we'll receive different output from each proxy.
Take a look at the code below.
import requests
from bs4 import BeautifulSoup
#basic config for your proxy
USERNAME = "your-username"
ZONE = "your-zone-name"
PASSWORD = "your-password"
HOSTNAME = "brd.superproxy.io"
PORT = 22225
COUNTRY = "us"
#set the initial connection
proxies = {
"http": f"http://brd-customer-{USERNAME}-zone-{ZONE}-country-{COUNTRY}:{PASSWORD}@{HOSTNAME}:{PORT}",
"https": f"http://brd-customer-{USERNAME}-zone-{ZONE}-country-{COUNTRY}:{PASSWORD}@{HOSTNAME}:{PORT}"
}
print("----------------------US---------------------")
location_info = requests.get("http://lumtest.com/myip.json", proxies=proxies)
print(location_info.text)
response = requests.get('https://www.theguardian.com/', proxies=proxies)
soup = BeautifulSoup(response.text, "html.parser")
subnav = soup.select_one("div[data-testid='sub-nav']")
print(subnav.text)
print("----------------------UK---------------------")
#reset the country variable
COUNTRY = "gb"
#reset the proxies
proxies = {
"http": f"http://brd-customer-{USERNAME}-zone-{ZONE}-country-{COUNTRY}:{PASSWORD}@{HOSTNAME}:{PORT}",
"https": f"http://brd-customer-{USERNAME}-zone-{ZONE}-country-{COUNTRY}:{PASSWORD}@{HOSTNAME}:{PORT}"
}
location_info = requests.get("http://lumtest.com/myip.json", proxies=proxies)
print(location_info.text)
response = requests.get("https://www.theguardian.com/", proxies=proxies)
soup = BeautifulSoup(response.text, "html.parser")
subnav = soup.select_one("div[data-testid='sub-nav']")
print(subnav.text)
There are some subtle differences you should notice above from our earlier geotargeting example:
location_info = requests.get("http://lumtest.com/myip.json", proxies=proxies)
is used after setting up each proxy connection. This is just to simply verify our location before getting our target site.COUNTRY
: This variable changes later on in the code so we can reset the proxy.proxies
: After resetting our country, we need to reset our proxies.
When we run the code, we get the following output.
----------------------US---------------------
{"ip":"134.199.83.172","country":"US","asn":{"asnum":20473,"org_name":"AS-VULTR"},"geo":{"city":"Los Angeles","region":"CA","region_name":"California","postal_code":"90017","latitude":34.0514,"longitude":-118.2707,"tz":"America/Los_Angeles","lum_city":"losangeles","lum_region":"ca"}}
USUS elections 2024WorldEnvironmentUkraineSoccerBusinessTechScienceNewslettersWellness
----------------------UK---------------------
{"ip":"92.43.85.194","country":"GB","asn":{"asnum":207990,"org_name":"HostRoyale Technologies Pvt Ltd"},"geo":{"city":"London","region":"ENG","region_name":"England","postal_code":"EC4R","latitude":51.5088,"longitude":-0.093,"tz":"Europe/London","lum_city":"london","lum_region":"eng"}}
WorldUKClimate crisisUkraineEnvironmentScienceGlobal developmentFootballTechBusinessObituaries
First, we'll look at the location comparison here. We cleaned up the important information from the JSON and made it a little easier to read. Our US proxy is located in California and our UK proxy is located in London.
Proxy Country | Region Name | City |
---|---|---|
US | California | Los Angeles |
UK | England | London |
Now let's take a closer look at our navbar text from each run.
us
:USUS elections 2024WorldEnvironmentUkraineSoccerBusinessTechScienceNewslettersWellness
gb
:WorldUKClimate crisisUkraineEnvironmentScienceGlobal developmentFootballTechBusinessObituaries
Let's make these a little easier to read.
us
:US | US elections | 2024 | World | Environment | Ukraine | Soccer | Business | Tech | Science | Newsletters | Wellness
gb
:World | UK | Climate crisis | Ukraine | Environment | Science | Global development | Football | Tech | Business | Obituaries
As you can see, there are some differences in the way that the navbar gets laid out. On the US site, the Top left hand corner of the navbar holds US followed by US Elections. In the UK, the viewer's attention is prioritized a bit differently. The Guardian knows that the average user in the UK is probably not as concerned about the US and US elections, so they instead see World followed by UK.
Many websites will prioritize your attention differently based on your location.
Alternative: ScrapeOps Proxy Aggregator
While Bright Data's Datacenter Proxies are quite a deal, we offer a different product with even better features for a pretty good price! Take a look at the ScrapeOps Proxy Aggregator. With our Proxy Aggregator, instead of paying for bandwidth, you pay per request. On top of that, you only pay for successful requests.
Our Proxy Aggregator automatically selects the best proxy for you based on our datacenter pools. We source these pools from tons of different providers. If a request fails using a datacenter proxy, we actually retry it using a premium (residential or mobile) proxy for you with no additional charge!
The table below outlines our pricing.
Monthly Price | API Credits | Basic Request Cost |
---|---|---|
$9 | 9,000 | $0.00036 |
$15 | 50,000 | $0.0003 |
$19 | 100,000 | $0.00019 |
$29 | 250,000 | $0.000116 |
$54 | 500,000 | $0.000108 |
$99 | 1,000,000 | $0.000099 |
$199 | 2,000,000 | $0.0000995 |
$254 | 3,000,000 | $0.000084667 |
All of these plans offer the following aweseome features:
- JavaScript Rendering
- Screenshot Capability
- Country Geotargeting
- Residential and Mobile Proxies
- Anti-bot Bypass
- Custom Headers
- Sticky Sessions
Along with all of these features, Bright Data is one of our providers! When you sign up for ScrapeOps, you get access to proxies from Bright Data and numerous other providers!
Go a head and sign up for a free trial account here.
Once you've got your free trial, you can copy and paste the code below to check your proxy connection.
import requests
from urllib.parse import urlencode
API_KEY = "your-super-secret-api-key"
LOCATION = "us"
def get_scrapeops_url(url, location=LOCATION):
payload = {
"api_key": API_KEY,
"url": url,
"country": location
}
proxy_url = "https://proxy.scrapeops.io/v1/?" + urlencode(payload)
return proxy_url
response = requests.get(get_scrapeops_url("http://lumtest.com/myip.json"))
print(response.text)
In the code above, we do the following.
- Create our configuration variables:
API_KEY
andLOCATION
. - Write a
get_scrapeops_url()
function. This function takes all of our parameters along with a target url and wraps it into a ScrapeOps Proxied url. This is an incredibly easy way to scrape and it makes our proxy code much more modular. - Check our IP info with
response = requests.get(get_scrapeops_url("http://lumtest.com/myip.json"))
. - Finally, we print it to the terminal. You should get an output similar to this.
{"country":"US","asn":{"asnum":26832,"org_name":"RICAWEBSERVICES"},"geo":{"city":"Dallas","region":"TX","region_name":"Texas","postal_code":"75247","latitude":32.8137,"longitude":-96.8704,"tz":"America/Chicago","lum_city":"dallas","lum_region":"tx"}}
Ethical Considerations and Legal Guidelines
Bright Data is one of the most ethical proxy companies around. Their proxies come entirely from ethical sources and they do not condone using their product for illegal or immoral behavior.
Legal
Don't use proxy providers to break laws. This is illegal and it harms everyone involved. It harms the proxy provider. It eventually harms you too. If you do something illegal using a proxy, first your action will be traced to the proxy provider. Then, the action will be traced to your account.
This sort of thing creates problems for both you and the proxy service.
-
Don't use residential proxies to access illegal content: These actions can come with intense legal penalties and even prison or jail time depending on the severity of the offense.
-
Don't scrape and disseminate other people's private data: Depending on what jurisdiction you're dealing with, this is also a highly illegal and dangerous practice. Doxxing private data can also lead to heavy fines and possibly jail/prison time,
Ethical
When we scrape, we don't just need to think about legality, we also need to make some ethical considerations. Just because something is legal doesn't mean that it's morally right. Nobody wants to be in the next headline about unethical practices.
-
Social Media Monitoring: Social media stalking can be a very destructive and disrespectful behavior. How would you feel if someone used data collection methods on your account?
-
Respect Site Policies: Failure to respect a site's policies can get your account suspended/banned. It can even lead to legal troubles for those of you who sign and violate a terms of service agreement.
Conclusion
You've made it to the end! You should have a solid understanding of what Bright Data's Datacenter proxies are capable of. You should know that they're noticeably faster than residential proxies. You should also have a decent understanding of how to implement them in Python Requests, Scrapy, NodeJS Puppeteer and NodeJS Playwright.
As an added bonus, you also learned how to setup a basic proxy connection using the ScrapeOps Proxy Aggregator. You learned about all the features we have and how reasonable priced we are. Take this new knowledge and go build something with Bright Data's Datacenter Proxies or the ScrapeOps Proxy Aggregator.
More Cool Articles
If you want to read more, we've got a ton of content. Whether you're a seasoned dev or your brand new to coding, we've got something useful for you. We love scraping so much that we wrote the Python Web Scraping Playbook. If you want to learn more, take a look at the guides below.