Geonode Datacenter Proxies: Web Scraping Guide

Geonode is a huge provider for all sorts of proxy products. These products include Residential Proxies, Datacenter Proxies, Mobile Proxies, and Site Unblocker. All of these products have some pretty good perks, but today's article is going to focus primarily on their Datacenter Proxies. In this article, we'll test out their Datacenter proxies. When we use datacenter proxies, we get speed, reliablitiy and stability. We'll go through from their process of signing up, all the way to using these proxies in production.

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.


TLDR: How to Integrate Geonode Datacenter Proxy?

Getting setup with Geonode Datacenter Proxies is quite simple.
  1. Once you've got your USERNAME, HOSTNAME, PORT, and PASSWORD, simply put them into the file below.
  2. This code sets up a proxy connection.
  3. Then, it checks your location information to ensure that everything's working correctly.
  4. If the output is not your actual location, the proxy connection is working correctly.
import requestsimport json
USERNAME = "your-username"PASSWORD = "your-password"HOSTNAME = "shared-datacenter.geonode.com"PORT = 9000
proxy_url = f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}"
proxies = {    "http": proxy_url,    "https": proxy_url}
url = "http://lumtest.com/myip.json"
proxied_ip = requests.get(url, proxies=proxies)print("Proxy Location:", proxied_ip.json())
  • First, we setup our configuration variables: USERNAME, PASSWORD, HOSTNAME, PORT.
  • Next, we create a proxy_url. It is comprised of the configuration variables mentioned above.
  • We create a dict object holding our http and https proxies. We set them both to our proxy_url.
  • We print our proxied_ip to the terminal. If the location printed here is different from your actual location, your proxy is working correctly.

Understanding Datacenter Proxies

Some sites block datacenter IP addresses. However, not many of them do this. For the vast majority of the web, datacenter proxies are more than sufficient. There are two main types of proxies in our industry: Datacenter and Premium. Premium proxies are made up of actual mobile and residential IP addresses. They tend to cost quite a bit more and they are often significantly slower than datacenter proxies. Over our next few sections, we'll go over the differences between residential and datacenter proxies. This way, you can make an informed choice when shopping for these proxy products.

Datacenter vs Residential

Datacenter

Pros
  • Price: Datacenter proxies are cheap... really cheap. When using residential proxies, it's not uncommon to pay up to $8/GB.
  • Speed: These proxies are hosted inside actual datacenters. Datacenters usually use top of the line internet connections and hardware. This can really increase our speed and efficiency.
  • Availability: Datacenter proxies usually operate with a much larger pool of IP addresses.
Cons
  • Blocking: Some sites block datacenter IPs by default. This makes some sites more difficult to scrape when using a datacenter proxy.
  • Less Geotargeting Support: While we often do get the option to choose our location with datacenter proxies, they still don't appear quite as normal as a residential proxy. We choose our location, but it still shows up as a datacenter.
  • Less Anonymity: Since you're not traced to an individual residential IP address, your proxy location can be traced easily to a datacenter. This doesn't reveal your identity, but it does reveal that the request is not coming from a standard residential location. Your request isn't tied to some random household, it's tied to a company.

Residential

Pros
  • Anonymity: Residential proxies do give us a higher degree of anonymity. Since you're using an IP address tied to an actual house, your traffic blends in much more.
  • Better Access: There are quite a few sites that block datacenter IP addresses. If you're trying to access a site that blocks them, you need to use a residential IP address.
Cons
  • Price: Residential Proxies are far more expensive than their Datacenter counterparts. Geonode charges $4/GB on their Pay As You Go plan!
  • Speed: Residential proxies are often slower than their datacenter counterparts. You're not always tied to a state of the art machine with the best connection. You are tied to a real residential device using a residential internet connection.
Residential proxies are ideal for SERP results, ad verification, social media monitoring/scraping and much more.

Why Use Geonode Datacenter Proxies?

When we decide to use Geonode Datacenter Proxies, we get some really great perks. For starters, their datacenter proxies are really, really cheap. You can get started for just $1! That's 1/4 the price of their residential proxies. We even get access to free geotargeting!
  • Datacenter Proxies with Geonode are very affordable.
  • We can use geotargeting to select which country we want to appear in.

Geonode Datacenter Proxy Pricing

The pricing plans for Geonode Datacenter Proxies are pretty straightforward. On their main page, they market all datacenter bandwidth at $1 per GB. This is a pretty good value and it gives us a very predictable pricing model to work with. The table below outlines their pricing plans.
PlanCostCost Per GB
1 GB$1$1
10,000 GB$10,000$1
For those of you who wish to use alot of bandwidth, Geonode also offers membership plans that reduce your overall price even further. This next table explains their membership plans.
PlanMonthly CostDiscountBandwidth Limit
Startup$4920%200GB
Emerging$19930%1000GB
Scale$49940%2000GB
If you want to compare them to other providers, take a look here. We actually built a tool to help you shop for the best proxy provider to meet your needs.

Setting Up Geonode Datacenter Proxies

Getting setup with Geonode is pretty straightforward. They give you the option to simply create an account, or to also sign up using Google. You can see what the signup process looks like in the screenshot below. After signing up, they'll have you take a brief survey about your plans for their product. Once you complete the survey, you're all set to purchase a plan. After signing up, you'll be presented with a bunch of different products on the main page. Select Datacenter, as you see in the picture below. Next, select the amount of bandwidth you'd like to purchase. The minimum amount is 1GB and you'll pay $1 per GB. This is an incredible value. Once you've selected a bandwidth amount, you'll be given the option to pay with either a credit/debit card or cryptocurrency. In the example below, we pay with a card. After payment confirmation, you'll be able to view your plan under the My Products tab. Click the To Product button and then scroll down. You'll ssee your credentials and host name. You'll use these to connect to your new datacenter proxy.

Authentication

We authenticate our requests using our username and password that we saw when viewing our credentials earlier. Our full URL gets constructed like this:
f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}"
Without this username and password, Geonode can't authenticate your HTTP requests. It is imperative that you have access to these credentials when making requests through their proxy.
import requestsimport json
USERNAME = "your-username"PASSWORD = "your-super-secret-password"HOSTNAME = "shared-datacenter.geonode.com"PORT = 9000
proxy_url = f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}"
proxies = {    "http": proxy_url,    "https": proxy_url}
url = "http://lumtest.com/myip.json"
proxied_ip = requests.get(url, proxies=proxies)print("Proxy Location:", proxied_ip.json())

Basic Request Using Geonode Datacenter Proxies

In the Authentication section above, you already learned how to create a basic request. For the sake of consistency, we'll you that same example here as well. The code example below makes a simple API request and prints the location information of our proxy.
import requestsimport json
USERNAME = "your-username"PASSWORD = "your-password"HOSTNAME = "shared-datacenter.geonode.com"PORT = 9000
proxy_url = f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}"
proxies = {    "http": proxy_url,    "https": proxy_url}
url = "http://lumtest.com/myip.json"
proxied_ip = requests.get(url, proxies=proxies)print("Proxy Location:", proxied_ip.json())
Anytime we make a basic request using Geonode's Datacenter proxies, we need to forward our requests through the proxy we setup earlier, http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}. When we unmask our credentials, your url should look alot more like this:
http://geonode_your_username:your_password@shared-datacenter.geonode.com:9000
You might have different credentials or a different port number, but for the most part, this is the setup you'll be looking at.

Country Geotargeting

Geotargeting with Geonode is not quite as intuitive as it may be with other services, but it still works just fine. To use geotargeting, we need to set a proxy for each country manually. If you look at the image below, we setup a proxy for both the US and the UK. Each country gets its own port number. If you look at the image above, our port for the UK is set to 9001. To use our British proxy, we simply need to use the correct port. In the example below, we change one thing: our port number.
import requestsimport json
USERNAME = "your-username"PASSWORD = "your-password"HOSTNAME = "shared-datacenter.geonode.com"PORT = 9001
proxy_url = f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}"
proxies = {    "http": proxy_url,    "https": proxy_url}
url = "http://lumtest.com/myip.json"
proxied_ip = requests.get(url, proxies=proxies)print("Proxy Location:", proxied_ip.json())
We ran this code and received the following output. As you can see in the screenshot above, our IP address is showing up in London, England. While you need to manually setup each proxy country, geotargeting with Geonode is really simple. Because each country is tied to its own port, managing our countries is also very intuitive. Instead of maintaining a list of country codes, simply select the country of your choice and setup a proxy connection there.

City Geotargeting

While most Datacenter proxies don't give us access to city level geotargeting, Geonode does. To geotarget a specific city, you can use the same setup process that you used earlier in our country geotargeting section. However, instead of stopping at a country, click on the city dropdown as well. Just like with country geotargeting, we simply set our port number to the proxy connection we just setup. In this case, we used port 9002. This is the only thing that changes in our code.
import requestsimport json
USERNAME = "your-username"PASSWORD = "your-password"HOSTNAME = "shared-datacenter.geonode.com"PORT = 9002
proxy_url = f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}"
proxies = {    "http": proxy_url,    "https": proxy_url}
url = "http://lumtest.com/myip.json"
proxied_ip = requests.get(url, proxies=proxies)print("Proxy Location:", proxied_ip.json())
City level geotargeting gives us access to hyper localized content. When you're dealing with local content, you can extract the following types of data at a local level. This gives allows you to collect and manage your precious data at a much more granular level.
  • Local Ads
  • Local Businesses
  • Local Social Media
  • Local Events
NOTE: City level geotargeting can be inconsistent. When we ran the code example above, we got 3 different results. If you remember from the image, we set our proxy up in New York City. Here are the results from run 1:
Proxy Location: {'country': 'US', 'asn': {'asnum': 16509, 'org_name': 'AMAZON-02'}, 'geo': {'city': 'Columbus', 'region': 'OH', 'region_name': 'Ohio', 'postal_code': '43215', 'latitude': 39.9625, 'longitude': -83.0061, 'tz': 'America/New_York', 'lum_city': 'columbus', 'lum_region': 'oh'}}
Here are the results from run 2:
Proxy Location: {'country': 'US', 'asn': {'asnum': 31898, 'org_name': 'ORACLE-BMC-31898'}, 'geo': {'city': 'San Jose', 'region': 'CA', 'region_name': 'California', 'postal_code': '95119', 'latitude': 37.2379, 'longitude': -121.7946, 'tz': 'America/Los_Angeles', 'lum_city': 'sanjose', 'lum_region': 'ca'}}
Finally, here are our results from run number 3:
Proxy Location: {'country': 'US', 'asn': {'asnum': 31898, 'org_name': 'ORACLE-BMC-31898'}, 'geo': {'city': 'Ashburn', 'region': 'VA', 'region_name': 'Virginia', 'postal_code': '20147', 'latitude': 39.0395, 'longitude': -77.4917, 'tz': 'America/New_York', 'lum_city': 'ashburn', 'lum_region': 'va'}}
In one of our runs, our location showed up in Los Angeles, California. This is almost 2,500 miles (4,000 km) away from our actual target location. To put this in perspective, time in Los Angeles is 3 hours behind that of New York. However, in two of our runs, we showed up in Ohio and Virginia, respectively. While these are still quite a distance from New York City, they at least show up inside the correct timezone. City level geotargeting is possible, but can be unreliable. When possible, try to limit your geotargeting to a specific country.

Error Codes

Error codes are incredibly important in all areas of web development. You might already know that 200 indicates a successful request, but there are a bunch of other status codes we might run into as well. In the table below, we outline those status codes and how to handle them. If you need to view their full documentation on status codes, it's available here.
Status CodeTypeDescription
200SuccessEveything worked as expected!
401UnauthorizedContact your administrator.
403Invalid RequestYour request format was invalid. Double check your params.
407Authentication ErrorDouble check your username and password.
411Access DeniedYour account has been blocked. Contact support.
429Too Many RequestsTry again in 5 minutes and slow down your requests.
461Sticky Session LimitStart a new sticky session.
462Sticky Port UnsupportedUse rotating ports.
463Unsupported LocationDouble check your proxy location.
464ForbiddenHost not allowed to connect to your target url.
465Location Not FoundDouble check your geolocation.
466Limit ReachedPlease upgrade or purchase more bandwidth.
468No Available ProxyTry again or contact your administrator.
470Account BlockedContact your administrator.
471Inactive PortSwitch to an active port.
561Proxy UnreachablePlease try again.
Status codes are imperative. When you encounter an error, you need to look up the status code and troubleshoot accordingly.

KYC Verification

Geonode does not require a KYC process to use their datacenter proxies. This practice is far more common when using residential proxies. Some companies will even require you to meet them on a video call! When using datacenter proxies, the liabilities are structured a little bit differently so such stringent KYC policies aren't as much of a necessity. When using Datacenter Proxies, Geonode does not require users to undergo the KYC process. KYC procedures are far more common when using residential proxies.

Implementing Geonode Datacenter Proxies in Web Scraping

We've been exploring Geonode's datacenter proxies in quite a bit of detail. In these next few sections, we're going to look at ways to implement them with different libraries. We'll go through a few popular Python frameworks and a few popular JavaScript ones as well.

Python Requests

We've been using Python Requests throughout this article. Perhaps the best way to start this section is by using Requests... you're already familiar with it!
import requestsimport json
USERNAME = "your-username"PASSWORD = "your-password"HOSTNAME = "shared-datacenter.geonode.com"PORT = 9000
proxy_url = f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}"
proxies = {    "http": proxy_url,    "https": proxy_url}
url = "http://lumtest.com/myip.json"
proxied_ip = requests.get(url, proxies=proxies)print("Proxy Location:", proxied_ip.json())
  • Once we've got our credentials, our proxy_url looks like this: f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}".
  • We then create a dict object that holds both our http and https proxies.
  • When making our requests, we make sure to pass proxies=proxies. This tells Python Requests to use the dict object we created for our proxy settings.

Python Selenium

SeleniumWire has always been the standard go-to for proxy authentication in Selenium. You might already know this: vanilla Selenium does not support authenticated proxies. On top of that, SeleniumWire has been deprecated! This being said, it is still technically possible to integrate Geonode Datacenter Proxies via SeleniumWire, but we strongly advise against it. When you decide to use SeleniumWire, you are vulnerable to the following risks:
  • Security: Browsers are updated with security patches regularly. Without these patches, your browser will have holes in the security that have been fixed in other browsers such as Chromedriver or Geckodriver.
  • Dependency Issues: SeleniumWire is no longer maintained. In time, it may not be able to keep up with its dependencies as they get updated. Broken dependencies can be a source of unending headache for anyone in software development.
  • Compatibility: As the web itself gets updated, SeleniumWire doesn't. Regular browsers are updated all the time. Since SeleniumWire no longer receives updates, you may experience broken functionality and unexpected behavior.
As time goes on, the probability of all these problems increases. If you understand the risks but still wish to use SeleniumWire, you can view a guide on that here. Depending on your time of reading, the code example below may or may not work. As mentioned above, we strongly recommend against using SeleniumWire because of its deprecation, but if you decide to do so anyway, here you go. We are not responsible for any damage that this may cause to your machine or your privacy.
from seleniumwire import webdriver
USERNAME = "your-username"PASSWORD = "your-password"HOSTNAME = "shared-datacenter.geonode.com"PORT = 9000
proxy_url = f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}"

## Define Your Proxy Endpointsproxy_options = {    "proxy": {        "http": proxy_url,        "https": proxy_url,        "no_proxy": "localhost:127.0.0.1"    }}
## Set Up Selenium Chrome driverdriver = webdriver.Chrome(seleniumwire_options=proxy_options)
## Send Request Using Proxydriver.get('https://httpbin.org/ip')
  • We build our url exactly the same way we did with Python Requests: f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}".
  • We assign this url to both the http and https protocols of our proxy settings.
  • driver = webdriver.Chrome(seleniumwire_options=proxy_options) tells webdriver to open Chrome with our custom seleniumwire_options,

Python Scrapy

Using these Datacenter Proxies with Scrapy is really straightforward. There are many ways to do it. In the example below, we'll setup our proxy from within our spider. To start, we need to make a new Scrapy project.
scrapy startproject datacenter
Then, from within your new Scrapy project, create a new Python file inside the spiders folder with the following code.
import scrapy
USERNAME = "your-username"PASSWORD = "your-password"HOSTNAME = "shared-datacenter.geonode.com"PORT = 9001
proxy_url = f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}"

class BrightdataScrapyExampleSpider(scrapy.Spider):    name = "datacenter_proxy"
    def start_requests(self):        request = scrapy.Request(url="https://httpbin.org/ip", callback=self.parse)        request.meta['proxy'] = proxy_url        yield request
    def parse(self, response):        print(response.body)
  • Once again, our proxy_url gets formatted the same way: f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}".
  • Inside of our start_requests method, we assign our proxy_url to request.meta['proxy'].

NodeJS Puppeteer

Now, we're going to setup Geonode's datacenter proxies using NodeJS Puppeteer. Like Scrapy, we need to create a new project first. Follow the steps below to get up and running in minutes. Create a new folder.
mkdir puppeteer-datacenter
cd into the new folder and create a new JavaScript project.
cd puppeteer-datacenternpm init --y
Next, we need to install Puppeteer.
npm install puppeteer
Next, from within your new JavaScript project, copy/paste the code below into a new .js file.
const puppeteer = require('puppeteer');
const USERNAME = "your-username";const PASSWORD = "your-password";const HOSTNAME = "shared-datacenter.geonode.com";const PORT = 9000;
(async () => {  const browser = await puppeteer.launch({    args: [`--proxy-server=${HOSTNAME}:${PORT}`]  });
  const page = await browser.newPage();
  await page.authenticate({    username: USERNAME,    password: PASSWORD  });
  await page.goto('http://lumtest.com/myip.json');  await page.screenshot({path: 'puppeteer.png'});
  await browser.close();})();
  • First, we declare all of our configuration variables as constants: PORT, USERNAME, PASSWORD, HOSTNAME.
  • We set our url when we launch our browser with this arg added --proxy-server=${HOSTNAME}:${PORT}.
  • We add our USERNAME to the authentication: username: USERNAME.
  • We also add our PASSWORD to the authentication: password: PASSWORD.
Puppeteer gives us full proxy support right out of the box. When we use Puppeteer's builtin authenticate() method, we even have a specific spot for both our USERNAME and our PASSWORD. The screenshot from this code is available for you to view below.

NodeJS Playwright

Integration with Playwright is virtually identical to what we did with Puppeteer. Puppeteer and Playwright both actually share a common origin in Chrome's DevTools. The steps below should look at least somewhat familiar, however it does get slightly different near the end. Create a new project folder.
mkdir playwright-datacenter
cd into the new folder and initialize a JavaScript project.
cd playwright-datacenternpm init --y
Install Playwright.
npm install playwrightnpx playwright install
Next, you can copy/paste the code below into a JavaScript file.
const playwright = require('playwright');
const USERNAME = "your-username";const PASSWORD = "your-password";const HOSTNAME = "shared-datacenter.geonode.com";const PORT = 9000;
const options = {    proxy: {        server: `http://${HOSTNAME}:${PORT}`,        username: USERNAME,        password: PASSWORD    }};
(async () => {    const browser = await playwright.chromium.launch(options);    const page = await browser.newPage();
    await page.goto('http://lumtest.com/myip.json');
    await page.screenshot({ path: "playwright.png" })
    await browser.close();})();
  • Like our Puppeteer example, we first setup our configuration variables: PORT, USERNAME, PASSWORD, HOSTNAME.
  • We create a proxy object with the following fields:
  • server: `http://${HOSTNAME}:${PORT}
  • username: USERNAME
  • password: PASSWORD
When we setting up our proxy port with Playwright, we get solid support and easy configuration for our proxy. You can view the resulting screenshot from this code below.

Case Study: Scrape The Guardian

It's a pretty standard practice for strict anti-bots to block datacenter IP addresses. When you're scraping datacenter proxies are for more general sites. Datacenter proxies are far cheaper and more efficient. Residential proxies tend to work better as a fallback in the event that datacenter proxies don't work. In this next section, we're going to scrape The Guardian. This scraping job is more about concepts than gathering vast amounts of data.
  1. In our code below, we first setup a proxy based in the US.
  2. We make a GET to verify our location information and then we make a GET to The Guardian.
  3. We print our location information and then we find the navbar from the Guardian's front page.
  4. After our initial run, we reset our proxy connection so it uses an IP inside of the UK.
  5. If our proxies are working, we'll receive different output from each proxy.
Take a look at the code below.
import requestsfrom bs4 import BeautifulSoup
USERNAME = "your-username"PASSWORD = "your-password"HOSTNAME = "shared-datacenter.geonode.com"UK_PORT = 9001US_PORT = 9002
proxy_url = f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{US_PORT}"
proxies = {    "http": proxy_url,    "https": proxy_url}


print("----------------------US---------------------")

location_info = requests.get("http://lumtest.com/myip.json", proxies=proxies)print(location_info.text)
response = requests.get('https://www.theguardian.com/', proxies=proxies)
soup = BeautifulSoup(response.text, "html.parser")
subnav = soup.select_one("div[data-testid='sub-nav']")print(subnav.text)

print("----------------------UK---------------------")
proxy_url = f"http://{USERNAME}:{PASSWORD}@{HOSTNAME}:{UK_PORT}"
proxies = {    "http": proxy_url,    "https": proxy_url}
location_info = requests.get("http://lumtest.com/myip.json", proxies=proxies)print(location_info.text)
response = requests.get("https://www.theguardian.com/", proxies=proxies)
soup = BeautifulSoup(response.text, "html.parser")
subnav = soup.select_one("div[data-testid='sub-nav']")
print(subnav.text)
There are some subtle differences you should notice above from our earlier geotargeting example:
  • location_info = requests.get("http://lumtest.com/myip.json", proxies=proxies) is used after setting up each proxy connection. We make this request just for verification purposes.
  • We have two different ports setup. If you remember in our geotargeting examples, we setup one proxy in the UK and then we set another up for New York City. The NYC proxy doesn't always show up in New York, but it does consistently show up inside the US. This will be more than sufficient for our needs here.
  • proxies: After resetting our country, we need to reset our proxies.
If we run our code, we get output similar to what you see below.
----------------------US--------------------- {"country":"US","asn":{"asnum":16509,"org_name":"AMAZON-02"},"geo":{"city":"Columbus","region":"OH","region_name":"Ohio","postal_code":"43215","latitude":39.9625,"longitude":-83.0061,"tz":"America/New_York","lum_city":"columbus","lum_region":"oh"}} USUS elections 2024WorldEnvironmentUkraineSoccerBusinessTechScienceNewslettersWellness ----------------------UK--------------------- {"country":"GB","asn":{"asnum":50304,"org_name":"Blix Solutions AS"},"geo":{"city":"","region":"","region_name":"","postal_code":"","latitude":51.4964,"longitude":-0.1224,"tz":"Europe/London"}} UKWorldClimate crisisUkraineFootballNewslettersBusinessEnvironmentUK politicsEducationSocietyScienceTechGlobal developmentObituaries
First, we'll look at our locations here. We cleaned up the important information from the JSON and made it a little easier to read. Our US proxy is located in the US and our UK proxy is located in the UK.
Proxy CountryCountryCity
USUSColumbus
UKGBN/A
Now let's take a closer look at our navbar text from each run.
  • us: USUS elections 2024WorldEnvironmentUkraineSoccerBusinessTechScienceNewslettersWellness
  • gb: UKWorldClimate crisisUkraineFootballNewslettersBusinessEnvironmentUK politicsEducationSocietyScienceTechGlobal developmentObituaries
Let's make these a little easier to read.
  • us: US | US elections 2024 | World | Environment | Ukraine | Soccer | Business | Tech | Science | Newsletters | Wellness
  • gb: UK | World | Climate crisis | Ukraine | Football | Newsletters | Business | Environment | UK politics | Education | Society | Science | Tech | Global development | Obituaries
As you can see, there are some differences in the way that the navbar gets laid out. On the US site, the Top left hand corner of the navbar holds US followed by US Elections. In the UK, the viewer's attention is prioritized a bit differently. The Guardian knows that the average user in the UK is probably not as concerned about the US and US elections, so they instead see UK followed by World. Many websites will prioritize your attention differently based on your location.

Alternative: ScrapeOps Proxy Aggregator

Geonode's datacenter proxies are a pretty good value. We offer a different product with much better features for a pretty great price! Check out the ScrapeOps Proxy Aggregator. With the Proxy Aggregator, we don't need to pay for bandwidth, instead, we pay per request. Even better, you only pay for successful requests! The ScrapeOps Proxy Aggregator automatically selects the best proxy for you based on our proxy pools. We source these pools from tons of different providers. Each request is usually tried first with a datacenter proxy. If your request fails, we then retry it using a premium (residential or mobile) proxy for you with no additional charge! The table below outlines our pricing.
Monthly PriceAPI CreditsBasic Request Cost
$99,000$0.00036
$1550,000$0.0003
$19100,000$0.00019
$29250,000$0.000116
$54500,000$0.000108
$991,000,000$0.000099
$1992,000,000$0.0000995
$2543,000,000$0.000084667
All of these plans offer the following aweseome features:
  • JavaScript Rendering
  • Screenshot Capability
  • Country Geotargeting
  • Residential and Mobile Proxies
  • Anti-bot Bypass
  • Custom Headers
  • Sticky Sessions
Along with all of these features, Geonode is one of our providers! When you sign up for ScrapeOps, you get access to proxies from Geonode and numerous other providers! Go a head and sign up for a free trial account here. Once you've got your free trial, you can copy and paste the code below to check your proxy connection.
import requestsfrom urllib.parse import urlencode
API_KEY = "your-super-secret-api-key"LOCATION = "us"
def get_scrapeops_url(url, location=LOCATION):    payload = {        "api_key": API_KEY,        "url": url,        "country": location    }    proxy_url = "https://proxy.scrapeops.io/v1/?" + urlencode(payload)    return proxy_url
response = requests.get(get_scrapeops_url("http://lumtest.com/myip.json"))print(response.text)
In the code above, we do the following.
  • Create our configuration variables: API_KEY and LOCATION.
  • Write a get_scrapeops_url() function. This function takes all of our parameters along with a target url and wraps it into a ScrapeOps Proxied url. This is an incredibly easy way to scrape and it makes our proxy code much more modular.
  • Check our IP info with response = requests.get(get_scrapeops_url("http://lumtest.com/myip.json")).
  • Finally, we print it to the terminal. You should get an output similar to this.
{"country":"US","asn":{"asnum":36352,"org_name":"AS-COLOCROSSING"},"geo":{"city":"Buffalo","region":"NY","region_name":"New York","postal_code":"14205","latitude":42.8856,"longitude":-78.8736,"tz":"America/New_York","lum_city":"buffalo","lum_region":"ny"}}

Geonode is one of the most ethical proxy companies around. Their proxies are always ethically sourced. The screenshot below comes straight from their homepage. It reads "Committed to ethical proxy sourcing". If it's on their front page, this is something that Geonode takes very seriously. You shouldn't use proxy providers to break laws. Obviously, it's illegal and something you might not have considered: it harms everyone involved. It harms the proxy provider. It eventually harms you too. If you do something illegal using a proxy, first, your action will be traced to the proxy provider. Then, the action will be traced to your account through either your API key, or your username and password. This creates problems for both you and your proxy service.
  • Don't use residential proxies to access illegal content: These actions can come with intense legal penalties and even prison or jail time depending on the severity of the offense.
  • Don't scrape and disseminate other people's private data: Depending on what jurisdiction you're dealing with, this is also a highly illegal and dangerous practice. Doxxing private data can also lead to heavy fines and possibly jail/prison time,

Ethical

When we scrape, we don't only need to consider legality, we need to make some ethical considerations too. Just because something is legal doesn't mean that it's morally right or acceptable. No one wants to be the next headline concerning unethical practices.
  • Social Media Monitoring: Social media stalking can be a very destructive and disrespectful behavior. How would you feel if someone used data collection methods on your account?
  • Respect Site Policies: Failure to respect a site's policies can get your account suspended/banned. It can even lead to legal troubles for those of you who sign and violate a terms of service agreement.

Conclusion

Congratulations, you've finished our Geonode Datacenter Tutorial! At this point, you should have a pretty good grasp of Geonode's datacenter proxies. You understand that datacenter proxies are noticeably faster than residential proxies. You should also have a decent understanding of how to implement them using Python Requests, Scrapy, NodeJS Puppeteer and NodeJS Playwright. You also learned how to setup a basic proxy connection with our very own Proxy Aggregator. You learned about all our features our reasonably priced plans. Now, take your new skills and go build something with Geonode's Datacenter Proxies or the ScrapeOps Proxy Aggregator.

More Cool Articles

If you're in the mood to binge-read, we've got a ton of content. Whether you're a seasoned dev or your brand new to web scraping, we've got something useful for you. We love scraping so much that we wrote the Python Web Scraping Playbook. If you want to learn more, take a look at the guides below.