ZenRows: Web Scraping Integration Guide

ZenRows is a powerful web scraping solution that simplifies the process of extracting data from websites by handling common challenges like captchas, IP blocks, and dynamic content. ZenRows offers a Scraper API, Residential Proxies and they are releasing a Scraping Browser soon. In this guide, we'll explore how to integrate Zenrows into your web scraping projects, enabling you to scrape data effortlessly while maintaining compliance and performance.

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.

Proxy Manager

Scraper Monitoring

Job Scheduling

TLDR: Web Scraping With ZenRows

Scraping with ZenRows is almost the same as it is with ScrapeOps. If you're just looking for a quick way to get started with it, go ahead and use the function below.

def get_zenrows_url(url):    payload = {        "apikey": API_KEY,        "url": url,        }    proxy_url = "https://api.zenrows.com/v1/?" + urlencode(payload)    return proxy_url

To customize your proxy, take a look at the API docs here.

What Is ZenRows?

Much like ScrapeOps, ZenRows is something of an all in one proxy solution. With their API, we can bypass anti-bots, rotate proxies, run a headless browser, much more. As you can see highlighted below, Zenrows is actually one of our providers for the ScrapeOps Proxy Aggregator. Their product is very similar to ours here at ScrapeOps. Much like ScrapeOps, they allow us to set custom countries, wait for content to render, pass custom headers, set premium proxies and much more. When we use the ZenRows API (much like the ScrapeOps Proxy API), here is how the base process goes:

We send our url and our api_key to ZenRows.
ZenRows attempts to get our url through one of their servers.
ZenRows gets their response.
ZenRows forwards the response back to us.

Throughout this process, ZenRows can rotate IP addresses and make all of our requests look like they're coming from somewhere else. Just like with the ScrapeOps, there are many other bells and whistles we can use with the API, but this overall process remains pretty much the same.

You tell the API which site you want to access.
Their servers access the site for you.
You scrape your desired site(s).

How Does Zenrows API Work?

ZenRows is a proxy provider. This means that we send them a url and our api_key and they send back our response from the website. They accomplish this by using different IP addresses to access the site. There are numerous options we can use to customize our request, but overall the process remains much the same. The table below contains a list of common parameters used with the ZenRows API. This list is non-exhaustive, you may view their full API documentation here.

Parameter	Description
apikey (requried)	Your ZenRows API key (string)
url (required)	The url you'd like to scrape (string)
js_render	Render JavaScript components on the page (boolean)
premium_proxy	Use a premium proxy (boolean)
proxy_country	Use with a premium proxy to set your geolocation (string)
session_id	Reuses an IP address for sticky sessions (int)
device	Either `"mobile"` or `"desktop"` (string)
original_status	Show the original status returned by the website (bool)
wait_for	Wait for a specific CSS selector to show up on the page (string)
wait	Wait a certain amount of time before returning the response (int)
screenshot	Take a screenshot of the page (boolean)
screenshot_fullpage	Take a sscreenshot of the full page (boolean)
screenshot_selector	Take a screenshot of a certain CSS selector (string)

Here is an example of a request you might make with the ZenRows API.

# pip install requestsimport requests
url = "https://quotes.toscrape.com/"api_key = "YOUR-ZENROWS-API-KEY"params = {    "url": url,    "apikey": api_key,}response = requests.get('https://api.zenrows.com/v1/', params=params)print(response.text)

Response Format

With ZenRows, we get the option to take our response as either JSON or HTML. This gives us the ability to better fine tune our scrape. Our responses come as HTML by default, but we can use an additional parameter to set our response to JSON. Remember the code snippet from above? We'll make a small change to it.

# pip install requestsimport requests
url = "https://quotes.toscrape.com/"api_key = "YOUR-ZENROWS-API-KEY"params = {    "url": url,    "apikey": api_key,    "json_response": "true"}response = requests.get('https://api.zenrows.com/v1/', params=params)print(response.text)

To get a JSON response, we only need to add one parameter, "json_response": True. For a regular HTML response, we don't need to add anything.

ZenRows API Pricing

You can view the lowest price options from ZenRows below. Their higher cost plans are in the next image.

Plan	Residential Bandwidth	URL Limit	Price per Month
Developer	12.73GB ($5.50/GB)	250,000 ($0.28/1,000)	$69
Startup	24.76GB ($5.25/GB)	1,000,000 ($0.13/1,000)	$129
Business	60GB ($5.00/GB)	3,000,000 ($0.10/1,000)	$299
Business 500	111.11GB ($4.50/GB)	6,000,000 ($0.08/1,000)	$499
Business 1K	285.71GB ($3.50/GB)	12,000,000 ($0.08/1,000)	$999
Business 2K	643.92GB ($3.15/GB)	25,000,000 ($0.08/1,000)	$1,999
Business 3K	1,071.43GB ($2.80/GB)	38,000,000 ($0.08/1,000)	$2,999
Custom	N/A	N/A	N/A

With each of these plans, you only pay for successful requests. If the API fails to get your page, you pay nothing. Each plan also includes the following:

Proxy Rotator
User-Agent Rotator
WAF Bypass
Basic Analytics
CAPTCHA Bypass
Auto-parsing
JavaScript Rendering

Response Status Codes

When using their API, there are a series of status codes we might get back. 200 is the one we want.

Status Code	Type	Possible Causes
200	Success	It worked!
400	Bad Request	Forbidden Domain, Invalid Parameters
401	Unauthorized	Missing API Key, Invalid API Key
402	Payment Required	Usage Exceeded, Didn't Pay the Bill
403	Forbidden	User Not Verified, IP Address Blocked
404	Not Found	Site Not Found, Page Not Found
405	Not Allowed	Method Not Allowed
407	Proxy Authentication	Invalid Authorization Header
413	Content Too Large	Response Size Greater Than Limit
422	Unprocessable Entity	Failed to Retrieve Content
424	Failed Dependency	Failed to Solve CAPTCHA
429	Too Many Requests	Concurrency or Rate Limit Exceeded
500	Internal Server Error	Context Cancelled, Unknown Error
502	Bad Gateway	Could not parse Content
504	Gateway Timeout	Operation Exceeded Time Limit

Setting Up ZenRows API

We'll get started by setting up and account and creating an API key. Once, you've signed up, you are given an API key. You can sign with any of the following methods:

Google
Github
Create an account with an email address and password

After signing up, you can navigate to their dashboard and see your API key located in the upper right. In the lower right portion of the screen, they also have a nifty little request builder. This is perfect for testing. As you probably noticed, in the screenshot above, I exposed my API key for all you readers to see. No worries! We can change our API key very easily from the account settings tab. Once you got an API key, you're all set to start using the ZenRows API.

API Endpoint Integration

Now, let's talk about the API endpoints. We're only going to use one endpoint, very similar to how we use only one with ScrapeOps. Take a look at the line below from some of our earlier examples.

response = requests.get('https://api.zenrows.com/v1/', params=params)

Our base domain is https://api.zenrows.com. Pretty simple right? Our endpoint is /v1. To customize our requests, we send different parameters to this endpoint. Think back to the following snippet from earlier.

# pip install requestsimport requests
url = "https://quotes.toscrape.com/"api_key = "YOUR-ZENROWS-API-KEY"params = {    "url": url,    "apikey": api_key,    "json_response": }response = requests.get('https://api.zenrows.com/v1/', params=params)print(response.text)

The params we send in this case are "url", "apikey", and "json_response". We'll send all of our custom parameters to this v1 endpoint.

Proxy Port Integration

Proxy Port Integration tells our HTTP client (if it supports this), to make all requests to a certain location. This allows us to forward all of our requests through said proxy.

http://<YOUR_ZENROWS_API_KEY>:premium_proxy=true@proxy.zenrows.com:8001

Below is an example of how to do this using Python Requests.

# pip install requestsimport requests
url = "https://https://quotes.toscrape.com"proxy = "http://YOUR-SUPER-SECRET-API-KEY:@proxy.zenrows.com:8001"proxies = {"http": proxy, "https": proxy}response = requests.get(url, proxies=proxies, verify=False)print(response.text)

This form of proxy integration is best used when you're dealing with tons of different functionality and you don't necessarily want fine control over the proxy. You just want to use the proxy and get on with your day.

SDK Integration

SDK (Software Development Kit) integration is an excellent option for developers, particularly beginners, who want to streamline their web scraping process without diving deep into the complexities of HTTP requests, handling proxies, and managing response parsing. SDK integration is ideal in various scenarios, particularly when you want a quick, efficient, and user-friendly way to interact with a web scraping service. Here's when you should consider using it:

Beginner-Friendly Projects: If you're new to web scraping or API integration, using an SDK can significantly lower the learning curve. It allows you to focus on the core aspects of your project without getting bogged down by technical complexities.
Rapid Prototyping: When you're looking to build a prototype or proof of concept quickly, SDKs can help you deliver faster since you won't need to manually code every interaction with the scraping service.
Standard Use Cases: If your scraping needs fall within standard scenarios—like scraping eCommerce data, monitoring competitors, or collecting blog posts—SDK integration provides a ready-made solution that works out of the box.
Consistent Maintenance: If you need ongoing support and updates to handle changes in website structure, rate limits, or captcha systems, using an SDK ensures that your integration remains functional and up-to-date with minimal effort.

ZenRows also has an SDK (Software Development Kit). This method is much easier for beginners who might not be familiar with HTTP clients yet. These SDKs abstract away a large portion of the lower level HTTP work. Take a look at the example below.

# pip install zenrowsfrom zenrows import ZenRowsClient
client = ZenRowsClient("YOUR-SUPER-SECRET-API-KEY")url = "https://quotes.toscrape.com"
response = client.get(url)
print(response.text)

As you can see above, this approach has a much lower barrier to entry.

Managing Concurrency

Managing concurrency is pretty straightforward if you know what you're doing. Once of the easiest ways to do this with Python Requests is to make use of ThreadPoolExecutor. ThreadPoolExecutor gives us the abiliity to open a new pool with x number of threads. On each available thread, we call a function of our choosing.

import requestsfrom bs4 import BeautifulSoupimport concurrent.futuresfrom urllib.parse import urlencode
API_KEY = 'YOUR_API_KEY'NUM_THREADS = 5
def get_proxy_url(url):    payload = {"api_key": API_KEY, "url": url}    proxy_url = 'https://api.zenrows.com/v1/' + urlencode(payload)    return proxy_url
## Example list of urls to scrapelist_of_urls = [            "https://quotes.toscrape.com/page/1/",            "https://quotes.toscrape.com/page/2/",            "http://quotes.toscrape.com/page/3/",            ]
output_data_list = []
def scrape_page(url):    try:        response = requests.get(get_proxy_url(url))        if response.status_code == 200:            soup = BeautifulSoup(response.text, "html.parser")            title = soup.find("h1").text                        ## add scraped data to "output_data_list" list            output_data_list.append({                'title': title,            })                except Exception as e:        print('Error', e)                with concurrent.futures.ThreadPoolExecutor(max_workers=NUM_THREADS) as executor:    executor.map(scrape_page, list_of_urls)    print(output_data_list)

Pay close attention to executor.map() in this situation.

Our first argument is scrape_page: the function we want to call on each thread.
Our second is list_of_urls: the list of arguments we want to pass into scrape_page.

Any other arguments to the function also get passed in as arrays.

Advanced Functionality

Advanced functionality was touched on briefly earlier. With advanced functionality, we can customize our scrape to do things like set our geolocation, or render JavaScript and much more. There's a bit of a hangup when using these advance functionalities though. They cost extra... a lot extra. Check out the table below for a breakdown of these functionalities and their cost.

Parameter	API Cost X Normal	Description
`js_render`	5x	render JavaScript on the webpage
`custom_headers`	1x	set custom headers to the server
`premium_proxy`	10x	use premium IP addresses
`proxy_country`	10x - 25x	set a custom geolocation, requires `premium_proxy`
`session_id`	1x	use to keep browsing sessions in tact between requests
`device`	1x	`"mobile"` or `"desktop"`, `"desktop"` by default
`original_status`	1x	return the original status code from the site
`allowed_status_codes`	1x	returns list of allowed status codes for debugging
`wait_for`	5x	waits for a CSS selector, requires `js_render`
`wait`	5x	wait for a period of time, requires `js_render`
`block_resources`	5x	block certain resources, requires `js_render`
`json_response`	1x	return response as JSON instead of HTML
`css_extractor`	Not Specified	extract elements with a certain CSS selector
`auto_parse`	Not Specified	attempt to automatically parse the page
`markdown_response`	Not Specified	return the parsed content as a markdown file
`screenshot`	5x	requires `js_render`, takes a screenshot of the page
`screenshot_fullpage`	5x	requires `js_render`, take a full page screenshot
`screenshot_selector`	5x	requires `js_render`, screenshot a certain element

You can view their full API documentation here.

JavaScript Rendering

Many modern websites, especially those using JavaScript frameworks like React, Angular, or Vue, load data dynamically after the initial HTML is served. This means the content you're trying to scrape might not be immediately visible in the static HTML, requiring JavaScript to run before the desired data is accessible. JavaScript rendering is essential when dealing with dynamic websites that rely on JavaScript to load content. Here are the key reasons to use it:

Access Dynamic Content: Many modern websites use JavaScript to load important data, such as product listings, reviews, or stock availability. Without JavaScript rendering, you’ll miss this dynamically loaded content because it doesn't appear in the initial HTML.
Scrape JavaScript-Heavy Websites: Sites built with frameworks like React, Angular, or Vue often deliver content dynamically through JavaScript. Rendering ensures you can scrape the full page, including elements that only appear after JavaScript execution.
Avoid Incomplete Data: If a page loads data asynchronously (e.g., product prices or user comments), traditional scraping may return empty or incomplete results. JavaScript rendering ensures all page elements are fully loaded before scraping.
Handle Single-Page Applications (SPAs): SPAs dynamically update the page without reloading it, making traditional scraping methods ineffective. JavaScript rendering allows you to scrape these applications by ensuring that all components are fully visible.
Improve Scraping Accuracy: By rendering JavaScript, you reduce the risk of missing critical information or encountering incomplete data, leading to more accurate and reliable scraping results.

When we tell ZenRows to render JavaScript, the browser will render JavaScript content on the page. We do this by setting the js_render param to True. Here's an example in Python.

# pip install requestsimport requests
url = "https://quotes.toscrape.com"apikey = "YOUR_ZENROWS_API_KEY"params = {    "url": url,    "apikey": apikey,	"js_render": 'true',}response = requests.get('https://api.zenrows.com/v1/', params=params)print(response.text)

You can view the documentation for this here.

Controlling The Browser

ZenRows comes with a builtin headless browser. We can send instructions to this browser using their API. The instruction set is relatively simple.

# pip install requestsimport requests
url = "https://httpbin.io/anything"apikey = "YOUR_ZENROWS_API_KEY"params = {    "url": url,    "apikey": apikey,	"js_render": "true",}response = requests.get("https://api.zenrows.com/v1/", params=params)print(response.text)

-->

Parameter	Description
`wait`	wait for a period of time
`wait_for`	wait for a CSS selector
`json_response`	return the response as JSON instead of HTML
`block_resources`	block resources from loading
`js_instructions`	instructions to run on the page, such as click
`screenshot`	take a screenshot of the page
`screenshot_fullpage`	take a full page screenshot

Here is a snippet that contains js_instructions to click a button and wait a half second.

# pip install requestsimport requests
url = 'https://www.example.com'apikey = 'YOUR_ZENROWS_API_KEY'params = {    'url': url,    'apikey': apikey,	'js_render': 'true',	'js_instructions': """[{"click":".button-selector"},{"wait":500}]""",}response = requests.get('https://api.zenrows.com/v1/', params=params)print(response.text)

The browser control docs are available here.

Country Geotargeting

Websites often serve different data, pricing, or availability based on the region from which a visitor is accessing the site, and country geotargeting ensures that your scraper can access the exact content relevant to the target location. Here are the main reasons to use geotargeting:

Access Location-Specific Content: Many websites deliver different content based on the visitor's location. Geotargeting allows you to scrape the exact content that users in specific regions would see.
Bypass Regional Restrictions: Some websites restrict access to certain data, features, or services based on the user's geographic location. Country geotargeting lets you bypass these restrictions by routing your scraping requests through proxies located in the target region.
Monitor International Competitors: If you're tracking competitors in multiple countries, geotargeting allows you to collect data on how they operate in different markets. This includes variations in pricing strategies, localized offerings, and marketing campaigns tailored to specific regions.
Perform Regional Market Research: Country geotargeting helps businesses gather insights for different markets. It allows you to scrape data specific to a target region, such as local customer reviews, product availability, or localized marketing strategies.
Localized SEO and Ad Tracking: If you're conducting SEO research, country geotargeting lets you see how websites rank in different countries, track regional keywords, or observe location-specific ads. It’s also useful for tracking how brands adjust their advertising and SEO strategies in various locations.
Test Website Localization: Developers can use geotargeting to ensure that websites are properly localized for different regions. This includes testing localized language versions, currency displays, and regional features to ensure they work correctly based on the user's location.

We can also use the proxy to choose our geolocation. We can do this by using the premium_proxy and proxy_country parameters.

# pip install requestsimport requests
url = "https://www.example.com"apikey = "YOUR_ZENROWS_API_KEY"params = {    "url": url,    "apikey": apikey,    "premium_proxy": "true",    "proxy_country": "us"}response = requests.get('https://api.zenrows.com/v1/', params=params)print(response.text)

Here is their list of country codes.

Country	Country Code
United States	`"us"`
Canada	`"ca"`
United Kingdom	`"gb"`
Germany	`"de"`
France	`"fr"`
Spain	`"es"`
Brazil	`"br"`
Mexico	`"mx"`
India	`"in"`
Japan	`"jp"`
China	`"cn"`

You can view the full documentation for this here.

Residential Proxies

A residential proxy is a type of proxy server that uses IP addresses assigned to real residential homes by internet service providers (ISPs). These IPs appear as if they come from everyday users rather than data centers, making them more difficult for websites to detect and block. They provide more reliability and fewer interruptions, making them ideal for scraping websites with strict anti-bot protections. Here're some solid reasons why the residential proxies are used:

Avoid IP Blocking and Bans: Websites often block data center IPs or proxies because they are easily identifiable as non-human traffic. Residential proxies appear as real users, reducing the risk of being blocked or flagged as suspicious activity.
Access Geo-Restricted Content: Residential proxies can be used to simulate traffic from specific geographic regions, helping you access region-locked content, such as location-specific versions of websites, prices, or products.
Bypass CAPTCHAs and Anti-Scraping Measures: Many websites deploy sophisticated anti-scraping techniques like CAPTCAs or rate limits to stop automated traffic. Residential proxies can bypass these measures by making the traffic appear to come from legitimate users, which reduces the likelihood of encountering captchas or other obstacles.
Improve Scraping Success Rates: For large-scale web scraping projects, residential proxies increase the chances of successfully gathering data without interruptions or blocks.
High Anonymity: Residential proxies provide a high level of anonymity, as they obscure the identity and origin of the scraper. This allows for stealthy data collection while maintaining the appearance of regular user activity.
Consistent Web Sessions: Some residential proxies offer static IPs that allow you to maintain consistent sessions, which is important for tasks like logging into accounts, managing cookies, or scraping data that requires a persistent connection.

We can tell ZenRows to use a Premium Proxy (residential proxy) by setting premium_proxy to True. This tells ZenRows to use a residential IP address instead of a datacenter IP. Sites with stringent anti-bot measures tend to block datacenter IP addresses. Here's a code example of how to use them.

# pip install requestsimport requests
url = "https://www.example.com"apikey = "YOUR_ZENROWS_API_KEY"params = {    "url": url,    "apikey": apikey,    "premium_proxy": "true",}response = requests.get('https://api.zenrows.com/v1/', params=params)print(response.text)

You can view their full Premium Proxy documentation here.

Custom Headers

When we're making requests, we sometimes need to add special headers. By default, proxy APIs manage these headers to optimize performance and ensure requests appear as normal user traffic. However, most proxy APIs offer the flexibility to send custom headers when specific conditions or requirements arise. The custom header functionality in proxy APIs allows you to manually specify HTTP request headers, which are key-value pairs sent from a client (like a web scraper) to a server.

Why Use Custom Headers?

Custom headers are essential in web scraping and API requests when you need more control over how your requests are handled by the target server. While proxy APIs optimize headers automatically for performance, there are specific situations where custom headers become necessary:

Requesting Specific Data: Certain websites require specific headers to return the desired data. For example, sending the correct User-Agent or Accept-Language header can make the server respond with the appropriate content.
POST Requests: When making POST requests, especially for form submissions or API interactions, custom headers like Content-Type, Authorization, and X-Requested-With are often required to ensure the request is processed correctly.
Bypassing Anti-Bot Systems: Some websites use sophisticated anti-bot systems that check headers to detect automated traffic. Custom headers can help you mimic real user behavior, making it harder for these systems to block your requests.
Handling Authorization: For websites or APIs that require authentication, custom headers like Authorization (e.g., for tokens or API keys) are essential for gaining access to protected resources.

Word of Caution

Misuse of custom headers can reduce the effectiveness of your scraping, making it crucial to implement custom headers carefully and only when absolutely necessary.

Decreased Performance: Using custom headers incorrectly can reduce the performance of your proxy requests. For example, sending static or poorly chosen headers can make your requests more likely to be flagged as automated, leading to blocks or captchas.
Static Headers Can Trigger Detection: If your custom headers remain the same across multiple requests, this consistency can trigger anti-scraping measures. Websites may detect that the requests are automated, leading to IP bans or additional verification steps.
Need for Dynamic Header Generation: For large-scale scraping operations, it’s important to continuously generate fresh, dynamic headers to avoid detection. Automated systems should be in place to ensure headers vary and appear authentic.
Use Only When Necessary: Custom headers should be used only when essential. In most cases, proxy APIs manage headers better, optimizing for performance and evading detection. Overriding these can sometimes do more harm than good if not handled carefully.

When we send these to a proxy server, sometimes things can get crossed and things can get mixed up when we're dealing with headers to two sites. To keep our custom headers, we use the custom_headers endpoint. You can view an example of this below.

import requests
# Set the URL and API keyurl = "https://api.zenrows.com/v1/"params = {    "apikey": "YOUR_ZENROWS_API_KEY",    "url": "https://httpbin.io/anything",    "custom_headers": "true"}
# Set the headersheaders = {    "Referer": "https://www.google.com"}
# Make the requestresponse = requests.get(url, headers=headers, params=params)
# Print the responseprint(response.text)

Take a look at their docs here.

Static Proxies

Static proxies, often referred to as sticky sessions, are proxy servers that maintain a consistent IP address for an extended period or for the duration of a user session. For instance, if you want to login on a site and remain logged in through the ZenRows Proxy, you'll need a Static Proxy. Static proxies offer several advantages that make them valuable for specific use cases in web scraping and online activities. Here are the key reasons to use static proxies:

Session Consistency: Static proxies are ideal for tasks that require maintaining session state, such as logging into a website, managing cookies, or interacting with user accounts.
Avoiding CAPTCHA and Verification: Using the same IP address consistently can help reduce the likelihood of triggering CAPTCHA challenges or account verification processes.
Long-Term Data Collection: For projects that involve long-term scraping or data collection, static proxies allow you to accumulate data over time without losing context or identity.
Improved Performance: Since static proxies do not change IPs frequently, they can reduce latency and improve the speed of requests, as you won’t have to negotiate new sessions or face interruptions caused by IP switching.
Easier Account Management: When managing multiple accounts on platforms that have strict anti-bot measures, using static proxies can help you operate multiple accounts without raising flags due to IP changes, making it easier to manage activities associated with each account.
Reduced Risk of IP Blacklisting: With static proxies, the risk of getting your IP blacklisted is lower compared to using a pool of rotating proxies, where frequent changes may draw attention and result in blocks.

In order to make use of a Static Proxy, you need to retrieve your sessionId. To do this, we pass session_id as 12345. This tells the ZenRows server that you'd like to keep your session in tact. But how does ZenRows remember which session is mine? ZenRows tracks your session using your API key. Here is an example.

import requests
# Set the URL and API keyurl = "https://api.zenrows.com/v1/"params = {    "apikey": "YOUR_ZENROWS_API_KEY",    "url": "https://quotes.toscrape.com",    "session_id": 12345}

# Make the requestresponse = requests.get(url, headers=headers, params=params)
# Print the responseprint(response.text)

Take a look at docs here

Screenshot Functionality

The screenshot functionality in proxy APIs allows you to capture images of web pages as they are rendered in a browser. This feature typically takes a snapshot of the page at a specific moment, including all visible content, styles, and layouts, providing a visual representation of the website. The screenshot functionality offers several key benefits that enhance the effectiveness of web scraping and data collection projects. Here are the primary reasons to utilize this feature:

Visual Documentation: Screenshots serve as a visual record of web pages at specific points in time, making them useful for audits, compliance checks, and tracking changes in design or content.
Error and Bug Reporting: When issues arise during scraping or site interaction, screenshots can help document errors, layout problems, or unexpected behavior.
Competitive Analysis: Capturing screenshots of competitors' websites enables businesses to analyze their design, layout, and content strategies.
Content Verification: Screenshots can provide proof of the scraped content, ensuring that it matches expectations or contractual obligations.
User Experience Testing: In usability testing, screenshots can be used to evaluate the design and layout of web applications or websites.
Monitoring Changes: Regularly capturing screenshots of a webpage allows you to track changes over time. This is particularly useful for monitoring dynamic content, such as pricing updates or promotional changes.

With ZenRows, screenshots are really easy. We get both a screenshot argument, one for screenshot_fullpage and on top of all that, we have screenshot_selector to take a shot of a specific element on the page. Go ahead and take a look at the code examples below. Here, we take a regular screenshot.

import requests
# Set the URL and API keyurl = "https://api.zenrows.com/v1/"params = {    "apikey": "YOUR_ZENROWS_API_KEY",    "url": "https://httpbin.io/anything",    "js_render": "true",    "screenshot": "true"}

# Make the requestresponse = requests.get(url, headers=headers, params=params)
# Print the responseprint(response.text)

Here is a full page screenshot.

import requests
# Set the URL and API keyurl = "https://api.zenrows.com/v1/"params = {    "apikey": "YOUR_ZENROWS_API_KEY",    "url": "https://httpbin.io/anything",    "js_render": "true",    "screenshot_fullpage": "true"}

# Make the requestresponse = requests.get(url, headers=headers, params=params)
# Print the responseprint(response.text)

Our final example here is a screenshot of a specific page element.

import requests
# Set the URL and API keyurl = "https://api.zenrows.com/v1/"params = {    "apikey": "YOUR_ZENROWS_API_KEY",    "url": "https://quotes.toscrape.com",    "js_render": "true",    "screenshot_selector": "div.container"}

# Make the requestresponse = requests.get(url, headers=headers, params=params)
# Print the responseprint(response.text)

The full documentation for screenshots is available here.

Auto Parsing

Auto parsing (also known as auto extract) is a feature that automatically identifies and extracts key data from web pages without the need for manual coding to locate specific HTML elements. Auto Parsing is an excellent feature. With Auto Parsing, we can actually tell ZenRows to scrape the site for us! With this functionality, we only need to focus on our jobs as developers. We don't need to pick through all the nasty HTML. Auto parsing is really handy when you need a quick, user-friendly, and efficient way to extract data, especially for dynamic or complex websites. It reduces time spent on setup and maintenance, making it ideal for large-scale projects or for those without extensive technical skills. This snippet tells ZenRows to parse the site for us.

import requests
# Set the URL and API keyurl = "https://api.zenrows.com/v1/"params = {    "apikey": "YOUR_ZENROWS_API_KEY",    "url": "https://www.amazon.com/dp/B01LD5GO7I/",    "autoparse": "true"}

# Make the requestresponse = requests.get(url, headers=headers, params=params)
# Print the responseprint(response.text)

You can use this feature to parse Amazon, YouTube, Zillow and many many more sites. You can view the full list here. The docs for this feature are available here.

Case Study: Using Scraper APIs on IMDb Top 250 Movies

Now, it's time for a case study. We're gonna pit ScrapeOps and ZenRows head to head and see how they match up. We're going to scrape the top 250 movies from IMDB. Once we've scraped our data, we'll save it to a JSON file. The two code examples below are virtually identical. The major difference is the proxy function. Aside from the base domain that we're pinging, we use the api_key param with ScrapeOps and with Zenrows we use apikey. Here is the proxy function for ScrapeOps:

def get_scrapeops_url(url):    payload = {        "api_key": API_KEY,        "url": url,        }    proxy_url = "https://proxy.scrapeops.io/v1/?" + urlencode(payload)    return proxy_url

Here is the same function for ZenRows.

def get_zenrows_url(url):    payload = {        "apikey": API_KEY,        "url": url,        }    proxy_url = "https://api.zenrows.com/v1/?" + urlencode(payload)    return proxy_url

The full ScrapeOps code is availavle for you below.

import osimport requestsfrom bs4 import BeautifulSoupimport jsonimport csvimport loggingfrom  urllib.parse import urlencodeimport concurrent.futures
## Logginglogging.basicConfig(level=logging.INFO)logger = logging.getLogger(__name__)
API_KEY = ""
with open("config.json", "r") as config_file:    config = json.load(config_file)    API_KEY = config["scrapeops_api_key"]
def get_scrapeops_url(url):    payload = {        "api_key": API_KEY,        "url": url,        }    proxy_url = "https://proxy.scrapeops.io/v1/?" + urlencode(payload)    return proxy_url


def scrape_movies(url, location="us", retries=3):    success = False    tries = 0
    while not success and tries <= retries:        response = requests.get(get_scrapeops_url(url))
        try:            if response.status_code != 200:                raise Exception(f"Failed response from server, status code: {e}")
            soup = BeautifulSoup(response.text, "html.parser")            json_tag = soup.select_one("script[type='application/ld+json']")            json_data = json.loads(json_tag.text)["itemListElement"]
            movie_list_length = 0
            movie_list = []
            for item in json_data:                movie_list.append(item["item"])            movie_list_length+=len(json_data)
            print(f"Movie list length: {len(json_data)}")            with open("scrapeops-top-250.json", "w") as file:                json.dump(movie_list, file, indent=4)                success = True        except Exception as e:            logger.error(f"Failed to process page: {e}, retries left: {retries-tries}")            tries+=1
    if not success:        raise Exception(f"Failed to scrape page, MAX RETRIES {retries} EXCEEDED!!!")


if __name__ == "__main__":
    MAX_RETRIES = 3
    logger.info("Starting IMDB scrape")
    url = "https://www.imdb.com/chart/top/"
    scrape_movies(url, retries=MAX_RETRIES)
    logger.info("Scrape complete")

This code took 5.401 seconds to run. Here is our ZenRows example.

import osimport requestsfrom bs4 import BeautifulSoupimport jsonimport csvimport loggingfrom  urllib.parse import urlencodeimport concurrent.futures
## Logginglogging.basicConfig(level=logging.INFO)logger = logging.getLogger(__name__)
API_KEY = ""
with open("config.json", "r") as config_file:    config = json.load(config_file)    API_KEY = config["zenrows_api_key"]
def get_zenrows_url(url):    payload = {        "apikey": API_KEY,        "url": url,        }    proxy_url = "https://api.zenrows.com/v1/?" + urlencode(payload)    return proxy_url


def scrape_movies(url, location="us", retries=3):    success = False    tries = 0
    while not success and tries <= retries:        response = requests.get(get_zenrows_url(url))
        try:            if response.status_code != 200:                raise Exception(f"Failed response from server, status code: {response.status_code}")
            soup = BeautifulSoup(response.text, "html.parser")            json_tag = soup.select_one("script[type='application/ld+json']")            json_data = json.loads(json_tag.text)["itemListElement"]
            movie_list_length = 0
            movie_list = []
            for item in json_data:                movie_list.append(item["item"])            movie_list_length+=len(json_data)
            print(f"Movie list length: {len(json_data)}")            with open("zenrows-top-250.json", "w") as file:                json.dump(movie_list, file, indent=4)                success = True        except Exception as e:            logger.error(f"Failed to process page: {e}, retries left: {retries-tries}")            tries+=1
    if not success:        raise Exception(f"Failed to scrape page, MAX RETRIES {retries} EXCEEDED!!!")


if __name__ == "__main__":
    MAX_RETRIES = 3
    logger.info("Starting IMDB scrape")
    url = "https://www.imdb.com/chart/top/"
    scrape_movies(url, retries=MAX_RETRIES)
    logger.info("Scrape complete")

Below is the output from the ZenRows example. ZenRows was barely faster. 5.401 - 4.878 = 0.523. With roughly a half second difference, this is negligible. Depending on time and location, either proxy could come out faster. Since the ScrapeOps API uses ZenRows as one of its providers under the hood, you actually will probably get more reliability out of ScrapeOps. If ZenRows fails, ScrapeOps will try a different provider.

Alternative: ScrapeOps Proxy API Aggregator

The ScrapeOps Proxy API provides an excellent alternative to ZenRows. With ScrapeOps, you actually get access to ZenRows under the hood along with a boatload of other proxy providers. We also get better pricing from ScrapeOps. As you saw earlier in this article, ZenRows lowest tier subscription costs $69 per month at $0.28 per URL. With ScrapeOps, we can get basically the same plan for $29 per month. With the ScrapeOps Proxy API, you can get virtually the same plan for less than half the price. Since ScrapeOps uses ZenRows as a provider, you still get access to ZenRows as well.

Troubleshooting

Issue #1: Request Timeouts

When scraping, timeouts can be an unending source of headache. To handle timeouts with Python Requests, we can use the timeout argument. The snippet below shows how to properly set a timeout.

import requests
# 5 second timeoutresponse = requests.get("https://httpbin.org/get", timeout=5)

Issue #2: Handling CAPTCHAs

If your proxy service is making you submit CAPTCHA requests, something is wrong. Both ScrapeOps and ZenRows are built to bypass CAPTCHAs for you by default. However, sometimes proxy providers can fail. If you run into a CAPTCHA, first, try to submit the request again. If you are consistently being prompted to complete a CAPTCHA, ZenRows allows you to pass any of the following arguments:

[	{"solve_captcha": {"type": "hcaptcha"}},	{"solve_captcha": {"type": "recaptcha"}},	{"solve_captcha": {"type": "cloudflare_turnstile"}},	{"solve_captcha": {"type": "hcaptcha", "options": {"solve_inactive": true}}},	{"solve_captcha": {"type": "recaptcha", "options": {"solve_inactive": true}}}]

You can also use a service like 2captcha. We have an excellent article on bypassing CAPTCHAs here.

Issue #4: Invalid Response Data

To deal with invalid responses, you need to check the status code. Check out ZenRows error codes here. The ScrapeOps error codes are available here. In most cases, you need to double check your parameters or make sure your bill is paid. Every once in awhile, you may receive a different error code that you can find in the links above.

The Legal & Ethical Implications of Web Scraping

Scraping the web is generally considered legal as long as you're scraping public data. If you don't have to login to view the data, this is considered public information and therefore public data. Much like a sign posted in the middle of your town. Reading it (and even taking a picture) is perfectly fine because it's public information. Scraping private data (data gated behind a login) from the web is completely different legal territory. When data is private, you're subject to the same laws and intellectual property policies as the sites you're scraping. However, even when we scrape public data, we're subject to both a website's terms and condidtions along with their robots.txt file. You can view those for IMDB below.

Violating either of these could result in either suspension or even permanent banning.

Potential Consequences of Misuse

Account Suspension or Blocking: If scraping is done excessively or against the site's ToS, your IP address or account could be banned. This can permanently prevent access to the target site.
Legal Penalties: Improper scraping can result in lawsuits, hefty fines, and legal penalties. For example, companies like LinkedIn have taken legal actions against unauthorized scrapers for violating their ToS, claiming damages for lost revenue and resources.
Reputation Damage: Misusing web scraping tools can damage the reputation of individuals or businesses involved, especially in cases where scraping leads to publicized legal disputes or privacy violations.
Risk to Users: If scraped data contains personal or sensitive information, misuse can harm the individuals involved. This may expose scrapers to lawsuits or fines under data protection laws, making it critical to anonymize or aggregate sensitive data to avoid direct harm to users.

Web scraping can provide immense value for business insights, competitive research, and data analysis, but it must be done responsibly. Always respect website terms of service, comply with privacy policies, and ensure that your scraping activities remain within legal and ethical boundaries. By doing so, you protect yourself from legal repercussions and help foster a fair, transparent digital ecosystem.

Conclusion

When using ZenRows, there are numerous ways we can access websites. Whether you're using their SDK, calling the API directly, or configuring the proxy straight into your HTTP client, you have an easy and reliable way to get your data. While the price barrier to ZenRows might seem pretty high, we can also gain access to ZenRows under the hood by using ScrapeOps for about half the price. Both of these solutions will help you get the data you need.

More Web Scraping Guides

At ScrapeOps, we've got tons of learning resources. It doesn't matter if you're brand new to scraping or a hardened developer, we have something for you. We wrote the playbook on scraping with Python. Bookmark one of the articles below and level up your skillset!