Cloudflare Error 1020: How To Bypass Denied Access
Cloudflare Error 1020 is an Access Denied error caused by violating a firewall rule on a Cloudflare WAF-protected site. While it occurs infrequently, it's crucial to identify the root causes and take appropriate action.
In this guide, you'll learn how to solve the Cloudflare error 1020, both for normal users and web scrapers. The guide will focus primarily on strategies for web scrapers.
- What Is Cloudflare’s Error 1020?
- Solving Cloudflare Error 1020 For Normal Users
- Solving Cloudflare Error 1020 For Web Scrapers
- Strategies For Bypassing Cloudflare
- Legal and Ethical Implications
- More Web Scraping Guides
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
What is Cloudflare’s Error 1020?
Cloudflare Error 1020, also known as the "Access Denied" error, is generated by a Cloudflare-protected website when someone violates specific firewall rules.
Cloudflare uses these rules to protect websites from excessive or malicious requests. The firewall rules are a set of criteria that determine whether to allow or block specific traffic.
For example, if a web scraper makes too many requests too frequently or in patterns that seem automated, the Cloudflare firewall rules can detect and stop the web scraper to protect against potential threats.
The Cloudflare anti-bot system detects web scrapers using a variety of technologies, including:
- TLS Fingerprinting
- JavaScript fingerprinting
- IP address analysis
- HTTP connection analysis (connection patterns, headers, etc.)
What are the Possible Causes for Error 1020 Access Denied?
You’re getting Error 1020 when scraping a website because Cloudflare detected suspicious activity from your client or browser. This occurs for several reasons, including:
- Making excessively frequent requests from the same IP address.
- Engaging in activity that is not typical of a human user.
- Using an IP address associated with prohibited activity or previous violations.
- Your IP address may be located in a restricted geographic region.
- In rare cases, temporary issues with Cloudflare itself can cause Error 1020, even for legitimate users.
How to Bypass Cloudflare 1020?
To bypass Cloudflare Error 1020, you can use different techniques, such as:
- Customize your client's User-Agent header.
- Use a rotating proxy to hide your IP address.
- Hide your automated requests by using Undetected ChromeDriver.
Or, consider using ScrapeOps Proxy Aggregator. It’s an All-In-One Proxy API that allows you to use 20+ proxy providers from a single API.
Solving Cloudflare Error 1020 For Normal Users
If you’re a legitimate user and unintentionally getting a Cloudflare 1020 error, then you can follow these steps:
- Contact the site owner:
If you receive an Access Denied error while visiting a site, contact the site owner and ask them to check their Firewall Rules. Only the owner of the site can tell you why you’re unable to access it.
- Restarting Your Router and Internet Connection:
Restarting your router and internet connection can often be a simple fix for various internet issues. Alternatively, if you’re on a corporate network or public Wi-Fi, there may be network-level restrictions in place.
- Check for browser-related issues:
Clear your browser's cache and cookies, as corrupted data in your cache can sometimes cause problems with website access.
Additionally, close and reopen your browser, as this can resolve temporary glitches. If you're still experiencing issues, try accessing the website using a different browser to identify if the issue is specific to your current browser.
- Check for site-wide issues:
Check whether the problem occurs only when accessing a specific page or the whole site. If other pages work, but you encounter an error on one particular page, close the website and go to the same page again.
Additionally, temporarily disable any ad-blockers or extensions. You can also try accessing the website at a different time.
- Wait and Retry:
Sometimes, the issue is temporary. Wait a few minutes and then try reaccessing the website.
Solving Cloudflare Error 1020 For Web Scrapers
Cloudflare's error 1020, "Access denied", can be a frustrating obstacle for web scrapers. However, you can overcome this hurdle with the following tried-and-tested solutions:
Option 1: Use a Rotating Proxy to Hide Your IP
To avoid getting banned while web scraping, use different proxies and rotate them regularly. This automatically switches your IP regularly, making it difficult for the website to detect and block you.
Rotating proxies is essential because websites can track the number of requests coming from a particular IP address and block scrapers that make excessive requests. Using multiple IP addresses helps you avoid detection and blocking.
Take a look at our guide on how to use & rotate proxies to learn what solutions are the best.
Also, consider using the ScrapeOps Proxy Aggregator, which offers over 20 proxy providers with the best performance and price, so you never have to worry about rotating a proxy, CAPTCHAs, or setting up headless browsers again.
Option 2: Customize and Rotate User-Agent Headers
In web scraping, the User Agent (UA) string is a crucial header as it informs the website about the sender's web browser, operating system, and other relevant details. The firewall uses the UA to detect and block bots.
Cloudflare-protected websites can detect multiple requests made with the same User Agent. Therefore, rotating user agents will make your requests appear as if they came from legitimate users, thus avoiding Cloudflare Error 1020.
Use a pool of recent and popular user agents. Also, ensure that your User Agent strings are correctly formatted and match the other headers for the best results. You can find the latest web browser and operating system user agents here.
If you want to learn more about rotating browser headers, read Headers & User-Agents Optimization Checklist. Also, read its implementation using Puppeteer here.
Option 3: Mask Headless Browser with Undetected ChromeDriver
The standard Selenium ChromeDriver leaks a lot of information that anti-bot systems can use to determine if it’s an automated browser/scraper or a real user visiting the website.
The Undetected ChromeDriver is an optimized version designed to bypass the detection mechanisms of sophisticated anti-bot systems like DataDome, Imperva, Perimeterx, Botprotect, and Cloudflare.
If you want to learn more about making your Selenium scrapers more undetectable in-depth, read our guide on Selenium Undetected ChromeDriver: Bypass Anti-Bots With Ease.
Option 4: Use Web Scraping APIs
Implementing these solutions, such as a proxy and HTTP header rotator, can require a significant amount of code, expertise, and budget to operate effectively at scale. Additionally, these solutions may not be effective for all websites. However, you can use a web scraping API to avoid all that.
ScrapeOps is an anti-bot toolkit that allows developers to easily bypass Cloudflare and all other challenges. ScrapeOps Proxy API Aggregator will automatically use the best proxy provider from the pool of over 20+ proxy providers, so you never need to worry about getting blocked. Sign up to get 1,000 free API credits.
Strategies For Bypassing Cloudflare
Sometimes, if the website has Cloudflare's anti-bot system enabled, implementing the above changes alone won't be enough to avoid Cloudflare's Error 1020.
You’ll need to use a system specifically designed to bypass Cloudflare's server-side and client-side anti-bot checks.
Many websites are now using Cloudflare's anti-bot system, and bypassing it can be challenging. Users often need to use premium proxies and optimized headless browsers to overcome these barriers.
If you want to dive deep into how to bypass Cloudflare and explore various options on how to do it, read out How To Bypass Cloudflare in 2023.
Here's the list of various options, you can choose the one that works best for you.
- Option #1: Send Requests To Origin Server:
One of the easiest ways to bypass Cloudflare is to send the request directly to the website's original server IP address rather than Cloudflare's content delivery network (CDN).
- Option #2: Scrape Google Cache Version:
Scrape data from Google Cache instead of the actual website. Scraping from the Google cache can be easier than scraping from a Cloudflare-protected website, but it's a viable choice only when the content on the website you want to scrape doesn't undergo frequent changes.
- Option #3: Cloudflare Solvers:
One way to bypass Cloudflare is to use one of several Cloudflare solvers that solve the Cloudflare challenges. However, these solvers may become outdated and cease to function due to Cloudflare updates. Currently, the most effective Cloudflare solver is FlareSolverr.
- Option #4: Scrape With Fortified Headless Browsers:
The other option is to perform the entire scraping task using a headless browser that has been fortified to look like a real user's browser. Developers have released several fortified headless browsers, including Puppeteer, Playwright, and Selenium, that address the most significant vulnerabilities.
- Option #5: Smart Proxy With Cloudflare Built-In Bypass:
A potential drawback of using open-source Cloudflare Solvers and Pre-Fortified Headless Browsers is that anti-bot companies like Cloudflare can see how they bypass their anti-bot protection systems and easily patch the issues that they exploit.
An alternative to open-source Cloudflare bypass methods is to utilize smart proxies that maintain their own private Cloudflare bypass mechanisms. However, one of the most effective options is to use the ScrapeOps Proxy Aggregator, which integrates over 20 proxy providers into a single proxy API and identifies the most suitable and cost-effective proxy provider for your target domains.
- Option #6: Reverse Engineer Cloudflare Anti-Bot Protection:
The most complex way to bypass the Cloudflare anti-bot protection is to reverse engineer Cloudflare's anti-bot protection system and develop a bypass that passes all Cloudflare anti-bot checks without the need to use a full fortified headless browser instance.
The advantage of this approach is that if you’re scraping at large scales you don't want to run hundreds (if not thousands) of costly full headless browser instances. You can instead develop the most resource-efficient Cloudflare bypass possible.
One that is solely designed to pass the Cloudflare JS, TLS, and IP fingerprint tests.
Let’s have a look at the two most popular methods.
Scrape With Fortified Headless Browsers
It’s important to fortify your Puppeteer scraper when using it for web scraping because it reveals many fingerprints that can be used to identify it.
One well-known leak found in headless browsers such as Puppeteer, Playwright, and Selenium is the value of navigator.webdriver
. This value is false
in regular browsers, but it’s true
in unfortified headless browsers.
There are over 200 known headless browser leaks, but anti-bot companies often keep some of these leaks secret.
The easiest way to fortify your headless browser is to use the stealth plugin. Developers have released several fortified headless browsers that fix the most important leaks, and the stealth plugin for Puppeteer is one of them.
Headless browser stealth plugins patch a large majority of these browser leaks and can often bypass many anti-bot services, such as Cloudflare, PerimeterX, Incapsula, and DataDome, depending on the security level implemented on the website.
Another way to make your headless browsers more undetectable is to combine them with high-quality residential or mobile proxies. These proxies usually have higher IP address reputation scores than data center proxies, and anti-bot services are less likely to block them, making the setup more reliable. The downside of pairing headless browsers with residential or mobile proxies is that costs can rack up fast.
Since residential and mobile proxies are typically charged per GB of bandwidth used, a page rendered with a headless browser can consume an average of 2MB (compared to 250KB without a headless browser). It means that it can get very expensive as you scale.
The following is an example of using residential proxies from BrightData with a headless browser assuming 2MB per page.
Pages | Bandwidth | Cost Per GB | Total Cost |
---|---|---|---|
25,000 | 50 GB | $13 | $625 |
100,000 | 200 GB | $10 | $2000 |
1 Million | 2TB | $8 | $16,000 |
Note: If you want to compare proxy providers to find cheap residential and mobile proxies, you can use this free proxy comparison tool, which can compare residential proxy plans and mobile proxy plans.
Example: The following example shows how to use Puppeteer Extra with the Stealth plugin.
// Import Puppeteer and the Puppeteer Extra Stealth plugin
const puppeteer = require('puppeteer-extra');
const stealthPlugin = require('puppeteer-extra-plugin-stealth');
// Enable the Stealth plugin with all evasions
puppeteer.use(stealthPlugin());
// Main function to take a screenshot of a webpage
(async () => {
// Launch the browser in headless mode
const browser = await puppeteer.launch({
args: ['--no-sandbox'],
headless: true
});
// Create a new page instance
const page = await browser.newPage();
// Navigate to the specified webpage
const targetUrl = '<https://quotes.toscrape.com/>';
await page.goto(targetUrl);
// Save a screenshot of the current page
const screenshotPath = 'screenshot.png';
await page.screenshot({
path: screenshotPath
});
// Log a message indicating that the screenshot has been saved
console.log('Screenshot saved at:', screenshotPath);
// Close the browser
await browser.close();
})();
The puppeteer-stealth
plugin applies various techniques to make the detection of Puppeteer harder. Websites can easily detect Puppeteer, and the goal of this plugin is to avoid detection. If Puppeteer is detected, your requests will be flagged as coming from a bot.
Smart Proxy With Cloudflare Built-In Bypass
The downside of using open-source Cloudflare solvers and pre-fortified headless browsers is that anti-bot companies, such as Cloudflare, can easily identify the methods used to bypass anti-bot protection systems, like how these solvers bypass their anti-bot protection systems. This makes it easy for them to patch the issues that they exploit.
Therefore, most open-source Cloudflare bypasses have a limited lifespan of only a few months before they become ineffective.
The alternative to using open-source Cloudflare bypasses is to use smart proxies that develop and maintain their private Cloudflare bypasses. These proxies are typically more reliable, as they are more difficult for Cloudflare to detect and patch.
Additionally, they are developed by proxy companies with a financial incentive to stay ahead of Cloudflare and promptly address any bypass disruptions.
Most smart proxy providers () have some form of Cloudflare bypass that works to varying degrees and varies in cost. However, one of the best options is to use the ScrapeOps Proxy Aggregator as it integrates over 20 proxy providers into the same proxy API, and finds the best/cheapest proxy provider for your target domains.
Many smart proxy providers, such as ScraperAPI, Scrapingbee, Oxylabs, and Smartproxy, offer Cloudflare bypass solutions with varying effectiveness and costs.
One excellent option is the ScrapeOps Proxy Aggregator, which integrates over 20 proxy providers into a single API. It identifies the best and most cost-effective proxy provider for your target domains.
You can activate ScrapeOps' Cloudflare Bypass by simply adding bypass=cloudflare_level_1
to your API request, and the ScrapeOps proxy will use the best & cheapest Cloudflare bypass available for your target domain.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Use Puppeteer to navigate to the URL
await page.goto('https://proxy.scrapeops.io/v1/?api_key=YOUR_API_KEY&url=http://example.com/&bypass=cloudflare_level_1');
// Wait for some content to be present (you may need to adjust this depending on the website)
await page.waitForSelector('body');
// Extract content
const content = await page.content();
console.log('Body:', content);
// Close the browser
await browser.close();
})();
Cloudflare is the most common anti-bot system being used by websites today, and bypassing it depends on which security settings the website has enabled.
To combat this, we offer 3 different Cloudflare bypasses designed to solve the Cloudflare challenges at each security level.
Security Level | Bypass | API Credits | Description |
---|---|---|---|
Low | cloudflare_level_1 | 10 | Use to bypass Cloudflare protected sites with low security settings enabled. |
Medium | cloudflare_level_2 | 35 | Use to bypass Cloudflare protected sites with medium security settings enabled. On large plans the credit multiple will be increased to maintain a flat rate of $3.50 per thousand requests. |
High | cloudflare_level_3 | 50 | Use to bypass Cloudflare protected sites with high security settings enabled. On large plans the credit multiple will be increased to maintain a flat rate of $4 per thousand requests. |
You can get a ScrapeOps API key with 1,000 free API credits by signing up here.
The advantage of taking this approach is that you can use your normal HTTP client and don't have to worry about:
- Finding origin servers
- Fortifying headless browsers
- Managing numerous headless browser instances & dealing with memory issues
- Reverse engineering the Cloudflare anti-bot protection
This is all managed within the ScrapeOps Proxy Aggregator.
Legal and Ethical Implications
While Cloudflare's access denied error can be an obstacle to data collection for web scrapers, understanding the underlying reasons, including firewall rules and data protection strategies, can help one avoid the 1020 error.
Unauthorized scraping or violating the terms of service can have legal consequences. Excessive scraping can overload website servers, ultimately hindering site performance for other users.
It’s crucial to emphasize that scraping should never involve collecting or using personal information without the user's consent.
Here are some tips on how to scrape responsibly and efficiently without harming the target website.
- Read the Robots.txt File:
Before scraping data from a website, always check the robots.txt file to understand which parts of the site can be scraped and which should not be accessed. This file helps prevent excessive server load and potential violations of the website's terms of service.
- Minimize Request Frequency:
Avoid sending excessive requests to the website in a short period. Wait a bit between each request to give the server a break. Act like a human user when browsing the site.
- Respect Data Privacy:
Only collect data that is freely available online without needing to log in or bypass security. Don't access private or personal information without permission.
- Deal with anti-scraping measures:
To avoid being blocked by anti-scraping measures, use headless browsers, proxies, and CAPTCHA solvers. Be careful when using proxies and rotate your IP addresses often.
- Use Batching or Pagination:
When handling large datasets, use batching or pagination to break down the data into manageable chunks. Batching keeps only the active batch of data in memory, optimizing memory usage, and boosting performance. Pagination divides the dataset into smaller pages for easy navigation and faster loading times.
More Web Scraping Guides
This comprehensive guide delves deeper into solving Cloudflare Error 1020: Access Denied for both normal users and web scrapers. It details strategies for bypassing Cloudflare, followed by a discussion of the legal and ethical implications.
Check out one of our more in-depth guides: