Cloudflare Error 1015: How To Bypass Rate Limiting
When a driver goes faster than the speed limit, they get pulled over by the police. Similarly, if a child eats too much candy quickly, their parents might restrict them from eating for a while. These scenarios are comparable to Cloudflare Error 1015.
Cloudflare Error 1015 happens when a website sets a limit on how many requests you can make, and you go over that limit by sending too many requests in a short time. This error is often shown when you’re trying to scrape a website.
This guide will look at resolving the Cloudflare Error 1015 from a normal user perspective, but primarily from a web scraper perspective.
- What Is Cloudflare’s Error 1015?
- Solving Cloudflare Error 1015 For Normal Users
- Solving Cloudflare Error 1015 For Web Scrapers
- Strategies For Bypassing Cloudflare
- Importance of Respecting Website Rate Limits
- More Web Scraping Guides
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
What Is Cloudflare’s Error 1015?
Error 1015 is a rate-limiting error that notifies users when they have exceeded the allowed number of requests within a specific timeframe. In general, this error occurs when a user tries to scrape a website protected by Cloudflare by sending a large volume of requests rapidly.
The error is a challenge for web scrapers, as they get temporarily blocked from accessing the site, disrupting web scraping activities, which leads to limited data retrieval and delays.
Cloudflare rate limiting is implemented by website owners and runs within an application. It tracks the IP addresses from which requests are coming and the time elapsed between each request.
If there are too many requests from a single IP within the given timeframe, the rate-limited application will say "Hey, please slow down", and the user's request will not be fulfilled for a certain amount of time.
Rate limiting is implemented to block users, bots, or applications that are overusing or abusing a web property. It can stop certain kinds of bot attacks, such as brute force attacks, DoS, and DDoS attacks.
How Long Does Error Code 1015 Last?
It depends on the website owner’s configuration. In some cases, the error is temporary, and the user can access the website after a few minutes. However, in other cases, the website owner may configure Cloudflare to permanently ban an IP address that repeatedly exceeds the rate limit.
How Long Is a Rate Limit Ban by Cloudflare?
Cloudflare users can set the ban duration between 10 seconds and 24 hours according to their plans. In free and pro plans, you can block for only one hour, while other plans have more flexibility.
The Cloudflare API has a global rate limit of 1200 requests per five minutes per user. If this is exceeded, all API calls for the next five minutes will be blocked.
Solving Cloudflare Error 1015 For Normal Users
To solve this error, you can follow these steps.
-
Wait and Retry: The simple solution is to wait for the specified time mentioned in the error message before trying to access the website again. Once this cooldown period has elapsed, users can reaccess the website. This solution is particularly for normal users who are not actively scraping or making excessive requests to the website.
-
Check Your Network: If you're on a shared network, other users could be contributing to the traffic causing this error. In a shared network environment, multiple users access the internet using the same IP address. These users can contribute to the overall traffic if they make a large number of requests to Cloudflare-protected websites. Disconnecting other devices or switching networks might help.
-
Disable browser extensions: Some browser extensions, especially those that automatically refresh pages or change request headers, might trigger Cloudflare's rate-limiting feature. The extensions might generate large numbers of requests in a short time or alter headers in a manner that triggers security measures. To resolve this issue, try disabling these extensions and accessing the site again.
-
Clear Browser Cache and Cookies: Browsers store copies of certain elements from websites to load pages faster on subsequent visits. If this data becomes outdated or corrupted, it may lead to errors when trying to access a website. Clearing your browser's cache and cookies can sometimes resolve this issue.
-
Change Your IP Address: Using a Virtual Private Network (VPN) offers a more reliable and controlled way to change your IP address. VPNs mask your real IP address and route your traffic through a server in various locations.
-
Check for Malware: Malware on your device can sometimes send automated traffic to websites, triggering rate limiting. Run a malware scan to identify and remove any malicious software causing this issue.
Solving Cloudflare Error 1015 For Web Scrapers
When web-scraping at scale, particularly with headless browsers like Puppeteer, encountering rate-limiting issues from services like Cloudflare is common.
To bypass rate limiting and avoid Cloudflare Error 1015, users can implement various strategies, such as throttling requests, using reliable premium proxies, utilizing web scraping APIs, and respecting website rate limits.
Responsible Request Management
When a user interacts with a website, each action they take demands a portion of the website's resources. The site will slow down if its traffic is not managed. Consider the following points for responsible request management:
- Ensure that only a limited number of requests are sent in a given time, and wait for each request to complete before making the next one.
- Avoid making numerous requests in parallel, as this can lead to higher request rates and increase the likelihood of triggering rate-limiting rules.
- If a website returns the same data to multiple users, use a caching mechanism to serve the response from the cache instead of generating it again.
- Batch multiple requests into a single request if possible. By doing this, you reduce the overall number of requests and stay within the limits allowed.
Rotating Proxies
To avoid getting banned while web scraping, you need to use different proxies and rotate them regularly. Services like Cloudflare impose rate limits on the number of requests a single IP address can make within a given timeframe. Therefore, rotating proxies is essential because websites can track the number of requests originating from a particular IP address and block scrapers that make excessive requests.
Using proxies helps you distribute your requests across multiple IP addresses, making it more difficult for websites to detect and block your web scrapers.
If you want to learn more about implementing proxy rotation, read Python Requests: How to Use & Rotate Proxies.
Using Premium Proxies
A proxy server serves as an intermediary between you and your target website. By routing your requests through multiple proxies, you effectively distribute traffic load across different IP addresses, ultimately avoiding Cloudflare Error 1015.
Websites often detect and block free proxies, particularly those hosted on shared data centers. To avoid this issue, consider using premium proxies, preferably residential ones. Residential proxies are commonly used for production crawlers and scrapers.
The ScrapeOps Proxy API Aggregator enables you to use higher quality/more expensive proxy providers by adding the premium=true
to your requests. However, if you set premium=true
then all your requests will go through these proxy providers from the start.
curl -k "https://proxy.scrapeops.io/v1/?api_key=YOUR_API_KEY&url=http://httpbin.org/anything&premium=true"
If you would like to use residential proxies then add residential=true
to your requests. Or if you would like to use mobile proxies then add mobile=true
to your requests.
Rotating Browser Headers
Each HTTP request contains headers that provide details about the sender to the web server. In web scraping, the User Agent (UA) string is the most important header as it informs the website of the sender's web browser, operating system, and other relevant details.
Cloudflare-protected websites can detect multiple requests made with the same User Agent. So rotating user-agents will make your requests appear as if they came from different users, thus avoiding Cloudflare Error 1015.
Use a pool of recent and popular user agents. Also, ensure that your User Agent strings are correctly formatted and match the other headers for the best results. You can find the latest web browser and operating system user agents here.
If you want to learn more about rotating browser headers, read Headers & User-Agents Optimization Checklist. Also, read its implementation using Puppeteer here.
Use Web Scraping APIs
Implementing these solutions, such as a proxy and HTTP header rotator, can require a significant amount of code, expertise, and budget to operate effectively at scale. Additionally, these solutions may not be effective for all websites. However, you can use a web scraping API to avoid all that.
ScrapeOps is an anti-bot toolkit that allows developers to easily bypass Cloudflare and all other challenges. ScrapeOps Proxy API Aggregator will automatically use the best proxy provider from the pool of over 20+ proxy providers, so you never need to worry about getting blocked. Sign up to get 1,000 free API credits.
Strategies For Bypassing Cloudflare
Sometimes, if the website has Cloudflare's anti-bot system enabled, implementing the above changes alone won't be enough to avoid Cloudflare's Error 1015. You will need to use a system specifically designed to bypass Cloudflare's server-side and client-side anti-bot checks.
Many websites are now using Cloudflare's anti-bot system, and bypassing it can be challenging. Users often need to use premium proxies and optimized headless browsers to overcome these barriers.
If you want to dive deep into how to bypass Cloudflare and explore various options on how to do it, read out How To Bypass Cloudflare in 2023.
Here's the list of various options, you can choose the one that works best for you.
-
Option #1: Send Requests To Origin Server: One of the easiest ways to bypass Cloudflare is to send the request directly to the website's original server IP address rather than Cloudflare's content delivery network (CDN).
-
Option #2: Scrape Google Cache Version: Scrape data from Google Cache instead of the actual website. Scraping from the Google cache can be easier than scraping from a Cloudflare-protected website, but it's a viable choice only when the content on the website you want to scrape doesn't undergo frequent changes.
-
Option #3: Cloudflare Solvers: One way to bypass Cloudflare is to use one of several Cloudflare solvers that solve the Cloudflare challenges. However, these solvers may become outdated and cease to function due to Cloudflare updates. Currently, the most effective Cloudflare solver is FlareSolverr.
-
Option #4: Scrape With Fortified Headless Browsers: The other option is to perform the entire scraping task using a headless browser that has been fortified to look like a real user's browser. Developers have released several fortified headless browsers, including Puppeteer, Playwright, and Selenium, that address the most significant vulnerabilities.
-
Option #5: Smart Proxy With Cloudflare Built-In Bypass: A potential drawback of using open-source Cloudflare Solvers and Pre-Fortified Headless Browsers is that anti-bot companies like Cloudflare can see how they bypass their anti-bot protection systems and easily patch the issues that they exploit. An alternative to open-source Cloudflare bypass methods is to utilize smart proxies that maintain their own private Cloudflare bypass mechanisms. However, one of the most effective options is to use the ScrapeOps Proxy Aggregator, which integrates over 20 proxy providers into a single proxy API and identifies the most suitable and cost-effective proxy provider for your target domains.
-
Option #6: Reverse Engineer Cloudflare Anti-Bot Protection: The most complex way to bypass the Cloudflare anti-bot protection is to reverse engineer Cloudflare's anti-bot protection system and develop a bypass that passes all Cloudflare anti-bot checks without the need to use a full fortified headless browser instance. The advantage of this approach is that if you are scraping at large scales you don't want to run hundreds (if not thousands) of costly full headless browser instances. You can instead develop the most resource-efficient Cloudflare bypass possible. One that is solely designed to pass the Cloudflare JS, TLS, and IP fingerprint tests.
Let’s have a look at the two most popular methods.
Scrape With Fortified Headless Browsers
It is important to fortify your Puppeteer scraper when using it for web scraping because it reveals many fingerprints that can be used to identify it.
One well-known leak found in headless browsers such as Puppeteer, Playwright, and Selenium is the value of navigator.webdriver
. This value is false
in regular browsers, but it is true
in unfortified headless browsers.
There are over 200 known headless browser leaks, but anti-bot companies often keep some of these leaks secret.
The easiest way to fortify your headless browser is to use the stealth plugin. Developers have released several fortified headless browsers that fix the most important leaks, and the stealth plugin for Puppeteer is one of them.
Headless browser stealth plugins patch a large majority of these browser leaks and can often bypass many anti-bot services, such as Cloudflare, PerimeterX, Incapsula, and DataDome, depending on the security level implemented on the website.
Another way to make your headless browsers more undetectable is to combine them with high-quality residential or mobile proxies. These proxies usually have higher IP address reputation scores than data center proxies, and anti-bot services are less likely to block them, making the setup more reliable. The downside of pairing headless browsers with residential or mobile proxies is that costs can rack up fast.
Since residential and mobile proxies are typically charged per GB of bandwidth used, a page rendered with a headless browser can consume an average of 2MB (compared to 250KB without a headless browser). It means that it can get very expensive as you scale.
The following is an example of using residential proxies from BrightData with a headless browser assuming 2MB per page.
Pages | Bandwidth | Cost Per GB | Total Cost |
---|---|---|---|
25,000 | 50 GB | $13 | $625 |
100,000 | 200 GB | $10 | $2000 |
1 Million | 2TB | $8 | $16,000 |
Note: If you want to compare proxy providers to find cheap residential and mobile proxies, you can use this free proxy comparison tool, which can compare residential proxy plans and mobile proxy plans.
Example: The following example shows how to use Puppeteer Extra with the Stealth plugin.
// Import Puppeteer and the Puppeteer Extra Stealth plugin
const puppeteer = require('puppeteer-extra');
const stealthPlugin = require('puppeteer-extra-plugin-stealth');
// Enable the Stealth plugin with all evasions
puppeteer.use(stealthPlugin());
// Main function to take a screenshot of a webpage
(async () => {
// Launch the browser in new headless mode
const browser = await puppeteer.launch({
args: ['--no-sandbox'],
headless: "new"
});
// Create a new page instance
const page = await browser.newPage();
// Navigate to the specified webpage
const targetUrl = 'https://quotes.toscrape.com/';
await page.goto(targetUrl);
// Save a screenshot of the current page
const screenshotPath = 'screenshot.png';
await page.screenshot({
path: screenshotPath
});
// Log a message indicating that the screenshot has been saved
console.log('Screenshot saved at:', screenshotPath);
// Close the browser
await browser.close();
})();
The puppeteer-stealth
plugin applies various techniques to make the detection of Puppeteer harder. Websites can easily detect Puppeteer, and the goal of this plugin is to avoid detection. If Puppeteer is detected, your requests will be flagged as coming from a bot.
Smart Proxy With Cloudflare Built-In Bypass
The downside of using open-source Cloudflare solvers and pre-fortified headless browsers is that anti-bot companies, such as Cloudflare, can easily identify the methods used to bypass anti-bot protection systems, like how these solvers bypass their anti-bot protection systems.
This makes it easy for them to patch the issues that they exploit. Therefore, most open-source Cloudflare bypasses have a limited lifespan of only a few months before they become ineffective.
The alternative to using open-source Cloudflare bypasses is to use smart proxies that develop and maintain their private Cloudflare bypasses. These proxies are typically more reliable, as they are more difficult for Cloudflare to detect and patch. Additionally, they are developed by proxy companies with a financial incentive to stay ahead of Cloudflare and promptly address any bypass disruptions.
Most smart proxy providers have some form of Cloudflare bypass that work to varying degrees and vary in cost. However, one of the best options is to use the ScrapeOps Proxy Aggregator as it integrates over 20 proxy providers into the same proxy API, and finds the best/cheapest proxy provider for your target domains.
Many smart proxy providers, such as ScraperAPI, Scrapingbee, Oxylabs, and Smartproxy, offer Cloudflare bypass solutions with varying effectiveness and costs. One excellent option is the ScrapeOps Proxy Aggregator, which integrates over 20 proxy providers into a single API. It identifies the best and most cost-effective proxy provider for your target domains.
You can activate ScrapeOps' Cloudflare Bypass by simply adding bypass=cloudflare_level_1
to your API request, and the ScrapeOps proxy will use the best & cheapest Cloudflare bypass available for your target domain.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: "new" });
const page = await browser.newPage();
// Use Puppeteer to navigate to the URL
await page.goto('https://proxy.scrapeops.io/v1/?api_key=YOUR_API_KEY&url=http://example.com/&bypass=cloudflare_level_1');
// Wait for some content to be present (you may need to adjust this depending on the website)
await page.waitForSelector('body');
// Extract content
const content = await page.content();
console.log('Body:', content);
// Close the browser
await browser.close();
})();
Cloudflare is the most common anti-bot system being used by websites today, and bypassing it depends on which security settings the website has enabled.
To combat this, we offer 3 different Cloudflare bypasses designed to solve the Cloudflare challenges at each security level.
Security Level | Bypass | API Credits | Description |
---|---|---|---|
Low | cloudflare_level_1 | 10 | Use to bypass Cloudflare protected sites with low security settings enabled. |
Medium | cloudflare_level_2 | 35 | Use to bypass Cloudflare protected sites with medium security settings enabled. On large plans the credit multiple will be increased to maintain a flat rate of $3.50 per thousand requests. |
High | cloudflare_level_3 | 50 | Use to bypass Cloudflare protected sites with high security settings enabled. On large plans the credit multiple will be increased to maintain a flat rate of $4 per thousand requests. |
You can get a ScrapeOps API key with 1,000 free API credits by signing up here.
The advantage of taking this approach is that you can use your normal HTTP client and don't have to worry about:
- Finding origin servers
- Fortifying headless browsers
- Managing numerous headless browser instances & dealing with memory issues
- Reverse engineering the Cloudflare anti-bot protection
This is all managed within the ScrapeOps Proxy Aggregator.
Importance of Respecting Website Rate Limits
Respecting website rate limits is essential to prevent rate-limiting errors and ensure website owners can maintain their sites' performance and stability. Ethical web scraping involves adhering to the website's terms of service, respecting user privacy, and minimizing the impact on website servers.
Unauthorized scraping or violating the terms of service can have legal consequences. Excessive scraping can overload website servers, ultimately hindering site performance for other users. It is crucial to emphasize that scraping should never involve collecting or using personal information without the user's consent.
Here are some tips on how to scrape responsibly and efficiently without harming the target website.
- Read the Robots.txt File: Before scraping data from a website, always check the robots.txt file to understand which parts of the site can be scraped and which should not be accessed. This file helps prevent excessive server load and potential violations of the website's terms of service.
- Minimize Request Frequency: Avoid sending excessive requests to the website in a short period. Wait a bit between each request to give the server a break. Act like a human user when browsing the site.
- Respect Data Privacy: Only collect data that is freely available online without needing to log in or bypass security. Don't access private or personal information without permission.
- Deal with anti-scraping measures: To avoid being blocked by anti-scraping measures, use headless browsers, proxies, and CAPTCHA solvers. Be careful when using proxies and rotate your IP addresses often.
- Use Batching or Pagination: When handling large datasets, use batching or pagination to break down the data into manageable chunks. Batching keeps only the active batch of data in memory, optimizing memory usage, and boosting performance. Pagination divides the dataset into smaller pages for easy navigation and faster loading times.
More Web Scraping Guides
This guide delved deep into solving Cloudflare Error 1015. It explained the process of resolving Cloudflare Error 1015 for normal users and for web scrapers. Finally, it discussed strategies for bypassing Cloudflare, followed by the importance of respecting website rate limits.
If you would like to learn more about Web Scraping with Puppeteer, then be sure to check out The Puppeteer Web Scraping Playbook.
For more resources see the links below: