![]()
How To Solve A Scrapy 403 Unhandled or Forbidden Errors (2026)
TL;DR — A 403 in Scrapy almost always means the target site has identified your spider as a bot and refused the request, not that Scrapy is broken. The fix is to make your spider look like a real browser: realistic User-Agent, full header set, sensible DOWNLOAD_DELAY and AUTOTHROTTLE, rotating residential proxies, and — for Cloudflare or DataDome-protected sites — TLS impersonation via scrapy-impersonate or scrapy-playwright.
It will typically look something like this in your logs:
2026-05-13 00:13:02 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.example.com/> (referer: None)
2026-05-13 00:13:03 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.example.com/>: HTTP status code is not handled or not allowed
Scrapy's HttpErrorMiddleware drops non-2xx responses by default, which is why the 403 itself never reaches your callback unless you opt in. The underlying 403 has only two realistic causes:
- The URL really is permission-protected and you need to authenticate.
- The website detected an automated client and returned a
403as a ban page.
In scraping, the second cause is overwhelmingly the more common one. Scrapy 403 responses are particularly common against sites protected by Cloudflare, DataDome and PerimeterX.
In this guide we'll debug Scrapy 403 Forbidden Errors end-to-end and walk through the fixes — in 2026 order, easiest first.
- Quick reference: Scrapy 403 causes & fixes
- Easy Way To Solve Scrapy 403 Errors
- Inspect the 403 response body
- Randomising Your Request Delays
- Use Fake User Agents
- Optimize Request Headers
- TLS Fingerprinting in Scrapy
- Use Rotating Proxies
- Frequently Asked Questions
Let's begin...
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
Quick reference: Scrapy 403 causes & fixes
| # | Symptom | Most likely cause | Fix in Scrapy |
|---|---|---|---|
| 1 | 403 on every request, only Scrapy/x.y User-Agent in logs | Default USER_AGENT setting still in place | Set USER_AGENT in settings.py or install scrapy-fake-useragent |
| 2 | 403 with a real User-Agent set | DEFAULT_REQUEST_HEADERS missing browser headers, or headers inconsistent with UA | Override DEFAULT_REQUEST_HEADERS to a full Chrome/Firefox header set |
| 3 | First few requests succeed, then 403 | IP-based rate limiting | Enable AUTOTHROTTLE_ENABLED = True and a non-zero DOWNLOAD_DELAY |
| 4 | 403 from every IP, even with perfect headers | Datacenter IPs flagged | Move to residential or mobile proxies via scrapy-rotating-proxies or a smart proxy API |
| 5 | 403 from Cloudflare/DataDome with a "Just a moment..." page | Scrapy's Twisted TLS stack has a distinctive JA3 fingerprint | Use scrapy-impersonate, scrapy-playwright, or the ScrapeOps Proxy Aggregator with bypass=cloudflare_level_X |
| 6 | 403 only on a few protected URLs | Login-required URLs or geo-blocked content | Authenticate, send the right cookies, or use a proxy in the right geography |
Easy Way To Solve Scrapy 403 Errors
If the URL you are trying to scrape is normally accessible, but you are getting Scrapy 403 Forbidden Errors then it is likely that the website is flagging your spider as a scraper and blocking your requests.
To avoid getting detected we need to optimise our spiders to bypass anti-bot countermeasures by:
- Randomising Your Requests
- Using Fake User Agents
- Optimizing Request Headers
- Using Proxies
We will discuss these below, however, the easiest way to fix this problem is to use a smart proxy solution like the ScrapeOps Proxy Aggregator.

With the ScrapeOps Proxy Aggregator you simply need to send your requests to the ScrapeOps proxy endpoint and our Proxy Aggregator will optimise your request with the best user-agent, header and proxy configuration to ensure you don't get 403 errors from your target website.
Simply get your free API key by signing up for a free account here and edit your Scrapy spider as follows:
import scrapy
from urllib.parse import urlencode
API_KEY = 'YOUR_API_KEY'
def get_scrapeops_url(url):
payload = {'api_key': API_KEY, 'url': url}
proxy_url = 'https://proxy.scrapeops.io/v1/?' + urlencode(payload)
return proxy_url
class QuotesSpider(scrapy.Spider):
name = "quotes"
def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=get_scrapeops_url(url), callback=self.parse)
If you are getting blocked by Cloudflare, then you can simply activate ScrapeOps' Cloudflare Bypass by adding bypass=cloudflare_level_1 to the request:
import scrapy
from urllib.parse import urlencode
API_KEY = 'YOUR_API_KEY'
def get_scrapeops_url(url):
payload = {'api_key': API_KEY, 'url': url, 'bypass': 'cloudflare_level_1'}
proxy_url = 'https://proxy.scrapeops.io/v1/?' + urlencode(payload)
return proxy_url
class QuotesSpider(scrapy.Spider):
name = "quotes"
def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=get_scrapeops_url(url), callback=self.parse)
Cloudflare is the most common anti-bot system being used by websites today, and bypassing it depends on which security settings the website has enabled.
To combat this, we offer 3 different Cloudflare bypasses designed to solve the Cloudflare challenges at each security level.
| Security Level | Bypass | API Credits | Description |
|---|---|---|---|
| Low | cloudflare_level_1 | 10 | Use to bypass Cloudflare protected sites with low security settings enabled. |
| Medium | cloudflare_level_2 | 35 | Use to bypass Cloudflare protected sites with medium security settings enabled. On large plans the credit multiple will be increased to maintain a flat rate of $3.50 per thousand requests. |
| High | cloudflare_level_3 | 50 | Use to bypass Cloudflare protected sites with high security settings enabled. On large plans the credit multiple will be increased to maintain a flat rate of $4 per thousand requests. |
You can check out the full documentation here.
Or if you would prefer to try to optimize your user-agent, headers and proxy configuration yourself then read on and we will explain how to do it.
Inspect the 403 response body
Before changing settings, find out what the 403 page actually says — a Cloudflare 403 looks very different from a DataDome 403 or a plain rate-limit response, and each one points to a different fix.
By default Scrapy drops non-2xx responses before they reach your callback. To opt in for 403s specifically, set handle_httpstatus_list on your spider:
import scrapy
class DebugSpider(scrapy.Spider):
name = "debug"
# Hand the 403 to our callback instead of letting HttpErrorMiddleware drop it.
handle_httpstatus_list = [403]
start_urls = ["https://www.example.com/"]
def parse(self, response):
self.logger.info("status=%s server=%s body_len=%d",
response.status,
response.headers.get("Server", b"").decode(errors="ignore"),
len(response.text))
# Look for telltale anti-bot signatures
for signature, vendor in [
("Just a moment...", "Cloudflare interstitial"),
("Attention Required! | Cloudflare", "Cloudflare block"),
("dd-captcha", "DataDome"),
("PXBT", "PerimeterX"),
("Sucuri", "Sucuri WAF"),
]:
if signature in response.text:
self.logger.warning("Detected: %s", vendor)
Two quick signals from the response are enough to choose the right fix:
Serverheader — Cloudflare almost always setsServer: cloudflare. AWS WAF, Akamai and DataDome leave their own signatures.- Body contents — anti-bot vendors leave distinctive strings in their challenge pages. The snippet above checks for the most common ones.
You can also override handle_httpstatus_list per request via meta={'handle_httpstatus_list': [403]} when you only need it for a subset of URLs.
Randomising Your Request Delays
If you send a request to a website from the same IP every second then it websites can easily detect you and flag you as a scraper.
Instead, you should space out your requests over a longer period of time and randomise when they are sent.
Doing this in Scrapy is very simple using the DOWNLOAD_DELAY functionality.
By default, your Scrapy projects DOWNLOAD_DELAY setting is set to 0, which means that it sends each request consecutively to the same website without any delay between requests.
However, you can randomize your requests by giving DOWNLOAD_DELAY a non-zero seconds value in your settings.py file:
## settings.py
DOWNLOAD_DELAY = 2 # 2 seconds of delay
When DOWNLOAD_DELAY is non-zero, Scrapy will wait a random interval of between 0.5 * DOWNLOAD_DELAY and 1.5 * DOWNLOAD_DELAY between each request.
This is because, by default RANDOMIZE_DOWNLOAD_DELAY is set to True.
If your scraping job isn't big and you don't have massive time pressure then it is recommended to set a high DOWNLOAD_DELAY as this will minimize the load on the website and reduce your chances of getting blocked.
Better: turn on AutoThrottle
For most modern Scrapy projects in 2026 you want AutoThrottle on as well. It dynamically adjusts the delay based on the load you're putting on the target server and the latency the server is returning — so you scale back automatically the moment the site starts struggling, which makes 403s and 429s far less likely:
## settings.py
DOWNLOAD_DELAY = 2
AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_START_DELAY = 1
AUTOTHROTTLE_MAX_DELAY = 60
AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
AUTOTHROTTLE_DEBUG = False
# Honour 429 Retry-After when present
RETRY_HTTP_CODES = [408, 429, 500, 502, 503, 504, 522, 524]
RETRY_TIMES = 5
AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 means "aim to have at most one request in flight per remote server at any time." Bump it up gradually if you need more throughput — but if you start seeing 403s come back, lower it.
Use Fake User Agents
The most common reason for a website to block a Scrapy spider and return a 403 error is because your spider is telling the website your spider is a automated scraper.
This is largely because by default Scrapy tells the website that it is a scraper in the user-agent it sends with your request.
Unless, you override the default Scrapy settings, your spider will send the following user-agent with every request:
user-agent: Scrapy/VERSION (+https://scrapy.org)
This tells the website that your requests are coming from a Scrapy spider, so it is very easy for them to block your requests and return a 403 status code.
Solution
The solution to this problem is to configure your spider to send a fake user-agent with every request. This way it is harder for the website to tell if your requests are coming from a scraper or a real user.
We wrote a full guide on how to set fake user-agents for your scrapers here, however, this is a quick summary of the solution:
Method 1: Set Fake User-Agent In Settings.py File
The easiest way to change the default Scrapy user-agent is to set a default user-agent in your settings.py file.
Simply uncomment the USER_AGENT value in the settings.py file and add a new user agent:
## settings.py
USER_AGENT = 'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148'
You can find a huge list of user-agents here.
This will only work on relatively small scrapes, as if you use the same user-agent on every single request then a website with a more sophisticated anti-bot solution could easily still detect your scraper.
To get around this we need to rotate through a large pool of fake user-agents so that every request looks unique.
Method 2: Use Scrapy-Fake-Useragent
You could gather a large list of fake user-agents and configure your spider to rotate through them yourself like this example, or you could use a Scrapy middleware like scrapy-fake-useragent.
scrapy-fake-useragent generates fake user-agents for your requests based on usage statistics from a real world database, and attached them to every request.
Getting scrapy-fake-useragent setup is simple. Simply install the Python package:
pip install scrapy-fake-useragent
Then in your settings.py file, you need to turn off the built in UserAgentMiddleware and RetryMiddleware, and enable scrapy-fake-useragent's RandomUserAgentMiddleware and RetryUserAgentMiddleware.
## settings.py
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
'scrapy_fake_useragent.middleware.RetryUserAgentMiddleware': 401,
}
And then enable the Fake User-Agent Providers by adding them to your settings.py file.
## settings.py
FAKEUSERAGENT_PROVIDERS = [
'scrapy_fake_useragent.providers.FakeUserAgentProvider', # This is the first provider we'll try
'scrapy_fake_useragent.providers.FakerProvider', # If FakeUserAgentProvider fails, we'll use faker to generate a user-agent string for us
'scrapy_fake_useragent.providers.FixedUserAgentProvider', # Fall back to USER_AGENT value
]
## Set Fallback User-Agent
USER_AGENT = '<your user agent string which you will fall back to if all other providers fail>'
When activated, scrapy-fake-useragent will download a list of the most common user-agents from useragentstring.com and use a random one with every request, so you don't need to create your own list.
You can also add your own user-agent string providers, or configure it to generate new user-agent strings as a backup using Faker.
To see all the configuration options, then check out the docs here.
Optimize Request Headers
In a lot of cases, just adding fake user-agents to your requests will solve the Scrapy 403 Forbidden Error, however, if the website is has a more sophisticated anti-bot detection system in place you will also need to optimize the request headers.
By default, Scrapy will only send basic request headers along with your requests such as Accept, Accept-Language, and User-Agent.
Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
Accept-Language: 'en'
User-Agent: 'Scrapy/VERSION (+https://scrapy.org)'
In contrast, here are the request headers a Chrome browser running on a MacOS machine would send:
Connection: 'keep-alive'
Cache-Control: 'max-age=0'
sec-ch-ua: '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"'
sec-ch-ua-mobile: '?0'
sec-ch-ua-platform: "macOS"
Upgrade-Insecure-Requests: 1
User-Agent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36'
Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
Sec-Fetch-Site: 'none'
Sec-Fetch-Mode: 'navigate'
Sec-Fetch-User: '?1'
Sec-Fetch-Dest: 'document'
Accept-Encoding: 'gzip, deflate, br'
Accept-Language: 'en-GB,en-US;q=0.9,en;q=0.8'
If the website is really trying to prevent web scrapers from accessing their content, then they will be analysing the request headers to make sure that the other headers match the user-agent you set, and that the request includes other common headers a real browser would send.
Solution
To solve this, we need to make sure we optimize the request headers, including making sure the fake user-agent is consistent with the other headers.
This is a big topic, so if you would like to learn more about header optimization then check out our guide to header optimization.
However, here is a quick example of adding optimized headers to our requests:
# bookspider.py
import scrapy
from demo.items import BookItem
class BookSpider(scrapy.Spider):
name = 'bookspider'
url_list = ["http://books.toscrape.com"]
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0) Gecko/20100101 Firefox/98.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Cache-Control": "max-age=0",
}
def start_requests(self):
for url in self.url_list:
return Request(url=url, callback=self.parse, headers=HEADERS)
def parse(self, response):
for article in response.css('article.product_pod'):
book_item = BookItem(
url = article.css("h3 > a::attr(href)").get(),
title = article.css("h3 > a::attr(title)").extract_first(),
price = article.css(".price_color::text").extract_first(),
)
yield book_item
Here we are adding the same optimized header with a fake user-agent to every request.
TLS Fingerprinting in Scrapy
If your User-Agent is right, your headers are right, your delays are sensible and you're still getting 403s from sites behind Cloudflare or DataDome — the problem is almost always your TLS fingerprint.
Scrapy uses Twisted for its HTTP transport, which produces a JA3/JA4 TLS fingerprint that doesn't match any major browser. Cloudflare, DataDome, Akamai Bot Manager and PerimeterX all compare incoming JA3/JA4 hashes against known-browser values and return 403 when they don't match — regardless of how perfect the rest of your request looks. (See the TLS fingerprinting section in our general 403 guide for the underlying mechanics.)
You have three solid options inside Scrapy:
Option A: scrapy-impersonate (TLS impersonation, no browser)
scrapy-impersonate is a Scrapy download handler backed by curl_cffi that swaps Scrapy's Twisted TLS stack for a browser-impersonating one — so your spider makes requests with the exact same JA3 and HTTP/2 fingerprint as Chrome.
pip install scrapy-impersonate
## settings.py
DOWNLOAD_HANDLERS = {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
Then per request, pick the browser version to impersonate:
yield scrapy.Request(
url=url,
callback=self.parse,
meta={"impersonate": "chrome124"},
)
This is the lightest-weight option — no headless browser, no JavaScript execution, just a wire-level browser fingerprint.
Option B: scrapy-playwright (a real browser)
If the target also fires JavaScript challenges (Cloudflare Turnstile, DataDome's slider, Akamai's Sensor Data) you need an actual browser. scrapy-playwright runs Playwright-controlled Chromium/Firefox/WebKit from inside Scrapy, with the full request/response flow still going through your spider's middlewares and pipelines.
pip install scrapy-playwright
playwright install chromium
## settings.py
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
PLAYWRIGHT_BROWSER_TYPE = "chromium"
yield scrapy.Request(
url=url,
callback=self.parse,
meta={"playwright": True, "playwright_include_page": False},
)
Playwright is heavier than scrapy-impersonate (CPU, memory, slower) but defeats both fingerprint and behavioural challenges.
Option C: outsource it to a Smart Proxy API
If you don't want to maintain TLS impersonation as Chrome versions move, the ScrapeOps Proxy Aggregator handles JA3/JA4, HTTP/2, proxy rotation and Cloudflare challenges for you — see the Easy Way section above.
Use Rotating Proxies
If the above solutions don't work then it is highly likely that the server has flagged your IP address as being used by a scraper and is either throttling your requests or completely blocking them.
This is especially likely if you are scraping at larger volumes, as it is easy for websites to detect scrapers if they are getting an unnaturally large amount of requests from the same IP address.
Solution
You will need to send your requests through a rotating proxy pool. We created a full guide on the various options you have when integrating & rotating proxies in your Scrapy spiders here.
However, he is one possible solution using the scrapy-rotating-proxies middleware.
To get started simply install the middleware:
pip install scrapy-rotating-proxies
Then we just need to update our settings.py to load in our proxies and enable the scrapy-rotating-proxies middleware:
## settings.py
## Insert Your List of Proxies Here
ROTATING_PROXY_LIST = [
'proxy1.com:8000',
'proxy2.com:8031',
'proxy3.com:8032',
]
## Enable The Proxy Middleware In Your Downloader Middlewares
DOWNLOADER_MIDDLEWARES = {
# ...
'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
'rotating_proxies.middlewares.BanDetectionMiddleware': 620,
# ...
}
And that's it. After this all requests your spider will make will be proxied using one of the proxies from the ROTATING_PROXY_LIST.
If you need help finding the best & cheapest proxies for your particular use case then check out our proxy comparison tool here.
Alternatively, you could just use the ScrapeOps Proxy Aggregator as we discussed previously.
Frequently Asked Questions
More Scrapy Tutorials
So that's how you can solve Scrapy 403 Unhandled & Forbidden Errors when you get them.
If you would like to know more about bypassing the most common anti-bots then check out our bypass guides here:
- How To Solve 403 Forbidden Errors When Web Scraping — the framework-agnostic companion guide.
- How To Bypass Cloudflare
- How To Bypass DataDome
- How To Bypass PerimeterX
If you would like to learn more about Scrapy, then be sure to check out The Scrapy Playbook.