Skip to main content

Scrapy 403 Unhandled or Forbidden Errors

UpdatedMay 13, 2026

How To Solve A Scrapy 403 Unhandled or Forbidden Errors (2026)

TL;DR — A 403 in Scrapy almost always means the target site has identified your spider as a bot and refused the request, not that Scrapy is broken. The fix is to make your spider look like a real browser: realistic User-Agent, full header set, sensible DOWNLOAD_DELAY and AUTOTHROTTLE, rotating residential proxies, and — for Cloudflare or DataDome-protected sites — TLS impersonation via scrapy-impersonate or scrapy-playwright.

It will typically look something like this in your logs:


2026-05-13 00:13:02 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.example.com/> (referer: None)
2026-05-13 00:13:03 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.example.com/>: HTTP status code is not handled or not allowed

Scrapy's HttpErrorMiddleware drops non-2xx responses by default, which is why the 403 itself never reaches your callback unless you opt in. The underlying 403 has only two realistic causes:

  • The URL really is permission-protected and you need to authenticate.
  • The website detected an automated client and returned a 403 as a ban page.

In scraping, the second cause is overwhelmingly the more common one. Scrapy 403 responses are particularly common against sites protected by Cloudflare, DataDome and PerimeterX.

In this guide we'll debug Scrapy 403 Forbidden Errors end-to-end and walk through the fixes — in 2026 order, easiest first.

Let's begin...

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.


Quick reference: Scrapy 403 causes & fixes

#SymptomMost likely causeFix in Scrapy
1403 on every request, only Scrapy/x.y User-Agent in logsDefault USER_AGENT setting still in placeSet USER_AGENT in settings.py or install scrapy-fake-useragent
2403 with a real User-Agent setDEFAULT_REQUEST_HEADERS missing browser headers, or headers inconsistent with UAOverride DEFAULT_REQUEST_HEADERS to a full Chrome/Firefox header set
3First few requests succeed, then 403IP-based rate limitingEnable AUTOTHROTTLE_ENABLED = True and a non-zero DOWNLOAD_DELAY
4403 from every IP, even with perfect headersDatacenter IPs flaggedMove to residential or mobile proxies via scrapy-rotating-proxies or a smart proxy API
5403 from Cloudflare/DataDome with a "Just a moment..." pageScrapy's Twisted TLS stack has a distinctive JA3 fingerprintUse scrapy-impersonate, scrapy-playwright, or the ScrapeOps Proxy Aggregator with bypass=cloudflare_level_X
6403 only on a few protected URLsLogin-required URLs or geo-blocked contentAuthenticate, send the right cookies, or use a proxy in the right geography

Easy Way To Solve Scrapy 403 Errors

If the URL you are trying to scrape is normally accessible, but you are getting Scrapy 403 Forbidden Errors then it is likely that the website is flagging your spider as a scraper and blocking your requests.

To avoid getting detected we need to optimise our spiders to bypass anti-bot countermeasures by:

  • Randomising Your Requests
  • Using Fake User Agents
  • Optimizing Request Headers
  • Using Proxies

We will discuss these below, however, the easiest way to fix this problem is to use a smart proxy solution like the ScrapeOps Proxy Aggregator.

ScrapeOps Proxy Aggregator

With the ScrapeOps Proxy Aggregator you simply need to send your requests to the ScrapeOps proxy endpoint and our Proxy Aggregator will optimise your request with the best user-agent, header and proxy configuration to ensure you don't get 403 errors from your target website.

Simply get your free API key by signing up for a free account here and edit your Scrapy spider as follows:


import scrapy
from urllib.parse import urlencode

API_KEY = 'YOUR_API_KEY'

def get_scrapeops_url(url):
payload = {'api_key': API_KEY, 'url': url}
proxy_url = 'https://proxy.scrapeops.io/v1/?' + urlencode(payload)
return proxy_url

class QuotesSpider(scrapy.Spider):
name = "quotes"

def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=get_scrapeops_url(url), callback=self.parse)


If you are getting blocked by Cloudflare, then you can simply activate ScrapeOps' Cloudflare Bypass by adding bypass=cloudflare_level_1 to the request:


import scrapy
from urllib.parse import urlencode

API_KEY = 'YOUR_API_KEY'

def get_scrapeops_url(url):
payload = {'api_key': API_KEY, 'url': url, 'bypass': 'cloudflare_level_1'}
proxy_url = 'https://proxy.scrapeops.io/v1/?' + urlencode(payload)
return proxy_url

class QuotesSpider(scrapy.Spider):
name = "quotes"

def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=get_scrapeops_url(url), callback=self.parse)

tip

Cloudflare is the most common anti-bot system being used by websites today, and bypassing it depends on which security settings the website has enabled.

To combat this, we offer 3 different Cloudflare bypasses designed to solve the Cloudflare challenges at each security level.

Security LevelBypassAPI CreditsDescription
Lowcloudflare_level_110Use to bypass Cloudflare protected sites with low security settings enabled.
Mediumcloudflare_level_235Use to bypass Cloudflare protected sites with medium security settings enabled. On large plans the credit multiple will be increased to maintain a flat rate of $3.50 per thousand requests.
Highcloudflare_level_350Use to bypass Cloudflare protected sites with high security settings enabled. On large plans the credit multiple will be increased to maintain a flat rate of $4 per thousand requests.

You can check out the full documentation here.

Or if you would prefer to try to optimize your user-agent, headers and proxy configuration yourself then read on and we will explain how to do it.


Inspect the 403 response body

Before changing settings, find out what the 403 page actually says — a Cloudflare 403 looks very different from a DataDome 403 or a plain rate-limit response, and each one points to a different fix.

By default Scrapy drops non-2xx responses before they reach your callback. To opt in for 403s specifically, set handle_httpstatus_list on your spider:


import scrapy

class DebugSpider(scrapy.Spider):
name = "debug"
# Hand the 403 to our callback instead of letting HttpErrorMiddleware drop it.
handle_httpstatus_list = [403]
start_urls = ["https://www.example.com/"]

def parse(self, response):
self.logger.info("status=%s server=%s body_len=%d",
response.status,
response.headers.get("Server", b"").decode(errors="ignore"),
len(response.text))
# Look for telltale anti-bot signatures
for signature, vendor in [
("Just a moment...", "Cloudflare interstitial"),
("Attention Required! | Cloudflare", "Cloudflare block"),
("dd-captcha", "DataDome"),
("PXBT", "PerimeterX"),
("Sucuri", "Sucuri WAF"),
]:
if signature in response.text:
self.logger.warning("Detected: %s", vendor)

Two quick signals from the response are enough to choose the right fix:

  • Server header — Cloudflare almost always sets Server: cloudflare. AWS WAF, Akamai and DataDome leave their own signatures.
  • Body contents — anti-bot vendors leave distinctive strings in their challenge pages. The snippet above checks for the most common ones.

You can also override handle_httpstatus_list per request via meta={'handle_httpstatus_list': [403]} when you only need it for a subset of URLs.


Randomising Your Request Delays

If you send a request to a website from the same IP every second then it websites can easily detect you and flag you as a scraper.

Instead, you should space out your requests over a longer period of time and randomise when they are sent.

Doing this in Scrapy is very simple using the DOWNLOAD_DELAY functionality.

By default, your Scrapy projects DOWNLOAD_DELAY setting is set to 0, which means that it sends each request consecutively to the same website without any delay between requests.

However, you can randomize your requests by giving DOWNLOAD_DELAY a non-zero seconds value in your settings.py file:

## settings.py

DOWNLOAD_DELAY = 2 # 2 seconds of delay

When DOWNLOAD_DELAY is non-zero, Scrapy will wait a random interval of between 0.5 * DOWNLOAD_DELAY and 1.5 * DOWNLOAD_DELAY between each request.

This is because, by default RANDOMIZE_DOWNLOAD_DELAY is set to True.

If your scraping job isn't big and you don't have massive time pressure then it is recommended to set a high DOWNLOAD_DELAY as this will minimize the load on the website and reduce your chances of getting blocked.

Better: turn on AutoThrottle

For most modern Scrapy projects in 2026 you want AutoThrottle on as well. It dynamically adjusts the delay based on the load you're putting on the target server and the latency the server is returning — so you scale back automatically the moment the site starts struggling, which makes 403s and 429s far less likely:

## settings.py

DOWNLOAD_DELAY = 2
AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_START_DELAY = 1
AUTOTHROTTLE_MAX_DELAY = 60
AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
AUTOTHROTTLE_DEBUG = False

# Honour 429 Retry-After when present
RETRY_HTTP_CODES = [408, 429, 500, 502, 503, 504, 522, 524]
RETRY_TIMES = 5

AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 means "aim to have at most one request in flight per remote server at any time." Bump it up gradually if you need more throughput — but if you start seeing 403s come back, lower it.


Use Fake User Agents

The most common reason for a website to block a Scrapy spider and return a 403 error is because your spider is telling the website your spider is a automated scraper.

This is largely because by default Scrapy tells the website that it is a scraper in the user-agent it sends with your request.

Unless, you override the default Scrapy settings, your spider will send the following user-agent with every request:


user-agent: Scrapy/VERSION (+https://scrapy.org)

This tells the website that your requests are coming from a Scrapy spider, so it is very easy for them to block your requests and return a 403 status code.

Solution

The solution to this problem is to configure your spider to send a fake user-agent with every request. This way it is harder for the website to tell if your requests are coming from a scraper or a real user.

We wrote a full guide on how to set fake user-agents for your scrapers here, however, this is a quick summary of the solution:

Method 1: Set Fake User-Agent In Settings.py File

The easiest way to change the default Scrapy user-agent is to set a default user-agent in your settings.py file.

Simply uncomment the USER_AGENT value in the settings.py file and add a new user agent:

## settings.py

USER_AGENT = 'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148'

You can find a huge list of user-agents here.

This will only work on relatively small scrapes, as if you use the same user-agent on every single request then a website with a more sophisticated anti-bot solution could easily still detect your scraper.

To get around this we need to rotate through a large pool of fake user-agents so that every request looks unique.

Method 2: Use Scrapy-Fake-Useragent

You could gather a large list of fake user-agents and configure your spider to rotate through them yourself like this example, or you could use a Scrapy middleware like scrapy-fake-useragent.

scrapy-fake-useragent generates fake user-agents for your requests based on usage statistics from a real world database, and attached them to every request.

Getting scrapy-fake-useragent setup is simple. Simply install the Python package:


pip install scrapy-fake-useragent

Then in your settings.py file, you need to turn off the built in UserAgentMiddleware and RetryMiddleware, and enable scrapy-fake-useragent's RandomUserAgentMiddleware and RetryUserAgentMiddleware.

## settings.py

DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
'scrapy_fake_useragent.middleware.RetryUserAgentMiddleware': 401,
}

And then enable the Fake User-Agent Providers by adding them to your settings.py file.

## settings.py

FAKEUSERAGENT_PROVIDERS = [
'scrapy_fake_useragent.providers.FakeUserAgentProvider', # This is the first provider we'll try
'scrapy_fake_useragent.providers.FakerProvider', # If FakeUserAgentProvider fails, we'll use faker to generate a user-agent string for us
'scrapy_fake_useragent.providers.FixedUserAgentProvider', # Fall back to USER_AGENT value
]

## Set Fallback User-Agent
USER_AGENT = '<your user agent string which you will fall back to if all other providers fail>'


When activated, scrapy-fake-useragent will download a list of the most common user-agents from useragentstring.com and use a random one with every request, so you don't need to create your own list.

You can also add your own user-agent string providers, or configure it to generate new user-agent strings as a backup using Faker.

To see all the configuration options, then check out the docs here.


Optimize Request Headers

In a lot of cases, just adding fake user-agents to your requests will solve the Scrapy 403 Forbidden Error, however, if the website is has a more sophisticated anti-bot detection system in place you will also need to optimize the request headers.

By default, Scrapy will only send basic request headers along with your requests such as Accept, Accept-Language, and User-Agent.


Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
Accept-Language: 'en'
User-Agent: 'Scrapy/VERSION (+https://scrapy.org)'

In contrast, here are the request headers a Chrome browser running on a MacOS machine would send:


Connection: 'keep-alive'
Cache-Control: 'max-age=0'
sec-ch-ua: '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"'
sec-ch-ua-mobile: '?0'
sec-ch-ua-platform: "macOS"
Upgrade-Insecure-Requests: 1
User-Agent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36'
Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
Sec-Fetch-Site: 'none'
Sec-Fetch-Mode: 'navigate'
Sec-Fetch-User: '?1'
Sec-Fetch-Dest: 'document'
Accept-Encoding: 'gzip, deflate, br'
Accept-Language: 'en-GB,en-US;q=0.9,en;q=0.8'

If the website is really trying to prevent web scrapers from accessing their content, then they will be analysing the request headers to make sure that the other headers match the user-agent you set, and that the request includes other common headers a real browser would send.

Solution

To solve this, we need to make sure we optimize the request headers, including making sure the fake user-agent is consistent with the other headers.

This is a big topic, so if you would like to learn more about header optimization then check out our guide to header optimization.

However, here is a quick example of adding optimized headers to our requests:

# bookspider.py 

import scrapy
from demo.items import BookItem

class BookSpider(scrapy.Spider):
name = 'bookspider'
url_list = ["http://books.toscrape.com"]

HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0) Gecko/20100101 Firefox/98.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Cache-Control": "max-age=0",
}

def start_requests(self):
for url in self.url_list:
return Request(url=url, callback=self.parse, headers=HEADERS)

def parse(self, response):

for article in response.css('article.product_pod'):
book_item = BookItem(
url = article.css("h3 > a::attr(href)").get(),
title = article.css("h3 > a::attr(title)").extract_first(),
price = article.css(".price_color::text").extract_first(),
)
yield book_item

Here we are adding the same optimized header with a fake user-agent to every request.


TLS Fingerprinting in Scrapy

If your User-Agent is right, your headers are right, your delays are sensible and you're still getting 403s from sites behind Cloudflare or DataDome — the problem is almost always your TLS fingerprint.

Scrapy uses Twisted for its HTTP transport, which produces a JA3/JA4 TLS fingerprint that doesn't match any major browser. Cloudflare, DataDome, Akamai Bot Manager and PerimeterX all compare incoming JA3/JA4 hashes against known-browser values and return 403 when they don't match — regardless of how perfect the rest of your request looks. (See the TLS fingerprinting section in our general 403 guide for the underlying mechanics.)

You have three solid options inside Scrapy:

Option A: scrapy-impersonate (TLS impersonation, no browser)

scrapy-impersonate is a Scrapy download handler backed by curl_cffi that swaps Scrapy's Twisted TLS stack for a browser-impersonating one — so your spider makes requests with the exact same JA3 and HTTP/2 fingerprint as Chrome.


pip install scrapy-impersonate

## settings.py

DOWNLOAD_HANDLERS = {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Then per request, pick the browser version to impersonate:


yield scrapy.Request(
url=url,
callback=self.parse,
meta={"impersonate": "chrome124"},
)

This is the lightest-weight option — no headless browser, no JavaScript execution, just a wire-level browser fingerprint.

Option B: scrapy-playwright (a real browser)

If the target also fires JavaScript challenges (Cloudflare Turnstile, DataDome's slider, Akamai's Sensor Data) you need an actual browser. scrapy-playwright runs Playwright-controlled Chromium/Firefox/WebKit from inside Scrapy, with the full request/response flow still going through your spider's middlewares and pipelines.


pip install scrapy-playwright
playwright install chromium

## settings.py

DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
PLAYWRIGHT_BROWSER_TYPE = "chromium"


yield scrapy.Request(
url=url,
callback=self.parse,
meta={"playwright": True, "playwright_include_page": False},
)

Playwright is heavier than scrapy-impersonate (CPU, memory, slower) but defeats both fingerprint and behavioural challenges.

Option C: outsource it to a Smart Proxy API

If you don't want to maintain TLS impersonation as Chrome versions move, the ScrapeOps Proxy Aggregator handles JA3/JA4, HTTP/2, proxy rotation and Cloudflare challenges for you — see the Easy Way section above.


Use Rotating Proxies

If the above solutions don't work then it is highly likely that the server has flagged your IP address as being used by a scraper and is either throttling your requests or completely blocking them.

This is especially likely if you are scraping at larger volumes, as it is easy for websites to detect scrapers if they are getting an unnaturally large amount of requests from the same IP address.

Solution

You will need to send your requests through a rotating proxy pool. We created a full guide on the various options you have when integrating & rotating proxies in your Scrapy spiders here.

However, he is one possible solution using the scrapy-rotating-proxies middleware.

To get started simply install the middleware:


pip install scrapy-rotating-proxies

Then we just need to update our settings.py to load in our proxies and enable the scrapy-rotating-proxies middleware:

## settings.py

## Insert Your List of Proxies Here
ROTATING_PROXY_LIST = [
'proxy1.com:8000',
'proxy2.com:8031',
'proxy3.com:8032',
]

## Enable The Proxy Middleware In Your Downloader Middlewares
DOWNLOADER_MIDDLEWARES = {
# ...
'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
'rotating_proxies.middlewares.BanDetectionMiddleware': 620,
# ...
}

And that's it. After this all requests your spider will make will be proxied using one of the proxies from the ROTATING_PROXY_LIST.

If you need help finding the best & cheapest proxies for your particular use case then check out our proxy comparison tool here.

Alternatively, you could just use the ScrapeOps Proxy Aggregator as we discussed previously.


Frequently Asked Questions

What does 'HTTP status code is not handled or not allowed' mean in Scrapy?

Scrapy's HttpErrorMiddleware drops any response with a status code outside the 2xx range by default — that's the 'not handled or not allowed' log line. The underlying 403 means the target site has identified your spider as a bot. The fix is on the request side: realistic User-Agent, full headers, AUTOTHROTTLE, proxies, and (for sites behind Cloudflare or DataDome) TLS impersonation.

Work through five layers: (1) set a real browser User-Agent in settings.py or via scrapy-fake-useragent; (2) send the full DEFAULT_REQUEST_HEADERS browser set; (3) slow down with DOWNLOAD_DELAY plus AUTOTHROTTLE_ENABLED; (4) route through residential or mobile rotating proxies; (5) if the target is behind Cloudflare or DataDome, use scrapy-impersonate, scrapy-playwright, or a smart proxy API.

Scrapy's default User-Agent — 'Scrapy/VERSION (+https://scrapy.org)' — is the most obvious bot signal a website can ask for. Even when overridden, Scrapy still sends a shorter, differently-ordered set of headers than a real browser and uses Twisted's TLS stack, which produces a distinctive JA3 fingerprint. curl with --impersonate or with your system's TLS often looks much closer to a real browser.

Set handle_httpstatus_list = [403] on the spider class, or pass meta={'handle_httpstatus_list': [403]} on a specific Request. Scrapy will then deliver the 403 response to your callback so you can inspect response.text, response.headers and the body — handy for telling apart a Cloudflare 403, DataDome 403 and plain rate-limit response.

Not for small crawls — a single residential or mobile IP plus a realistic header set is often enough. Proxies become essential as you scale up. Use AUTOTHROTTLE first and add proxy rotation when throttling alone isn't enough.

scrapy-impersonate is the right choice when the only problem is TLS fingerprinting — it's much lighter than a full browser and reuses Scrapy's normal request/response flow. scrapy-playwright is the right choice when the target also fires JavaScript challenges (Cloudflare Turnstile, DataDome slider, etc.) that need a real browser to clear.


More Scrapy Tutorials

So that's how you can solve Scrapy 403 Unhandled & Forbidden Errors when you get them.

If you would like to know more about bypassing the most common anti-bots then check out our bypass guides here:

If you would like to learn more about Scrapy, then be sure to check out The Scrapy Playbook.