![]()
How to Monitor Your Scrapy Spiders?
For anyone who has been in web scraping for a while, you know that if there is anything certain in web scraping that just because your scrapers work today doesn’t mean they will work tomorrow.
From day to day, your scrapers can break or their performance degrade for a whole host of reasons:
- The HTML structure of the target site can change.
- The target site can change their anti-bot countermeasures.
- Your proxy network can degrade or go down.
- Or something can go wrong on your server.
Because of this it is very important for you to have a reliable and effective way for you to monitor your scrapers in production, conduct health checks and get alerts when the performance of your spider drops.
In this guide, we will go through the 4 popular options to monitor your scrapers:
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
#1: Scrapy Logs & Stats
Out of the box, Scrapy boasts by far the best logging and stats functionality of any web scraping library or framework out there.
2021-12-17 17:02:25 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 1330,
'downloader/request_count': 5,
'downloader/request_method_count/GET': 5,
'downloader/response_bytes': 11551,
'downloader/response_count': 5,
'downloader/response_status_count/200': 5,
'elapsed_time_seconds': 2.600152,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 12, 17, 16, 2, 22, 118835),
'httpcompression/response_bytes': 55120,
'httpcompression/response_count': 5,
'item_scraped_count': 50,
'log_count/INFO': 10,
'response_received_count': 5,
'scheduler/dequeued': 5,
'scheduler/dequeued/memory': 5,
'scheduler/enqueued': 5,
'scheduler/enqueued/memory': 5,
'start_time': datetime.datetime(2021, 12, 17, 16, 2, 19, 518683)}
2021-12-17 17:02:25 [scrapy.core.engine] INFO: Spider closed (finished)
Whereas most other scraping libraries and frameworks focus solely on making requests and parsing the responses, Scrapy has a whole logging and stats layer under the hood that tracks your spiders in real-time. Making it really easy to test and debug your spiders when developing them.
You can easily customise the logging levels, and add more stats to the default Scrapy stats in spiders with a couple lines of code.
The major problem relying solely on using this approach to monitoring your scrapers is that it quickly becomes impractical and cumbersome in production. Especially when you have multiple spiders running every day across multiple servers.
The check the health of your scraping jobs you will need to store these logs, and either periodically SSH into the server to view them or setup a custom log exporting system so you can view them in a central user interface. More on this later.
Summary
Using Scrapy's built-in logging and stats functionality is great during development, but when running scrapers in production you should look to use a better monitoring setup.
Pros
- Setup right out of the box, and very light weight.
- Easy to customise so it to logs more stats.
- Great for local testing and the development phase.
Cons
- No dashboard functionality, so you need to setup your own system to export your logs and display them.
- No historical comparison capabilities within jobs.
- No inbuilt health check functionality.
- Cumbersome to rely solely on when in production.
#2: ScrapeOps Extension
ScrapeOps is a monitoring and alerting tool dedicated to web scraping. With a simple 30 second install ScrapeOps gives you all the monitoring, alerting, scheduling and data validation functionality you need for web scraping straight out of the box.
Live demo here: ScrapeOps Demo

The primary goal with ScrapeOps is to give every developer the same level of scraping monitoring capabilities as the most sophisticated web scrapers, without any of the hassle of setting up your own custom solution.
Unlike the other options on this list, ScrapeOps is a full end-to-end web scraping monitoring and management tool dedicated to web scraping that automatically sets up all the monitors, health checks and alerts for you. If you have an issue with integrating ScrapeOps or need advice on setting up your scrapers then they have a support team on-hand to assist you.
Features
Once you have completed the simple install (3 lines in your scraper), ScrapeOps will:
- 🕵️♂️ Monitor - Automatically monitor all your scrapers.
- 📈 Dashboards - Visualise your job data in dashboards, so you see real-time & historical stats.
- 💯 Data Quality - Validate the field coverage in each of your jobs, so broken parsers can be detected straight away.
- 📉 Auto Health Checks - Automatically check every jobs performance data versus its 7 day moving average to see if its healthy or not.
- ✔️ Custom Health Checks - Check each job with any custom health checks you have enabled for it.
- ⏰ Alerts - Alert you via email, Slack, etc. if any of your jobs are unhealthy.
- 📑 Reports - Generate daily (periodic) reports, that check all jobs versus your criteria and let you know if everything is healthy or not.
Job stats tracked include:
- ✅ Pages Scraped & Missed
- ✅ Items Parsed & Missed
- ✅ Item Field Coverage
- ✅ Runtimes
- ✅ Response Status Codes
- ✅ Success Rates
- ✅ Latencies
- ✅ Errors & Warnings
- ✅ Bandwidth
Integration
Getting setup with the logger is simple. Just install the Python package:
pip install scrapeops-scrapy
And add 3 lines to your settings.py file:
## settings.py
## Add Your ScrapeOps API key
SCRAPEOPS_API_KEY = 'YOUR_API_KEY'
## Add In The ScrapeOps Extension
EXTENSIONS = {
'scrapeops_scrapy.extension.ScrapeOpsMonitor': 500,
}
## Update The Download Middlewares
DOWNLOADER_MIDDLEWARES = {
'scrapeops_scrapy.middleware.retry.RetryMiddleware': 550,
'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
}
From there, your scraping stats will be automatically logged and automatically shipped to your dashboard.

Summary
ScrapeOps is a powerful web scraping monitoring tool, that gives you all the monitoring, alerting, scheduling and data validation functionality you need for web scraping straight out of the box.
Pros
- Free unlimited community plan.
- Simple 30 second install, gives you advanced job monitoring, health checks and alerts straight out of the box.
- Job scheduling and management functionality so you can manage and monitor your scrapers from one dashboard.
- Customer support team, available to help you get setup and add new features.
Cons
- Currently, less customisable than Spidermon or other log management tools. (Will be soon!)
#3: Spidermon Extension
Spidermon is an open-source monitoring extension for Scrapy. When integrated it allows you to set up custom monitors that can run at the start, end or periodically during your scrape, and alert you via your chosen communication method.
This is a very powerful tool as it allows you to create custom monitors for each of your Spiders that can validate each Item scraped with your own unit tests.
For example, you can make sure a required field has been scraped, that a url field actually contains a valid url, or have it double check that scraped price is actually a number and doesn’t include any currency signs or special characters.
from schematics.models import Model from schematics.types import URLType, StringType, ListType
class ProductItem(Model): url = URLType(required=True) name = StringType(required=True) price = DecimalType(required=True) features = ListType(StringType) image_url = URLType()
However, the two major drawbacks with Spidermon is the fact that: