
freeCodeCamp Scrapy Beginners Course Part 10: Deploying & SchedulingSpiders With Scrapyd
In Part 10 of the Scrapy Beginner Course, we go through how you can deploy and run your spiders in the cloud with Scrapyd.
There are several ways to run and deploy your scrapers to the cloud which we will cover in this course:
However, in Part 10 we will show you a method using Scrapyd:
- What Is Scrapyd?
- How to Setup Scrapyd
- Controlling Spiders With Scrapyd
- Scrapyd Dashboards
- Integrating Scrapyd with ScrapeOps
- Integrating Scrapyd with ScrapydWeb
The code for this part of the course is available on Github here!
If you prefer video tutorials, then check out the video version of this course on the freeCodeCamp channel here.
This guide is part of the 12 Part freeCodeCamp Scrapy Beginner Course where we will build a Scrapy project end-to-end from building the scrapers to deploying on a server and run them every day.
If you would like to skip to another section then use one of the links below:
- Part 1: Course & Scrapy Overview
- Part 2: Setting Up Environment & Scrapy
- Part 3: Creating Scrapy Project
- Part 4: First Scrapy Spider
- Part 5: Crawling With Scrapy
- Part 6: Cleaning Data With Item Pipelines
- Part 7: Storing Data In CSVs & Databases
- Part 8: Faking Scrapy Headers & User-Agents
- Part 9: Using Proxies With Scrapy Spiders
- Part 10: Deploying & Scheduling Spiders With Scrapyd
- Part 11: Deploying & Scheduling Spiders With ScrapeOps
- Part 12: Deploying & Scheduling Spiders With Scrapy Cloud
The code for this project is available on Github here!
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
What Is Scrapyd?
Scrapyd is an open source package that allows us to deploy Scrapy spiders to a server and run them remotely using a JSON API. Scrapyd allows you to:
- Run Scrapy jobs.
- Pause & Cancel Scrapy jobs.
- Manage Scrapy project/spider versions.
- Access Scrapy logs remotely.
Scrapyd is a great option for developers who want an easy way to manage production Scrapy spiders that run on a remote server.
With Scrapyd you can manage multiple servers from one central point by using a ready-made Scrapyd management tool like ScrapeOps, an open source alternative or by building your own.
Here you can check out the full Scrapyd docs and Github repo.
How to Setup Scrapyd
Getting Scrapyd setup is simple, and you can run it locally or on a server.
First step is to install Scrapyd:
pip install scrapyd
And then start the server by using the command:
scrapyd
This will start Scrapyd running on http://localhost:6800/. You can open this URL in your browser and you should see the following screen:

Deploying Spiders To Scrapyd
To run jobs using Scrapyd, we first need to eggify and deploy our Scrapy project to the Scrapyd server.
To do this, there is a library called scrapyd-client that makes this process very simple.
First, let's install scrapyd-client:
pip install git+https://github.com/scrapy/scrapyd-client.git
Once installed, navigate to our bookscraper project we want to deploy and open our scrapyd.cfg file, which should be located in your project's root directory.
You should see something like this, with the "bookscraper" text being replaced by your Scrapy projects name:
## scrapy.cfg
[settings]
default = bookscraper.settings
[deploy]
#url = http://localhost:6800/
project = bookscraper
Here the scrapyd.cfg configuration file defines the endpoint your Scrapy project should be deployed to. To enable us to deploy our project to Scrapyd, we just need to uncomment the url value if we want to deploy it to a locally running Scrapyd server.
## scrapy.cfg
[settings]
default = bookscraper.settings
[deploy]
url = http://localhost:6800/
project = bookscraper
Then run the following command in your Scrapy projects root directory:
scrapyd-deploy default
This will then eggify your Scrapy project and deploy it to your locally running Scrapyd server. You should get a result like this in your terminal if it was successful:
$ scrapyd-deploy default
Packing version 1640086638
Deploying to project "bookscraper" in http://localhost:6800/addversion.json
Server response (200):
{"node_name": "DESKTOP-67BR2", "status": "ok", "project": "bookscraper", "version": "1640086638", "spiders": 1}
Now your Scrapy project has been deployed to your Scrapyd and is ready to be run.
Controlling Spiders With Scrapyd
Scrapyd comes with a minimal web interface which can be accessed at http://localhost:6800/, however, this interface is just a rudimentary overview of what is running on a Scrapyd server and doesn't allow you to control the spiders deployed to the Scrapyd server.
To control your spiders with Scrapyd you have 3 options:
- Scrapyd JSON API
- Python-Scrapyd-API Library
- Scrapyd Dashboards
For this tutorial we will focus on using free Scrapyd dashboards like ScrapeOps & ScrapydWeb. However, if you would like to learn more about the other options then check out our in-depth Scrapyd guide.
Scrapyd Dashboards
Using Scrapyd's JSON API to control your spiders is possible, however, it isn't ideal as you will need to create custom workflows on your end to monitor, manage and run your spiders. Which can become a major project in itself if you need to manage spiders spread across multiple servers.
Other developers ran into this problem so luckily for us, they decided to create free and open-source Scrapyd dashboards that can connect to your Scrapyd servers so you can manage everything from a single dashboard.
There are many different Scrapyd dashboard and admin tools available:
If you'd like to choose the best one for your requirements then be sure to check out our Guide to the Best Scrapyd Dashboards here.
For this tutorial we will focus on using free Scrapyd dashboards like ScrapeOps & ScrapydWeb.
Integrating Scrapyd with ScrapeOps
ScrapeOps is a free monitoring tool for web scraping that also has a Scrapyd dashboard that allows you to schedule, run and manage all your scrapers from a single dashboard.
Live demo here: ScrapeOps Demo

With a simple 30 second install ScrapeOps gives you all the monitoring, alerting, scheduling and data validation functionality you need for web scraping straight out of the box.
Unlike the other Scrapyd dashboard, ScrapeOps is a full end-to-end web scraping monitoring and management tool dedicated to web scraping that automatically sets up all the monitors, health checks and alerts for you.
ScrapeOps Features
Once setup, ScrapeOps will:
- 🕵️♂️ Monitor - Automatically monitor all your scrapers.
- 📈 Dashboards - Visualise your job data in dashboards, so you see real-time & historical stats.
- 💯 Data Quality - Validate the field coverage in each of your jobs, so broken parsers can be detected straight away.
- 📉 Auto Health Checks - Automatically check every jobs performance data versus its 7 day moving average to see if its healthy or not.
- ✔️ Custom Health Checks - Check each job with any custom health checks you have enabled for it.