freeCodeCamp Scrapy Beginners Course Part 2: Setting Up Scrapy
In Part 2 of the Scrapy Beginner Course, we go through how to setup your Python environment along with installing Scrapy.
We will walk through:
- How To Install Python
- Setting Up Your Python Virtual Environment On Linux/MacOS
- Setting Up Your Python Virtual Environment On Windows
- How To Install Scrapy
The code for this part of the course is available on Github here!
If you prefer video tutorials, then check out the video version of this course on the freeCodeCamp channel here.
This guide is part of the 12 Part freeCodeCamp Scrapy Beginner Course where we will build a Scrapy project end-to-end from building the scrapers to deploying on a server and run them every day.
If you would like to skip to another section then use one of the links below:
- Part 1: Course & Scrapy Overview
- Part 2: Setting Up Environment & Scrapy
- Part 3: Creating Scrapy Project
- Part 4: First Scrapy Spider
- Part 5: Crawling With Scrapy
- Part 6: Cleaning Data With Item Pipelines
- Part 7: Storing Data In CSVs & Databases
- Part 8: Faking Scrapy Headers & User-Agents
- Part 9: Using Proxies With Scrapy Spiders
- Part 10: Deploying & Scheduling Spiders With Scrapyd
- Part 11: Deploying & Scheduling Spiders With ScrapeOps
- Part 12: Deploying & Scheduling Spiders With Scrapy Cloud
The code for this project is available on Github here!
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
How To Install Python
For this course, we assume you already have Python installed and have a basic understanding of coding with Python.
However, if you don't then we recommend you follow the steps outlined in this video to install Python onto your machine.
How To Install pip
pip is a popular 3rd party package manager for Python. It allows you to quickly and easily install 3rd party packages to use lots of code from other people/companies in your Python projects.
You may already have pip installed as part of your Python installation.
You can check this by running pip --version
in your terminal/powershell command prompt. If something like the following is output to the screen it should be installed correctly:
pip 22.3.1 from /usr/local/lib/python3.9/site-packages/pip (python 3.9)
If not then you can install pip with the following command on MacOS/Linux operating systems (after python is installed!):
python -m ensurepip --upgrade
For windows machines:
py -m ensurepip --upgrade
Python Virtual Environments
To avoid version conflicts down the raod it is best practice to create a seperate virtual environments for each of your Python projects. This means that any packages(3rd party code/module) you install for a project are kept seperate from other projects, so you don't inadverently end up breaking other projects.
Depending on the operating system of your machine these commands will be slightly different.
venv comes "built in" as part of the latest version of Python 3 and makes it simple to setup and use virtual environments.
Setting Up Your Python Virtual Environment On Linux
Once you have Python installed, setting up a virtual environment on any Linux distro is pretty simple.
First, we want to make sure we've the latest version of our packages installed.
$ sudo apt-get update
$ apt install tree
Then install python3-venv
if you haven't done so already.
$ sudo apt install -y python3-venv
Next, we will create and activate our Python virtual environment so that any new pip install commands will install into the new venv folder by doing:
$ cd /free_code_camp_scrapy (or what ever the name your project folder is)
$ python3 -m venv venv
$ source venv/bin/activate
Setting Up Your Python Virtual Environment On MacOS
On macOS just run the following commands:
$ cd /free_code_camp_scrapy (or what ever the name your project folder is)
$ python3 -m venv venv
We then activate the virtual environment so that any new pip install commands will install into the new venv folder by doing:
$ source venv/bin/activate
Setting Up Your Python Virtual Environment On Windows
Setting up a virtual environment on Windows is also pretty simple, but we will use virtualenv
instead as venv
can be more complicated to install on Windows.
Install virtualenv
in your Windows command shell, Powershell, or other terminal you are using.
pip install virtualenv
Navigate to the folder where you want to create the virtual environment, and run the virtualenv command.
cd /free_code_camp_scrapy
virtualenv venv
We then activate the virtual environment so that any new pip install commands will install into the new venv folder.
source venv\Scripts\activate
How To Install Scrapy
With our virtual environment created and activated, now it is time to install Scrapy into it.
To do so we just need to install Scrapy via Pip:
pip install scrapy
To make sure everything is working, we can check if Scrapy was installed correctly by typing the command scrapy
into your command line you should get an output like this:
$ scrapy
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
check Check spider contracts
commands
crawl Run a spider
edit Edit spider
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
list List available spiders
parse Parse URL (using its spider) and print the results
runspider Run a self-contained spider
If you get a output similar to the above then you know you have successfully installed Scrapy.
Next Steps
Now that we have our environment setup we will move onto creating our first Scrapy project.
All parts of the 12 Part freeCodeCamp Scrapy Beginner Course are as follows:
- Part 1: Course & Scrapy Overview
- Part 2: Setting Up Environment & Scrapy
- Part 3: Creating Scrapy Project
- Part 4: First Scrapy Spider
- Part 5: Crawling With Scrapy
- Part 6: Cleaning Data With Item Pipelines
- Part 7: Storing Data In CSVs & Databases
- Part 8: Faking Scrapy Headers & User-Agents
- Part 9: Using Proxies With Scrapy Spiders
- Part 10: Deploying & Scheduling Spiders With Scrapyd
- Part 11: Deploying & Scheduling Spiders With ScrapeOps
- Part 12: Deploying & Scheduling Spiders With Scrapy Cloud