Web Scraping Guide Part 1: How To Build Our First Scraper
Whether you prefer lightweight HTTP requests or full‑browser automation, this series walks you through building a production‑ready scraper step‑by‑step. In Part 1, we use a single e‑commerce site as the common target and implement five separate stacks - Python Requests + BeautifulSoup, Selenium, Node.js Axios + Cheerio, Puppeteer, and Playwright; so you can see exactly how each tool retrieves, parses, paginates, and exports data.
The goal is pragmatic execution, not theory: every section covers environment setup, resilient request patterns, CSS selector strategies, basic data cleaning, and a CSV hand‑off you can drop straight into a workflow. Choose the stack that fits your project’s scale and move on with confidence.
- Python Requests + BeautifulSoup
- Python Selenium
- Node.js Axios + Cheerio
- Node.js Puppeteer
- Node.js Playwright

Python Requests/BS4 Beginners Series Part 1: How To Build Our First Scraper
When it comes to web scraping Python is the go-to language for web scraping because of its highly active community, great web scraping libraries and popularity within the data science community.
There are lots of articles online, showing you how to make your first basic Python scraper. However, there are very few that walk you through the full process of building a production ready Python scraper.
To address this, we are doing a 6-Part Python Requests/BeautifulSoup Beginner Series, where we're going to build a Python scraping project end-to-end from building the scrapers to deploying on a server and run them every day.
Python Requests/BeautifulSoup 6-Part Beginner Series
-
Part 1: Basic Python Requests/BeautifulSoup Scraper - We'll go over the basics of scraping with Python, and build our first Python scraper. (Part 1)
-
Part 2: Cleaning Dirty Data & Dealing With Edge Cases - Web data can be messy, unstructured, and have lots of edge cases. In this tutorial we'll make our scraper robust to these edge cases, using data classes and data cleaning pipelines. (Part 2)
-
Part 3: Storing Data in AWS S3, MySQL & Postgres DBs - There are many different ways we can store the data that we scrape from databases, CSV files to JSON format, and S3 buckets. We'll explore several different ways we can store the data and talk about their pros, and cons and in which situations you would use them. (Part 3)
-
Part 4: Managing Retries & Concurrency - Make our scraper more robust and scalable by handling failed requests and using concurrency. (Part 4)
-
Part 5: Faking User-Agents & Browser Headers - Make our scraper production ready by using fake user agents & browser headers to make our scrapers look more like real users. (Part 5)
-
Part 6: Using Proxies To Avoid Getting Blocked - Explore how to use proxies to bypass anti-bot systems by hiding your real IP address and location. (Part 6)
For this beginner series, we're going to be using one of the simplest scraping architectures. A single scraper, being given a start URL which will then crawl the site, parse and clean the data from the HTML responses, and store the data all in the same process.
This architecture is suitable for the majority of hobby and small scraping projects, however, if you are scraping business critical data at larger scales then we would use different scraping architectures.
The code for this project is available on Github here!
If you prefer to follow along with a video then check out the video tutorial version here:
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
Part 1: Basic Python Scraper
In this tutorial, Part 1: Basic Python Scraper we're going to cover:
- Our Python Web Scraping Stack
- How to Setup Our Python Environment
- Creating Our Scraper Project
- Laying Out Our Python Scraper
- Retrieving The HTML From Website
- Extracting Data From HTML
- Saving Data to CSV
- How to Navigate Through Pages
- Next Steps
For this series, we will be scraping the products from Chocolate.co.uk as it will be a good example of how to approach scraping a e-commerce store. Plus, who doesn't like Chocolate!

Our Python Web Scraping Stack
When it comes to web scraping stacks there are two key components:
- HTTP Client: Which sends a request to the website to retrieve the HTML/JSON response from the website.
- Parsing Library: Which is used to extract the data from the web page.
Due to the popularity of Python for web scraping, we have numerous options for both.
We can use Python Requests, Python HTTPX or Python aiohttp as HTTP clients.
And BeautifulSoup, Lxml, Parsel, etc. as parsing libraries.
Or we could use Python web scraping libraries/frameworks that combine both HTTP requests and parsing like Scrapy, Python Selenium and Requests-HTML.
Each stack has its own pros and cons, however, for the puposes of this beginners series we will use the Python Requests/BeautifulSoup stack as it is by far the most common web scraping stack used by Python developers.
Using the Python Requests/BeautifulSoup stack you can easily build highly scalable scrapers that will retrieve a pages HTML, parse and process the data, and store it the file format and location of your choice.
How to Setup Our Python Environment
With the intro out of the way, let's start developing our scraper. First, things first we need to setup up our Python environment.
Step 1 - Setup your Python Environment
To avoid version conflicts down the raod it is best practice to create a seperate virtual environment for each of your Python projects. This means that any packages you install for a project are kept seperate from other projects, so you don't inadverently end up breaking other projects.
Depending on the operating system of your machine these commands will be slightly different.