Skip to main content
The Python Web Scraping Playbook - Python Logo

The
Python
Web Scraping Playbook

Your guide to becoming a Python Web Scraping Pro!

Python Web Scraping Community:
Python Reddit Community
Python Discord Community
Python Twitter

Stay Up To Date

Get notified of the latest Python news, guides, extensions, and spiders as soon as they are released. Also, suggest new guides and extensions for the Python Web Scraping Playbook then signup here.

Python For Beginners Series

Scrapy Playbook - Create Your First Python Spider - Python Requests/BS4 Beginners Series [Part 1]

Create Your First Python Spider - Python Requests/BS4 Beginners Series [Part 1]

In Part 1 of the series, we go over the basics of how to build a scraper with Python using Python Requests & BeautifulSoup.

Scrapy Playbook - Cleaning Dirty Data & Dealing With Edge Cases - Python Requests/BS4 Beginners Series [Part 2]

Cleaning Dirty Data & Dealing With Edge Cases - Python Requests/BS4 Beginners Series [Part 2]

In Part 2 of the series, we're going to show you how to make your scraper more robust and reliable.

Scrapy Playbook - Storing Data in AWS S3, MySQL & Postgres DBs - Python Requests/BS4 Beginners Series [Part 3]

Storing Data in AWS S3, MySQL & Postgres DBs - Python Requests/BS4 Beginners Series [Part 3]

In Part 3 of the series, we'll explore several different ways we can store the data and talk about their pros, and cons and in which situations you would use them.

Scrapy Playbook - Managing Retries & Concurrency - Python Requests/BS4 Beginners Series [Part 4]

Managing Retries & Concurrency - Python Requests/BS4 Beginners Series [Part 4]

In Part 4 of the series, we make our scraper more robust and scalable by handling failed requests and using concurrency.

Scrapy Playbook - Faking User-Agents & Browser Headers - Python Requests/BS4 Beginners Series [Part 5]

Faking User-Agents & Browser Headers - Python Requests/BS4 Beginners Series [Part 5]

In Part 5 of the series, we make our scraper production ready by using fake user agents & browser headers to make our scrapers look more like real users.

How To Scrape with Python Requests Series

Scrapy Playbook - How to Scrape Amazon With Python Requests and BeautifulSoup

How to Scrape Amazon With Python Requests and BeautifulSoup

In this video, we'll take you through how to scrape Amazon using Python Requests and BeautifulSoup.

Scrapy Playbook - How to Scrape g2 With Python Requests and BeautifulSoup

How to Scrape g2 With Python Requests and BeautifulSoup

In this video, we'll take you through how to scrape g2 using Python Requests and BeautifulSoup.

Python Requests

Scrapy Playbook - Python Requests: Web Scraping Guide

Python Requests: Web Scraping Guide

In this guide, we walk through how you should set up your Python Request scrapers to avoid getting blocked, retrying failed requests and scaling up with concurrency.

Scrapy Playbook - Python Requests: How to Send POST Requests

Python Requests: How to Send POST Requests

In this guide, we walk through how to send POST requests with Python Requests. Including how to POST form data and JSON data.

Scrapy Playbook - Python Requests: How to Use & Rotate Proxies

Python Requests: How to Use & Rotate Proxies

In this guide, we walk through how to use proxies with Python Requests. Including how to rotate proxies in a list, use proxy gateways, and proxy APIs with our Python Requests scrapers.

Scrapy Playbook - Python Requests: Setting Fake User-Agents

Python Requests: Setting Fake User-Agents

In this guide, we walk through how to use fake user-agents with Python Requests to prevent your scrapers from getting blocked.

Scrapy Playbook - Python Requests: Retry Failed Requests

Python Requests: Retry Failed Requests

In this guide, we walk through how to configure Python Requests to retry failed requests so you can build a more reliable system.

Scrapy Playbook - Python Requests: Make Concurent Requests

Python Requests: Make Concurent Requests

In this guide, we walk through how to configure Python Requests to make concurrent requests so that you can increase the speed of your scrapers.

Python Pyppeteer

Scrapy Playbook - The Python Pyppeteer Guide - Use Puppeteer With Python

The Python Pyppeteer Guide - Use Puppeteer With Python

In this guide we show you how to use Python Pyppeteer, the Puppeteer library for Python, to render and scrape Javascript heavy websites.

HTML Parser Libraries

Scrapy Playbook - The 5 Best Python HTML Parsing Libraries Compared

The 5 Best Python HTML Parsing Libraries Compared

In this guide, we compare the 5 best Python HTML parsing libraries available in 2023 - BeautifulSoup, lxml, html5lib, requests-html, and pyquery.

Need an easy way to monitor your scrapers?

Sign up for a free ScrapeOps account today.