The
Python Web Scraping
Playbook
Your guide to becoming a Python Web Scraping Pro!
Web Scraping For Beginners Series
Part 1: How To Build Your First Python Scraper
In Part 1 of the series, we go over the basics of how to build a scraper with Python using Python Requests & BeautifulSoup.
Comparison of Python Libraries for Web Scraping
Python Selenium VS Python Requests Compared
Should you use Python Selenium or Python Requests? In this guide we compare both options and tell you when you should use each.
Python Scrapy VS Python Selenium Compared
Should you use Python Selenium or Python Scrapy? In this guide we compare both options and tell you when you should use each.
Python Scrapy vs Python Pyppeteer Compared
Should you use Python Scrapy or Python Pyppeteer? In this guide we compare both options and tell you when you should use each.
Python Selenium vs NodeJS Playwright Compared
Should you use Python Selenium or NodeJS Playwright? In this guide we compare both options and tell you when you should use each.
Proxies, User-Agents & Avoiding Bans
Python Fake User-Agents: How to Manage User Agents When Scraping
In this guide we show you how to create and manage fake user agents when scraping in Python so you don't get blocked.
Python CloudScraper: Scrape Cloudflare Protected Websites
In this guide, we use the Python CloudScraper library to scrape Cloudflare protected websites.
FlareSolverr Guide - Bypassing Cloudflare Made Simple
In this guide we show you how to setup and use a FlareSolverr server to bypass Cloudflare when scraping.
Python Headless Browsers & Javascript Rendering
The Python Pyppeteer Guide - Use Puppeteer With Python
In this guide we show you how to use Python Pyppeteer, the Puppeteer library for Python, to render and scrape Javascript heavy websites.
Scrapy Javascript Rendering: The 4 Best Scrapy Libraries to Scrape JS Heavy Websites
In this guide we will go through the best javascript rendering libraries for Scrapy so you can scrape modern websites with ease.
Scrapy Playwright Guide: Render & Scrape JS Heavy Websites
In this guide we show you how to use Scrapy Playwright to render and scrape Javascript heavy websites.
Scrapy Splash Guide: A JS Rendering Service For Web Scraping
In this guide we show you how to setup and use Scrapy Splash in your Spider to extract JS rendered data from webpages.
Scrapy Selenium Guide: Integrating Selenium Into Your Scrapy Spiders
In this guide we show you how to setup and use Scrapy Selenium in your Spider to extract JS rendered data from webpages.
HTML Parser Libraries
The 5 Best Python HTML Parsing Libraries Compared
In this guide, we compare the 5 best Python HTML parsing libraries available in 2023 - BeautifulSoup, lxml, html5lib, requests-html, and pyquery.
Newspaper3k Guide: Scrape Articles Using AI
In this guide, we walk through the Python Newspaper3k library and how to use it to scrape & curate articles.
FeedParser Guide: Parse RSS, Atom & RDF Feeds With Python
In this guide, we walk through the Python FeedParser library and how to parse RSS, Atom & RDF feeds.
Python Requests
Python Requests: Web Scraping Guide
In this guide, we walk through how you should set up your Python Request scrapers to avoid getting blocked, retrying failed requests and scaling up with concurrency.
Python Requests: How to Send POST Requests
In this guide, we walk through how to send POST requests with Python Requests. Including how to POST form data and JSON data.
Python Requests: How to Use & Rotate Proxies
In this guide, we walk through how to use proxies with Python Requests. Including how to rotate proxies in a list, use proxy gateways, and proxy APIs with our Python Requests scrapers.
Python Requests: Setting Fake User-Agents
In this guide, we walk through how to use fake user-agents with Python Requests to prevent your scrapers from getting blocked.
Python Requests: Retry Failed Requests
In this guide, we walk through how to configure Python Requests to retry failed requests so you can build a more reliable system.
Python Requests: Make Concurent Requests
In this guide, we walk through how to configure Python Requests to make concurrent requests so that you can increase the speed of your scrapers.
Python HTTPX
Python HTTPX: How to Use & Rotate Proxies
In this guide, we walk through how to use proxies with Python HTTPX. Including how to rotate proxies in a list, use proxy gateways, and proxy APIs with our Python HTTPX scrapers.
Python HTTPX: Setting Fake User-Agents
In this guide, we walk through how to use fake user-agents with Python HTTPX to prevent your scrapers from getting blocked.
Python HTTPX: How to Send POST Requests
In this guide, we walk through how to send POST requests with Python HTTPX. Including how to POST form data and JSON data.
Python HTTPX: Retry Failed Requests
In this guide, we walk through how to configure Python HTTPX to retry failed requests so you can build a more reliable system.
Python aiohttp
Python aiohttp - How to Use & Rotate Proxies
In this guide, we walk through how to use proxies with Python aiohttp. Including how to rotate proxies in a list, use proxy gateways, and proxy APIs with our Python aiohttp scrapers.
Python aiohttp: How to Send POST Requests
In this guide, we walk through how to send POST requests with Python aiohttp. Including how to POST form data and JSON data.
Python PycURL
PycURL - Guide to Using cUrl With Python
In this guide, we walk through how to use cURL in Python using PycURL. Including how to make GET, POST requests, use proxies, user-agents, and more.
Python BeautifulSoup
BeautifulSoup Guide: Scraping HTML Pages With Python
In this guide, we walk through how to use BeautifulSoup to scrape data from HTML websites and files.
How To Install BeautifulSoup
In this guide, we walk through how to install and use BeautifulSoup on Windows, MacOS, and Linux machines.
How To Use BeautifulSoup's find() Method
In this guide, we walk through how to use BeautifulSoup's find_all() method to find the first page element by class, id, text, regex, and more.
How To Use BeautifulSoup's find_all() Method
In this guide, we walk through how to use BeautifulSoup's find_all() method to find a list of page elements by class, id, text, regex, and more.
Fix BeautifulSoup Returns Empty List or Value
In this guide, we walk through how to fix your code when BeautifulSoup returns an empty list or value.
How To Eliminate Span & Other HTML Tags With BeautifulSoup
In this guide, we walk through how to use BeautifulSoup to remove HTML tags like span, script, etc. from HTML files.
Python Parsel
Parsel Guide: Scraping HTML Pages With Python
In this guide, we walk through how to use Parsel to scrape data from HTML websites and files.