
Python Requests: Web Scraping Guide
In this guide for The Python Web Scraping Playbook, we will look at how to set up your Python Requests scrapers to avoid getting blocked, retrying failed requests and scaling up with concurrency.
Python Requests is the most popular HTTP client library used by Python developers, so in this article we will run through all the best practices you need to know. Including:
- Making GET Requests
- Making POST Requests
- Using Fake User Agents With Python Requests
- Using Proxies With Python Requests
- Retrying Failed Requests
- Scaling Your Scrapers Using Concurrent Threads
- Rendering JS On Client-Side Rendered Pages
For this guide, we're going to focus on how to setup the HTTP client element of your Python Request based scrapers, not how to parse the data from the HTML responses.
To keep things simple, we will be using BeautifulSoup to parse data from QuotesToScrape.
If you want to learn more about how to use BeautifulSoup or web scraping with Python in general then check out our BeautifulSoup Guide or our Python Beginners Web Scraping Guide.
Let's begin with the basics and work ourselves up to the more complex topics...
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
Making GET Requestsβ
Making GET requests with Python Requests is very simple.
We just need to request the URL using requests.get(url):
import requests
response = requests.get('http://quotes.toscrape.com/')
print(response.text)
The following are the most commonly used attributes of the Response class:
status_code: The HTTP status code of the response.text: The response content as a Unicode string.content: The response content in bytes.headers: A dictionary-like object containing the response headers.url: The URL of the response.encoding: The encoding of the response content.cookies: ARequestsCookieJarobject containing the cookies sent by the server.history: A list of previous responses if there were redirects.ok: A boolean indicating whether the response was successful (status code between 200 and 399).reason: The reason phrase returned by the server (e.g., "OK", "Not Found").elapsed: The time elapsed between sending the request and receiving the response.request: ThePreparedRequestobject that was sent to the server.
Making POST Requestsβ
Making POST requests with Python Requests is also very simple.
To send JSON data in a POST request, we just need to request the URL using requests.post() along with the URL and the data using the json parameter:
import requests
url = 'http://quotes.toscrape.com/'
data = {'key': 'value'}
# Send POST request with JSON data using the json parameter
response = requests.post(url, json=data)
# Print the response
print(response.json())
To send Form data in a POST request, we just need to request the URL using requests.post() along with the URL and the data using the data parameter:
import requests
url = 'http://quotes.toscrape.com/'
data = {'key': 'value'}
# Send POST request with JSON data using the json parameter
response = requests.post(url, data=data)
# Print the response
print(response.json())
For more details on how to send POST requests with Python Requests, then check out our Python Requests Guide: How to Send POST Requests

Using Fake User Agents With Python Requestsβ
User Agents are strings that let the website you are scraping identify the application, operating system (OSX/Windows/Linux), browser (Chrome/Firefox/Internet Explorer), etc. of the user sending a request to their website. They are sent to the server as part of the request headers.
Here is an example User agent sent when you visit a website with a Chrome browser:
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36'
When scraping a website, you also need to set user-agents on every request as otherwise the website may block your requests because it knows you aren't a real user.
In the case of most Python HTTP clients like Python Requests, when you send a request the default settings clearly identify that the request is being made with Python Requests in the user-agent string.
'User-Agent': 'python-requests/2.26.0',
This user-agent will clearly identify your requests are being made by the Python Requests library, so the website can easily block you from scraping the site.
That is why we need to manage the user-agents we use with Python Request when we send requests.