![]()
Python aiohttp: Setting Fake User-Agents
To use fake user-agents with Python aiohttp first create a session object using the aiohttp.ClientSession() method and then use the get() method on that session object. Then you just need to define a user-agent in a headers dictionary and pass it into the headers attribute of your request.
import aiohttp
import asyncio
headers = {"User-Agent": "Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148"}
async def make_request():
async with aiohttp.ClientSession() as session:
response = await session.get('http://httpbin.org/headers', headers=headers)
print(await response.json())
asyncio.run(make_request())
One of the most common reasons for getting blocked whilst web scraping is using bad user-agents.
However, integrating fake user-agents into your Python web scrapers is very easy.
So in this guide, we will go through:
- What Are Fake User-Agents?
- How To Set A User Agent In Python aiohttp
- How To Rotate User-Agents
- How To Manage Thousands of Fake User-Agents
- Why Use Fake Browser Headers
- ScrapeOps Fake Browser Headers API
First, let's quickly go over some the very basics.
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
What Are Fake User-Agents?
User Agents are strings that let the website you are scraping identify the application, operating system (OSX/Windows/Linux), browser (Chrome/Firefox/Internet Explorer), etc. of the user sending a request to their website. They are sent to the server as part of the request headers.
Here is an example User agent sent when you visit a website with a Chrome browser:
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36'
When scraping a website, you also need to set user-agents on every request as otherwise the website may block your requests because it knows you aren't a real user.
In the case of most Python HTTP clients like Python aiohttp, when you send a request the default settings clearly identify that the request is being made with Python aiohttp in the user-agent string.
'User-Agent': 'Python/3.9 aiohttp/3.7.4',
This user-agent will clearly identify your requests are being made by the Python aiohttp library, so the website can easily block you from scraping the site.
That is why we need to manage the user-agents we use with Python aiohttp when we send requests.