How To Scrape Amazon.com Products & Reviews [2023]
In this guide for our "How To Scrape X" series, we're going to look at how to scrape Amazon.com.
Amazon, is the most popular website for web scrapers with billions of product pages being scraped every month.
So in this guide we will go through:
- How To Build A List Of Amazon Product URLs
- How To Scrape Product Data From Amazon Product Search Pages
- How To Scrape Product Data From Amazon Product Pages
- How To Scrape Product Reviews From Amazon Review Pages
- Bypassing Amazon's Anti-Bot Protection
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
How To Build A List Of Amazon Product URLs
The first part of scraping Amazon is designing a web crawler that will generate a list of product URLs for our scrapers to scrape.
For example, here is a product URL for a iPad:
'https://www.amazon.com/2021-Apple-10-2-inch-iPad-Wi-Fi/dp/B09G9FPHY6/ref=sr_1_1'
The alternative approach is to crawl Amazon for ASIN (Amazon Standard Identification Number) codes. Every product listed on Amazon has its own unique ASIN code, which you can use to construct URLs to scrape that product page, reviews, or other sellers.
For example, you can retrieve the product page of any product using its ASIN:
## URL Structure
'https://www.amazon.com/dp/ASIN'
## Example
'https://www.amazon.com/dp/B09G9FPHY6'
With Amazon.com the easiest way to do this is to use the Amazon Search page which returns up to 20 products per page.
For example, here is how we would get search results for iPads.
'https://www.amazon.com/s?k=iPads&page=1'
This URL contains a number of parameters that we will explain:
k
stands for the search keyword. In our case,k=ipad
. Note: If you want to search for a keyword that contains spaces or special characters then remember you need to encode this value.page
stands for the page number. In our cases, we've requestedpage=1
.
Using these parameters we can query the Amazon search endpoint to start building a list of URLs to scrape.
Here is an example response page:
To extract product URLs (and ASIN codes) from this page, we need to look through every product on this page, extract the relative URL to the product and the either create an absolute product URL or extract the ASIN.
Extracting Product Page URLs
Here is an example crawler that will extract product URLs from an Amazon Search Page with Python Requests & Parsel:
import requests
from parsel import Selector
from urllib.parse import urljoin
product_urls = []
keyword_list = ['ipad']
for keyword in keyword_list:
url = f'https://www.amazon.com/s?k={keyword}&page=1'
try:
response = requests.get(url)
if response.status_code == 200:
sel = Selector(text=response.text)
## Extract Product Page URLs
search_products = sel.css("div.s-result-item[data-component-type=s-search-result]")
for product in search_products:
relative_url = product.css("h2>a::attr(href)").get()
product_url = urljoin('https://www.amazon.com/', relative_url).split("?")[0]
product_urls.append(product_url)
except Exception as e:
print("Error", e)
Extracting Product ASINs
Here is an example where we extract product ASINs from the relative URL for the same Amazon Search Page with Python Requests & Parsel:
import requests
from parsel import Selector
product_asins = []
keyword_list = ['ipad']
for keyword in keyword_list:
url = f'https://www.amazon.com/s?k={keyword}&page=1'
try:
response = requests.get(url)
if response.status_code == 200:
sel = Selector(text=response.text)
## Extract Product ASINS
search_products = sel.css("div.s-result-item[data-component-type=s-search-result]")
for product in search_products:
relative_url = product.css("h2>a::attr(href)").get()
asin = relative_url.split('/')[3] if len(relative_url.split('/')) >= 4 else None
product_asins.append(asin)
except Exception as e:
print("Error", e)
Paginating Amazon Search Pages
The above code just scrapes the product URL and ASINs from the first page of the Amazon Search results. However, most of the time we will want to scrape the data from the other available pages.
To do so we will need to configure our crawler to paginate through every available page for our search keywords and extract the data from those as well.
In the below example we are going to extract the available page numbers and configure our scraper to requests those pages as well by adding them into our url_list
.
import requests
from parsel import Selector
from urllib.parse import urljoin
product_urls = []
keyword_list = ['ipad']
for keyword in keyword_list:
url_list = [f'https://www.amazon.com/s?k={keyword}&page=1']
for url in url_list:
try:
response = requests.get(url)
if response.status_code == 200:
sel = Selector(text=response.text)
## Extract Product Page URLs
search_products = sel.css("div.s-result-item[data-component-type=s-search-result]")
for product in search_products:
relative_url = product.css("h2>a::attr(href)").get()
product_url = urljoin('https://www.amazon.com/', relative_url).split("?")[0]
product_urls.append(product_url)
## Get All Pages
if "&page=1" in url:
available_pages = sel.xpath(
'//a[has-class("s-pagination-item")][not(has-class("s-pagination-separator"))]/text()'
).getall()
for page in available_pages:
search_url_paginated = f'https://www.amazon.com/s?k={keyword}&page={page}'
url_list.append(search_url_paginated)
except Exception as e:
print("Error", e)
How To Scrape Product Data From Amazon Product Search Pages
In the above examples, we saw how to extract product URLs and ASINs from Amazon Search pages so that we can scrape the product pages.
However, if you only need basic product data (name, price, image URL, rating, number of reviews, etc) then you can scrape this data from the actual search pages.
The advantage with this approach is that you will cut the number of requests you need to make to Amazon by a factor of 20. Making your scrapers faster and cheaper to run.
Here is an example using Python Requests & Parsel that scrapes the product data from all available Amazon Search Pages.
import requests
from parsel import Selector
from urllib.parse import urljoin
keyword_list = ['ipad']
product_overview_data = []
for keyword in keyword_list:
url_list = [f'https://www.amazon.com/s?k={keyword}&page=1']
for url in url_list:
try:
response = requests.get(url)
if response.status_code == 200:
sel = Selector(text=response.text)
## Extract Product Page
search_products = sel.css("div.s-result-item[data-component-type=s-search-result]")
for product in search_products:
relative_url = product.css("h2>a::attr(href)").get()
#print(relative_url.split('/'))
asin = relative_url.split('/')[3] if len(relative_url.split('/')) >= 4 else None
product_url = urljoin('https://www.amazon.com/', relative_url).split("?")[0]
product_overview_data.append(
{
"keyword": keyword,
"asin": asin,
"url": product_url,
"ad": True if "/slredirect/" in product_url else False,
"title": product.css("h2>a>span::text").get(),
"price": product.css(".a-price[data-a-size=xl] .a-offscreen::text").get(),
"real_price": product.css(".a-price[data-a-size=b] .a-offscreen::text").get(),
"rating": (product.css("span[aria-label~=stars]::attr(aria-label)").re(r"(\d+\.*\d*) out") or [None])[0],
"rating_count": product.css("span[aria-label~=stars] + span::attr(aria-label)").get(),
"thumbnail_url": product.xpath("//img[has-class('s-image')]/@src").get(),
}
)
## Get All Pages
if "&page=1" in url:
available_pages = sel.xpath(
'//a[has-class("s-pagination-item")][not(has-class("s-pagination-separator"))]/text()'
).getall()
for page in available_pages:
search_url_paginated = f'https://www.amazon.com/s?k={keyword}&page={page}'
url_list.append(search_url_paginated)
except Exception as e:
print("Error", e)
If you can get away with only the product data available on the Amazon Search page then you should only scrape these pages. Going with this approach is more ethical as you will put less demand on the websites servers.
How To Scrape Product Data From Amazon Product Pages
Once we have a list of Amazon product URLs then we can scrape all the product data from each individual Amazon product page.
Scraping product data from Amazon Product Pages is pretty straightforward.
First we need to request the product page using either a full product URL like this:
'https://www.amazon.com/2021-Apple-10-2-inch-iPad-Wi-Fi/dp/B09G9FPHY6/ref=sr_1_1'
Or one based solely off of the products ASIN code:
'https://www.amazon.com/dp/B09G9FPHY6'
Then create parsers for every field we want to extract data for.
import re
import requests
from parsel import Selector
from urllib.parse import urljoin
product_urls = [
'https://www.amazon.com/2021-Apple-10-2-inch-iPad-Wi-Fi/dp/B09G9FPHY6/ref=sr_1_1',
]
product_data_list = []
for product_url in product_urls:
try:
response = requests.get(product_url)
if response.status_code == 200:
sel = Selector(text=response.text)
image_data = json.loads(re.findall(r"colorImages':.*'initial':\s*(\[.+?\])},\n", response.text)[0])
variant_data = re.findall(r'dimensionValuesDisplayData"\s*:\s* ({.+?}),\n', response.text)
feature_bullets = [bullet.strip() for bullet in sel.css("#feature-bullets li ::text").getall()]
price = sel.css('.a-price span[aria-hidden="true"] ::text').get("")
if not price:
price = sel.css('.a-price .a-offscreen ::text').get("")
product_data_list.append({
"name": sel.css("#productTitle::text").get("").strip(),
"price": price,
"stars": sel.css("i[data-hook=average-star-rating] ::text").get("").strip(),
"rating_count": sel.css("div[data-hook=total-review-count] ::text").get("").strip(),
"feature_bullets": feature_bullets,
"images": image_data,
"variant_data": variant_data,
})
except Exception as e:
print("Error", e)
In the above code, we scrape all the main product data from the page including product variant data.
Here an example output:
{"name": "Apple iPad 9.7inch with WiFi 32GB- Space Gray (2017 Model) (Renewed)",
"price": "$137.00",
"stars": "4.6 out of 5 stars",
"rating_count": "8,532 global ratings",
"feature_bullets": [
"Make sure this fits by entering your model number.",
"9.7-Inch Retina Display, wide Color and True Tone",
"A9 third-generation chip with 64-bit architecture",
"M9 motion coprocessor, 1.2MP FaceTime HD Camera",
"8MP insight Camera, touch ID, Apple Pay"],
"images": [{"hiRes": "https://m.media-amazon.com/images/I/51dBcW+NXPL._AC_SL1000_.jpg",
"thumb": "https://m.media-amazon.com/images/I/51pGtRLfaZL._AC_US40_.jpg",
"large": "https://m.media-amazon.com/images/I/51pGtRLfaZL._AC_.jpg",
"main": {...},
"variant": "MAIN",
"lowRes": None,
"shoppableScene": None},
{"hiRes": "https://m.media-amazon.com/images/I/51c43obovcL._AC_SL1000_.jpg",
"thumb": "https://m.media-amazon.com/images/I/415--n36L8L._AC_US40_.jpg",
"large": "https://m.media-amazon.com/images/I/415--n36L8L._AC_.jpg",
"main": {...},
"variant": "PT01",
"lowRes": None,
"shoppableScene": None},
"variant_data": ["{`B074PXZ5GC`:[`9.7 inches`,`Wi-Fi`,`Silver`],`B00TJGN4NG`:[`16GB`,`Wi-Fi`,`White`],`B07F93611L`:[`5 Pack`,`Wi-Fi`,`Space grey`],`B074PWW6NS`:[`Refurbished`,`Wi-Fi`,`Black`],`B0725LCLYQ`:[`9.7`,`Wi-Fi`,`Space Gray`],`B07D3DDJ4L`:[`32GB`,`Wi-Fi`,`Space Gray`],`B07G9N7J3S`:[`32GB`,`Wi-Fi`,`Gold`]}"]}
However, you can easily expand this to scrape other data like delievery times, product specs, etc.
How To Scrape Product Reviews From Amazon Review Pages
Another popular type of data to scrape from Amazon is product reviews.
To request a product page you just need the products ASIN code and using the following URL format:
'https://www.amazon.com/product-reviews/B09G9FPHY6/'
The following code scrapes the product reviews for the target product.
import requests
from parsel import Selector
from urllib.parse import urljoin
reviews = []
product_review_url_list = ['https://www.amazon.com/product-reviews/B09G9FPHY6/']
for product_review_url in product_review_url_list:
try:
response = requests.get(product_review_url)
if response.status_code == 200:
sel = Selector(text=response.text)
## Parse Product Reviews
review_elements = sel.css("#cm_cr-review_list div.review")
for review_element in review_elements:
reviews.append({
"text": "".join(review_element.css("span[data-hook=review-body] ::text").getall()).strip(),
"title": review_element.css("*[data-hook=review-title]>span::text").get(),
"location_and_date": review_element.css("span[data-hook=review-date] ::text").get(),
"verified": bool(review_element.css("span[data-hook=avp-badge] ::text").get()),
"rating": review_element.css("*[data-hook*=review-star-rating] ::text").re(r"(\d+\.*\d*) out")[0],
})
except Exception as e:
print("Error", e)
The output of this code will look like this:
[{"text": "Ok..little old lady here, whose working life consisted of nothing but years and years of Windows, android phones, etc. Just in last several years jumped hesitantly into Apple (phone, Ipad mini, etc.)LOVE LOVE LOVE my iPad mini but, thought..might be time to think about replacement..so, I saw the great price on this 10 inch tablet and thought Id take a chance. I am much more partial to the mini sized tablets, but thought Id go for it...soooo, even after reading all the bu.......t comments here, thought Id try, if i didnt like it., Id return it. 1. Delivered on time, yayyy! 2. Package well protected, sealed, unblemished...perfect condition (and yeah..no fingerprints on screen) 3. Ipad fired right up...70% charged 4. Ipad immediately began transferring info from iPhone that was sitting nearby. Yayyyyy!! No need for reams of books, booklets, warnings, etc., etc.!! 5. EVERYTHING transferred from iPhone and IPad Mini...and I still had some 15 gig storage left on new 64 gig iPad (just remember ...this is for my entertainment...not for work with diagrams, idiotic work related emails about cleaning up my workspace, or 20 specs for items no one will ever use) 6. Did a test run...everything worked exactly as I required, expected. 7. Ultimate test...watched old Morse/Poirot shows I have in Prime..excellent quality! love love love 8. After 8 full hours...I had to recharge for a bit before I went to bed. (charged fairly fast!)sooooooo...Im keeping this jewel!!!!!Risk is there...evidently, if you believe the nutso crowd and their comments here. Its a GREAT item, its a fabulous deal, Christmas is coming...or if you need to have a worthy backup..,...DO IT!!!!",
"title": "EXCELLENT buy!",
"location_and_date": "Reviewed in the United States 🇺🇸 on October 10, 2022",
"verified": true,
"rating": "5.0"},
{"text": "If you’re anything like me you want something to watch shows on in you living room or bed, but you don’t like the TV, and your phone is too small. Well this is the perfect thing for you, the screen is just the right size and very crisp and clear(maybe better then my iPhone X), the responsiveness is excellent, and all of the streaming sites work with this perfectly. On top of that, my AirPods automatically switch between this and my phone, so I don’t have to worry about messing with the settings every time. However, the camera is only OK. And it feels very delicate, so I would pick up a case and get AppleCare+. The battery isn’t the best either, but should be enough to get through the day. Overall I definitely recommend this, especially for the price.",
"title": "Perfect",
"location_and_date": "Reviewed in the United States 🇺🇸 on October 13, 2022",
"verified": true,
"rating": "5.0"},
{"text": "My old IPad was acting up, wouldn’t hold a charge etc. This iPad arrived the very next day after I ordered it. What a great surprise. The one corner of the outer box it arrived in was damaged, but the inner box containing the iPad was in perfect condition. It was so simple to transfer everything from my old iPad to this one, just laid the new one on the old (iPad 2019) and it did pretty much everything on its own. I am very pleased with my purchase, I hope it lasts longer than my 2019 model.",
"title": "Great purchase",
"location_and_date": "Reviewed in the United States 🇺🇸 on October 15, 2022",
"verified": true,
"rating": "5.0"},
{"text": "Im not much of an apple product person but I do buy them for people I dont want to provide tech support to. (Parents, In-laws, Wife, and Kids)I used to use the fire tablets because they were cheap and I thought that would keep the kids entertained, especially on road trips. This worked for movies and some games but there were always problems with how slow they become with updates, loss of battery life, etc.This ipad was a game changer. I always knew they were the best tablets but I was also a bit in denial as well as just being somewhat anti-apple. With this on sale during prime day 2022 (July) I took a chance and bought one for the kids.This does everything as well or better (usually better) than previous tablets I had purchased because they were cheaper.I also didnt buy a case for it and my kids are brutal with these types of devices. To date, it is still in one piece, operational, and has no cracks in the screen.Sometimes it is worth paying a bit more for the name brand product and in this case Im a believer.",
"title": "Kids love it",
"location_and_date": "Reviewed in the United States 🇺🇸 on October 2, 2022",
"verified": true,
"rating": "5.0"},
{"text": "For those who wonder, this is brand new in the box, 2021 9th generation. It is NOT refurbed or an exchange. It is never opened and shrink wrapped by Apple. (See my photos.) The reason it is so much cheaper than the other 2021 iPads is the 64gb storage. But with iCloud so ridiculously cheap for cloud storage, I just cannot see this 64gb as not getting the job done. I myself was curious about this low price buying me a refurb/exchange, but that is simply not the case here. I do, however, recommend you not go with 32gb. I believe even with an iCloud account, you will be sorry you didn’t go 64gb.And the ease of setting this up cannot be understated. I simply sat my iPhone 13 Pro Max next to it and all relevant files and Wi-Fi passwords were transferred over with no input from me. It looked to me that it will do that with Android and most laptops also, though I did not test that out. All photos also came over, and the ones I took after that transfer, I simply Air-Dropped them into this iPad. All in all, this is as simple as it gets for transferring files and photos. Apple has this stuff down to a science, believe me.This screen is incredible. If you are looking at a pre-Retina screen, you will be amazed at this 2021 version. This thing is very fast, the on screen keyboard is fast, accurate and very concise. Dealing with apps is easy, and Apple doesn’t load you down with bloat you’ll never use. It is claimed this has about 12 hours on a charge; what I’ve seen thus far leads me to believe that is accurate.All in all, I am extremely pleased with this purchase. You can’t always say you got what you paid for. But I can definitely say that with this. This is the entry level 2021 9th generation iPad, and it is exactly what I need. Go and get you one…",
"title": "Incredible deal on incredible machine",
"location_and_date": "Reviewed in the United States 🇺🇸 on September 21, 2022",
"verified": true,
"rating": "5.0"},
{"text": "I bought this for my husband. He loves it! It is the gift that really does keep on giving. It arrived quickly, well packaged and I didn’t have to leave my house to get it. It was great to use my iPad to purchase this one as a gift and have it arrive safe.y. Thank you, Amazon!",
"title": "Best gift 🎁",
"location_and_date": "Reviewed in the United States 🇺🇸 on October 15, 2022",
"verified": true,
"rating": "5.0"},
{"text": "I have had an ipad air since they came out. I used hotel points to get it and its served me well as a book and simple internet use. Recently I noticed that it was no updating and some of my favorite apps were telling me they were using an old version because my IOS was outdated. Without being able to update it I decided to pass my old one on and get a new one. Then I thought Id get a mini 6 but after comparing the prices and the ability I could not justify a double price for it. I ordered this Ipad 9 and it came quicker than expected. Out of the box it performs much better than my old one, screen appears clearer and I like the new IOS it uses. My old one will live on as a small tv for my wife when shes in the kitchen and for that it does very well. I have no complaints about my new one. Its easy to talk yourself into the top of the curve, but sometimes being a bit behind it makes better fiscal sense",
"title": "My old Ipad was too old to update, so it was passed down,",
"location_and_date": "Reviewed in the United States 🇺🇸 on September 25, 2022",
"verified": true,
"rating": "5.0"},
{"text": "I have always been an android user. I finally dipped my toe into Apple. There is a learning curve, I do not speak Apple. Thankfully I have grandchildren and they have taught me a lot. Dont snooze on this one, I love it, fast, images clearer, pics, videos, pen, everything about this one is great. I now get the Apple craze.",
"title": "Perfect size and performance",
"location_and_date": "Reviewed in the United States 🇺🇸 on October 13, 2022",
"verified": true,
"rating": "5.0"},
{"text": "El iPad es una tableta muy fácil de usar y muy práctica puedes hacer casi todo lo que necesitas en el día a día, oficina, escuela, entretenimiento, productividad, y con 256gb tengo para almacenar mucha información.",
"title": "El iPad es la mejor tableta que existe",
"location_and_date": "Reviewed in the United States 🇺🇸 on October 14, 2022",
"verified": true,
"rating": "5.0"},
{"text": "Thought I was gonna get a knock off for the price but came brand new, no problems what so ever. Amazing battery life I charge it every two days and use it constantly at school and work for studying and job demands.",
"title": "Excellent product",
"location_and_date": "Reviewed in the United States 🇺🇸 on October 14, 2022",
"verified": true,
"rating": "5.0"}]
Paginating Product Review Pages
The above code works, but it just extracts all the product reviews from a single Amazon reviews page.
However, we can expand the scraper to paginate through all the product review pages and scrape the product reviews from every page by checking if there is another page.
import requests
from parsel import Selector
from urllib.parse import urljoin
reviews = []
product_review_url_list = ['https://www.amazon.com/product-reviews/B09G9FPHY6/']
for product_review_url in product_review_url_list:
try:
response = requests.get(product_review_url)
if response.status_code == 200:
sel = Selector(text=response.text)
## Get Next Page Url
next_page_relative_url = sel.css(".a-pagination .a-last>a::attr(href)").get()
if next_page_relative_url is not None:
next_page = urljoin('https://www.amazon.com/', next_page_relative_url)
product_review_url_list.append(next_page)
## Parse Product Reviews
review_elements = sel.css("#cm_cr-review_list div.review")
for review_element in review_elements:
reviews.append({
"text": "".join(review_element.css("span[data-hook=review-body] ::text").getall()).strip(),
"title": review_element.css("*[data-hook=review-title]>span::text").get(),
"location_and_date": review_element.css("span[data-hook=review-date] ::text").get(),
"verified": bool(review_element.css("span[data-hook=avp-badge] ::text").get()),
"rating": review_element.css("*[data-hook*=review-star-rating] ::text").re(r"(\d+\.*\d*) out")[0],
})
except Exception as e:
print("Error", e)
Bypassing Amazon's Anti-Bot Protection
As you might have seen already if you run this code Amazon might be blocking you and returning a error page like this:
This is because Amazon uses anti-bot protection to try and prevent (or at least make it harder) developers from scraping their site.
Amazon outright blocks any requests that identify themselves as coming from a HTTP client like Python Requests in this case.
For example, when you make a request with Python Requests it sends the following user-agent with the request (this is why the above code examples won't work).
'User-Agent': 'python-requests/2.26.0',
This user agent clearly identifies your requests as being made by the Python Requests library, so Amazon can easily block you from scraping the site.
Even if you do use a fake User-Agent, Amazon analyses the other headers to check if they match the typical headers a browser would send.
That is why you should be sending full browser headers when scraping Amazon. And if you are scraping at scale you will need to spread your requests over hundreds if not thousands of proxies to hide your scrapers identity.
We have written about how to solve these challenges here:
- Guide to Web Scraping Without Getting Blocked
- Web Scraping Guide: Headers & User-Agents Optimization Checklist
- Python Fake User-Agents: How to Manage User Agents When Scraping
- Python Requests: How to Use & Rotate Proxies
However, if you don't want to implement all this anti-bot bypassing logic yourself the easier option is to use a smart proxy solution like ScrapeOps Proxy Aggregator.
The ScrapeOps Proxy Aggregator is a smart proxy that handles everything for you:
- Proxy rotation & selection
- Rotating user-agents & browser headers
- Ban detection & CAPTCHA bypassing
- Country IP geotargeting
- Javascript rendering with headless browsers
To use the ScrapeOps Proxy Aggregator, we just need to send the URL we want to scrape to the Proxy API instead of making the request directly ourselves. We can do this with a simple wrapper function:
SCRAPEOPS_API_KEY = 'YOUR_API_KEY'
def scrapeops_url(url):
payload = {'api_key': SCRAPEOPS_API_KEY, 'url': url, 'country': 'us'}
proxy_url = 'https://proxy.scrapeops.io/v1/?' + urlencode(payload)
return proxy_url
amazon_url = 'https://www.amazon.com/s?k=iPads&page=1'
## Send URL To ScrapeOps Instead of Amazon
response = requests.get(scrapeops_url(amazon_url))
You can get a API key with 1,000 free API credits by signing up here.
Here is our updated Amazon Search Crawler to use the ScrapeOps Proxy:
import requests
from parsel import Selector
from urllib.parse import urlencode, urljoin
API_KEY = 'YOUR_API_KEY'
def scrapeops_url(url):
payload = {'api_key': API_KEY, 'url': url, 'country': 'us'}
proxy_url = 'https://proxy.scrapeops.io/v1/?' + urlencode(payload)
return proxy_url
keyword_list = ['ipad']
product_overview_data = []
for keyword in keyword_list:
url_list = [f'https://www.amazon.com/s?k={keyword}&page=1']
for url in url_list:
try:
response = requests.get(scrapeops_url(url))
if response.status_code == 200:
sel = Selector(text=response.text)
## Extract Product Data From Search Page
search_products = sel.css("div.s-result-item[data-component-type=s-search-result]")
for product in search_products:
relative_url = product.css("h2>a::attr(href)").get()
asin = relative_url.split('/')[3] if len(relative_url.split('/')) >= 4 else None
product_url = urljoin('https://www.amazon.com/', relative_url).split("?")[0]
product_overview_data.append(
{
"keyword": keyword,
"asin": asin,
"url": product_url,
"ad": True if "/slredirect/" in product_url else False,
"title": product.css("h2>a>span::text").get(),
"price": product.css(".a-price[data-a-size=xl] .a-offscreen::text").get(),
"real_price": product.css(".a-price[data-a-size=b] .a-offscreen::text").get(),
"rating": (product.css("span[aria-label~=stars]::attr(aria-label)").re(r"(\d+\.*\d*) out") or [None])[0],
"rating_count": product.css("span[aria-label~=stars] + span::attr(aria-label)").get(),
"thumbnail_url": product.xpath("//img[has-class('s-image')]/@src").get(),
}
)
## Get All Pages
if "&page=1" in url:
available_pages = sel.xpath(
'//a[has-class("s-pagination-item")][not(has-class("s-pagination-separator"))]/text()'
).getall()
for page in available_pages:
search_url_paginated = f'https://www.amazon.com/s?k={keyword}&page={page}'
url_list.append(search_url_paginated)
except Exception as e:
print("Error", e)
print(product_overview_data)
Now when we make requests with our scraper Amazon won't be block them.
More Web Scraping Guides
In this edition of our "How To Scrape X" series, we went through how you can scrape Amazon.com including how to bypass its anti-bot protection.
If you would like to learn how to scrape other popular websites then check out our other How To Scrape Guides:
Or if you would like to learn more about web scraping in general, then be sure to check out The Web Scraping Playbook, or check out one of our more in-depth guides: