Skip to main content

Python Requests Integration

The ScrapeOps Python Requests SDK is an extension for your scrapers that gives you all the scraping monitoring, statistics, alerting, and data validation you will need straight out of the box.

To start using it, you just need to initialize the ScrapeOpsRequests logger in your scraper and use the ScrapeOps RequestsWrapper instead of the normal Python Requests library.

The ScrapeOps RequestsWrapper is just a wrapper around the standard Python Requests library so all functionality (HTTP requests, Sessions, HTTPAdapter, etc.) will work as normal and return the stanard requests response object.

Once integrated, the ScrapeOpsRequests logger will automatically monitor your scrapers and send your logs to your scraping dashboard.

🚀 Getting Setup

You can get the ScrapeOps monitoring suite up and running in 4 easy steps.

#1 - Install the ScrapeOps Python Requests SDK:

pip install scrapeops-python-requests

#2 - Import & Initialize the ScrapeOps logger:

Import then initialize the ScrapeOpsRequests logger at the top of your scraper and add your API key.

## myscraper.py

from scrapeops_python_requests.scrapeops_requests import ScrapeOpsRequests

scrapeops_logger = ScrapeOpsRequests(
scrapeops_api_key='API_KEY_HERE',
spider_name='SPIDER_NAME_HERE',
job_name='JOB_NAME_HERE',
)

Here, you need to include your free ScrapeOps API Key, which you can get for free here.

You also have the option of giving your scraper a:

  • Spider Name: This should be the name of your scraper, and can be reused by multiple jobs scraping different pages on a website. When not defined, it will default to the filename of your scraper.
  • Job Name: This should be used if the same spider is being used for multiple different jobs so you can compare the stats of similar jobs historically. Example would be a spider scraping a eCommerce store, but have multiple jobs using the same scraper to scrape different products on the website (i.e. Books, Electronics, Fashion). When not defined, the job name will default to the spider name.

#3 - Initialize the ScrapeOps Python Requests Wrapper

The last step is to just override the standard python requests with the ScrapeOps RequestsWrapper.

Our wrapper uses the standard Python Request library but just provides a way for us to monitor the requests as they happen.

Please only initialize the requests wrapper once near the top of your code.


requests = scrapeops_logger.RequestsWrapper()


#4 - Log Scraped Items:

With the ScrapeOpsRequests logger you can also log the data you scrape as items using the item_scraped method.


## Log Scraped Item
scrapeops_logger.item_scraped(
response=response,
item={'demo': 'test'}
)

Using item_scraped the logger will log that an item has been scraped and calculate the data coverage so you can see in your dashboard if your scraper is missing some fields.


Example Scraper:

Here is a simple example so you can see how you can add it to an existing project.


from scrapeops_python_requests.scrapeops_requests import ScrapeOpsRequests


## Initialize the ScrapeOps Logger
scrapeops_logger = ScrapeOpsRequests(
scrapeops_api_key='API_KEY_HERE',
spider_name='QuotesSpider',
job_name='Job1',
)


## Initialize the ScrapeOps Python Requests Wrapper
requests = scrapeops_logger.RequestsWrapper()

urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
'http://quotes.toscrape.com/page/3/',
'http://quotes.toscrape.com/page/4/',
'http://quotes.toscrape.com/page/5/',
]


for url in urls:
response = requests.get(url)

item = {'test': 'hello'}

## Log Scraped Item
scrapeops_logger.item_scraped(
response=response,
item=item
)


Done!

That's all. From here, the ScrapeOps SDK will automatically monitor and collect statistics from your scraping jobs and display them in your ScrapeOps dashboard.