Skip to main content

Introduction

The following is documentation on how to setup and use ScrapeOps with your Scrapy spiders.

๐Ÿ’ป Demo

๐Ÿ”— ScrapeOps Dashboard Demo

โญ Features

  • Scrapy Job Stats & Visualisation

    • ๐Ÿ“ˆ Individual Job Progress Stats
    • ๐Ÿ“Š Compare Jobs versus Historical Jobs
    • ๐Ÿ’ฏ Job Stats Tracked
      • โœ… Pages Scraped & Missed
      • โœ… Items Parsed & Missed
      • โœ… Item Field Coverage
      • โœ… Runtimes
      • โœ… Response Status Codes
      • โœ… Success Rates & Average Latencies
      • โœ… Errors & Warnings
      • โœ… Bandwidth
  • Health Checks & Alerts

    • ๐Ÿ” Custom Spider & Job Health Checks
    • ๐Ÿ“ฆ Out of the Box Alerts - Slack (More coming soon!)
    • ๐Ÿ“‘ Daily Scraping Reports
  • ScrapyD Cluster Management

    • ๐Ÿ”— Integrate With ScrapyD Servers
    • โฐ Schedule Periodic Jobs
    • ๐Ÿ’ฏ All Scrapyd JSON API Supported
    • ๐Ÿ” Secure Your ScrapyD with BasicAuth, HTTPS or Whitelisted IPs

๐Ÿš€ Getting Started

To use ScrapeOps you first need to create a free account and get your free API_KEY.

There are 2 way you can use ScrapeOps:

  1. ScrapeOps Logger Mode
  2. ScrapyD Manager Mode

1) Spider Logger Mode

In this mode the ScrapeOps SDK will log all your scraping stats and generate statistics, graphs and trigger alerts on the ScrapeOps dashboard. Getting setup is very easy, you just need to add 3 lines to your Scrapy projects settings.py file and the ScrapeOps SDK will take care of the rest.

Detailed Read: ScrapeOps SDK Installation Guide


2) ScrapyD Manager Mode

In this mode, if you connect ScrapeOps with your ScrapyD server you will be able to schedule and manage your ScrapyD spiders via the ScrapeOps dashboard.

โ— Note: To use the stats, graphs and alerts functionality of ScrapeOps, you need to install the ScrapeOps SDK in your Scrapy spiders.

Read: ScrapeOps ScrapyD Integration Guide