Skip to main content

Introduction

Welcome to the ScrapeOps Documentation pages. Here you will find info on how to integrate and use our 3 products.


💻 Demo

🔗 ScrapeOps Dashboard Demo


📊 ScrapeOps Monitoring

ScrapeOps Monitoring is a monitoring tool purpose built for web scraping. With a simple 30 seconds install of one of our SDKs, your scraper's performance & error stats will be automatically aggregated and shipped to your ScrapeOps dashboard.

Features & Functionality

ScrapeOps Monitoring gives you the following features & functionality:

  • Scrapy Job Stats & Visualisation

    • 📈 Individual Job Progress Stats
    • 📊 Compare Jobs versus Historical Jobs
    • 💯 Job Stats Tracked
      • Pages Scraped & Missed
      • Items Parsed & Missed
      • Item Field Coverage
      • Runtimes
      • Response Status Codes
      • Success Rates & Average Latencies
      • Errors & Warnings
      • Bandwidth
  • Health Checks & Alerts

    • 🔍 Custom Spider & Job Health Checks
    • 📦 Out of the Box Alerts - Slack (More coming soon!)
    • 📑 Daily Scraping Reports

🚀 Getting Started

To use ScrapeOps you first need to create a free account and get your free API_KEY.

Currently ScrapeOps integrates with both Python Requests & Python Scrapy scrapers:

  1. Python Requests Integration
  2. Python Scrapy Integration

More ScrapeOps Monitoring integrations are on the way.


ScrapeOps Server Manager & Scheduler

ScrapeOps Server Manager & Job Scheduler is a easy to use server integration that enables you to deploy, manage and schedule your scrapers from the ScrapeOps dashboard.

There are two options to integrate ScrapeOps with your servers:

  1. Via SSH (Recommended)
  2. Via Scrapyd Server HTTP Endpoints (Only Applicable to Python Scrapy)

Features & Functionality

ScrapeOps Server Manager & Job Scheduler gives you the following features & functionality:

  • SSH Server Management
    • 🔗 Integrate With Any SSH Capably Server
    • 🕷 Deploy scrapers directly from GitHub to your servers.
    • Schedule Periodic Jobs
  • ScrapyD Cluster Management
    • 🔗 Integrate With ScrapyD Servers
    • Schedule Periodic Jobs
    • 💯 All Scrapyd JSON API Supported
    • 🔐 Secure Your ScrapyD with BasicAuth, HTTPS or Whitelisted IPs

To learn how to setup the integrate ScrapeOps with your servers with this guide.


💻 ScrapeOps Proxy Aggregator

ScrapeOps Proxy Aggregator is an easy to use proxy that gives you access to the best performing proxies via a single endpoint. We take care of finding the best proxies, so you can focus on the data.

To use the ScrapeOps proxy, you first need an API key which you can get by signing up for a free account here.

🚀 Getting Started

To make requests you need send the URL you want to scrape to the ScrapeOps Proxy endpoint https://proxy.scrapeops.io/v1/ by adding your API Key and URL to the request using the api_key and url query parameter:


curl -k "https://proxy.scrapeops.io/v1/?api_key=YOUR_API_KEY&url=http://httpbin.org/anything"

After receiving a response from one of our proxy providers the ScrapeOps Proxy API Aggregator will then respond with the raw HTML content of the target URL along with a response code:


<html>
<head>
...
</head>
<body>
...
</body>
</html>

With the ScrapeOps Proxy API Aggregator you are only charged for successful requests (200 and 404 status codes).

To learn how to use the ScrapeOps Proxy Aggregator and customise it to your requirement then check out the QuickStart Guide.