Introduction
Welcome to the ScrapeOps Documentation pages. Here you will find info on how to integrate and use our 3 products.
💻 Demo
📊 ScrapeOps Monitoring
ScrapeOps Monitoring is a monitoring tool purpose built for web scraping. With a simple 30 seconds install of one of our SDKs, your scraper's performance & error stats will be automatically aggregated and shipped to your ScrapeOps dashboard.
⭐ Features & Functionality
ScrapeOps Monitoring gives you the following features & functionality:
-
Scrapy Job Stats & Visualisation
- 📈 Individual Job Progress Stats
- 📊 Compare Jobs versus Historical Jobs
- 💯 Job Stats Tracked
- ✅ Pages Scraped & Missed
- ✅ Items Parsed & Missed
- ✅ Item Field Coverage
- ✅ Runtimes
- ✅ Response Status Codes
- ✅ Success Rates & Average Latencies
- ✅ Errors & Warnings
- ✅ Bandwidth
-
Health Checks & Alerts
- 🔍 Custom Spider & Job Health Checks
- 📦 Out of the Box Alerts - Slack (More coming soon!)
- 📑 Daily Scraping Reports
🚀 Getting Started
To use ScrapeOps you first need to create a free account and get your free API_KEY.
Currently ScrapeOps integrates with both Python Requests & Python Scrapy scrapers:
More ScrapeOps Monitoring integrations are on the way.
⏰ ScrapeOps Server Manager & Scheduler
ScrapeOps Server Manager & Job Scheduler is a easy to use server integration that enables you to deploy, manage and schedule your scrapers from the ScrapeOps dashboard.
There are two options to integrate ScrapeOps with your servers:
- Via SSH (Recommended)
- Via Scrapyd Server HTTP Endpoints (Only Applicable to Python Scrapy)
⭐ Features & Functionality
ScrapeOps Server Manager & Job Scheduler gives you the following features & functionality:
- SSH Server Management
- 🔗 Integrate With Any SSH Capably Server
- 🕷 Deploy scrapers directly from GitHub to your servers.
- ⏰ Schedule Periodic Jobs
- ScrapyD Cluster Management
- 🔗 Integrate With ScrapyD Servers
- ⏰ Schedule Periodic Jobs
- 💯 All Scrapyd JSON API Supported
- 🔐 Secure Your ScrapyD with BasicAuth, HTTPS or Whitelisted IPs
To learn how to setup the integrate ScrapeOps with your servers with this guide.
💻 ScrapeOps Proxy Aggregator
ScrapeOps Proxy Aggregator is an easy to use proxy that gives you access to the best performing proxies via a single endpoint. We take care of finding the best proxies, so you can focus on the data.
To use the ScrapeOps proxy, you first need an API key which you can get by signing up for a free account here.
🚀 Getting Started
To make requests you need send the URL you want to scrape to the ScrapeOps Proxy endpoint https://proxy.scrapeops.io/v1/
by adding your API Key and URL to the request using the api_key
and url
query parameter:
curl -k "https://proxy.scrapeops.io/v1/?api_key=YOUR_API_KEY&url=http://httpbin.org/anything"
After receiving a response from one of our proxy providers the ScrapeOps Proxy API Aggregator will then respond with the raw HTML content of the target URL along with a response code:
<html>
<head>
...
</head>
<body>
...
</body>
</html>
With the ScrapeOps Proxy API Aggregator you are only charged for successful requests (200
and 404
status codes).
To learn how to use the ScrapeOps Proxy Aggregator and customise it to your requirement then check out the QuickStart Guide.