Skip to main content

Scrapy SDK Customisation Options

Once, you have integrated the ScrapeOps SDK with your Scrapy spiders using the Scrapy Integration Guide, the SDK will automatically log all your scraping stats and display them on your dashboard.

However, there is more functionality within the Scrapy SDK that enables you to customise how it logs your Scrapy spiders.


Setting a Job Name

With ScrapeOps you can give your jobs unique Job Names so that their stats can be grouped together on the dashboard and have health checks applied to them.

There are two ways of doing this:

  1. ScrapeOps Server Management - If you are using ScrapeOps to schedule/run jobs on your VM/Scrapyd servers you can give each job a unique name in the dashboard when you schedule it.
  2. In Your Spider - You can define the job name in your spider using the spider arguement sops_job_name.

The following is an example of how to give a job an unique Job Name within your spider.

class DemoSpider(scrapy.Spider):
name = 'demo_spider'
sops_job_name = "MY_JOB_NAME" ## add job name here

def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)

def parse(self, response):
quote_item = QuoteItem()
for quote in response.css('div.quote'):
quote_item['text'] = quote.css('span.text::text').get()
quote_item['author'] = quote.css('small.author::text').get()
quote_item['tags'] = quote.css('div.tags a.tag::text').getall()
yield quote_item