Scrapy SDK Customisation Options
Once, you have integrated the ScrapeOps SDK with your Scrapy spiders using the Scrapy Integration Guide, the SDK will automatically log all your scraping stats and display them on your dashboard.
However, there is more functionality within the Scrapy SDK that enables you to customise how it logs your Scrapy spiders.
Setting a Job Name
With ScrapeOps you can give your jobs unique Job Names so that their stats can be grouped together on the dashboard and have health checks applied to them.
There are two ways of doing this:
- ScrapeOps Server Management - If you are using ScrapeOps to schedule/run jobs on your VM/Scrapyd servers you can give each job a unique name in the dashboard when you schedule it.
- In Your Spider - You can define the job name in your spider using the spider arguement
sops_job_name
.
The following is an example of how to give a job an unique Job Name within your spider.
class DemoSpider(scrapy.Spider):
name = 'demo_spider'
sops_job_name = "MY_JOB_NAME" ## add job name here
def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
quote_item = QuoteItem()
for quote in response.css('div.quote'):
quote_item['text'] = quote.css('span.text::text').get()
quote_item['author'] = quote.css('small.author::text').get()
quote_item['tags'] = quote.css('div.tags a.tag::text').getall()
yield quote_item