Skip to main content

AWS Server Integration

Using ScrapeOps SSH integration, you can easily integrate the ScrapeOps job scheduling and management functionality into a AWS server.

This functionality will allow you to:

  • Deploy your spiders (Python, NodeJs, Scrapy, etc.) from GitHub to your AWS server using the ScrapeOps dashboard.
  • Schedule & run spiders from the ScrapeOps dashboard without having to touch your server.
Job Monitoring:

To use the stats, graphs and alerts functionality of ScrapeOps, you need to install the ScrapeOps SDK in your scrapers.

📋 Prerequisites​

To integrate with a AWS you first must have a AWS server setup.

Python Scrapy Integration​

SSH Configuration Guide​

Single SSH Key for Multiple Repositories​

Let's say you want to use the same SSH key for all your GitHub repositories. GitHub allows you to use a single SSH key to access multiple repositories. This means you can use one "primary" SSH key to access all your repositories. Follow these steps:

  1. Generate a New SSH Key (or choose one of your existing keys):

    ssh-keygen -t rsa -b 4096 -f ~/.ssh/default_github_key
  2. Add the Public Key to Your GitHub Account:

    • Copy the contents of the public key file:
      cat ~/.ssh/default_github_key.pub
    • Go to GitHub > Settings > SSH and GPG keys > New SSH key.
    • Paste the public key content and save.
  3. Update Your SSH Config:

    • Open (or create) your SSH configuration file:

      nano ~/.ssh/config
    • Add the following entry to ensure this key is used for GitHub:

      Host github.com
      HostName github.com
      User git
      IdentityFile ~/.ssh/default_github_key
    • Save and exit.

  4. Verify the Configuration:

    • Test the connection to GitHub:

      ssh -T git@github.com
    • If configured correctly, you should see a message like:
      Hi <username>! You've successfully authenticated, but GitHub does not provide shell access.

Any request to github.com will now use this SSH key. Since the public key is added to your GitHub account and has access to all your repositories, it will work for any repo.