Skip to main content

ScrapeOps n8n Examples

Learn how to build powerful web scraping workflows with real-world examples. Each example includes step-by-step instructions and can be imported directly into your n8n instance.

Full Documentation

For complete API documentation, see:


Quick Start Examples

Example 1: Simple Web Page Scraper

Goal: Scrape a basic web page and save to Google Sheets

Workflow:

[Manual Trigger] → [ScrapeOps Proxy] → [Google Sheets]

Configuration:

  1. ScrapeOps Node (Proxy API)

    URL: https://example.com/products
    Method: GET
    Return Type: Default
  2. Google Sheets Node

    Operation: Append
    Sheet: Web Scraping Results
    Data: {{ $json }}

Example 2: Amazon Price Tracker

Goal: Monitor Amazon product prices daily

Workflow:

[Schedule Trigger] → [ScrapeOps Data API] → [Compare Prices] → [Email Alert]

Configuration:

  1. Schedule Trigger

    Interval: Daily at 9 AM
  2. ScrapeOps Node (Data API)

    Domain: Amazon
    API Type: Product API
    Input Type: ASIN
    ASIN: B08N5WRWNW
  3. Function Node (Compare Prices)

    const currentPrice = $json.data.price;
    const targetPrice = 40.00;

    return {
    priceDropped: currentPrice < targetPrice,
    currentPrice: currentPrice,
    savings: targetPrice - currentPrice
    };

Advanced Workflows

Competitive Price Monitoring

Goal: Track competitor prices across multiple products

graph LR
A[Schedule] --> B[Get Products]
B --> C[Loop]
C --> D[ScrapeOps Data API]
D --> E[Process Price]
E --> F[Update Database]
F --> C
C --> G[Generate Report]
G --> H[Send Email]

Implementation:

  1. MySQL Node - Get Products

    SELECT asin, product_name, target_price 
    FROM products_to_monitor
    WHERE active = 1
  2. Loop Node

    Items: {{ $node["MySQL"].json }}
  3. ScrapeOps Node (Data API)

    Domain: Amazon
    API Type: Product API
    Input Type: ASIN
    ASIN: {{ $json.asin }}
  4. Code Node - Process Price

    const item = $node["Loop"].json;
    const apiResponse = $json;

    return {
    asin: item.asin,
    product_name: item.product_name,
    current_price: apiResponse.data.price,
    target_price: item.target_price,
    below_target: apiResponse.data.price < item.target_price,
    timestamp: new Date().toISOString()
    };

Job Listing Aggregator

Goal: Collect job listings from Indeed and filter by criteria

Workflow Structure:

1. Search Keywords → Indeed Search → Parse Results
2. For Each Result → Get Job Details → Filter Criteria
3. Matching Jobs → Format Data → Send to Slack

Key Nodes Configuration:

  1. ScrapeOps Node (Proxy API) - Search

    URL: https://indeed.com/jobs?q=data+scientist&l=New+York
    Method: GET
    Render JavaScript: true
    Wait For: .jobsearch-ResultsList
  2. ScrapeOps Node (Parser API)

    Domain: Indeed
    Page Type: Search Page
    URL: {{ $node["Proxy_Search"].json.url }}
    HTML: {{ $node["Proxy_Search"].json.body }}
  3. Filter Node

    // Filter jobs by salary and experience
    return $json.data.jobs.filter(job => {
    const salary = parseInt(job.salary?.replace(/\D/g, '') || 0);
    const isRemote = job.location?.includes('Remote');
    const isEntry = !job.description?.includes('Senior');

    return salary > 80000 && (isRemote || isEntry);
    });

E-commerce Inventory Monitor

Goal: Track product availability and alert on restocks

Complete Workflow:

// 1. Schedule Trigger - Every 30 minutes

// 2. Get Products to Monitor
const products = [
{
name: "PlayStation 5",
url: "https://walmart.com/ip/playstation-5/123456",
notify_email: "gamer@example.com"
},
{
name: "Xbox Series X",
url: "https://walmart.com/ip/xbox-series-x/789012",
notify_email: "gamer@example.com"
}
];

// 3. Loop through products
for (const product of products) {
// 4. ScrapeOps Proxy API
const html = await scrapeWithProxy(product.url);

// 5. ScrapeOps Parser API
const parsed = await parseWalmartProduct(html);

// 6. Check availability
if (parsed.data.in_stock && !product.was_in_stock) {
// 7. Send notification
await sendEmail({
to: product.notify_email,
subject: `${product.name} is back in stock!`,
body: `Price: $${parsed.data.price}\nLink: ${product.url}`
});

// 8. Update status
product.was_in_stock = true;
}
}

Integration Patterns

Pattern 1: Scrape → Transform → Store

Use Case: Regular data collection for analysis

[Trigger] → [ScrapeOps] → [Transform] → [Database]
↓ ↓ ↓ ↓
Schedule Get Data Clean/Format PostgreSQL

Example Implementation:

// Transform Node
const rawData = $json;

// Clean and structure data
return {
product_id: rawData.data.asin,
title: rawData.data.title?.trim(),
price: parseFloat(rawData.data.price),
rating: parseFloat(rawData.data.rating),
review_count: parseInt(rawData.data.review_count),
scraped_at: new Date().toISOString(),
marketplace: 'amazon_us'
};

Pattern 2: Monitor → Compare → Act

Use Case: Price drop alerts, stock notifications

[ScrapeOps] → [Get Previous] → [Compare] → [Conditional]
↓ ↓ ↓ ↓
Current Data Database Changes? Email/Slack

Comparison Logic:

const current = $node["ScrapeOps"].json;
const previous = $node["Database"].json;

const changes = {
price_changed: current.price !== previous.price,
price_direction: current.price > previous.price ? 'up' : 'down',
price_difference: Math.abs(current.price - previous.price),
percent_change: ((current.price - previous.price) / previous.price * 100).toFixed(2)
};

return changes.price_difference > 5 ? changes : null;

Pattern 3: Aggregate → Analyze → Report

Use Case: Market research, competitive analysis

[Multiple Scrapers] → [Merge Data] → [Analytics] → [Report]
↓ ↓ ↓ ↓
Parallel Runs Combine CSV Statistics Dashboard

Error Handling Examples

Retry Logic with Exponential Backoff

// Error Handler Node
const maxRetries = 3;
const currentRetry = $node["ScrapeOps"].error?.retryCount || 0;

if (currentRetry < maxRetries) {
// Calculate delay
const delay = Math.pow(2, currentRetry) * 1000; // 1s, 2s, 4s

// Wait before retry
await new Promise(resolve => setTimeout(resolve, delay));

// Retry with incremented count
return {
retry: true,
retryCount: currentRetry + 1
};
} else {
// Log failure and continue
return {
failed: true,
error: $node["ScrapeOps"].error?.message
};
}

Fallback Data Sources

// Primary scraper failed, try alternative
if ($node["ScrapeOps_Primary"].error) {
// Use alternative marketplace
const alternativeUrl = $json.url.replace('amazon.com', 'amazon.co.uk');

// Trigger backup scraper
return {
useBackup: true,
backupUrl: alternativeUrl
};
}

Performance Optimization

Batch Processing Example

// Split large lists into batches
const items = $node["Get_Items"].json;
const batchSize = 10;
const batches = [];

for (let i = 0; i < items.length; i += batchSize) {
batches.push(items.slice(i, i + batchSize));
}

// Process each batch with delays
for (const batch of batches) {
// Process batch in parallel
const results = await Promise.all(
batch.map(item => processItem(item))
);

// Add delay between batches
await new Promise(resolve => setTimeout(resolve, 2000));
}

Caching Implementation

// Check cache before scraping
const cacheKey = `product_${asin}_${country}`;
const cached = await getFromCache(cacheKey);

if (cached && cached.timestamp > Date.now() - 3600000) {
// Use cached data (less than 1 hour old)
return cached.data;
} else {
// Scrape fresh data
const fresh = await scrapeProduct(asin, country);

// Update cache
await saveToCache(cacheKey, fresh);

return fresh;
}

Complete Workflow Examples

1. Amazon to Shopify Product Importer

Workflow Overview:

  1. Read Amazon ASINs from CSV
  2. Get product details via Data API
  3. Transform to Shopify format
  4. Create/Update Shopify products
  5. Log results

2. Real Estate Price Tracker

Workflow Overview:

  1. Search Redfin for properties
  2. Parse search results
  3. Get detailed property info
  4. Compare with historical data
  5. Generate market report

3. Review Sentiment Analyzer

Workflow Overview:

  1. Get product reviews from Amazon
  2. Parse review content
  3. Run sentiment analysis
  4. Calculate aggregate scores
  5. Create visual report

Tips for Building Workflows

1. Start Simple

  • Test with single items before loops
  • Add complexity incrementally
  • Use manual triggers during development

2. Handle Errors Gracefully

  • Add error branches to critical nodes
  • Log errors for debugging
  • Implement retry mechanisms

3. Optimize for Performance

  • Use caching where appropriate
  • Batch similar requests
  • Add delays to respect rate limits

4. Monitor and Maintain

  • Set up alerts for failures
  • Track success rates
  • Review logs regularly

Getting Help

Resources


Common Issues

  • Rate Limits: Add delays between requests
  • Dynamic Content: Enable JavaScript rendering
  • Authentication: Use session management
  • Large Data: Implement pagination

Ready to build your own workflows? Start with our templates or create from scratch!