Skip to main content
NodeJs Web Scraping Playbook - NodeJs Logo

The
NodeJs
Web Scraping
Playbook

Everything you need to know to become a Node.js Web Scraping Pro!

Web Scraping Community:
Web Scraping Reddit Community
Web Scraping Discord Community

Web Scraping For Beginners Series

Part 1: How To Build Your First Scraper

In Part 1 of the series, we go over the basics of how to build a scraper using Node.js Axios & CheerioJS.

Part 2: Cleaning Dirty Data & Dealing With Edge Cases

In Part 2 of the series, we're going to show you how to make your scraper more robust and reliable.

Part 3: Storing Data in AWS S3, MySQL & Postgres DBs

In Part 3 of the series, we'll explore several different ways we can store the data and talk about their pros, and cons and in which situations you would use them.

Part 4: Retries and Concurrency

In Part 4 of the series, we'll enhance our scraper's reliability and scalability by handling failed requests and utilizing concurrency.

Part 5: Using Fake User-Agents and Browser Headers

In Part 5 of the series, we'll learn how to create a production-ready scraper by simulating real users through user-agent and browser header manipulation.

NodeJS Introduction

How To Minimize Web Scraping Costs With Node.js

In this guide, we will dive into various approaches for minimizing web scraping costs with Node.js.

Proxies, User-Agents & Avoiding Bans

NodeJs Fake User-Agents - How to Manage User Agents When Scraping

In this guide we show you how to create and manage fake user agents when scraping in NodeJs so you don't get blocked.

Node Unblocker - Build Your Own Proxy Server

Node Unblocker is an open source library that allows you to easily build and deploy your own proxy server as a VPN or for web scraping.

How To Solve CAPTCHAs with NodeJS

In this guide, we will explore various strategies and techniques for efficiently bypassing CAPTCHA challenges with NodeJS.

Navigation & Logging In

How to Scroll a Page with NodeJS

In this article, we'll walk through the scrolling techniques using Puppeteer and Playwright.

How To Submit a Form With NodeJS

In this guide, we will dive into various approaches for submitting forms using NodeJS and provide detailed steps and examples for each method.

NodeJS Headless Browsers

The Best Node.js Headless Browsers for Web Scraping

In this guide, we will cover the top Node.js headless browsers used for web scraping today, explaining their key features and providing code examples.

The NodeJS Puppeteer Guide

In this guide, we show you how to use Puppeteer, a Node.js library that offers a high-level, user-friendly API for automating tasks and interacting with dynamic web pages.

The NodeJS Playwright Guide

In this guide, we’ll introduce you to the fundamental functionality of Node.js Playwright and how to use it in your own projects.

HTML Parsing Libraries

The 5 Best NodeJs HTML Parsing Libraries Compared

We compare the 5 best NodeJs HTML parsing libraries available in 2023 - Cheerio, JSDOM, Parse5, htmlparser2, and xml2js.

CheerioJS Guide - Scraping HTML Pages With NodeJs

In this guide, we walk through how to extract valuable data from HTML pages effortlessly, leveraging the robust capabilities of Cheerio for efficient data parsing and manipulation in NodeJS.

Request-Promise

NodeJs Request-Promise: How to Use & Rotate Proxies

In this guide, we walk through how to use proxies with Nodejs Request-Promise. Including how to rotate proxies in a list, use proxy gateways, and proxy APIs with our Nodejs Request-Promise scraper.

NodeJs Request Promise: How to Send POST Requests

In this guide, we walk through how to send POST requests with NodeJs Request Promise. Including how to POST form data and JSON data.

NodeJs Request Promise: Setting Fake User-Agents

In this guide, we walk through how to use fake user-agents with NodeJs Request Promise to avoid your scrapers from getting blocked.

NodeJs Request-Promise: Retry Failed Requests

In this guide, we walk through how to configure NodeJs Request-Promise to retry failed requests so you can build a more reliable system.

NodeJs Request-Promise: Make Concurent Requests

In this guide, we walk through how to configure NodeJs Request-Promise to make concurrent requests so that you can increase the speed of your scrapers.

Axios

Axios: Setting Fake User-Agents

How to use fake user-agents with NodeJs Axios to avoid your scrapers from getting blocked.

Axios: How to Send POST Requests

In this guide, we walk through how to send POST requests with NodeJs Axios. Including how to POST form data and JSON data.

Axios: Retry Failed Requests

In this guide, we walk through how to configure NodeJs Axios to retry failed requests so you can build a more reliable system.

Axios: Make Concurrent Requests

In this guide, we walk through how to configure NodeJs Axios to make concurrent requests so that you can increase the speed of your scrapers.

Node-Fetch

Node-Fetch: How to Use & Rotate Proxies

In this guide, we walk through how to use proxies with Node-Fetch. Including how to rotate proxies in a list, use proxy gateways, and proxy APIs with our Node-Fetch scraper.

Node-Fetch: Setting Fake User-Agents

In this guide, we walk through how to use fake user-agents with Node-Fetch Library to avoid your scrapers from getting blocked.

Node-Fetch: How to Send POST Requests

In this guide, we walk through how to send POST requests with Node-Fetch Library. Including how to POST form data and JSON data.

Node-Fetch: Make Concurrent Requests

In this guide, we walk through how to configure Node-Fetch Library to make concurrent requests so that you can increase the speed of your scrapers.

Node-Fetch: Retry Failed Requests

In this guide, we walk through how to configure Node-Fetch library to retry failed requests so you can build a more reliable system.

NodeJs Got

NodeJs Got: How to Use & Rotate Proxies

In this guide, we walk through how to use proxies with NodeJS Got. Including how to rotate proxies in a list, use proxy gateways, and proxy APIs with our NodeJS Got scraper.

NodeJs Got: Setting Fake User-Agents

In this guide, we walk through how to use fake user-agents with NodeJs Got Library to avoid your scrapers from getting blocked.

NodeJs Got: How to Send POST Requests

In this guide, we walk through how to send POST requests with NodeJS Got Library. Including how to POST form data and JSON data.

NodeJs Got: Make Concurrent Requests

In this guide, we walk through how to configure NodeJs Got Library to make concurrent requests so that you can increase the speed of your scrapers.

NodeJs Got: Retry Failed Requests

In this guide, we walk through how to configure NodeJs Got library to retry failed requests so you can build a more reliable system.

NodeJs SuperAgent

SuperAgent: How to Use & Rotate Proxies

In this guide, we walk through how to use proxies with NodeJS SuperAgent. Including how to rotate proxies in a list, use proxy gateways, and proxy APIs with our SuperAgent scraper.

SuperAgent: Setting Fake User-Agents

In this guide, we walk through how to use fake user-agents with NodeJs SuperAgent Library to avoid your scrapers from getting blocked.

SuperAgent: How to Send POST Requests

In this guide, we walk through how to send POST requests with NodeJs SuperAgent Library. Including how to POST form data and JSON data.

SuperAgent: Make Concurrent Requests

In this guide, we walk through how to configure NodeJs SuperAgent Library to make concurrent requests so that you can increase the speed of your scrapers.

SuperAgent: Retry Failed Requests

In this guide, we walk through how to configure NodeJs SuperAgent library to retry failed requests so you can build a more reliable system.

Need an easy way to monitor your scrapers?

Sign up for a free ScrapeOps account today.