Web Scraping For Beginners Series
Part 1: How To Build Your First Scraper
In Part 1 of the series, we go over the basics of how to build a scraper using Node.js Axios & CheerioJS.
Part 2: Cleaning Dirty Data & Dealing With Edge Cases
In Part 2 of the series, we're going to show you how to make your scraper more robust and reliable.
Part 3: Storing Data in AWS S3, MySQL & Postgres DBs
In Part 3 of the series, we'll explore several different ways we can store the data and talk about their pros, and cons and in which situations you would use them.
Part 4: Retries and Concurrency
In Part 4 of the series, we'll enhance our scraper's reliability and scalability by handling failed requests and utilizing concurrency.
Part 5: Using Fake User-Agents and Browser Headers
In Part 5 of the series, we'll learn how to create a production-ready scraper by simulating real users through user-agent and browser header manipulation.
NodeJS Introduction
How To Minimize Web Scraping Costs With Node.js
In this guide, we will dive into various approaches for minimizing web scraping costs with Node.js.
Proxies, User-Agents & Avoiding Bans
NodeJs Fake User-Agents - How to Manage User Agents When Scraping
In this guide we show you how to create and manage fake user agents when scraping in NodeJs so you don't get blocked.
Node Unblocker - Build Your Own Proxy Server
Node Unblocker is an open source library that allows you to easily build and deploy your own proxy server as a VPN or for web scraping.
How To Solve CAPTCHAs with NodeJS
In this guide, we will explore various strategies and techniques for efficiently bypassing CAPTCHA challenges with NodeJS.
Navigation & Logging In
How to Scroll a Page with NodeJS
In this article, we'll walk through the scrolling techniques using Puppeteer and Playwright.
How To Submit a Form With NodeJS
In this guide, we will dive into various approaches for submitting forms using NodeJS and provide detailed steps and examples for each method.
NodeJS Headless Browsers
The Best Node.js Headless Browsers for Web Scraping
In this guide, we will cover the top Node.js headless browsers used for web scraping today, explaining their key features and providing code examples.
The NodeJS Puppeteer Guide
In this guide, we show you how to use Puppeteer, a Node.js library that offers a high-level, user-friendly API for automating tasks and interacting with dynamic web pages.
The NodeJS Playwright Guide
In this guide, we’ll introduce you to the fundamental functionality of Node.js Playwright and how to use it in your own projects.
HTML Parsing Libraries
The 5 Best NodeJs HTML Parsing Libraries Compared
We compare the 5 best NodeJs HTML parsing libraries available in 2023 - Cheerio, JSDOM, Parse5, htmlparser2, and xml2js.
CheerioJS Guide - Scraping HTML Pages With NodeJs
In this guide, we walk through how to extract valuable data from HTML pages effortlessly, leveraging the robust capabilities of Cheerio for efficient data parsing and manipulation in NodeJS.
Request-Promise
NodeJs Request-Promise: How to Use & Rotate Proxies
In this guide, we walk through how to use proxies with Nodejs Request-Promise. Including how to rotate proxies in a list, use proxy gateways, and proxy APIs with our Nodejs Request-Promise scraper.
NodeJs Request Promise: How to Send POST Requests
In this guide, we walk through how to send POST requests with NodeJs Request Promise. Including how to POST form data and JSON data.
NodeJs Request Promise: Setting Fake User-Agents
In this guide, we walk through how to use fake user-agents with NodeJs Request Promise to avoid your scrapers from getting blocked.
NodeJs Request-Promise: Retry Failed Requests
In this guide, we walk through how to configure NodeJs Request-Promise to retry failed requests so you can build a more reliable system.
NodeJs Request-Promise: Make Concurent Requests
In this guide, we walk through how to configure NodeJs Request-Promise to make concurrent requests so that you can increase the speed of your scrapers.
Axios
Axios: Setting Fake User-Agents
How to use fake user-agents with NodeJs Axios to avoid your scrapers from getting blocked.
Axios: How to Send POST Requests
In this guide, we walk through how to send POST requests with NodeJs Axios. Including how to POST form data and JSON data.
Axios: Retry Failed Requests
In this guide, we walk through how to configure NodeJs Axios to retry failed requests so you can build a more reliable system.
Axios: Make Concurrent Requests
In this guide, we walk through how to configure NodeJs Axios to make concurrent requests so that you can increase the speed of your scrapers.
Node-Fetch
Node-Fetch: How to Use & Rotate Proxies
In this guide, we walk through how to use proxies with Node-Fetch. Including how to rotate proxies in a list, use proxy gateways, and proxy APIs with our Node-Fetch scraper.
Node-Fetch: Setting Fake User-Agents
In this guide, we walk through how to use fake user-agents with Node-Fetch Library to avoid your scrapers from getting blocked.
Node-Fetch: How to Send POST Requests
In this guide, we walk through how to send POST requests with Node-Fetch Library. Including how to POST form data and JSON data.
Node-Fetch: Make Concurrent Requests
In this guide, we walk through how to configure Node-Fetch Library to make concurrent requests so that you can increase the speed of your scrapers.
Node-Fetch: Retry Failed Requests
In this guide, we walk through how to configure Node-Fetch library to retry failed requests so you can build a more reliable system.
NodeJs Got
NodeJs Got: How to Use & Rotate Proxies
In this guide, we walk through how to use proxies with NodeJS Got. Including how to rotate proxies in a list, use proxy gateways, and proxy APIs with our NodeJS Got scraper.
NodeJs Got: Setting Fake User-Agents
In this guide, we walk through how to use fake user-agents with NodeJs Got Library to avoid your scrapers from getting blocked.
NodeJs Got: How to Send POST Requests
In this guide, we walk through how to send POST requests with NodeJS Got Library. Including how to POST form data and JSON data.
NodeJs Got: Make Concurrent Requests
In this guide, we walk through how to configure NodeJs Got Library to make concurrent requests so that you can increase the speed of your scrapers.
NodeJs Got: Retry Failed Requests
In this guide, we walk through how to configure NodeJs Got library to retry failed requests so you can build a more reliable system.
NodeJs SuperAgent
SuperAgent: How to Use & Rotate Proxies
In this guide, we walk through how to use proxies with NodeJS SuperAgent. Including how to rotate proxies in a list, use proxy gateways, and proxy APIs with our SuperAgent scraper.
SuperAgent: Setting Fake User-Agents
In this guide, we walk through how to use fake user-agents with NodeJs SuperAgent Library to avoid your scrapers from getting blocked.
SuperAgent: How to Send POST Requests
In this guide, we walk through how to send POST requests with NodeJs SuperAgent Library. Including how to POST form data and JSON data.
SuperAgent: Make Concurrent Requests
In this guide, we walk through how to configure NodeJs SuperAgent Library to make concurrent requests so that you can increase the speed of your scrapers.
SuperAgent: Retry Failed Requests
In this guide, we walk through how to configure NodeJs SuperAgent library to retry failed requests so you can build a more reliable system.