How to Solve CAPTCHAs with NodeJS

How To Solve CAPTCHAs with NodeJS

As developers, we often encounter situations where automated processes need to interact with websites that employ CAPTCHA protection. Fortunately, with the power of NodeJS and the vast ecosystem of libraries and tools available, tackling CAPTCHAs programmatically has become more feasible than ever.

In this article, we will explore various strategies and techniques for efficiently bypassing CAPTCHA challenges with NodeJS.

TLDR: Solving CAPTCHAs Efficiently with NodeJS
Understanding CAPTCHAs and Their Challenges
How to Solve CAPTCHAs?
Solving Text CAPTCHAs With NodeJS
Solving reCAPTCHAs With NodeJS
Solving hCAPTCHAs With NodeJS
Solving Audio CAPTCHAs With NodeJS
How To Avoid Triggering CAPTCHAs
How To Avoid CAPTCHAs Using ScrapeOps Proxy Aggregator
The Legal and Ethical Implications of Bypassing CAPTCHAs
Conclusion
More NodeJS Web Scraping Guides

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.

Proxy Manager

Scraper Monitoring

Job Scheduling

TLDR: Solving CAPTCHAs Efficiently with NodeJS

Solving CAPTCHAs in NodeJS quickly and efficiently involves using an existing service such as 2captcha. In terms of efficiency, this method is preferred as it offers a strong NodeJS API for solving a myriad of CAPTCHA challenges.

Assuming you've found a website that permits you to solve a CAPTCHA challenge automatically, let's see how to actually do it in code.

Before running the code, it's necessary to sign up on their website and select a plan. That will give you the API key you can use in the actual code.

After the subscription, let's install 2captcha via npm:

npm install 2captcha

Now, replace the API_KEY_HERE with the API key you copied from the 2captcha's API dashboard and run the code.

const Captcha = require("2captcha");

const solver = new Captcha.Solver("API_KEY_HERE");

solver
  .recaptcha("6Ld2sf4SAAAAAKSgzs0Q13IZhY02Pyo31S2jgOB5", "https://patrickhlauke.github.io/recaptcha/")
  .then((res) => {
    console.log("reCAPTCHA passed successfully.");
  })
  .catch((err) => {
    console.error(err.message);
  });

You should see the message reCAPTCHA passed successfully.

Understanding CAPTCHAs and Their Challenges

What are CAPTCHAs?

CAPTCHA stands for "Completely Automated Public Turing test to tell Computers and Humans Apart". It is a technique used to prevent automated programs such as bots from accessing certain resources and websites.

CAPTCHA technology has evolved over the years, offering various methods to verify human interaction and prevent automated bots from accessing websites.

Here are some of the most common types of CAPTCHAs:

Google reCAPTCHA (v2 / v3): Developed by Google, reCAPTCHA is one of the most widely used CAPTCHA systems. It offers both v2 and v3 versions, with v2 typically requiring users to click on checkboxes or solve image-based challenges, while v3 operates in the background to assess the user's behavior without explicit interaction.
hCaptcha: hCaptcha is a popular alternative to Google reCAPTCHA, providing similar functionality for verifying human users. It presents users with various challenges, such as selecting specific images or solving puzzles, to validate their identity.
FunCaptcha: FunCaptcha introduces a gamified approach to CAPTCHA verification, engaging users with interactive challenges like spinning wheels, sliding puzzles, or simple games. This type of CAPTCHA aims to enhance user experience while maintaining security.
Base64 Image CAPTCHAs: Base64 Image CAPTCHAs involve encoding images into base64 format and presenting them to users for verification. Users may need to identify objects, characters, or patterns within the encoded images to prove their humanity.
Audio CAPTCHAs: Audio CAPTCHA provides an alternative method for users with visual impairments or difficulties in solving image-based challenges. Instead of presenting visual cues, Audio CAPTCHA delivers audio clips containing spoken characters or numbers that users must listen to and transcribe accurately to pass the verification process. This approach ensures accessibility for a wider range of users while maintaining security against automated bots.

These CAPTCHA variants serve different purposes and offer varying levels of complexity and website protection. Website administrators often choose CAPTCHA types based on their security needs and user preferences.

Can CAPTCHAs Be Bypassed?

While CAPTCHAs pose a challenge for automated systems, they are not entirely foolproof. They can be bypassed in different ways, although they are intentionally crafted to be complex for computers to crack.

CAPTCHAs can be troubling when writing scrapers because they prevent automated approaches for collecting relevant data. Therefore, an effective solution is to employ a mechanism for CAPTCHA-solving that can bypass this issue and give access to the requested resource.

Approaches to Dealing With CAPTCHAs

When it comes to solving CAPTCHA, there are four general approaches: Avoidance, solving it "DIY", employing an OCR, and using existing services.

1. Avoidance

The most effective strategy to manage CAPTCHAs is to avoid encountering them in the first place. In case a CAPTCHA does surface, retrying the request may still resolve the issue.

Pros:
- Reduces the risk of encountering CAPTCHAs, streamlining data extraction
- Enhances efficiency by bypassing CAPTCHA-solving processes, saving time and resources
Cons:
- Requires sophisticated techniques to convincingly mimic human behavior
- Not foolproof, as CAPTCHA mechanisms may still detect bot-like behavior despite emulation efforts

2. DIY Custom CAPTCHA-Solving Solutions

Creating custom CAPTCHA-solving solutions, though challenging, offers flexibility for tackling various challenges.

An example would be training a machine learning model complex but competent enough to solve a specific CAPTCHA.

Pros:
- Ideal for complex or customized CAPTCHAs, fostering innovation and skill development
- Presents a stimulating programming challenge for advanced developers and hackers
Cons:
- Demands significant time and expertise for implementation due to complexity
- Limited accessibility for beginners due to high expertise requirements
- Difficulty generalizing, as each type of CAPTCHA needs to be solved on its own

3. OCR

Utilizing Optical Character Recognition (OCR) for CAPTCHA solving is an intermediate way to solve challenges, and may offer low-cost solutions depending on the use-case.

Pros:
- Free if using open-source libraries like Tesseract.js
- Versatile for handling image-based CAPTCHAs
Cons:
- Requires proficiency in API integration and image manipulation techniques
- Limited to image-based CAPTCHAs,
- Has a learning curve for beginners

4. Existing Services

Utilizing third-party services like 2captcha provides a user-friendly and accessible approach for all levels of programming skill.

The way 2captcha and other similar services work is by delegating the task of solving CAPTCHAs to workers dispersed all around the globe. They get paid for solving CAPTCHAs, so whenever you send an API request, it is in fact being solved by another human instead of a machine.

Pros:
- Easy-to-use API, accessible to everyone
- Reliable and scalable solutions for various CAPTCHA types
Cons:
- Costs associated with each API request
- Dependency on external services may introduce reliability issues and privacy concerns

How to Solve CAPTCHAs?

There are many ways to solve CAPTCHAs, such as using free and open-source libraries or paid third-party services.

Free & Open Source CAPTCHA Solving Libraries

Popular NodeJS library for CAPTCHA solving is Tesseract OCR. In this section, we'll see how to utilize Tesseract for solving image-based CAPTCHAs.

Tesseract OCR

Tesseract is an open-source OCR library that can be utilized for solving text-based CAPTCHAs.

Features:

Tesseract OCR offers high accuracy in text recognition, making it suitable for text-based CAPTCHAs. It supports multiple languages and can process various font styles and sizes
Excels in handling simple and moderately complex text-based CAPTCHAs. It provides options for image preprocessing and post-processing to enhance recognition accuracy

Installation Instructions:

Install the Tesseract OCR library using npm:

npm install tesseract.js

After installation, the below code should run out-of-the-box. Make sure to replace the IMAGE_URL with the URL of the image you'd like to extract text from.

Usage Example:

const { createWorker } = require("tesseract.js");

(async () => {
  const worker = await createWorker("eng");
  const ret = await worker.recognize("IMAGE_URL");

  console.log(ret.data.text);

  await worker.terminate();
})();

The above code extracts data from a given image.
If we were to solve a text-based CAPTCHA on a live website, we could utilize Tesseract and playwright to automate the CAPTCHA-solving process.
The program would connect to a website using Playwright, fetch the CAPTCHA image, solve it with Tesseract, and then type the extracted text into a text input field.

Besides Tesseract, here is an inconclusive list of tools and libraries that do the job of solving CAPTCHAs (note that some of them may be obsolete for the newest versions of NodeJS):

arunpatala/captcha: Utilizes the Torch machine learning library to create a dataset of 10,000 samples, each with 5 characters, aiming to break Simple Captcha, a Java-based CAPTCHA software.
zakizhou/CAPTCHA: Implements a small convolutional neural network using TensorFlow to recognize CAPTCHAs, with images containing four digits and noise for simplicity.
nladuo/captcha-break: Implements CAPTCHA breaking based on OpenCV, Tesseract-OCR, and a machine learning algorithm.
ypwhs/captcha_break: Utilizes Keras to build a deep convolutional neural network for identifying CAPTCHA verification codes.
ptigas/simple-captcha-solver: Offers a simple solver for specific and easy-to-solve CAPTCHAs, employing a scoring system to determine the correct text.
rickyhan/SimGAN-Captcha: Assists in solving CAPTCHAs without manual labeling by generating synthesized training pairs using GANs.
arunpatala/captcha.irctc: Achieves 98% accuracy in reading IRCTC CAPTCHAs using deep learning, particularly designed for the popular Indian travel website.
JackonYang/captcha-tensorflow: Solves image CAPTCHAs using TensorFlow and CNN with over 90% accuracy.
skyduy/CNN_keras: Implements a convolutional neural network with Keras, achieving 95% accuracy in recognizing single letters from a dataset of about 5000 samples.
PatrickLib/captcha_recognize: Provides an image recognition CAPTCHA with high accuracy using Python Keras and OpenCV.
zhengwh/captcha-svm: Identifies and solves simple CAPTCHAs using a support vector machine (SVM).
chxj1992/captcha_cracker: Offers a simple verification code recognition function using a convolutional neural network model in Keras.
chxj1992/slide_captcha_cracker: Locates sliding verification code puzzles in background images using OpenCV and image edge detection algorithms.
JasonLiTW/simple-railway-captcha-solver#english-version: Provides a simple railway CAPTCHA solver using a convolutional neural network, achieving high accuracy.
lllcho/CAPTCHA-breaking: Breaks simple CAPTCHAs using Python Keras and OpenCV.
ecthros/uncaptcha: Defeats Google's audio reCAPTCHA with 85% accuracy by identifying spoken numbers and passing the reCAPTCHA program programmatically.
dessant/buster: Assists in solving difficult CAPTCHAs by completing reCAPTCHA audio challenges using speech recognition.
kerlomz/captcha_trainer: Offers a deep learning-based image verification code solution capable of handling complex scenarios.

3rd Party CAPTCHA Solving Services

Third party services employ either real people or advanced algorithms to solve CAPTCHAs. By integrating them into your web-scraping code, you can submit CAPTCHA images, receive solutions, and input them into the page automatically, as we've shown in the previous example with Tesseract. The ingredients change but the recipe is the same.

2Captcha: solves a large number of existing CAPTCHAs such as reCAPTCHA v2/v3, hCAPTCHAs, and audio CAPTCHAs. Its price starts at $1.00 for 1,000 solved CAPTCHAs, and its auto captcha solver response time is less than 12 seconds.
Anti-Captcha: relies on human workers worldwide, guaranteeing a solution rate of 100% for CAPTCHAs. Prices begin at $0.50 per 1,000 images.
DeathByCaptcha: an all-encompassing CAPTCHA-solving service, Death by CAPTCHA can handle any type of CAPTCHA challenge. Prices vary based on the complexity of the CAPTCHA, ranging from $0.99-$2 per 1,000 CAPTCHAs for standard challenges to $3.99 per 1,000 CAPTCHAs for hCAPTCHAs.

Solving Text CAPTCHAs With NodeJS

Alphanumeric CAPTCHAs are a type of challenge-response test used to determine whether the user is human or a bot. They typically involve presenting users with a combination of letters and numbers arranged in a distorted or obscured manner.

Users are required to correctly identify and input the alphanumeric characters into a text box to pass the test. This process helps websites protect against automated spam and malicious activities by ensuring that only human users can successfully complete the CAPTCHA.

Here's how a text CAPTCHA looks like:

An example of a text CAPTCHA

Solving Text CAPTCHAs With Open Source Solution

Tesseract, out of the box, doesn't seem to be as robust as 2captcha since in this example it decoded the above captcha to WOHSIK. However, this is somewhat expected, as 2captcha relies on human workers to solve the challenges which offers much higher accuracy. To make Tesseract more robust, a more serious dedication is needed to fully utilize the library's potential.

Make sure to have tesseract.js and playwright libraries installed:

npm install tesseract.js
npm install playwright

Then run the code below:

const { createWorker } = require("tesseract.js");
const { chromium } = require("playwright");

(async () => {
  const browser = await chromium.launch();
  const page = await browser.newPage();

  try {
    await page.goto("https://2captcha.com/demo/normal");

    const worker = await createWorker("eng");
    const ret = await worker.recognize("./photos/demo.png");

    let captchaText = ret.data.text;
    await worker.terminate();

    console.log("CAPTCHA result:", captchaText);
  } catch (error) {
    console.error("Error fetching and solving CAPTCHA:", error);
  } finally {
    await browser.close();
  }
})();

Solving Text CAPTCHAs With Paid Service

Below is a simple example that uses 2captcha and Playwright to solve a CAPTCHA text challenge. In this example, the CAPTCHA has been pre-saved into a local image. Before proceeding, make sure to install the 2captcha library as well as playwright:

npm install 2captcha
npm install playwright

If you aren't familiar with Playwright, it is a is a framework for web testing and automation. In short, it allows making requests to and getting responses from websites. We'll be using it throughout this guide.

After installing the necessary libraries, you'll need to sign up for an account at the 2captcha's website. After buying a price plan, you'll be given an API key. To make the code below runnable, make sure to replace API_KEY with your own.

The image used has been pre-downloaded for simplicity, but a full URL can be passed just as easily.

const { chromium } = require("playwright");
const Captcha = require("2captcha");
const fs = require("fs");

(async () => {
  const browser = await chromium.launch();
  const page = await browser.newPage();

  try {
    await page.goto("https://2captcha.com/demo/normal");

    const solver = new Captcha.Solver("API_KEY");
    const captchaText = await solver.imageCaptcha(fs.readFileSync("./demo.png", "base64"));

    await page.fill("#simple-captcha-field", captchaText.data);
    await page.click('[data-action="demo_action"]');

    console.log("CAPTCHA solved!");
  } catch (error) {
    console.error("Error fetching and solving CAPTCHA:", error);
  } finally {
    await browser.close();
  }
})();

Depending on the exact image, running the above code will yield the extracted text.

Solving reCAPTCHAs With NodeJS

reCAPTCHA, developed by Google, is a security measure designed to distinguish between human users and automated bots on the internet. It presents users with challenges, such as checkboxes or invisible verification, that are easy for humans to solve but difficult for bots.

Through advanced algorithms analyzing user interactions, reCAPTCHA effectively protects websites from spam and unauthorized access while ensuring a seamless experience for genuine users.

An example of how a reCAPTCHA looks like:

reCAPTCHA example

Solving reCAPTCHAs With Open Source Solution

The following example uses Playwright for browser automation and the recaptcha-solver library for solving CAPTCHA. There are no API keys required; simply run the code below.

const { chromium } = require("playwright-core");
const { solve } = require("recaptcha-solver");

const EXAMPLE_PAGE = "https://www.google.com/recaptcha/api2/demo";

main();

async function main() {
  const browser = await chromium.launch({ headless: false });
  const page = await browser.newPage();
  await page.goto(EXAMPLE_PAGE);

  await solve(page);
  console.log("reCAPTCHA solved!");

  await page.click("#recaptcha-demo-submit");

  page.on("close", async () => {
    await browser.close();
    process.exit(0);
  });
}

Solving reCAPTCHAs With Paid Service

Like in the previous paid service example, we'll be using 2captcha for solving the reCAPTCHA challenge.

Since reCAPTCHA is integrated into a website as an element, each top-level div contains a unique property named data-sitekey. This parameter should be extracted manually, then passed to the recaptcha method. To see an example, take a look at https://patrickhlauke.github.io/recaptcha.

By using Developer Tools, inspect the page and find the top-level div. Inside, you should find the data-sitekey property.

To solve the above reCAPTCHA, we'll pass the data-sitekey value and the URL of the website into the API method, respectively. The API key should also be plugged instead of API_KEY_HERE.

const Captcha = require("2captcha");

const solver = new Captcha.Solver("API_KEY_HERE");

solver
  .recaptcha("6Ld2sf4SAAAAAKSgzs0Q13IZhY02Pyo31S2jgOB5", "https://patrickhlauke.github.io/recaptcha/")
  .then((res) => {
    console.log("reCAPTCHA passed successfully.");
  })
  .catch((err) => {
    console.error(err.message);
  });

After running the code, you should see the message reCAPTCHA passed successfully.

Solving hCAPTCHAs With NodeJS

hCAPTCHAs are a type of challenge-response test used to distinguish between human users and automated bots on the internet. Unlike traditional CAPTCHAs, which rely on distorted text or image recognition, hCAPTCHAs require users to complete tasks that are easy for humans but difficult for bots, such as identifying objects in images or answering simple questions.

This approach helps enhance security by ensuring that only genuine human users can pass the CAPTCHA test, thereby protecting websites from spam, fraud, and other malicious activities.

Here's how hHAPTCHA looks like:

hCaptcha example

Solving hCAPTCHAs With Paid Service

Now we'll solve a hCAPTCHA with 2captcha so make sure it's installed. In the designated place below, paste your API key and simply run the code.

The process of solving hCAPTCHA challenges is similar to that of reCAPTCHA. Within the div structure, there is a data-sitekey property which needs to be manually extracted and passed to the hcaptcha API method.

For example, take a look at 2captcha's demo page where you test and experiment around with actual CAPTCHA challenges in a simulated environment.

Once we have the data-sitekey value and the URL of the website, we'll plug those into the API method, respectively. The API key should also be written instead of API_KEY_HERE

const Captcha = require("2captcha");

const solver = new Captcha.Solver("API_KEY_HERE");

solver
  .hcaptcha("f7de0da3-3303-44e8-ab48-fa32ff8ccc7b", "https://2captcha.com/demo/hcaptcha?difficulty=moderate")
  .then((res) => {
    console.log("hCAPTCHA passed successfully");
  })
  .catch((err) => {
    console.error(err.message);
  });

The expected output is hCAPTCHA passed successfully.

Solving Audio CAPTCHAs With NodeJS

Audio CAPTCHAs are a type of challenge-response test designed to verify the user's humanity by presenting an audio challenge instead of a visual one. They work by playing a sequence of distorted or scrambled audio recordings containing spoken letters, numbers, or words. Users are required to listen to the audio and accurately transcribe the content into a text box to pass the test.

This method provides accessibility for users with visual impairments while still effectively preventing automated bots from bypassing the CAPTCHA.

An image of an audio captcha as it appears on a website:

Audio CAPTCHA example

Solving Audio CAPTCHAs With Paid Service

Once again, we'll employ 2captcha for solving audio. However, unlike in the previous examples, this one fully relies on a pre-trained machine learning model for audio recognition. Currently, the supported speech languages are English, French, German, Greek, Russian, and Portuguese.

At the time of writing this article, the NodeJS 2captcha library does not offer a method for directly solving audio CAPTCHAs. So for this example, we'll have to write our own API caller by using the node-fetch library.

If not already, go ahead and install the required library:

npm install node-fetch

First we'll send an audio CAPTCHA request and store the received task ID. Then, as per the documentation, we'll create a timeout of several seconds before fetching the result. This gives enough time to the underlying machine learning model to perform the inference and make ready the result.

The example audio CAPTCHA was acquired from https://captcha.com/audio-captcha-examples.html and stored locally. Make sure to update the audio path as well as the placeholder API key.

const fetch = require("node-fetch");
const fs = require("fs");

const apiKey = "API_KEY_HERE";

const submitAudioCaptchaTask = async () => {
  const audioBase64 = fs.readFileSync("./audio/robot.mp3", "base64");

  const requestBody = {
    clientKey: apiKey,
    task: {
      type: "AudioTask",
      body: audioBase64,
      lang: "en", // Specify the language of the audio
    },
  };

  try {
    const response = await fetch("https://api.2captcha.com/createTask", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
      },
      body: JSON.stringify(requestBody),
    });

    if (!response.ok) {
      throw new Error(`HTTP error! Status: ${response.status}`);
    }

    const data = await response.json();
    return data.taskId;
  } catch (error) {
    throw new Error("Error submitting audio CAPTCHA task: " + error.message);
  }
};

const fetchAudioCaptchaResult = async (taskId) => {
  try {
    // Delay for 10 seconds before fetching the result
    await new Promise((resolve) => setTimeout(resolve, 10000));

    const response = await fetch(`https://api.2captcha.com/res.php?key=${apiKey}&action=get&id=${taskId}&json=1`);

    if (!response.ok) {
      throw new Error(`Response error: ${response.status}`);
    }

    const data = await response.json();
    console.log("CAPTCHA result:", data);
  } catch (error) {
    console.error("Error getting CAPTCHA result:", error);
  }
};

const main = async () => {
  try {
    const taskId = await submitAudioCaptchaTask();
    await fetchAudioCaptchaResult(taskId);
  } catch (error) {
    console.error(error.message);
  }
};

main();

After running the code, the resulting text should be visible in the console.

How To Avoid Triggering CAPTCHAs

An alternative to dealing with CAPTCHAs is not to trigger them in the first place. This approach works for those CAPTCHAs that aren't pernemently embedded in the page, and are instead shown when the website thinks the request isn't coming from a human. Therefore, by adding a certain amount of randomness, it is possible to emulate human behaviour and bypass a CAPTCHA system altogether.

Ways to avoid triggering CAPTCHAs

Emulating Human Behavior: Incorporate randomized delays between requests, simulate natural mouse movements, and pace interactions with websites to mimic genuine browsing behavior. This helps evade detection by CAPTCHA systems that flag unusual activity.
Proxy Selection and Rotation: Rotate user agents and IP addresses using proxies to vary your bot's digital fingerprint. By appearing as multiple distinct users, you reduce the likelihood of being flagged as automated.
Usage of Specific Tools: Employ headless browsers like Puppeteer or Selenium for easy interaction with websites, and use CAPTCHA solving services such as 2Captcha or Anti-Captcha as a last resort to handle challenges automatically. These tools enhance efficiency while minimizing the risk of triggering CAPTCHAs.

How To Avoid CAPTCHAs Using ScrapeOps Proxy Aggregator

You can avoid anti-bot CAPTCHAs using a service like the ScrapeOps Proxy Aggregator. ScrapeOps takes care of proxy selection and rotation, so you only need to send the URL you want to scrape.

ScrapeOps Proxy Aggregator is an all-in-one proxy API that allows you to use over 20 proxy providers from a single API.

Here is an example of how to use ScrapeOps as a proxy aggregator alongside the Playwright framework.

We'll be connecting to Quotes to Scrape which offers a collection of various quotes.

First, make sure you have Playwright and Cheerio installed:

npm i playwright
npm i cheerio

After they've been installed, go ahead and run the following code. Make sure you replace YOUR_API_KEY with the API key you've been assigned to by ScrapeOps after signing up.

const playwright = require("playwright");
const cheerio = require("cheerio");

// ScrapeOps proxy configuration
PROXY_USERNAME = "scrapeops.headless_browser_mode=true";
PROXY_PASSWORD = "YOUR_API_KEY";
PROXY_SERVER = "proxy.scrapeops.io";
PROXY_SERVER_PORT = "5353";

(async () => {
  const browser = await playwright.chromium.launch({
    headless: true,
    proxy: {
      server: `http://${PROXY_SERVER}:${PROXY_SERVER_PORT}`,
      username: PROXY_USERNAME,
      password: PROXY_PASSWORD,
    },
  });

  const context = await browser.newContext({ ignoreHTTPSErrors: true });
  const page = await context.newPage();

  try {
    await page.goto("https://quotes.toscrape.com/page/3/", { timeout: 180000 });

    let bodyHTML = await page.evaluate(() => document.body.innerHTML);
    let $ = cheerio.load(bodyHTML);

    let h1Text = $("h1").text();
    console.log("h1Text:", h1Text);
  } catch (err) {
    console.log(err);
  }

  await browser.close();
})();

Example of Using Proxy Aggregator on an Amazon CAPTCHA Page

The following example shows how a CAPTCHA page is easily avoided by using a proxy aggregator such as ScrapeOps. First, let's see what happens when running the request directly without any aggregator in place.

As in previous examples, we'll be using playwright here. No other libraries are needed for now.

const { chromium } = require("playwright");

(async () => {
  const browser = await chromium.launch();
  const page = await browser.newPage();

  try {
    const customHeaders = {
      "User-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Safari/605.1.15",
      Connection: "keep-alive",
    };

    const url = "https://www.amazon.com/Star-Wars-Jokes-Worst-Galaxy/dp/1797227459/";

    await page.goto(url, { headers: customHeaders });
    await page.waitForLoadState("networkidle");

    const pageContent = await page.content();
    console.log("Page loaded.");
  } catch (error) {
    console.error("Error:", error);
  } finally {
    await browser.close();
  }
})();

Running the above code won't work. The result errors with a Timeout 30000ms exceeded. Obviously, the script can't access the page. Now, let's trying using ScrapeOps. In this example, we'll additionally need the cheerio library.

After making some changes and using ScrapeOps' API, let's run the code again:

const playwright = require("playwright");
const cheerio = require("cheerio");

// ScrapeOps proxy configuration
PROXY_USERNAME = "scrapeops.headless_browser_mode=true";
PROXY_PASSWORD = "YOUR_API_KEY";
PROXY_SERVER = "proxy.scrapeops.io";
PROXY_SERVER_PORT = "5353";

(async () => {
  const browser = await playwright.chromium.launch({
    headless: true,
    proxy: {
      server: `http://${PROXY_SERVER}:${PROXY_SERVER_PORT}`,
      username: PROXY_USERNAME,
      password: PROXY_PASSWORD,
    },
  });

  const context = await browser.newContext({ ignoreHTTPSErrors: true });
  const page = await context.newPage();

  try {
    await page.goto("https://www.amazon.com/Star-Wars-Jokes-Worst-Galaxy/dp/1797227459/", { timeout: 180000 });

    let bodyHTML = await page.evaluate(() => document.body.innerHTML);
    let $ = cheerio.load(bodyHTML);

    console.log("Page loaded.");
  } catch (err) {
    console.log(err);
  }

  await browser.close();
})();

This time, the page loads successfully.

The Legal and Ethical Implications of Bypassing CAPTCHAs

This article serves as an informational guide on solving CAPTCHAs using NodeJS. It is crucial to note that the content presented here is exclusively for educational purposes. We do not endorse nor condone the circumvention of CAPTCHA security measures to disrupt software systems or engage in any form of malicious activity. It is critical to maintain ethical standards and comply with legal regulations.

Legal regulations in question include but aren't limited to the terms of service and privacy policies of websites and services. By registering on a website, users explicitly agree to adhere to its terms of service which may prohibit the use of automated tools and bots.

Any attempt to bypass CAPTCHAs without explicit permission may violate these agreements and could have legal consequences. Unauthorized access to CAPTCHA-protected content can seriously compromise and disrupt the security and integrity of online platforms.

Conclusion

This guide's purpose was to equip developers with essential strategies for tackling CAPTCHA challenges in NodeJS. By utilizing a mix of open-source tools and paid services, developers can choose their preferred way to approach solving CAPTCHAs.

As previously mentioned, solving CAPTCHAs must be done in accordance with legal and ethical standards. Legal documents such as terms of service must be respected, and any automated process in that context must remain compliant and ethical.

By following these principles, solving CAPTCHAs in NodeJS should become a breeze, assuming one maintains integrity and responsibility.

More NodeJS Web Scraping Guides

If you want more insights into the world of web scraping, we've got you covered! Check out our extensive The NodeJs Web Scraping Playbook or dive deeper into the different techniques of web scraping by exploring the following articles:

TLDR: Solving CAPTCHAs Efficiently with NodeJS
Understanding CAPTCHAs and Their Challenges
How to Solve CAPTCHAs?
- Free & Open Source CAPTCHA Solving Libraries
- 3rd Party CAPTCHA Solving Services
Solving Text CAPTCHAs With NodeJS
- Solving Text CAPTCHAs With Open Source Solution
- Solving Text CAPTCHAs With Paid Service
Solving reCAPTCHAs With NodeJS
- Solving reCAPTCHAs With Open Source Solution
- Solving reCAPTCHAs With Paid Service
Solving hCAPTCHAs With NodeJS
- Solving hCAPTCHAs With Paid Service
Solving Audio CAPTCHAs With NodeJS
- Solving Audio CAPTCHAs With Paid Service
How To Avoid Triggering CAPTCHAs
- Ways to avoid triggering CAPTCHAs
How To Avoid CAPTCHAs Using ScrapeOps Proxy Aggregator
- Example of Using Proxy Aggregator on an Amazon CAPTCHA Page
The Legal and Ethical Implications of Bypassing CAPTCHAs
Conclusion
More NodeJS Web Scraping Guides

How To Solve CAPTCHAs with NodeJS

Need help scraping the web?

TLDR: Solving CAPTCHAs Efficiently with NodeJS​

Understanding CAPTCHAs and Their Challenges​

What are CAPTCHAs?​

Can CAPTCHAs Be Bypassed?​

Approaches to Dealing With CAPTCHAs​

1. Avoidance​

2. DIY Custom CAPTCHA-Solving Solutions​

3. OCR​

4. Existing Services​

How to Solve CAPTCHAs?​

Free & Open Source CAPTCHA Solving Libraries​

Tesseract OCR​

3rd Party CAPTCHA Solving Services​

Solving Text CAPTCHAs With NodeJS​

Solving Text CAPTCHAs With Open Source Solution​

Solving Text CAPTCHAs With Paid Service​

Solving reCAPTCHAs With NodeJS​

Solving reCAPTCHAs With Open Source Solution​

Solving reCAPTCHAs With Paid Service​

Solving hCAPTCHAs With NodeJS​

Solving hCAPTCHAs With Paid Service​

Solving Audio CAPTCHAs With NodeJS​

Solving Audio CAPTCHAs With Paid Service​

How To Avoid Triggering CAPTCHAs​

Ways to avoid triggering CAPTCHAs​

How To Avoid CAPTCHAs Using ScrapeOps Proxy Aggregator​

Example of Using Proxy Aggregator on an Amazon CAPTCHA Page​

The Legal and Ethical Implications of Bypassing CAPTCHAs​

Conclusion​

More NodeJS Web Scraping Guides​