How to Download Images with Node.js

In an increasingly data-driven world, images play a vital role across diverse fields, from digital marketing and social media to machine learning and computer vision. Downloading images programmatically offers numerous advantages, including automation, speed, and scalability.

By using code to handle image downloads, we can manage tasks that would be time-consuming or repetitive if done manually, such as saving thousands of images from a dataset, updating an image library, or pulling photos from an online API.

In this guide, we'll learn how to download images with NodeJS.

TLDR: How to Download Images with NodeJS
Choosing the Right Tool
Implementing Image Download with Axios
Implementing Image Download with Node-Fetch
Implementing Image Download with Request
Using Native HTTP/HTTPS Modules
Handling Errors and Retries in Downloading Images
Advanced Techniques
Case Study - Downloading Images from Unsplash
Conclusion
More Web Scraping Guides

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.

Proxy Manager

Scraper Monitoring

Job Scheduling

TLDR: How to Download Images with NodeJS

If you want to download images but don't have time for the full tutorial, you can use the script below.

This script uses axios to fetch Unsplash search results, cheerio to parse HTML, and downloads images concurrently using Promise.all for efficiency.

It filters out duplicate URLs and saves images as .jpg files in a local directory, automating the process of downloading high-quality images based on a search term.

To get started, make sure you have Node.js installed and set up on your machine. Install the required dependencies (axios and cheerio) by running:

npm install axios cheerio

in your project directory.

Once set up, simply run the script, and it will automatically scrape and download images based on your specified search term.

const axios = require('axios');
const cheerio = require('cheerio');
const fs = require('fs');
const path = require('path');

const searchTerm = 'nature';
const numberOfImages = 20;
const saveDirectory = path.resolve(__dirname, 'images');
const downloadedUrls = new Set(); // To keep track of downloaded images and avoid duplicates

// Helper function to add a delay
const delay = (ms) => new Promise(resolve => setTimeout(resolve, ms));

// Function to download the image
async function downloadImage(url, filename) {
  const savePath = path.join(saveDirectory, filename);
  try {
    // Fetch image data as an arraybuffer with axios
    const response = await axios.get(url, { responseType: 'arraybuffer' });
    fs.writeFileSync(savePath, response.data); // Save the image buffer directly
    console.log(`Image saved as ${filename}`);
  } catch (error) {
    console.error('Error downloading the image:', error.message);
  }
}

// Main scraping function (without pagination)
async function scrapeAndDownloadImages() {
  try {
    // Fetch the search page HTML
    const response = await axios.get(`https://unsplash.com/s/photos/${searchTerm}`);
    const html = response.data;
    const $ = cheerio.load(html);

    const imageUrls = [];

    // Only select <img> elements with itemprop="thumbnailUrl"
    $('img[itemprop="thumbnailUrl"]').each((i, element) => {
      if (imageUrls.length >= numberOfImages) return false;

      // Try to get full-size images from the srcset or data-src attributes
      const srcSet = $(element).attr('srcset');
      const dataSrc = $(element).attr('data-src');

      // Check if the srcset or data-src contains valid image paths
      if (srcSet) {
        const urls = srcSet.split(',').map(item => item.trim().split(' ')[0]);
        const largestImageUrl = urls[urls.length - 1]; // Get the highest quality URL
        if (largestImageUrl && !downloadedUrls.has(largestImageUrl)) {
          imageUrls.push(largestImageUrl);
          downloadedUrls.add(largestImageUrl);
        }
      } else if (dataSrc && !downloadedUrls.has(dataSrc)) {
        imageUrls.push(dataSrc);
        downloadedUrls.add(dataSrc);
      }
    });

    console.log(`Image URLs:`, imageUrls);

    // Download each unique image
    await Promise.all(
      imageUrls.map((url, index) => downloadImage(url, `image${index + 1}.jpg`))
    );
    console.log(`Images downloaded successfully!`);
  } catch (error) {
    console.error('Error during scraping:', error.message);
  }
}

// Main function to start scraping and downloading
async function main() {
  if (!fs.existsSync(saveDirectory)) {
    fs.mkdirSync(saveDirectory);
  }

  await scrapeAndDownloadImages();
}

main();

If you'd like to use this script to download images from a different website, you'll need to adjust the way the script scrapes image URLs.

Start by inspecting the HTML structure of the new site to identify the correct attributes or tags containing the image URLs (e.g., srcset, data-src, or src). Update the scrapeAndDownloadImages function to target those specific attributes or tags.

Additionally, make sure the URL format is compatible with the rest of the script’s logic, and you're good to go! You can also update the search term or URL structure as needed based on the new site’s layout.

Choosing the Right Tool

When it comes to downloading images using Node.js, there are several popular libraries available: Axios, Node-fetch, Request, and the native HTTP/HTTPS modules.

Below, we’ll compare these tools based on their ease of use, performance, community support, and suitability for different use cases.

1. Axios

Axios is a promise-based HTTP client with a clean, simple API. It’s ideal for developers looking for a solution that minimizes setup time and abstracts away the complexities of handling HTTP requests. Axios handles JSON parsing automatically, provides built-in support for request and response interceptors, and makes working with asynchronous code easier through promises.

It performs well in most cases but introduces some overhead compared to native HTTP/HTTPS modules due to the additional features it provides. Axios has a large, active community, ensuring regular updates and extensive documentation.

Axios is best suited for developers who want a reliable, feature-rich library with minimal effort and prefer handling HTTP requests without diving deep into the intricacies of the Node.js HTTP/HTTPS modules.

Pros:
- Promise-based, making it easy to work with asynchronous code.
- Built-in JSON parsing.
- Request/response interceptors for global error handling.
- Active community with plenty of documentation.
Cons:
- Adds some overhead in terms of bundle size.
- Lacks native streaming support for large files.
- Doesn’t provide the fine-grained control that the native HTTP/HTTPS modules offer.

2. Node-fetch

Node-fetch provides a minimalistic and lightweight solution that mimics the fetch API from browsers. It’s easy to integrate and requires minimal setup for making HTTP requests.

Its performance is excellent for most use cases, especially when handling large image downloads through streaming. Node-fetch has a smaller community compared to Axios, but it’s still widely used and well-maintained. This library is best suited for projects where simplicity, small bundle size, and support for streams are important, such as when downloading large files like images.

Pros:
- Lightweight with a minimal footprint.
- Supports streaming for large files.
- Uses the same fetch API syntax found in modern browsers.
Cons:
- Doesn’t handle JSON parsing automatically, requiring additional steps for APIs.
- Lacks advanced features like interceptors.
- Smaller community compared to Axios.

3. Request

Request was known for its simplicity and ease of use, making it very beginner-friendly for HTTP requests. However, it has been deprecated and is no longer actively maintained, making it unsuitable for new projects. Performance-wise, it doesn’t offer the same optimizations as more modern libraries like Axios or Node-fetch, especially when handling concurrent requests or large image downloads.

While Request is simple to use, the lack of active development and security updates makes it a poor choice for any new projects. If you're working with legacy code or maintaining an old project that already uses Request, it may still serve the purpose, but it’s best to avoid it for future development.

Pros:
- Extremely simple and user-friendly.
- Rich feature set for common HTTP operations (e.g., redirects, cookies).
Cons:
- Deprecated and no longer maintained.
- Larger bundle size compared to Node-fetch and Axios.
- Not suited for modern asynchronous workflows or high-performance applications.

4. Native HTTP/HTTPS modules

The native HTTP/HTTPS modules are built into Node.js, offering the best performance since they don’t require any external dependencies. However, the API is more complex compared to third-party libraries like Axios or Node-fetch, requiring more setup for tasks like handling errors, parsing JSON, or managing redirects.

These modules are best for developers who need total control over HTTP requests and performance. They excel in scenarios where performance is critical, such as high-performance applications or when handling large file downloads with minimal memory usage. While the native modules are well-documented and reliable, they lack the simplicity and features offered by other libraries.

Pros:
- Built-in, no external dependencies required.
- Maximum performance and flexibility.
- Supports streaming and fine-grained control over HTTP requests.
Cons:
- Requires manual handling of many aspects (e.g., headers, JSON parsing, errors).
- More complex API, especially for developers used to simpler libraries.
- Lack of native promise support, though this can be worked around with async/await or third-party wrappers.

Guidance on Selecting the Best Tool

If you're looking for simplicity and robust features, such as automatic JSON handling, request/response interceptors, and easier error management, Axios is the best choice. It is ideal for projects where you need quick, reliable HTTP requests with minimal setup.
If you're working on a lightweight project and want something that mimics the browser's fetch API with a minimal footprint, Node-fetch is a great option. It’s also suitable for streaming large files, which makes it perfect for downloading images efficiently.
If you're working on an existing project that already uses Request, or if you’re dealing with legacy code, you can continue using Request, but be aware that it is deprecated and not recommended for new projects.
If you want to avoid additional dependencies and are comfortable handling HTTP requests manually, or need fine control over performance and streaming, the native HTTP/HTTPS modules offer the best performance and flexibility.

Implementing Image Download with Axios

Setting Up Axios

Axios is a popular, promise-based HTTP client that simplifies sending HTTP requests and handling responses in Node.js. It’s widely used due to its simplicity, built-in features like automatic JSON parsing, and support for modern JavaScript features like async/await. Axios makes working with HTTP requests more intuitive and efficient, especially when dealing with APIs or downloading files such as images.

To get started with Axios, the first step is to install it in your project. Open your terminal and run the following command:

npm install axios

Once installed, you can easily import it into your script:

const axios = require('axios'); 

In our script, we’re using Axios to fetch an image and save it locally. Axios is configured to send a GET request to the provided image URL. By setting the responseType to 'arraybuffer', Axios ensures that the image data is returned as a binary buffer, which is suitable for saving images.

Here's a breakdown of how Axios works in the script:

Fetching the Image: axios.get(url, { responseType: 'arraybuffer' }) sends the HTTP request to the image URL and ensures the response is an array buffer (the raw image data).
Saving the Image: Once the image data is received, it is saved using Node's fs.writeFileSync() method.

This simple setup showcases how easy it is to fetch and handle binary data with Axios. The promise-based architecture allows you to handle asynchronous code using async/await, making the code more readable and easier to manage.

By using Axios, you can effortlessly make HTTP requests to download images, interact with APIs, or handle complex HTTP features like timeouts, interceptors, and custom headers, all while keeping your code simple and clean.

Download an Image with Axios

In this section, we will guide you through creating a simple script that uses Axios to download an image. This will give you a clear understanding of how Axios handles HTTP requests, along with how to manage errors that might occur during the download process.

**Step 1: Set Up Your Script

Now, let's set up the basic structure of our script. We'll need to import the necessary libraries, define the folder to save images, and set up the image download function.

const axios = require('axios');
const fs = require('fs');
const path = require('path');

const saveDirectory = path.resolve(__dirname, 'images');

In the code above:

Axios is used for sending HTTP requests.
fs is used to interact with the file system (save the image).
path is used to handle file paths and ensure we’re saving images in the correct directory.

Step 2: Create the downloadImage Function

Next, we’ll write a function that handles the image download. This function will:

Fetch the image from the URL.
Save the image to the specified directory.

async function downloadImage(url, filename) {
  const savePath = path.join(saveDirectory, filename);
  try {
    // Fetch image data as an arraybuffer with axios
    const response = await axios.get(url, { responseType: 'arraybuffer' });
    fs.writeFileSync(savePath, response.data); // Save the image buffer to a file
    console.log(`Image saved as ${filename}`);
  } catch (error) {
    console.error('Error downloading the image:', error.message);
  }
}

axios.get(url, { responseType: 'arraybuffer' }): Sends a GET request to fetch the image. The arraybuffer response type ensures that binary data (the image) is returned correctly.
fs.writeFileSync(savePath, response.data): Saves the fetched image data to a local file. The response.data contains the image data as a buffer, which is written to the file system.

Step 3: Ensure the Save Directory Exists

Before downloading the image, we need to check if the directory where we want to save the image exists. If it doesn't exist, we create it.

// Check if the directory exists, if not, create it
if (!fs.existsSync(saveDirectory)) {
  fs.mkdirSync(saveDirectory, { recursive: true });
}

fs.existsSync(saveDirectory): Checks if the directory already exists.
fs.mkdirSync(saveDirectory, { recursive: true }): Creates the directory, including any necessary parent directories, if it doesn’t already exist.

Step 4: Download the Image

Finally, we'll call the downloadImage function inside the main() function. In this example, we'll use a sample image URL.

async function main() {
  const imageUrl = 'https://example.com/image.jpg'; // Replace with a valid image URL
  downloadImage(imageUrl, 'downloaded_image.jpg');
}

main();

downloadImage(imageUrl, 'downloaded_image.jpg'): This line downloads the image from the specified URL and saves it as downloaded_image.jpg in the images directory.

Step 5: Handling Errors and Exceptions

While downloading the image, several issues may arise, such as:

Network issues.
Invalid URLs.
File system issues (e.g., if the save path is invalid).

In our script, we handle errors using a try-catch block:

try {
  const response = await axios.get(url, { responseType: 'arraybuffer' });
  fs.writeFileSync(savePath, response.data); 
} catch (error) {
  console.error('Error downloading the image:', error.message);
}

If any error occurs during the download (whether from Axios or the file system), the error message will be caught and printed to the console.

Step 6: Running the Script

To run the script, save it to a file (e.g., downloadImage.js) and execute it using Node.js:

node downloadImage.js

This will download the image from the specified URL and save it to the images directory.

This simple script demonstrates how to download an image using Axios in Node.js. The key steps include:

Setting up Axios and the file system.
Downloading the image from a URL.
Handling errors if anything goes wrong during the download.

By following these steps, you can easily adapt the script to download images from any URL and handle potential issues efficiently.

Saving the Image to the File System

Once you've successfully downloaded an image using Axios, the next step is saving it to the local file system. In this section, we'll show you how to save the image to a specified directory and discuss some common issues you may encounter along the way, such as naming conflicts, handling different image formats, and ensuring the image is properly saved.

Step 1: Saving the Image

In our script, we used the fs (file system) module to save the image. Here's the key part of the code that handles saving the image to the local disk:

fs.writeFileSync(savePath, response.data); // Save the image buffer to a file

savePath: The full path where the image will be saved, including the directory and filename.
response.data: The binary image data fetched from the URL. Axios returns the image data as a buffer when the responseType is set to 'arraybuffer'.

This command writes the image buffer to the file system synchronously. If the directory and file path are valid, the image will be saved correctly.

Step 2: Handling File Naming

File naming can often be an issue, especially if the image's name already exists in the save directory or if you're downloading multiple images with similar names. Here are a few strategies to handle file naming:

Automatic Filename Assignment: You can programmatically generate a unique filename for each image. For example, adding a timestamp or an incremental number to the filename ensures that each file has a unique name:

const timestamp = Date.now(); // Generates a unique timestamp
const filename = `image_${timestamp}.jpg`; // Example: image_1628182728273.jpg

Handling File Overwrites: If you're downloading multiple images to the same folder, make sure your script handles file overwrites. A simple solution is to check if the file already exists and rename it if necessary:

let filename = 'image.jpg';
let savePath = path.join(saveDirectory, filename);

// Check if file exists, then modify the filename
let counter = 1;
while (fs.existsSync(savePath)) {
  filename = `image_${counter}.jpg`;
  savePath = path.join(saveDirectory, filename);
  counter++;
}

This checks if the file exists and, if so, adds a number to the filename to prevent overwriting.

Step 3: Handling Different Image Formats

When downloading images, you may encounter various image formats like PNG, JPEG, GIF, or WebP. To handle different formats, you should ensure that the file extension is correctly assigned based on the image format.

If the image's format is part of the URL (e.g., image.jpg, image.png), you can extract the file extension from the URL:

const fileExtension = url.split('.').pop(); // Extract the file extension (e.g., "jpg")
const filename = `downloaded_image.${fileExtension}`;

Alternatively, if the format is not obvious from the URL, you can use the Content-Type header from the response to determine the format:

const contentType = response.headers['content-type'];
let fileExtension;

if (contentType.includes('jpeg')) {
  fileExtension = 'jpg';
} else if (contentType.includes('png')) {
  fileExtension = 'png';
} else if (contentType.includes('gif')) {
  fileExtension = 'gif';
} else {
  fileExtension = 'jpg'; // Default to JPEG
}

const filename = `downloaded_image.${fileExtension}`;

This checks the Content-Type header to detect the format of the image and assigns the correct extension.

Step 4: Common Issues and Solutions

Here are some common issues you might face when saving images:

Directory Doesn’t Exist: If the directory where you want to save the image doesn’t exist, you’ll need to create it first. We’ve already covered how to ensure the directory exists using fs.existsSync() and fs.mkdirSync().
Permissions Issues: If you don’t have permission to write to the directory, the fs.writeFileSync() method will fail. Ensure that your script is running with the necessary permissions or choose a directory where your user has write access.
Incorrect File Formats: Sometimes, the content retrieved might not be an image, even if the URL suggests it is. Always check the file’s content type (as shown earlier with the Content-Type header) to make sure you're saving the right data.
File Size: Large images may take longer to download or may cause memory issues. For very large files, consider using streams instead of downloading the entire file into memory. Streams allow you to handle large files more efficiently without consuming too much memory.

Step 5: Verifying the Image

Once the image is saved, you can verify that the file has been saved correctly by checking the directory or opening the file manually.

To confirm the save path, you can log the path to the console:

console.log(`Image saved at ${savePath}`);

If everything is set up correctly, you should see the image saved at the specified location.

Saving images to the file system is a crucial part of the image downloading process. By using fs.writeFileSync(), you can store images locally, ensuring proper file naming, handling different formats, and preventing overwrites.

As always, it's important to consider potential issues like directory permissions, file types, and size when saving images. With these best practices, you can efficiently manage downloaded images in your Node.js application.

Step 6: Final Run

Here is our final script that will download an image and save in onto disk:

const axios = require('axios');
const fs = require('fs');
const path = require('path');

const saveDirectory = path.resolve(__dirname, 'images');

// Function to download image
async function downloadImage(url, filename) {
  const savePath = path.join(saveDirectory, filename);
  try {
    // Fetch image data as an arraybuffer with axios
    const response = await axios.get(url, { responseType: 'arraybuffer' });
    fs.writeFileSync(savePath, response.data); // Save the image buffer directly
    console.log(`Image saved as ${filename}`);
  } catch (error) {
    console.error('Error downloading the image:', error.message);
  }
}

// Main function to start scraping and downloading
async function main() {
  // Check if the directory exists, if not, create it
  if (!fs.existsSync(saveDirectory)) {
    fs.mkdirSync(saveDirectory, { recursive: true });
  }

  // Download the image with a valid URL
  const imageUrl = "https://www.example.com/image.png"; // Replace with a valid image URL
  downloadImage(imageUrl, "picture.jpg");
}

main();

The result is a successfully downloaded image.

Implementing Image Download with Node-Fetch

Setting up Node-Fetch

Node-Fetch is a lightweight library for making HTTP requests in Node.js, modeled after the browser's fetch() function. It provides a simple and efficient way to interact with APIs, fetch data, and download files. Node-Fetch is ideal for server-side applications where making HTTP requests is necessary, such as web scraping or consuming APIs.

Benefits of Node-Fetch

Promise-based API: Node-Fetch uses promises, which makes it compatible with modern async/await syntax.
Lightweight: It has a small footprint, making it efficient for smaller applications or those requiring minimal dependencies.
Supports modern features: Features like arrayBuffer(), streaming, and more are available for handling binary data or large files.

To start using Node-Fetch in your project, follow these steps:

Install Node-Fetch

Run the following command to install Node-Fetch:

npm install node-fetch

Import Node-Fetch

Since we are using ES Modules (ESM) in the example, you need to import Node-Fetch like this:

import fetch from 'node-fetch';

If you are using CommonJS, the syntax is:

const fetch = require('node-fetch');

Setting Up ES Modules for Node-Fetch

To use node-fetch v3.x properly, you need to ensure your project is set up for ES modules. This is done by adding "type": "module" to your package.json and using the import syntax (e.g., import fetch from 'node-fetch';).

Additionally, make sure your Node.js version is 12.20.0 or higher to support ES modules. With these steps, your code will work seamlessly with node-fetch.

Download an Image

Step 1.Downloading an Image with Node-Fetch

To download an image with node-fetch, we use an asynchronous function that fetches the image from a URL. The function downloadImage(url, filename) is designed for this purpose. The image is fetched using the fetch() method, which retrieves the image data as an ArrayBuffer. This is then converted into a Buffer and saved to the local file system using fs.writeFileSync().

const response = await fetch(url);
const arrayBuffer = await response.arrayBuffer();  // Fetching the image data
const buffer = Buffer.from(arrayBuffer);  // Converting ArrayBuffer to Buffer
fs.writeFileSync(savePath, buffer);  // Saving the image buffer to disk 

Step 2. Handling HTTP Status Codes

When fetching an image or any resource, it's essential to handle different HTTP status codes that the server might return. We use response.ok to check if the fetch was successful. If the status code indicates an error (i.e., the response is not OK), an error is thrown with a message that includes the statusText of the response.

if (!response.ok) throw new Error(`Failed to fetch image: ${response.statusText}`);

response.ok: This property is true for status codes in the range 200–299, indicating success.
Error Handling: If the fetch fails (e.g., 404 Not Found, 500 Internal Server Error), an error is thrown with the message detailing what went wrong.

Step 3. Handling Errors and Exceptions

Proper error handling ensures that if something goes wrong during the download process, it is caught and logged appropriately. We can use a try...catch block to handle any errors that might arise while fetching or saving the image. The error message is logged to the console, providing valuable debugging information.

try {
  const response = await fetch(url);
  // Further code...
} catch (error) {
  console.error('Error downloading the image:', error.message);
} 

try...catch block: Used to capture errors during the fetch operation or while working with the file system.
error.message: The error message provides details about the specific problem, whether it's a network issue, invalid URL, or file system issue.

Step 4. Ensuring the Directory Exists

Before saving the image, we need to ensure that the directory where the image will be saved exists. If it doesn't exist, we create it using fs.mkdirSync() with the { recursive: true } option to allow nested directory creation. This is done before attempting to download and save the image.

if (!fs.existsSync(saveDirectory)) {
  fs.mkdirSync(saveDirectory, { recursive: true });
} 

fs.existsSync(): Checks if the directory already exists.
fs.mkdirSync(): Creates the directory if it doesn't exist.

Saving the Image to the File System

Once we’ve successfully fetched the image from a URL using node-fetch, the next step is saving it to the local file system. In our code, the saving process involves writing the image data to a file using fs.writeFileSync().

Let's explore this step further and address some potential challenges.

Step 1. Saving the Image with fs.writeFileSync()

The image is saved using fs.writeFileSync(), which writes the image buffer to the file system. The savePath variable is used to define the location and filename where the image will be stored.

fs.writeFileSync(savePath, buffer);  // Save the image buffer directly

fs.writeFileSync(): This function synchronously writes data (in this case, the image buffer) to the file at the specified path. It is synchronous, meaning the script will pause at this line until the image is fully saved before continuing.

Step 2. Handling File Paths

One potential challenge when saving files is ensuring that the file path is valid, especially when working with dynamic paths (like saving the image to a specific directory).

In your code, the saveDirectory variable is used to define the folder where the images will be stored, and the filename is used to name the image file. The path.join() function combines these two to create the full path where the image will be saved.

const savePath = path.join(saveDirectory, filename);

path.join(): This method ensures the correct handling of file paths across different operating systems. It takes care of platform-specific differences in file separators (e.g., / for UNIX-based systems and \ for Windows).
__dirname: This is used to get the directory of the current module, ensuring that the paths are relative to the script’s location.

Step 3. File System Permissions

A potential challenge when saving files is ensuring that the script has the appropriate file system permissions. If the user doesn’t have write access to the target directory, the script will throw an error when attempting to write the image.

File Permissions: On some systems, you may encounter "permission denied" errors if the script doesn't have permission to write to the specified directory. This is common when writing to restricted locations like system directories or directories with limited permissions.
Solution: Ensure that the target directory is writable, or choose a directory where the script has permission to write. In our code, we create the target directory if it doesn't exist, which is a good practice to avoid these errors.

if (!fs.existsSync(saveDirectory)) {
  fs.mkdirSync(saveDirectory, { recursive: true });
} 

fs.mkdirSync(): This method ensures that the directory exists before attempting to write to it. The { recursive: true } option allows the creation of nested directories if they don't exist, making it easier to manage complex directory structures.

Step 4. Challenges with File Naming

Another challenge is managing file naming, especially if you’re saving multiple images or if the URL doesn’t directly provide a meaningful filename (e.g., a random string or a generic name). The filename is provided explicitly when calling the downloadImage() function:

await downloadImage(imageUrl, "picture.jpg");

Naming Conflicts: If a file with the same name already exists, fs.writeFileSync() will overwrite it without warning. If you want to avoid overwriting existing files, you can add logic to check if the file already exists and generate a unique filename.
Dynamic Filenames: If you need more descriptive or dynamic filenames (e.g., based on the URL or the timestamp), you can extract the filename from the URL or use a timestamp to ensure uniqueness.

Step 5. Final Run

Now let's run our Node-Fetch script:

import fetch from 'node-fetch';  // Use ESM import
import fs from 'fs';
import path from 'path';
import { fileURLToPath } from 'url';

// Get __dirname in ESM
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

const saveDirectory = path.resolve(__dirname, 'images');

// Function to download image
async function downloadImage(url, filename) {
  const savePath = path.join(saveDirectory, filename);
  try {
    // Fetch image data as an arrayBuffer with node-fetch
    const response = await fetch(url);
    if (!response.ok) throw new Error(`Failed to fetch image: ${response.statusText}`);

    const arrayBuffer = await response.arrayBuffer();  // Use arrayBuffer() instead of buffer()
    const buffer = Buffer.from(arrayBuffer);  // Convert ArrayBuffer to Buffer
    fs.writeFileSync(savePath, buffer);  // Save the image buffer directly
    console.log(`Image saved as ${filename}`);
  } catch (error) {
    console.error('Error downloading the image:', error.message);
  }
}

// Main function to start downloading the image
async function main() {
  // Check if the directory exists, if not, create it
  if (!fs.existsSync(saveDirectory)) {
    fs.mkdirSync(saveDirectory, { recursive: true });
  }

  // Download the image with a valid URL
  const imageUrl = "https://www.example.com/image.png"; // Replace with a valid image URL
  await downloadImage(imageUrl, "picture.jpg");
}

main();

The expected result is a downloaded image!

Summary

In this section, we wrote a simple script to download an image using node-fetch. The script handles different HTTP statuses by checking if the response is successful and throwing an error if it's not.

Additionally, it includes error handling for network issues or file system errors and ensures that the directory where the image will be saved exists before attempting to download it.

Saving the image to the file system involves using fs.writeFileSync() to write the image buffer to a file. Key considerations include handling file paths correctly with path.join() and ensuring the target directory exists using fs.mkdirSync().

Additionally, you must account for file system permissions and potential naming conflicts to ensure a smooth saving process.

Implementing Image Download with Request

Setting up Request

request is a widely-used HTTP client library in the Node.js ecosystem, designed for making HTTP requests to remote servers with a simple API. It supports a variety of HTTP methods, such as GET, POST, PUT, and DELETE, making it versatile for a range of web scraping, API interaction, and web automation tasks.

The library provides an intuitive and easy-to-use interface, which is why it has been one of the most popular HTTP request libraries in the Node.js community.

However, as of 2020, the request library is now officially deprecated in favor of more modern and lightweight libraries like node-fetch, axios, and got. While the request library still works for many applications, it’s generally recommended to consider transitioning to more actively maintained alternatives.

Why Use Request?

Despite its deprecation, request is still favored in legacy applications and by developers who prioritize simplicity. It allows you to:

Send HTTP requests with minimal configuration.
Automatically handle request retries and manage timeouts.
Easily handle both synchronous and asynchronous requests.
Stream data (e.g., download files) without worrying about manual handling.

To use the request library in your Node.js project, follow these steps:

Step 1. Install the Request library

First, you need to install the request library using npm. To do this, run the following command in your terminal:

npm install request

This will download and install the library, making it available for use in your Node.js project.

Step 2. Import the Request Library

In your Node.js code, import the request module. If you are using ES6 modules (which is the default in modern Node.js), you can import it like this:

import request from 'request';

Alternatively, if you are using CommonJS modules (the older format), you can import it like this:

const request = require('request');

Note: As of today, the request library supports both CommonJS and ES6 imports, but it's important to remember that it’s no longer maintained.

Step 3. Check for Possible Alternatives

Although request works well for many applications, it's worth mentioning that it is no longer maintained. You might want to explore other alternatives that provide similar functionality, such as node-fetch, axios, or got. These libraries offer more modern features and better performance.

If you are starting a new project, you may want to consider using one of these alternatives instead of request.

Example Setup

Here’s how your basic setup with request looks:

import request from 'request';
const imageUrl = 'https://example.com/image.jpg';
const savePath = './image.jpg';

request.get({ url: imageUrl, encoding: null }, (error, response, body) => {
  if (error) {
    console.error('Error downloading the image:', error.message);
    return;
  }
  if (response.statusCode !== 200) {
    console.error(`Failed to fetch image: ${response.statusCode}`);
    return;
  }
  fs.writeFileSync(savePath, body);
  console.log('Image saved!');
});

This example demonstrates how to make a simple GET request using request to fetch an image and save it to a local file.

Download an Image

In this section, we'll guide you through writing a Node.js script using the request library to download an image from a URL and save it to your local file system.

Step 1. Setting up the Project Structure

At the beginning of the script, we set up a directory called images to store the downloaded image. This is done by using the path and fs modules. If the directory doesn't already exist, it is created using fs.mkdirSync(). Here's how it's set up:

const saveDirectory = path.resolve(__dirname, 'images');

// Check if the directory exists, if not, create it
if (!fs.existsSync(saveDirectory)) {
  fs.mkdirSync(saveDirectory, { recursive: true });
}

The saveDirectory variable holds the path to the images directory, and the code ensures it's created before any image is downloaded.

Step 2. Writing the Function to Download the Image

The core function that downloads the image is downloadImage(url, filename). Here's a breakdown of the process:

Make a request to download the image:

Inside the downloadImage function, we use request.get() to send a GET request to the image URL. We specify { encoding: null } to receive the image data as a raw buffer. This ensures that binary data (like images) can be handled correctly:

request.get({ url, encoding: null }, (error, response, body) => {
      if (error) {
        console.error('Error downloading the image:', error.message);
        return;
      }
      
      if (response.statusCode !== 200) {
        console.error(`Failed to fetch image: ${response.statusCode}`);
        return;
      }
      
      // Save the image data to a file
      fs.writeFileSync(savePath, body);  // Save the image buffer directly
      console.log(`Image saved as ${filename}`);
});

Handling errors and HTTP status codes:
- If there is an error during the request (such as network issues), the error is logged using console.error().
- If the server responds with a status code other than 200 OK, the script will log an error and stop. This ensures that you only proceed with valid images:

if (error) {
  console.error('Error downloading the image:', error.message);
  return;
}

if (response.statusCode !== 200) {
  console.error(`Failed to fetch image: ${response.statusCode}`);
  return;
}

Saving the image:

After a successful download, the image is saved to the specified path using fs.writeFileSync(). The body of the response (which is the image data) is written directly to a file. The filename is passed as an argument to the downloadImage function:

fs.writeFileSync(savePath, body);  // Save the image buffer directly
console.log(`Image saved as ${filename}`);

Step 3. Optimizing the Download Process

While the above code works well for smaller images, handling larger image files efficiently is important. Here are a couple of ways to optimize the process:

Stream the image data: For large files, it's better to stream the data instead of loading it all into memory. You can do this with request by using the .pipe() method to directly stream the image data to a file:

request(url)
  .pipe(fs.createWriteStream(savePath))
  .on('close', () => console.log('Image saved successfully.'));

This reduces memory usage, especially for large files, as it avoids loading the entire image into memory before saving.

Set a timeout: It’s a good practice to set a timeout to avoid hanging requests, especially if the server is slow or the network is unstable. You can add a timeout option to the request to ensure the download doesn't hang indefinitely:

request.get({ url, encoding: null, timeout: 10000 }, (error, response, body) => {
      if (error) {
        console.error('Error downloading the image:', error.message);
        return;
      }
      
      if (response.statusCode !== 200) {
        console.error(`Failed to fetch image: ${response.statusCode}`);
        return;
      }
      
      fs.writeFileSync(savePath, body);  // Save the image buffer directly
      console.log(`Image saved as ${filename}`);
});

The timeout option ensures that the download will be aborted if it takes longer than 10 seconds (or your specified time).

Step 4. Conclusion

By following these steps, you now have a working Node.js script that can download an image from a given URL and save it to your local system. The script handles errors and checks the HTTP status codes, ensuring that only valid images are saved.

For larger images, you can optimize the process by streaming the data and setting timeouts to prevent long delays.

Saving the Image to the File System

In this section, we’ll walk through how to save the downloaded image to the file system. We’ll also cover how to handle potential issues, such as file overwriting and managing directories where images will be saved.

Step 1. Setting up the File Path

In both of our image download scripts (using node-fetch and request), we specify the directory where the image will be saved. In this case, we’re saving the image to an images directory within the current working directory. Here's how we define the path:

const saveDirectory = path.resolve(__dirname, 'images');

This ensures that the images folder will be created in the same directory as the script. If it doesn’t already exist, we use fs.mkdirSync() to create the directory before downloading the image.

Step 2. Handling Directory Creation

Before saving an image, we check whether the images directory exists. If not, we create it using fs.mkdirSync(). This is done with the following code:

// Check if the directory exists, if not, create it
if (!fs.existsSync(saveDirectory)) {
  fs.mkdirSync(saveDirectory, { recursive: true });
}

The { recursive: true } option ensures that if the parent directories don’t exist, they will also be created. This is useful when dealing with nested directories.

Step 3. Managing File Overwriting

When saving the image to the file system, we must consider whether the file already exists in the target directory. In our current setup, if an image with the same filename exists, it will be overwritten by default.

If you want to avoid overwriting files, you could implement a check to see if the file already exists. Here's a simple way to do this:

const savePath = path.join(saveDirectory, filename);
if (fs.existsSync(savePath)) {
  console.log('File already exists, appending timestamp to filename...');
  const timestamp = Date.now();
  filename = `${timestamp}-${filename}`;
}

This checks whether the file already exists. If it does, it appends the current timestamp to the filename, ensuring that each download has a unique filename.

Step 4. Saving the Image

After determining the correct path and ensuring the directory exists, the image is saved using fs.writeFileSync() (for the request script) or fs.createWriteStream() (for the node-fetch script). Here's the relevant code for saving the image with request:

fs.writeFileSync(savePath, body);  // Save the image buffer directly
console.log(`Image saved as ${filename}`);

This writes the image data to the specified file path. If the file already exists, it will be overwritten unless you’ve added additional logic to handle file naming conflicts.

Step 5. Tips for Managing File Storage

File extensions: When saving images, make sure the filename includes the correct file extension (e.g., .jpg, .png). You can extract the file extension from the URL or allow the user to specify it.

Example (from the URL):

const fileExtension = path.extname(url);  // Extract file extension from URL
const savePath = path.join(saveDirectory, `image${fileExtension}`);

Limit file size: If you are working with a large number of images, consider implementing a file size limit to avoid filling up the disk space quickly. You can check the file size before downloading and only download images that are below a certain threshold.
Organize by date or category: For better organization, consider saving images in subdirectories based on the date or category. For instance, you could create a folder named by the current date (e.g., images/2024-11-10/) to save images by day.

const dateFolder = path.join(saveDirectory, new Date().toISOString().split('T')[0]);
if (!fs.existsSync(dateFolder)) {
  fs.mkdirSync(dateFolder, { recursive: true });
}
const savePath = path.join(dateFolder, filename);

This will create a folder for each day, making it easier to manage downloaded images over time.

Step 6. Final Run

Now we can run the Request script and see how it works in action:

import request from 'request';
import fs from 'fs';
import path from 'path';
import { fileURLToPath } from 'url';

// Get __dirname in ESM
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

const saveDirectory = path.resolve(__dirname, 'images');

// Function to download image
async function downloadImage(url, filename) {
  const savePath = path.join(saveDirectory, filename);
  try {
    // Fetch image data with request
    request.get({ url, encoding: null }, (error, response, body) => {
      if (error) {
        console.error('Error downloading the image:', error.message);
        return;
      }
      if (response.statusCode !== 200) {
        console.error(`Failed to fetch image: ${response.statusCode}`);
        return;
      }
      // Save the image data to a file
      fs.writeFileSync(savePath, body);  // Save the image buffer directly
      console.log(`Image saved as ${filename}`);
    });
  } catch (error) {
    console.error('Error downloading the image:', error.message);
  }
}

// Main function to start downloading the image
async function main() {
  // Check if the directory exists, if not, create it
  if (!fs.existsSync(saveDirectory)) {
    fs.mkdirSync(saveDirectory, { recursive: true });
  }

  // Download the image with a valid URL
  const imageUrl = "https://www.example.com/image.png"; // Replace with a valid image URL
  await downloadImage(imageUrl, "picture.jpg");
}

main();

Step 7. Conclusion

Saving images to the file system is a crucial part of image downloading scripts. By setting up directories, managing file overwriting, and organizing files effectively, you can ensure that your images are saved properly and stored in an efficient manner.

Using Native HTTP/HTTPS Modules

Setting Up Native Modules

Node.js's built-in http and https modules allow us to handle HTTP and HTTPS requests natively without relying on third-party libraries.

This can be particularly advantageous for lightweight applications or environments where minimizing dependencies is important. In this section, we'll explain the benefits of using these native modules and guide you through the process of using them to download an image.

Benefits of Using NodeJS's Built-In HTTP/HTTPS Modules

No External Dependencies: By using Node.js's native http and https modules, you avoid adding additional dependencies to your project. This keeps your project lightweight and reduces the complexity of managing external libraries.
Performance: Native modules are optimized for performance since they're built directly into Node.js. This results in fewer layers between your code and the system’s networking capabilities, making the HTTP requests more efficient.
Simplicity: For simple use cases, such as downloading a single image, the native modules provide everything you need. There's no need to install or learn an external library like axios or request if the functionality you need is already built into Node.js.
Stability: Since http and https are part of Node.js itself, they are stable and well-maintained, with frequent updates alongside the Node.js runtime. You don't have to worry about external libraries becoming deprecated or unsupported.

Step 1. Setting Up the HTTP/HTTPS Modules Without External Libraries

Setting up the http and https modules is straightforward as they come bundled with Node.js. Here’s how you can begin using them:

Import the Modules

The http and https modules are core modules in Node.js, meaning they don’t require any installation. Simply import them at the top of your script:

import https from 'https';
import http from 'http'; 

Choose the Appropriate Client

You will typically want to decide between http and https based on the URL you're working with. In our example, we check the URL to determine whether to use the https or http module:

const client = url.startsWith('https') ? https : http; 

Making a Request to Download the Image

Once the correct client is selected, you can use the .get() method to send a request to the server and receive the image data. The native get method accepts a URL and a callback function that will be called with the server's response.

For instance:

client.get(url, (response) => {
  // Handle the response
}); 

Handling the Data

The response will be streamed, meaning it comes in chunks. As the image data is received, we collect the chunks and combine them into a single buffer once the download is complete. This is done using the data and end events:

const arrayBuffers = [];
response.on('data', chunk => {
  arrayBuffers.push(chunk); // Collect chunks of the image
});
response.on('end', () => {
  const buffer = Buffer.concat(arrayBuffers); // Combine all chunks into one buffer
}); 

Saving the Image

Finally, after collecting the entire image data, we write it to the file system using fs.writeFileSync():

fs.writeFileSync(savePath, buffer);

Error Handling

Both the http and https modules provide error events that we can listen to in order to handle issues with the request. It's important to check the statusCode to ensure the request was successful and log any errors:

response.on('error', (error) => {
  console.error('Error downloading the image:', error.message);
}); 

By using the native http and https modules, we are able to download an image directly without relying on third-party libraries. This keeps the code simple and efficient, ensuring we’re working with a minimal setup.

Download an Image

In this section, we’ll walk through how to write a script that downloads an image using Node.js’s built-in http and https modules. These modules allow you to make HTTP/HTTPS requests directly without relying on external libraries. We'll also provide tips for optimizing the download process and handling larger image files efficiently.

Step 1: Import Required Modules

To get started, we first need to import the necessary modules. Since we are using ES modules, we import https, http, fs, and path like this:

import fs from 'fs';
import path from 'path';
import { fileURLToPath } from 'url';
import https from 'https';
import http from 'http'; 

https and http: These modules are used to make HTTP and HTTPS requests, respectively.
fs and path: These are used to handle file system operations and manage file paths.
fileURLToPath: A helper to get the __dirname equivalent in ES modules.

Step 2: Define the Save Directory

Next, we define the directory where we want to store the downloaded image. We also check if this directory exists and create it if necessary.

const saveDirectory = path.resolve(__dirname, 'images');

// Ensure the save directory exists
if (!fs.existsSync(saveDirectory)) {
  fs.mkdirSync(saveDirectory, { recursive: true });
} 

This ensures that we have a dedicated folder to store our downloaded images, and the mkdirSync function ensures the folder is created if it doesn't already exist.

Step 3: Writing the Download Function

Now, we write the function that will actually handle the download. The downloadImage function checks the protocol (either http or https) of the image URL, then uses the appropriate module to make the request.

function downloadImage(url, filename) {
  const savePath = path.join(saveDirectory, filename);
  const client = url.startsWith('https') ? https : http;  // Choose https or http based on URL

  // Fetch image data with HTTP/HTTPS
  client.get(url, (response) => {
    if (response.statusCode !== 200) {
      console.error(`Failed to fetch image: ${response.statusCode}`);
      return;
    }

    const arrayBuffers = [];
    response.on('data', chunk => {
      arrayBuffers.push(chunk); // Collect the chunks of the image
    });

    response.on('end', () => {
      const buffer = Buffer.concat(arrayBuffers); // Combine the chunks into a single buffer
      fs.writeFileSync(savePath, buffer);  // Save the image buffer directly
      console.log(`Image saved as ${filename}`);
    });

    response.on('error', (error) => {
      console.error('Error downloading the image:', error.message);
    });
  }).on('error', (error) => {
    console.error('Error with the request:', error.message);
  });
} 

Protocol Check: We determine whether the image URL uses HTTP or HTTPS and select the appropriate module (http or https).
Error Handling: If the request fails (e.g., non-200 status code or network errors), appropriate error messages are logged.
Streaming the Image: The image is downloaded in chunks using the .on('data') event. This prevents high memory usage for large files, which is especially useful for larger image files.

Step 4: Optimizing the Download Process

While the built-in HTTP/HTTPS modules work well for most use cases, they can be further optimized for handling large files:

Streaming: By default, both the http and https modules return a stream, so the image data is received in chunks. This is more memory-efficient than downloading the entire image into memory at once, especially with large files.
Error Handling: To prevent memory leaks or unexpected behavior, ensure that you handle both request and response errors by using the on('error') events. This helps catch network issues or file system problems.
Buffer Concatenation: The chunks of the image are stored in an array and concatenated into a single buffer once the download is complete. This ensures that the image data is correctly written to the file without data loss.

Step 5: Wrapping It Up

Here’s the complete downloadImage function:

function downloadImage(url, filename) {
  const savePath = path.join(saveDirectory, filename);
  const client = url.startsWith('https') ? https : http;  // Choose https or http based on URL

  client.get(url, (response) => {
    if (response.statusCode !== 200) {
      console.error(`Failed to fetch image: ${response.statusCode}`);
      return;
    }

    const arrayBuffers = [];
    response.on('data', chunk => {
      arrayBuffers.push(chunk);
    });

    response.on('end', () => {
      const buffer = Buffer.concat(arrayBuffers);
      fs.writeFileSync(savePath, buffer);
      console.log(`Image saved as ${filename}`);
    });

    response.on('error', (error) => {
      console.error('Error downloading the image:', error.message);
    });
  }).on('error', (error) => {
    console.error('Error with the request:', error.message);
  });
}

The function uses Node's native HTTP/HTTPS modules to download and save an image efficiently. You can call this function with the image URL and desired filename, and it will save the image to the local file system.

Step 6:Conclusion and Tips for Handling Large Files

Memory Management: By using streaming (i.e., downloading chunks of the file), this method is memory-efficient, even for large images.
Error Handling: Always handle potential errors during the HTTP request and the file writing process to prevent crashes.
Directory Management: Ensure the destination directory exists before attempting to save the file. This can prevent errors related to missing directories.

Saving the Image to the File System

Once the image is downloaded using the native HTTP/HTTPS modules, the next step is to save it to the file system. This section will guide you through saving the image efficiently and managing file storage. We'll also cover handling common issues like file overwriting and ensuring the correct directory structure is in place.

Step 1: Defining the Save Directory

Before saving the image, we need to ensure that the target directory exists. This prevents issues where the image cannot be saved because the directory doesn’t exist.

In the provided code, we use the path module to define the directory, and the fs.existsSync method to check if it exists. If it doesn't, we create the directory using fs.mkdirSync.

const saveDirectory = path.resolve(__dirname, 'images');

// Ensure the save directory exists
if (!fs.existsSync(saveDirectory)) {
  fs.mkdirSync(saveDirectory, { recursive: true });
} 

path.resolve(__dirname, 'images'): This resolves the path to the images directory, ensuring it's relative to the current file's location.
fs.existsSync(saveDirectory): This checks if the images folder already exists.
fs.mkdirSync(saveDirectory, { recursive: true }): If the directory doesn’t exist, this creates it (and any intermediate directories, if necessary).

This ensures that your code won’t encounter errors when trying to save the image, even if the directory was not pre-created.

Step 2: Handling File Overwriting

When saving files, there is always the possibility of overwriting an existing file if a file with the same name already exists in the target directory. In your code, the image is saved directly using fs.writeFileSync. By default, this will overwrite any existing file with the same name.

fs.writeFileSync(savePath, buffer);  // Save the image buffer directly 

However, overwriting files can be problematic in certain scenarios (e.g., when you want to keep all downloaded images). Here are a few strategies to prevent overwriting and manage file versions:

Check if the file exists: Before writing the file, check if it already exists and decide how to handle it.

For example, you could append a timestamp or an incremental number to the filename to ensure uniqueness:

let savePath = path.join(saveDirectory, filename);

// Check if the file exists and modify the filename to avoid overwriting
let counter = 1;
while (fs.existsSync(savePath)) {
  const extname = path.extname(filename);
  const basename = path.basename(filename, extname);
  savePath = path.join(saveDirectory, `${basename}-${counter++}${extname}`);
}
fs.writeFileSync(savePath, buffer);

fs.existsSync(savePath): Checks if the file already exists.
Incremental Filenames: If the file exists, the filename is modified by appending a counter (e.g., image-1.jpg, image-2.jpg), ensuring no overwriting occurs.

This method allows you to manage file versions without losing any downloaded images.

Step 3: Saving the Image

After making sure the directory exists and managing file overwriting, you can safely save the image data. The image data is written to the file system using the fs.writeFileSync method, as shown in the code:

fs.writeFileSync(savePath, buffer);  // Save the image buffer directly 

Here, buffer is the image data that has been fetched and concatenated from the streamed chunks, and savePath is the final path to save the image.

Step 4: Handling Errors in File Saving

It's important to handle potential errors that may arise when saving a file, such as file system permission issues or disk space problems. If there is an error writing the file, you can catch it using a try...catch block:

try {
  fs.writeFileSync(savePath, buffer);
  console.log(`Image saved as ${filename}`);
} catch (error) {
  console.error('Error saving the image:', error.message);
} 

The catch block will log any errors that occur during the file-saving process, allowing you to debug the issue and handle it gracefully.

Step 5: Conclusion

Here’s the final summary of how the image is saved:

Directory Management: We ensure that the target directory exists and create it if necessary.
File Overwriting: By default, files are overwritten. However, we can modify the filename to prevent overwriting and manage multiple versions of the same image.
Error Handling: We catch errors during the file-saving process to ensure the script doesn’t crash unexpectedly.

Step 6: Final Run

Finally, let us run the script to see the downloaded image in action:

const fs = require('fs');
const path = require('path');
const https = require('https');
const http = require('http');

const saveDirectory = path.resolve(__dirname, 'images');

// Function to download image using native HTTP/HTTPS modules
function downloadImage(url, filename) {
  const savePath = path.join(saveDirectory, filename);
  const client = url.startsWith('https') ? https : http;  // Choose https or http based on URL

  // Fetch image data with HTTP/HTTPS
  client.get(url, (response) => {
    if (response.statusCode !== 200) {
      console.error(`Failed to fetch image: ${response.statusCode}`);
      return;
    }

    const arrayBuffers = [];
    response.on('data', chunk => {
      arrayBuffers.push(chunk); // Collect the chunks of the image
    });

    response.on('end', () => {
      const buffer = Buffer.concat(arrayBuffers); // Combine the chunks into a single buffer
      fs.writeFileSync(savePath, buffer);  // Save the image buffer directly
      console.log(`Image saved as ${filename}`);
    });

    response.on('error', (error) => {
      console.error('Error downloading the image:', error.message);
    });
  }).on('error', (error) => {
    console.error('Error with the request:', error.message);
  });
}

// Main function to start downloading the image
async function main() {
  // Check if the directory exists, if not, create it
  if (!fs.existsSync(saveDirectory)) {
    fs.mkdirSync(saveDirectory, { recursive: true });
  }

  // Download the image with a valid URL
  const imageUrl = "https://www.example.com/image.png"; // Replace with a valid image URL
  downloadImage(imageUrl, "picture.jpg");
}

main();

The image is successfully saved to the file system using the native HTTP/HTTPS modules.

Handling Errors and Retries in Downloading Images

Common Issues in Image Downloading

Downloading images can run into various issues due to network inconsistencies, server limitations, or file handling errors. Some common issues include:

Network Timeouts: Slow or unstable connections may cause requests to time out.
Server Errors: Servers might return a 500 error for various reasons, or rate-limit requests with a 429 status.
File Conflicts: When saving images, conflicts can arise if files with the same name already exist.
Data Corruption: Incomplete downloads or interruptions can result in corrupted image files.

By anticipating these issues, we can make our downloading process more reliable.

Implementing Retry Logic

Retries help recover from intermittent issues by reattempting the request after a failure. A common approach includes:

Defining Retry Limits: Limit retries to avoid endless requests (e.g., 3 attempts).
Adding Delays: Implement a delay between retries to avoid overwhelming the server, using exponential backoff (doubling the delay after each attempt) for improved efficiency.

Example (pseudo-code):

async function downloadImageWithRetry(url, filename, retries = 3) {
    try {
        await downloadImage(url, filename);
    } catch (error) {
        if (retries > 0) {
            const delay = (3 - retries) * 1000; // increase delay with each retry
            console.log(`Retrying in ${delay / 1000} seconds...`);
            await new Promise(resolve => setTimeout(resolve, delay));
            return downloadImageWithRetry(url, filename, retries - 1);
        } else {
            console.error(`Failed to download after multiple attempts: ${error.message}`);
        }
    }
}

Handling Timeouts and Server Errors

Timeouts occur when the server takes too long to respond. Many libraries (Axios, for instance) support timeout settings. Handling server errors involves checking HTTP status codes:

Client-side Timeout Handling: Set a maximum wait time to avoid hanging requests.
Response Status Handling: Check response codes (500, 404, etc.) to determine error cause.

Adding timeout and error handling within retry logic builds resilience into the download process.

Advanced Techniques

Downloading Multiple Images

To download several images at once, we can use Promise.all, which allows multiple asynchronous tasks to run in parallel. This method speeds up the process compared to downloading each image sequentially.

By placing each download operation within an individual promise, Promise.all executes them simultaneously, which is especially useful when downloading images from a high-capacity server.

Example workflow for parallel downloads:

Define an array of image URLs.
Use Promise.all to execute downloadImage (or similar) for each URL concurrently.

For example:

async function downloadMultipleImages(imageUrls) {
    const downloadPromises = imageUrls.map((url, index) => {
        const filename = `image_${index + 1}.jpg`;
        return downloadImage(url, filename); // Each download returns a promise
    });
    await Promise.all(downloadPromises);
    console.log('All images downloaded successfully.');
} 

Here, each URL in imageUrls triggers a call to downloadImage, and Promise.all waits for all downloads to finish before logging completion.

Throttling Downloads for Rate Limits

When dealing with rate limits, performing too many requests in a short period can lead to server blocks. Throttling allows us to control the rate of downloads, avoiding issues with restricted servers.

One method is to use a small number of simultaneous downloads (e.g., 5 at a time), finishing each batch before moving to the next.

Example for throttling with a custom limit:

async function throttledDownload(images, limit = 5) {
    for (let i = 0; i < images.length; i += limit) {
        const batch = images.slice(i, i + limit);
        const downloadBatch = batch.map((url, index) => {
            const filename = `image_${i + index + 1}.jpg`;
            return downloadImage(url, filename);
        });
        await Promise.all(downloadBatch); // Wait for the current batch to complete
        console.log(`Batch ${Math.floor(i / limit) + 1} completed`);
    }
} 

This approach slices the images array into batches, downloading only a specified number (e.g., 5) at a time, improving control over download speed and server impact.

Managing large Image Files

When handling large image files in Node.js, managing memory efficiently and preventing timeouts are essential for a smooth download process.

Libraries like Axios offer convenient methods for downloading large files while keeping memory usage low. Node-fetch, is also effective but may require extra configuration to manage large files as efficiently as Axios.

Best Practices for Downloading Large Images

Stream the Data: Instead of loading an entire image into memory, stream data directly to the file system. This approach keeps memory usage manageable by processing data in smaller chunks. With Axios, setting responseType to stream allows piping image data directly to disk.
Set Timeouts and Retries: Large downloads are prone to timeouts. Configure Axios with a reasonable timeout, like 30 seconds, to handle server delays. For reliability, consider retrying the download on failure, especially when dealing with large files or network interruptions.
Use Write Streams for Storage: Writing files using fs.createWriteStream lets you store large files without loading everything into memory, preserving system resources. Piping the streamed data directly to disk minimizes memory impact.

Here’s a sample using Axios to download and save a large image:

import fs from 'fs';
import axios from 'axios';

async function downloadLargeImage(url, filename) {
    const writer = fs.createWriteStream(filename);
    const response = await axios({
        url,
        method: 'GET',
        responseType: 'stream',
        timeout: 30000, // Set a 30-second timeout
    });

    response.data.pipe(writer);

    return new Promise((resolve, reject) => {
        writer.on('finish', resolve);
        writer.on('error', reject);
    });
}

downloadLargeImage('https://example.com/large-image.jpg', 'large-image.jpg')
    .then(() => console.log('Download complete'))
    .catch(error => console.error('Error downloading file:', error.message)); 

This example configures Axios to stream image data directly to a write stream, making it ideal for handling large files. Node-fetch can also work well by handling chunks, though more setup might be required for large files.

Optimizing the Download Process

Efficiently managing resources and optimizing performance can make the image download process faster and more reliable. Here’s how to improve download efficiency in Node.js, covering strategies like streaming, caching, and file management.

Key Techniques for Optimizing Downloads

Performance Considerations: Minimize the strain on memory and processing power by adjusting download methods to avoid loading entire files at once. Streamlined processes are especially important when handling multiple or large images.
Streaming vs. Buffering: For large files, streaming is typically more memory-efficient than buffering, as it allows data to flow directly to disk without storing it all in memory. This approach is achieved easily with libraries like Axios by setting responseType to stream, which saves resources when downloading large images.
Efficient File Management and Disk Storage: Use write streams to save images directly to disk, preventing memory overload from large buffers. Proper directory management also helps to avoid issues with duplicate files and to organize downloads neatly. Setting up automated processes to clean or archive files once used can further optimize disk space.
Compression and Caching: When working with image-heavy applications, use compressed images to reduce bandwidth usage and speed up download times. Implement caching strategies to prevent downloading the same image multiple times, especially when working with API-based or frequently accessed images.
Retry Mechanisms: Network issues or server downtime can interrupt downloads, so use retries to improve reliability. Implement exponential backoff for retry attempts to avoid server overload and optimize response time.

Here’s a streamlined example of downloading an image using Axios with streaming, caching, and error handling:

import fs from 'fs';
import path from 'path';
import axios from 'axios';

const imageCache = new Set();  // Cache to avoid duplicate downloads

async function downloadImage(url, filename) {
    if (imageCache.has(url)) {
        console.log(`Image from ${url} is already downloaded.`);
        return;
    }

    const writer = fs.createWriteStream(filename);
    try {
        const response = await axios({
            url,
            method: 'GET',
            responseType: 'stream',
            timeout: 30000, // Set a 30-second timeout
        });

        response.data.pipe(writer);
        imageCache.add(url); // Add URL to cache after successful download

        await new Promise((resolve, reject) => {
            writer.on('finish', resolve);
            writer.on('error', reject);
        });
        console.log(`Image saved as ${filename}`);
    } catch (error) {
        console.error(`Error downloading ${url}:`, error.message);
    }
}

downloadImage('https://example.com/image.jpg', 'image.jpg'); 

This setup streams data to a file, prevents duplicate downloads, and sets a timeout for server delays. Such practices ensure that downloading images is both fast and resource-efficient, without straining the system or network.

Security Considerations

When downloading images, security is crucial to protect both your application and users from potential threats. Below are best practices for handling untrusted URLs, validating image data, and ensuring secure downloads.

Handling Untrusted URLs: Always validate URLs before attempting downloads, especially when URLs come from user input or third-party sources. Use regular expressions or URL parsers to confirm the URL format and prevent malicious inputs. Limit downloads to only trusted domains whenever possible to reduce exposure to potentially harmful content.

Validating Image Data: Even if a URL points to an image, the data returned could still contain harmful content. Use libraries to verify that the downloaded data is a legitimate image format (e.g., JPEG, PNG) by checking headers or file signatures. Additionally, validate the size and dimensions of the image to avoid loading overly large or unexpected files.

Using HTTPS for Secure Downloads: Whenever possible, prioritize HTTPS URLs to ensure secure, encrypted data transfer. HTTPS protects against man-in-the-middle attacks by encrypting the download, making it harder for third parties to intercept or modify data during transfer. Avoid downloading images over unsecured HTTP connections unless absolutely necessary.

Here’s an example setup in Node.js that includes URL validation, image type validation, and a security-first approach with HTTPS:

import axios from 'axios';
import fs from 'fs';
import path from 'path';

async function downloadImage(url, filename) {
    // Validate URL and ensure it uses HTTPS
    try {
        const parsedUrl = new URL(url);
        if (parsedUrl.protocol !== 'https:') {
            throw new Error('Only HTTPS URLs are allowed for secure downloads.');
        }
    } catch (error) {
        console.error('Invalid URL:', error.message);
        return;
    }

    // Create write stream for secure download
    const writer = fs.createWriteStream(filename);
    try {
        const response = await axios({
            url,
            method: 'GET',
            responseType: 'stream',
            timeout: 30000,
            validateStatus: (status) => status === 200,
        });

        // Confirm image content type
        const contentType = response.headers['content-type'];
        if (!contentType || !contentType.startsWith('image/')) {
            throw new Error('URL did not return a valid image.');
        }

        response.data.pipe(writer);
        await new Promise((resolve, reject) => {
            writer.on('finish', resolve);
            writer.on('error', reject);
        });
        console.log(`Image securely saved as ${filename}`);
    } catch (error) {
        console.error('Error during secure image download:', error.message);
    }
}

downloadImage('https://example.com/image.jpg', 'image.jpg');

This example ensures that only HTTPS URLs are processed, validates that the response is an image, and securely saves the file. Such measures protect the application and users from untrusted sources and ensure secure handling of image data.

Here’s a final chapter for the case study, with a focus on downloading images from Unsplash, implementing a basic scraper, and then optimizing it for performance.

Case Study - Downloading Images from Unsplash

In our case study, we'll walk through the process of building a scraper to download images from Unsplash.

Starting with a basic scraping and downloading setup, we'll explore ways to improve efficiency and performance, handling potential issues with network requests, file storage, and duplicate downloads.

Step 1: Setting Up the Basic Scraper

Our goal is to download images based on a specific search term from Unsplash, avoiding duplicates and storing the images in a structured directory.

We'll use Axios for HTTP requests, Cheerio for HTML parsing, and Node's fs and path modules for file management.

Here’s the initial setup:

const axios = require('axios');
const cheerio = require('cheerio');
const fs = require('fs');
const path = require('path');

// Define search term and target number of images
const searchTerm = 'nature';
const numberOfImages = 20;
const saveDirectory = path.resolve(__dirname, 'images');
const downloadedUrls = new Set(); // Track downloaded images to avoid duplicates

This configuration includes:

A search term ('nature') to find relevant images.
A limit on the number of images to download (20 in this case).
A Set (downloadedUrls) to keep track of previously downloaded images, ensuring each image is unique.

Step 2: Writing the Initial Scraper

Our initial scraper downloads images based on a search page, parsing the page’s HTML to find relevant <img> tags. Using Cheerio, we target image URLs and download the highest quality version available by selecting the largest image in each srcset.

async function scrapeAndDownloadImages() {
  try {
    const response = await axios.get(`https://unsplash.com/s/photos/${searchTerm}`);
    const html = response.data;
    const $ = cheerio.load(html);

    const imageUrls = [];

    $('img[itemprop="thumbnailUrl"]').each((i, element) => {
      if (imageUrls.length >= numberOfImages) return false;

      const srcSet = $(element).attr('srcset');
      const dataSrc = $(element).attr('data-src');

      if (srcSet) {
        const urls = srcSet.split(',').map(item => item.trim().split(' ')[0]);
        const largestImageUrl = urls[urls.length - 1];
        if (largestImageUrl && !downloadedUrls.has(largestImageUrl)) {
          imageUrls.push(largestImageUrl);
          downloadedUrls.add(largestImageUrl);
        }
      } else if (dataSrc && !downloadedUrls.has(dataSrc)) {
        imageUrls.push(dataSrc);
        downloadedUrls.add(dataSrc);
      }
    });

    console.log(`Image URLs:`, imageUrls);

    await Promise.all(
      imageUrls.map((url, index) => downloadImage(url, `image${index + 1}.jpg`))
    );
    console.log(`Images downloaded successfully!`);
  } catch (error) {
    console.error('Error during scraping:', error.message);
  }
}

This function:

Fetches the HTML of the search page.
Parses <img> elements for image URLs, checking both srcset and data-src attributes to find the highest resolution.
Avoids duplicates using downloadedUrls.
Downloads each unique image asynchronously using Promise.all, which improves speed by handling all downloads concurrently.

Step 3: Improving and Optimizing Performance

While the initial scraper is functional, it can be improved in several ways:

Adding Delays between requests to avoid being flagged by the server.
Implementing Retry Logic for failed downloads.
Throttling Concurrent Downloads to avoid network congestion.

Optimization 1: Retry Logic with Delay

To handle network issues, we add retry logic with exponential backoff, allowing the script to reattempt downloads after failures with increasing delay times.

const delay = (ms) => new Promise(resolve => setTimeout(resolve, ms));

async function downloadImageWithRetry(url, filename, retries = 3) {
  const savePath = path.join(saveDirectory, filename);
  try {
    const response = await axios.get(url, { responseType: 'arraybuffer' });
    fs.writeFileSync(savePath, response.data);
    console.log(`Image saved as ${filename}`);
  } catch (error) {
    if (retries > 0) {
      const retryDelay = (4 - retries) * 1000; // Increase delay on each retry
      console.log(`Retrying ${filename} in ${retryDelay / 1000} seconds...`);
      await delay(retryDelay);
      return downloadImageWithRetry(url, filename, retries - 1);
    } else {
      console.error(`Failed to download ${filename} after multiple attempts:`, error.message);
    }
  }
}

This function:

Attempts to download the image.
Retries up to 3 times with incremental delays if there’s a failure.
Logs an error if it ultimately fails to download after all attempts.

Optimization 2: Throttling Downloads

When downloading multiple images, downloading too many at once can strain network resources or get blocked by the server. Throttling downloads to a manageable level (e.g., 5 at a time) can prevent these issues.

async function throttledDownload(images, limit = 5) {
  for (let i = 0; i < images.length; i += limit) {
    const batch = images.slice(i, i + limit);
    const downloadBatch = batch.map((url, index) => 
      downloadImageWithRetry(url, `image_${i + index + 1}.jpg`)
    );
    await Promise.all(downloadBatch); // Wait for each batch to complete
    console.log(`Batch ${Math.floor(i / limit) + 1} completed`);
  }
}

Here:

We split the list of images into batches of a given limit.
Process each batch concurrently up to the limit, waiting for all to complete before starting the next batch.
This approach minimizes server load and avoids network congestion.

Step 4: Testing Performance Improvements

To compare the effectiveness of these optimizations:

Run the basic version of the scraper and track download times and error rates.
Run the optimized version and observe reduced errors, smoother download progression, and improved reliability.

These optimizations help manage network usage, improve download reliability, and prevent server overloads by balancing speed and resource efficiency.

Step 5: Final Optimized Scraper

Combining all improvements, here’s the final version of the scraper, which includes retry logic, delays, and throttling.

async function scrapeAndDownloadImagesOptimized() {
  try {
    const response = await axios.get(`https://unsplash.com/s/photos/${searchTerm}`);
    const html = response.data;
    const $ = cheerio.load(html);

    const imageUrls = [];
    $('img[itemprop="thumbnailUrl"]').each((i, element) => {
      if (imageUrls.length >= numberOfImages) return false;
      const srcSet = $(element).attr('srcset');
      const dataSrc = $(element).attr('data-src');
      if (srcSet) {
        const urls = srcSet.split(',').map(item => item.trim().split(' ')[0]);
        const largestImageUrl = urls[urls.length - 1];
        if (largestImageUrl && !downloadedUrls.has(largestImageUrl)) {
          imageUrls.push(largestImageUrl);
          downloadedUrls.add(largestImageUrl);
        }
      } else if (dataSrc && !downloadedUrls.has(dataSrc)) {
        imageUrls.push(dataSrc);
        downloadedUrls.add(dataSrc);
      }
    });

    console.log(`Found ${imageUrls.length} unique image URLs.`);
    await throttledDownload(imageUrls);
    console.log('All images downloaded successfully with optimizations!');
  } catch (error) {
    console.error('Error during optimized scraping:', error.message);
  }
}

In this case study, we developed a scraper for downloading images from Unsplash, progressively optimizing it with retry logic, throttling, and batch processing.

These improvements reduce error rates, prevent server overload, and enhance the overall efficiency and reliability of the image download process.

Conclusion

In this article, we explored various techniques for downloading images programmatically using Node.js. From setting up basic image downloading scripts to optimizing performance, we have seen how important it is to select the right tools and methods to meet project requirements.

Note: However, do note that not all downloaded images are available for any kind of use. Always check appropriate licences involved, Terms and Conditions, and other legal documents that specify how you can and how you can't use each particular image.

Key Methods Covered:

Basic Image Downloading with Axios & Node-Fetch:
We started with simple methods using popular libraries like axios and node-fetch, which are excellent for smaller-scale projects due to their ease of use and flexibility. However, they can be limited when handling large numbers of requests or large file sizes.
Native HTTP/HTTPS Modules:
We then examined the native http and https modules, which offer a low-level, built-in solution for downloading images. While efficient for simple tasks, they lack some of the convenience and features provided by higher-level libraries like axios, such as automatic retries or easy error handling.
Image Downloading with Retry Logic:
Adding retry logic was a significant improvement, as it makes the script more robust to network errors or server unavailability. This approach ensures greater reliability, especially when downloading images in large numbers.
Optimizing Performance with Concurrency and Throttling:
To handle multiple downloads efficiently, we introduced concurrency management and throttling. By limiting concurrent downloads, we could improve the speed of the script while ensuring that the server isn't overwhelmed by too many simultaneous requests. This is crucial when scraping large image libraries like Unsplash.
Advanced Techniques:
In the final steps, we explored best practices for managing large image files, optimizing the download process, and handling security concerns like validating URLs and ensuring secure downloads over HTTPS.

More Web Scraping Guides

For more Node.JS resources, feel free to check out the NodeJS Web Scraping Playbook or some of our in-depth guides:

TLDR: How to Download Images with NodeJS
Choosing the Right Tool
Implementing Image Download with Axios
Implementing Image Download with Node-Fetch
Implementing Image Download with Request
Using Native HTTP/HTTPS Modules
Handling Errors and Retries in Downloading Images
Advanced Techniques
Case Study - Downloading Images from Unsplash
Conclusion
More Web Scraping Guides

How to Download Images with Node.js

Need help scraping the web?

TLDR: How to Download Images with NodeJS​

Choosing the Right Tool​

1. Axios​

2. Node-fetch​

3. Request​

4. Native HTTP/HTTPS modules​

Guidance on Selecting the Best Tool​

Implementing Image Download with Axios​

Setting Up Axios​

Download an Image with Axios​

Saving the Image to the File System​

Implementing Image Download with Node-Fetch​

Setting up Node-Fetch​

Download an Image​

Saving the Image to the File System​

Implementing Image Download with Request​

Setting up Request​

Download an Image​

Saving the Image to the File System​

Using Native HTTP/HTTPS Modules​

Setting Up Native Modules​

Download an Image​

Saving the Image to the File System​

Handling Errors and Retries in Downloading Images​

Common Issues in Image Downloading​

Implementing Retry Logic​

Handling Timeouts and Server Errors​

Advanced Techniques​

Downloading Multiple Images​

Throttling Downloads for Rate Limits​

Managing large Image Files​

Best Practices for Downloading Large Images​

Optimizing the Download Process​

Key Techniques for Optimizing Downloads​

Security Considerations​

Case Study - Downloading Images from Unsplash​

Step 1: Setting Up the Basic Scraper​

Step 2: Writing the Initial Scraper​

Step 3: Improving and Optimizing Performance​

Optimization 1: Retry Logic with Delay​

Optimization 2: Throttling Downloads​

Step 4: Testing Performance Improvements​

Step 5: Final Optimized Scraper​

Conclusion​

More Web Scraping Guides​

TLDR: How to Download Images with NodeJS

Choosing the Right Tool

1. Axios

2. Node-fetch

3. Request

4. Native HTTP/HTTPS modules

Guidance on Selecting the Best Tool

Implementing Image Download with Axios

Setting Up Axios

Download an Image with Axios

Saving the Image to the File System

Implementing Image Download with Node-Fetch

Setting up Node-Fetch

Download an Image

Saving the Image to the File System

Implementing Image Download with Request

Setting up Request

Download an Image

Saving the Image to the File System

Using Native HTTP/HTTPS Modules

Setting Up Native Modules

Download an Image

Saving the Image to the File System

Handling Errors and Retries in Downloading Images

Common Issues in Image Downloading

Implementing Retry Logic

Handling Timeouts and Server Errors

Advanced Techniques

Downloading Multiple Images

Throttling Downloads for Rate Limits

Managing large Image Files

Best Practices for Downloading Large Images

Optimizing the Download Process

Key Techniques for Optimizing Downloads

Security Considerations

Case Study - Downloading Images from Unsplash

Step 1: Setting Up the Basic Scraper

Step 2: Writing the Initial Scraper

Step 3: Improving and Optimizing Performance

Optimization 1: Retry Logic with Delay

Optimization 2: Throttling Downloads

Step 4: Testing Performance Improvements

Step 5: Final Optimized Scraper

Conclusion

More Web Scraping Guides