Skip to main content

Block Images and Resources using Puppeteer

Puppeteer Guide - How to Block Images and Resources using Puppeteer

One useful feature of Puppeteer is the ability to control and manipulate the loading of resources, such as images, stylesheets, and scripts, during browser automation.

When trying to block images and resources in Puppeteer there are two prominent approaches. The first is using the puppeteer-extra plugin suite. The second is to use puppeteer request interceptors yourself.

In this guide, we'll cover the essential steps to selectively block the loading of images and other resources.

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.


Why to Block Images and Resources in Web Scraping

Blocking resources when web scraping can lead to a number of benefits.

Some of these benefits include performance improvements, detection avoidance, privacy and security, and content filtering.

Faster Page Loading

Additional resources cause the page to load slower which makes your scraping slower.

Less requests = more focused scraping = reduced load = better performance.

Bandwidth Usage

Some resources, like images, consume a lot of bandwidth, blocking them can keep costs down and improve performance on limited connections like proxies or cellular networks.

Avoid Detection

Some websites use tracking scripts, and similar techniques to detect scrapers, blocking these resources can help avoid detection and rate limits.

For more information see our guide on Web Scraping Without Getting Blocked

Testing and Development

Developers may want to simulate different network conditions or test how a website behaves when certain resources are blocked. This is useful for testing the robustness and performance of web applications under various scenarios.

Privacy and Security

Blocking external resources can enhance privacy by preventing the loading of third-party assets that may be used for tracking or analytics. It can also reduce the risk of security vulnerabilities associated with external resource loading.

Content Filtering

Blocking specific types of resources, such as images or scripts from certain domains, can act as a content filter.

This is relevant in situations where there's a need to restrict access to certain content or prevent the loading of potentially harmful resources.

Limited Device Resources

On devices with limited resources, blocking unnecessary resources can contribute to a smoother browsing experience and reduce the strain on the device's CPU and memory.

These benefits does not mean that resource blocking might be perfect for your project.

For example, if you are scraping images specifically or need all stylesheets and images to load in order to properly scrape a web page then resource blocking may not be helpful.


How to Block Images and Resources using Puppeteer

Using puppeteer-extra

puppeteer-extra is a modular plugin framework that provides a variety of plugins to accomplish common goals in Puppeteer and provides the easiest way to block images and resources using Puppeteer.

The plugin of interest for us is puppeteer-extra-plugin-block-resources.

This plugin will simplify the process of blocking specific resources.

Step 1: Install Packages

To get started with puppeteer-extra, make sure you have it installed by running the following

npm install puppeteer puppeteer-extra puppeteer-extra-plugin-block-resources

Step 2: Launching Puppeteer with Puppeteer-Extra

With puppeteer-extra installed, it takes place of the traditional puppeteer library. You can launch a headless browser (like you would with Puppeteer) using the following snippet

const puppeteer = require("puppeteer-extra");

puppeteer.launch({ headless: true }).then(async (browser) => {
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });

await page.goto("https://www.vanityfair.com");
await page.screenshot({ path: "vanityfair.png" });
});

The above code will simply launch a puppeteer browser, visit the Vanity Fair website, and take a screenshot.

Step 3: Utilizing the block-resources plugin

Now we need to use the puppeteer-extra-block-resources plugin to begin blocking different types of resources. Our code now looks like the following snippet

const puppeteer = require("puppeteer-extra");
const blockResourcesPlugin = require("puppeteer-extra-plugin-block-resources")({
blockedTypes: new Set(["image"]),
});
puppeteer.use(blockResourcesPlugin);

puppeteer.launch({ headless: true }).then(async (browser) => {
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });

await page.goto("https://www.vanityfair.com");
await page.screenshot({ path: "vanityfair.png" });
});

You can see in the above code, we initialize a new resource blocker plugin with the blockedTypes list including image. After running this, you will see the screenshot no longer loads any images on the web page.

vanityfair.com loaded without images

As you can see in the produced image, the content of the web page still loads but images are missing. For this website the space for the images is still occupied as blank space.

Puppeteer-extra blockable types

With this plugin you are not limited to just blocking images. The following table is a list of blockable resources. Make sure to check the plugin docs for any changes.

Request TypeDescription
documentRequest for an HTML document
stylesheetRequest for a CSS stylesheet
imageRequest for an image
mediaRequest for media content (audio/video)
fontRequest for a font file
scriptRequest for a JavaScript file
texttrackRequest for a text track file (for subtitles)
xhr (XMLHttpRequest)Request initiated by JavaScript using XMLHttpRequest
fetchModern API for making network requests in JavaScript
eventsourceServer-sent events for real-time updates
websocketFull-duplex communication channel between client and server
manifestRequest for a web app manifest file
otherOther types of requests that don't fit into specific categories

Dynamically change blocked resources

You are not limited to a static set of blocked types for your entire execution. The types can be changed during runtime and will be respected for any future requests. As an example, see the code below.

const puppeteer = require("puppeteer-extra");
const blockResourcesPlugin = require("puppeteer-extra-plugin-block-resources")({
blockedTypes: new Set(["image"]),
});
puppeteer.use(blockResourcesPlugin);

puppeteer.launch({ headless: true }).then(async (browser) => {
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });

await page.goto("https://www.vanityfair.com");
await page.screenshot({ path: "vanityfair.png" });

blockResourcesPlugin.blockedTypes.delete("image");
blockResourcesPlugin.blockedTypes.add("font");
await page.setCacheEnabled(false); // Force fonts to be requested not cached
await page.reload();
await page.screenshot({ path: "vanityfair_nofont.png" });
});

After running the above code, you can see that images have returned but now the browser is using default fonts.

vanityfair.com loaded with images but missing fonts

As you can see in the produced screenshots, all images (even the advertisement) have returned but you can notice the page text and headers are using default sans-serif fonts rather than the serif font we saw in the previous image.

From this example you now know how to launch Puppeteer with the puppeteer-extra library, block specific resources and dynamically change which resources are being blocked.


How to Build a Custom Resource Blocker with Puppeteer

We can accomplish the same goal without any extra libraries by using Puppeteer's request interception.

Request interception in Puppeteer allows you to programmatically block or allow network requests in the browser.

Once request interception is enabled, all network requests will pause until the interceptor makes a decision on the request.

The below code shows you how to set up a basic request interceptor and visit a website.

const puppeteer = require("puppeteer");

puppeteer.launch({ headless: true }).then(async (browser) => {
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });

// Set request interceptor
await page.setRequestInterception(true);
page.on("request", (req) => {
req.continue();
});

await page.goto("https://www.vanityfair.com");
await page.screenshot({ path: "vanityfair.png" });
});

The above code will launch a headless browser, enable interception and create an interceptor.

The interceptor created is a simple permissive one that allows all requests. Then we visit the Vanity Fair website and see that it loads properly.

Utilize Request Interception to Block Resources

Now that you've created a request interceptor, we can use it to block images and other resources. To do this we can use the resourceType method to check what is being requested.

It is worth noting, this is not the only criteria you can check against.

You could also block requests based off url, request method, headers, and more. Check the Puppeteer HTTPRequest docs to see the available fields and methods.

For now, we will just focus on the resourceType. For a list of possible resource types, check the MDN documentation. With the following code, we will begin blocking images.

const puppeteer = require("puppeteer");

puppeteer.launch({ headless: true }).then(async (browser) => {
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });

// Set request interceptor
await page.setRequestInterception(true);
const blockedTypes = new Set(["image"]);
page.on("request", (req) => {
if (blockedTypes.has(req.resourceType())) {
req.abort();
} else req.continue();
});

await page.goto("https://www.vanityfair.com");
await page.screenshot({ path: "vanityfair.png" });
});

From this code, we visit the page and you see the screenshot has no images.

You can see the set up is pretty similar to the puppeteer-extra plugin but we have more control.

On top of defining a set of blocked resources we can also evaluate other criteria as previously mentioned.

vanityfair loaded without images via interceptors

The code produces the image seen above. Just like with the puppeteer extra plugin, we can see the images are missing from the page once again.

Dynamically Blocking Multiple Resource Types

Just like with the plugin, you may want to block multiple types of resources. For example, both images and videos should not be loaded.

We can use our blockedTypes set the same as we did in the puppeteer-extra plugin to define blocked types and change them dynamically.

Remember, the change will only apply to any network requests that proceed the change of types. See the below example for how we can do this.

const puppeteer = require("puppeteer");

puppeteer.launch({ headless: true }).then(async (browser) => {
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });

// Set request interceptor
await page.setRequestInterception(true);
const blockedTypes = new Set(["image", "video"]);
page.on("request", (req) => {
if (blockedTypes.has(req.resourceType())) {
req.abort();
} else req.continue();
});

await page.goto("https://www.vanityfair.com");
await page.screenshot({ path: "vanityfair.png" });

blockedTypes.delete("image");
blockedTypes.add("font");

await page.setCacheEnabled(false); // force reload without cache
await page.reload();
await page.screenshot({ path: "vanityfair_nofont.png" });
});

From the above example, we begin by blocking image and video. After loading the page the first time, we disable the cache, enable image requests and block font requests.

After loading the page the second time, images return and default fonts are used.

vanityfair loaded with images but without fonts via interceptors

Once again, in the produced screenshot, we can see images return but the browser is using default fonts again. This validates that we have dynamically changes the blocked types to allow images but deny fonts.

How To Block Images

In order to block images, you can add the following line to the above script or you can include "image" in the definition of the blockedTypes variable.

const blockedTypes = new Set(["image"]);
// or
blockedTypes.add("image");

How To Block CSS

In order to block CSS, you can add the following line to the above script or you can include "stylesheet" in the definition of the blockedTypes variable.

const blockedTypes = new Set(["stylesheet"]);
// or
blockedTypes.add("stylesheet");

How To Block Media Loading

In order to block media, you can add the following line to the above script or you can include "font" in the definition of the blockedTypes variable.

const blockedTypes = new Set(["font"]);
// or
blockedTypes.add("font");

How To Block Script Loading

In order to block scripts, you can add the following line to the above script or you can include "script" in the definition of the blockedTypes variable.

const blockedTypes = new Set(["script"]);
// or
blockedTypes.add("script");

How To Block XHR & Fetch Requests

In order to block XHR & Fetch requests, you can add the following lines to the above script or you can include "media" in the definition of the blockedTypes variable.

const blockedTypes = new Set(["xhr", "fetch"]);
// or
blockedTypes.add("xhr");
blockedTypes.add("fetch");

Handling Errors

Handling errors is an essential aspect when working with Puppeteer, especially when intercepting requests and blocking resources.

Block the right resources:

  • Anytime you change the behavior of a web page you may encounter unexpected errors or changes.
  • Be sure you are not blocking critical page resources like essential scripts and stylesheets. Accidentally blocking these can interrupt your scraping flow.

Be selective:

  • Similarly make sure you are blocking resources based on the correct criteria. Make your criteria selective.
  • Instead of just blocking all scripts, try blocking all scripts from a certain domain or after a certain point.

Multiple request interceptors:

  • Be careful about introducing multiple interceptors. The program can only allow or abort a request once.
  • If you have multiple interceptors with overlapping logic it may cause errors and unexpected behavior.
  • This is not just for your own defined interceptors but even interceptors defined by third-party packages and libraries, like puppeteer-extra. To use multiple interceptors at once, read about cooperative intercept mode.

Best Practices

By following these best practices, you can enhance the reliability and efficiency of your Puppeteer scripts that involve blocking images and resources.

Customize these recommendations based on the specific requirements and characteristics of the websites you are automating.

Use Page Events Wisely:

  • Utilize page events to block resources effectively. For example, if you are trying to avoid third-party trackers and script loading.
  • In this case, you can apply the request blockers after the page has initially loaded. This will allow the page you want to load without issue and then block any third-party scripts that may be introduced afterwards.

Block Resources Dynamically:

  • Dynamically adjust your block list to fit your needs. You may be able to block certain resources only after you've clicked a button or experienced a certain redirect.
  • Changing your block list to match these scenarios helps you ensure you are selectively blocking the correct resources.

Understand Resources in a Web Page:

  • You should understand the different resources and how they are used so that you can block them appropriately.
  • View the MDN Resource Type documentation or table in this article to learn these.

Conclusion

From this article you have learned how to implement resource blocking in your web scraping project and how it can be useful.

The puppeteer-extra plugin is an easy drop in solution to achieve this. Custom puppeteer interceptors provide a highly customizable and controllable resource blocker.

Whichever solution you choose can benefit your project.

Check the official documentation for more information:

More Puppeteer Guides

If you would like to learn more about Web Scraping with Puppeteer, then be sure to check out The Puppeteer Web Scraping Playbook.

For more resources see the links below: