Python BeautifulSoup - Use BeautifulSoup's find_all() Method

How To Use BeautifulSoup's find_all() Method

BeautifulSoup's .find_all() method is a powerful tool for finding all elements on an HTML or XML page that match your query criteria.

In this guide, we will look at the various ways you can use the findall method to extract the data you need:

BeautifulSoup .find_all() Method
FindAll By Class And Ids
FindAll By Text
FindAll With Multiple Criteria
FindAll Using Regex
FindAll Using Custom Functions

If you would like to find the first element that matches your query criteria then use the find() method.

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.

Proxy Manager

Scraper Monitoring

Job Scheduling

BeautifulSoup `.find_all()` Method

You should use the .find_all() method when there are multiple instances of the element on the page that matches your query.

The .find_all() returns an array of elements that you can then parse individually.

To use the .find_all() method simply add the page element you want to find to the .find_all('a') method. In this case, we want to find all the <a> tags on a HTML page.

from bs4 import BeautifulSoup

html_doc = """
<html>
    <body>
        <h1>Hello, BeautifulSoup!</h1>
        <ul>
            <li><a href="http://example.com">Link 1</a></li>
            <li><a href="http://scrapy.org">Link 2</a></li>
        </ul>
    </body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

## Find All <a> Tags
print(soup.find_all('a'))

"""
Output:

[
    <a href="http://example.com">Link 1</a>,
    <a href="http://scrapy.org">Link 2</a>
]

"""

As .find_all() returns an array of elements, you will need to loop each element in the list to extract the data you want:

element_list = soup.find_all('a')
for element in element_list:
    print(element.get_text())
    ## --> 'Link1', 'Link2'

To limit the number of results the .find_all() method returns then use the limit parameter:

soup.find_all('a', limit=2)

This works just like the LIMIT keyword in SQL. It tells BeautifulSoup to stop gathering results after it’s found a certain number.

For more details then check out the full findall documentation here.

FindAll By Class And Ids

The .find_all() method allows you to find elements on the page by class name, id, or any other element attribute using the attrs parameter.

For example, here are examples on how to find all <p> tags that have the following classes, ids or attributes:

## <p> Tag + Class Name
soup.find_all('p', class_='class_name')

## <p> Tag + Id
soup.find_all('p', id='id_name')

## <p> Tag + Any Attribute
soup.find_all('p', attrs={"aria-hidden": "true"})

FindAll By Text

The .find_all() method allows you to search by string too using the string parameter. It returns a list of strings that exactly match your string.

## Strings that exactly match 'Link 1'
soup.find_all(string="Link 1")
## --> ['Link 1']

If you want to find any strings that contain your substring then you need to use regular expressions:

import re

## Strings that contain 'Link'
soup.find_all(string=re.compile("Link"))
## --> ['Link 1', 'Link 2']

FindAll With Multiple Criteria

If you need to find page elements that require you to add multiple attributes to the query then you can do so with the attrs parameter:

## <p> Tag + Class Name & Id
soup.find_all('p', attrs={"class": "class_name", "id": "id_name"})

FindAll Using Regex

The .find_all() method also supports the use of regular expressions.

Simply add the regex query into the .find_all() method.

For example, here we are using the .find_all() method with a regex expression to find all tags that start with the letter b:

import re
for tag in soup.find_all(re.compile("^b")):
    print(tag.name)
# --> body
# --> b

Or here we are using the .find_all() method with a regex expression to find all tags that contain the letter t:

for tag in soup.find_all(re.compile("t")):
    print(tag.name)
# --> html
# --> title

FindAll Using Custom Functions

If you need to make very complex queries then you can also pass functions into the .find_all() method:

def custom_selector(tag):
	# Return "span" tags with a class name of "target_span"
	return tag.name == "span" and tag.has_attr("class") and "target_span" in tag.get("class")

soup.find_all(custom_selector)

How To Use BeautifulSoup's find_all() Method

Need help scraping the web?

BeautifulSoup .find_all() Method​

FindAll By Class And Ids​

FindAll By Text​

FindAll With Multiple Criteria​

FindAll Using Regex​

FindAll Using Custom Functions​

More Web Scraping Tutorials​