How To Use BeautifulSoup's find() Method
BeautifulSoup's .find()
method is a powerful tool for finding the first page element in a HTML or XML page that matches your query criteria.
In this guide, we will look at the various ways you can use the find method to extract the data you need:
- BeautifulSoup
.find()
Method - Find By Class And Ids
- Find By Text
- Find With Multiple Criteria
- Find Using Regex
- Find Using Custom Functions
If you would like to find all the elements that match your query criteria then use the find_all() method.
Need help scraping the web?
Then check out ScrapeOps, the complete toolkit for web scraping.
BeautifulSoup .find()
Method
You should use the .find()
method when there is only one element that matches your query criteria, or you just want the first element.
The .find()
returns the first element that matches your query criteria.
To use the .find()
method simply add the page element you want to find to the .find('h1')
method. In this case, we want to find all the <h1>
tags on a HTML page.
from bs4 import BeautifulSoup
html_doc = """
<html>
<body>
<h1>Hello, BeautifulSoup!</h1>
<ul>
<li><a href="http://example.com">Link 1</a></li>
<li><a href="http://scrapy.org">Link 2</a></li>
</ul>
</body>
</html>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
## Find All <a> Tags
print(soup.find('h1'))
## --> <h1>Hello, BeautifulSoup!</h1>
print(soup.find('h1').get_text())
## -->'Hello, BeautifulSoup!'
For more details then check out the full findall documentation here.
Find By Class And Ids
The .find_all()
method allows you to find the first element on the page by class name, id, or any other element attribute using the attrs
parameter that matches your query criteria.
For example, here are examples on how to find the first <p>
tag that have the following classes, ids or attributes:
## <p> Tag + Class Name
soup.find('p', class_='class_name')
## <p> Tag + Id
soup.find('p', id='id_name')
## <p> Tag + Any Attribute
soup.find('p', attrs={"aria-hidden": "true"})
Find By Text
The .find()
method allows you to search by string too using the string
parameter. It returns the first string that exactly match your string.
## Strings that exactly match 'Link 1'
soup.find(string="Link 1")
## --> 'Link 1'
If you want to find the first string that contains your substring then you need to use regular expressions:
import re
## Strings that contain 'Link'
soup.find(string=re.compile("Link"))
## --> 'Link 1'
Find With Multiple Criteria
If you need to find the first page element that requires you to add multiple attributes to the query then you can do so with the attrs
parameter:
## <p> Tag + Class Name & Id
soup.find('p', attrs={"class": "class_name", "id": "id_name"})
Find Using Regex
The .find()
method also supports the use of regular expressions.
Simply add the regex query into the .find()
method.
For example, here we are using the .find()
method with a regex expression to find all tags that start with the letter b:
import re
## Find First Element That Starts With The Letter 'b'
soup.find(re.compile("^b"))
# --> <body>...</body>
Find Using Custom Functions
If you need to make very complex queries then you can also pass functions into the .find()
method:
def custom_selector(tag):
# Return "span" tags with a class name of "target_span"
return tag.name == "span" and tag.has_attr("class") and "target_span" in tag.get("class")
soup.find(custom_selector)
More Web Scraping Tutorials
So that's how to use the BeautifulSoup's .find()
method.
If you would like to learn more about how to use BeautifulSoup then check out our other BeautifulSoup guides:
- BeautifulSoup Guide: Scraping HTML Pages With Python
- How To Install BeautifulSoup
- Fix BeautifulSoup Returns Empty List or Value
- How To Install BeautifulSoup
- How To Use BeautifulSoup's find_all() Method
Or if you would like to learn more about Web Scraping, then be sure to check out The Python Web Scraping Playbook.
Or check out one of our more in-depth guides: