Selenium Guide: Run Selenium Using a Jupyter Notebook
Jupyter Notebook, with its interactive and user-friendly interface, has emerged as a preferred platform for data scientists, researchers, and educators alike. Selenium's integration with Jupyter Notebook opens up new avenues for automating web-related workflows, making it a valuable asset for tasks ranging from web scraping to comprehensive web application testing.
In this article, we will embark on a journey to explore the capabilities and applications of Jupyter Notebook using Selenium.
- TLDR
- Introduction to Jupyter Notebook and Selenium
- Using Selenium in a Jupyter Notebook
- Using Selenium Webdriver Manager in a Jupyter Notebook
- Interactive Web Automation
- Visual Feedback and Debugging
- Magic Commands in Selenium
- Troubleshooting and Best Practices
- More Web Scraping Guides
TLDR Example
This example assumes that you already have enough prior knowledge to start Jupyter Notebook and create a Selenium script. If you do not already know these things, please move on and read the rest of the article.
Cell 1
from selenium import webdriver
from selenium.webdriver.common.by import By
Cell 2
driver = webdriver.Chrome()
driver.get("https://quotes.toscrape.com")
Cell 3
driver.quit()
Introduction to Jupyter Notebook and Selenium
Jupyter Notebook is an interactive way to write, run and edit Python code. Notebook files end with the extsension .ipynb
instead of .py
. With a traditional Python file, a script is executed from top to bottom and if we need to make small changes to our code, we have completely restart execution.
Inside a Jupyter Notebook file, code is broken into what we call cells. The real power of Jupyter lies in these cells. From inside Jupyter, you can edit your code while it runs so you don't have to constantly restart your execution.
Especially when debugging or using trial and error as we often have to do when scraping the web, this can be a huge time saver.
When running Selenium inside of a Jupyter Notebook, we get several advantages:
- Easier to edit: When changing variables or other parts of your code, you don't have to restart the execution of your script!
- Code is run cell by cell: Because our code is run cell by cell, Selenium won't exit until we tell it to.
- Interactive Display: By leveraging other third party libraries, we can display the output from our cells... even screenshots!
Using Selenium in a Jupyter Notebook
In this guide, we'll walk through the steps to set up Selenium within a Jupyter Notebook and explore practical examples of its applications.
Setting Up the Environment
This tutorial assumes that you already have pip
and Python installed. First, we'll install Jupyter Notebook. You can install it with the command below.
pip install jupyter
Next, we'll get started with installing Selenium. You can install Selenium very similarly.
pip install selenium
We also need to make sure that we have a webdriver installed. For those of you unfamiliar with webdrivers, they literally run or drive a browser from inside of your code. Make sure that your version of webdriver matches the version of your browser.
You can check your Chrome version with the command below:
google-chrome --version
You should get an output that looks similar to this:
Google Chrome 120.0.6099.129
Once you know your Chrome version, you can head over here to find the version of Chromedriver matching it.
Now, you're ready to start the Jupyter Notebook server and begin writing code. You can do this with this next command. This will start a server running locally on your machine which you can view in your browser at http://localhost:8888/tree. VSCode also has a great extension for running Jupyter Notebooks if you prefer to run your notebook files from inside your text editor.
jupyter notebook
Creating Your First Selenium Script in Jupyter Notebook
As mentioned previously, in Jupyter, instead of writing full files, we break our code into cells. From the tree
page that you opened inside your browser, click on the "New" button and then click the "Notebook" option.
Once we've created this new file, we'll get a new file with an empty cell for our code. Let's put our imports in our first cell! Add the following lines below to your first cell.
from selenium import webdriver
from selenium.webdriver.common.by import By
We've got two more cells to create. Here's the next one:
driver = webdriver.Chrome()
driver.get("https://quotes.toscrape.com")
And finally:
driver.quit()
To run your new file, select the first cell at the top and hit the play button.
After running, you should notice something really convenient. Selenium opens up Chrome, and does not close until we run the cell containing driver.quit()
!
Using Selenium Webdriver Manager In A Jupyter Notebook
One of the biggest issues when dealing with Selenium is dependency management. While newer versions of webdriver have gotten better with compatibility, users still occasionally run into the issue where their browser updates and then becomes incompatible with their webdriver. Webdriver Manager can completely solve this issue for us.
According to the website, Webdriver Manager is compatible with Selenium v4.x and below. To check your Selenium version, you can use the following command:
pip freeze | grep "selenium"
Your output will look similar to this:
selenium==4.13.0
As long as your version of Selenium is compatible, I strongly recommend installing webdriver manager. You can do so with the following command:
pip install webdriver-manager
Create a new Notebook file and add the lines below into your first cell.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
Next we can create the cell that starts Chrome.
#create an instance of the Service object
service = Service(executable_path=ChromeDriverManager().install())
#start Chrome using the service keyword
driver = webdriver.Chrome(service=service)
#go to the site
driver.get("https://quotes.toscrape.com")
In older versions of Selenium, it was possible to specify the executable path as a keyword argument inside of the webdriver object, webdriver.Chrome("path/to/chromedriver")
. In modern Selenium, this no longer works!!!
In order to specify a custom path to Chromedriver, we must import the Service
class. WITHOUT THE SERVICE CLASS, YOU CANNOT SPECIFY A PATH TO CHROMEDRIVER!!!
The cell above does the following:
service = Service(executable_path=ChromeDriverManager().install())
creates a Service object and specifies the path to the Chromedriver that we wish to useChromeDriverManager().install()
ensures that we're running (and installing if needed) the latest version of Chromedriver
And our final cell:
driver.quit()
Once again, you can run your code cell by cell. Chrome will not close until the cell with driver.quit()
has been executed.
If you have encountered "Chromedriver executable needs to be in PATH" error while installing ChromeDriver, check our extensive guide to resolve this error.
Interactive Web Automation
When using Jupyter Notebooks to run Selenium code, we get the huge benefit of editing and running our code interactively. When using code in cells, we receive the following advantages:
- Easier to debug: When you experience a crash, you can see exactly which cell failed so you don't have to dig through traceback messages.
- Easier to edit: When changing variables or other parts of your code, you don't have to restart the execution of your script!
- Code is run cell by cell: Because our code is run cell by cell, Selenium won't exit until we tell it to.
- Interactive Display: By leveraging other third party libraries, we can display the output from our cells... even screenshots!
Step By Step Development
Now that we have a basic understanding of how Jupyter Notebooks work, we can build a scraper and easily develop it step by step. Let's create a new notebook file using the same three cells we created earlier.
Cell 1
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
Cell 2
#create a Service object
service = Service(executable_path=ChromeDriverManager().install())
#start Chrome and specify our service
driver = webdriver.Chrome(service=service)
#navigate to the site
driver.get("https://quotes.toscrape.com")
Cell 3
driver.quit()
Now that we have our basic skeleton, we can go through and add new cells before the one that contains driver.quit()
. Go to Cell 2 and insert a cell below it.
In our new cell, add this line:
driver.get("https://scrapeops.io")
You can now run this one, and the driver will change sites...we still haven't even finished running the code and we can continue to make changes to it.
Before moving on with this tutorial, go ahead and try adding some other code and changing some of the cells.
Element Interaction and Data Extraction
Because we can add and edit cells so easily, we can rapidly prototype and change our element interaction. We'll start with the same basic skeleton again. Go ahead and create a new file with the same three basic cells we used in the previous example.
Now we'll add a new cell before driver.quit()
. In this cell add the following line.
login_link = driver.find_element(By.CSS_SELECTOR, "a[href='/login']")
We can then add another cell to click the link.
login_link.click()
After running the first few cells, we can now click the login link by simply running the cell.
Next, let's add another cell after the one that clicks on the login link. We'll use this one to find the username and password boxes.
You should be able to add this cell and run it immediately without issue. If it fails, you probably need to restart your script.
element(By.CSS_SELECTOR, "input[id='username']")
password = driver.find_element(By.CSS_SELECTOR, "input[id='password']")
Let's add one more cell to add input into the boxes we found above.
After running this cell, go head and take a look at the instance of Chrome that Selenium is controlling. It should look like the image below.
Visual Feedback and Debugging
Jupyter Notebook gives us visual feedback for our code as we run it. First, we need to install the Python Imaging Library, Pillow.
Install pillow:
pip install pillow
Now we'll create a new file.
Cell 1
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from PIL import Image
from IPython.display import display
Cell 2
#create a Service object
service = Service(executable_path=ChromeDriverManager().install())
#start Chrome and specify our service
driver = webdriver.Chrome(service=service)
#navigate to the site
driver.get("https://quotes.toscrape.com")
Cell 3
#take a screenshot
driver.save_screenshot("quotes.png")
#open the image
image = Image.open("quotes.png")
#display the image
display(image)
Cell 4
driver.quit()
If you run the entire notebook file, you'll notice something really cool. Below Cell 3, our screenshot is displayed right inside the notebook!
When debugging, Jupyter has first class support. If a cell fails to execute, it will be highlighted in red and you can immediately find and address the issue.
Jupyter Notebook Magic Commands for Selenium
Jupyter has special builtin commands called Magic Commands. These commands are invoked by using the %
operator. When using the %
operator, %
will perform the operation for a single line of code. To time a specific line, we can use %timeit
. If we wish to view debugging information about a single line of code, we can use the %debug
command.
Let's create another notebook file with the following cells.
Cell 1
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
Cell 2
#create a Service object
service = Service(executable_path=ChromeDriverManager().install())
#start Chrome and specify our service
driver = webdriver.Chrome(service=service)
Cell 3
#time this line
%timeit driver.get("https://quotes.toscrape.com")
Cell 4
#debug this line
%debug driver.get("https://quotes.toscrape.com/login")
Cell 5
driver.quit()
Pay attention to Cells 3 and 4:
- In Cell 3, we time the line of code. After running this cell, we get a report on timing benchmarks for it.
- In Cell 4, we debug a single line of code, while there are no bugs in this line, it gives us basic information about the datatypes that are used in said line. This can be incredibly helpful when trying to debug large files and track down errors.
Troubleshooting and Best Practices
In this section, we'll explore troubleshooting tips and best practices to ensure a smooth experience when working with Selenium in your Jupyter environment.
Handling Credentials Safely
Jupyter Notebooks are built to make code not only easier to write, but easier to share. Often, Selenium scripts may be used to access password protected or some other type of secured information. You should never hard code your credentials into a notebook file or any Python file for that matter.
Always keep passwords, API keys and other types of credentials stored in a .env
, .json
or some other type of configuration file that you do not share when sharing your code.
Common Issues and Solutions
Always make sure that your version of Webdriver matches your browser. As was shown earlier in this article, you can use Webdriver Manager in order to manage your versions of webdriver.
Another common issue is the Element Not Found
exception. In order to properly handle this, the developer should be aware of basic error handling in Python and also, have a basic understanding of dealing with dynamic content and how to use either an Explicit Wait or an Implicit Wait.
Conclusion
Good job! You've finished this tutorial. You're ready to go and apply your new knowledge of both Selenium and Jupyter Notebook.
Our Challenge for You: Go build something!
To view more information about Selenium or Jupyter, take a look at the following links:
More Web Scraping Guides
Delve deeper into the fascinating world of web scraping with our curated collection of articles.