Glassdoor
Scraping Teardown
Find out everything you need to know to reliably scrape Glassdoor,
including scraping guides, Github Repos, proxy performance and more.
Glassdoor Web Scraping Overview
Glassdoor implements multiple layers of protection to prevent automated data extraction. This section provides an overview of its anti-bot systems and common challenges faced when scraping, along with insights into how these protections work and potential strategies to navigate them.
Scraping Summary
Glassdoor is a website where current and former employees anonymously review companies. It also lists job advertisements and company profiles. From a web scraping perspective, it is somewhat popular due to the valuable job, company, and review data it provides. However, Glassdoor uses moderate to strong anti-scraping systems, making bots detection a challenge. Scrapping Glassdoor can involve multiple approaches like usage of proxies or rotation of user-agents to avoid detection. Parsing data can moderately be challenging as Glassdoor uses dynamic CSS and content is dynamically loaded when scrolling pages. Data extraction can be difficult as some of the content might be behind login and geolocated.
Best Glassdoor Proxies
Proxy statistics and optimal proxy providers for scraping Glassdoor. Learn which proxy types work best, their success rates, and how to minimize bans with the right provider.
Glassdoor Anti-Bots
Anti-scraping systems used by Glassdoor to prevent web scraping. These systems can make it harder and more expensive to scrape the website but can be bypassed with the right tools and strategies.
Glassdoor Data
Explore the key data types available for scraping and alternative methods such as public APIs, to streamline your web data extraction process.
Data Types
No data types found
Public APIs
API Description
Glassdoor does not offer a public API for data extraction. This means that access to its public data such as company reviews, salary information, and job listings is limited. While the website provides a comprehensive database for job seekers and researchers, the lack of a public API restricts the capability of users to access this data on a large scale for purposes like market analysis, company review assessment, and job market trend evaluation.
Access Requirements
There are no access requirements as a public API is not available.
API Data Available
There is no API data available.
Why People Use Web Scraping?
A lack of a public API leaves web scraping as the primary method to extract structured data from Glassdoor. This approach allows for the mining of company reviews, salary information, and job postings that are open to public view. However, it's worth noting that scraping Glassdoor is generally against the platform's terms of service, often leading to legal and ethical concerns. It's advised that any scraping activities take place with respect for the site's rules, and consideration for individual privacy rights. Ultimately, while scraping offers a workaround, it's not an officially supported or reliable means of accessing the data.
Glassdoor Web Scraping Legality
Understand the legal considerations before scraping Glassdoor. Review the website's robots.txt file, terms & conditions, and any past lawsuits to assess the risks. Ensure compliance with applicable laws and minimize the chances of legal action.
Legality Review
Glassdoor's robots.txt file and its Terms of Service (TOS) express a clear preference against web scraping, substantially constraining potential areas for crawling. While these set the website's expectations, they aren't absolute legal barriers for scraping certainly public sections. The de facto legal view tends to acknowledge scraping publicly available web content, provided logins or other technical access controls aren't circumvented.
Actual legal risk generally surfaces from scraping content in authenticated areas where the user has explicitly agreed to the TOS, accessing personal data, or disregarding technical access controls. On Glassdoor, accessing user-generated content like reviews or salary information – considered private under the TOS – amplifies the risk, particularly when logged in. For publicly available sections, it's prudent for developers to adopt respectful crawling behavior, steer clear of legally protected areas, and manage any personal or copyrighted data responsibly.
Glassdoor Robots.txt
Does Glassdoor robot.txt permit web scraping?
Summary
The robots.txt file for Glassdoor sets strict rules that substantially limit the areas of the website that can be accessed by general automated crawlers. Instructions such as Disallow: /Salaries, Disallow: /Jobs, Disallow: /Interview, and numerous others create extensive barriers that impede web scrapers from reaching a large number of critical sections on the site. Although these rules are designed to apply to all standard user agents, some exceptions are made for specific bots such as Googlebot and Bingbot, which have their own set of Allow directives.
There are very few paths in the robots.txt file that are explicitly allowed, which includes rules such as Allow: /sitemap.htm and Allow: /Content/ContentMap.htm. The file also contains references to sitemap locations. From a practical viewpoint, this setup creates a rather restrictive environment that places heavy limitations on the capabilities of general-purpose web scrapers, while still permitting search engines to maintain access for indexing purposes. In summary, the robots.txt file for Glassdoor clearly indicates a restrictive stance when it comes to automated scraping, with a preference for selectively allowing access to certain whitelisted crawlers.
Glassdoor Terms & Conditions
Does Glassdoor Terms & Conditions permit web scraping?
Summary
The terms of service for Glassdoor explicitly prohibit automated access and data extraction. The terms state:
"You may not use any robot, spider, scraper, data mining tools, data gathering and extraction tools, or other automated means to access the Services for any purpose without our express written permission."
This covers all scraping, crawling, or bot-driven collection across both public and logged-in parts of the site. While terms can apply differently depending on whether a user has explicitly agreed (for example, by creating an account), Glassdoor frames this restriction as universal.
Glassdoor does not offer a public API, and any attempts to bypass login gates, rate limits, or CAPTCHAs are likely to be treated as violations. The terms mention potential actions such as IP blocking, account suspension, and legal remedies, making unauthorized scraping generally high-risk without prior written permission.
Glassdoor Lawsuits
Legal Actions Against Scrapers: A history of lawsuits filed by the website owner against scrapers and related entities, highlighting legal disputes, claims, and outcomes.
Lawsuits Summary
Glassdoor has not been involved in any known legal disputes related to web scraping.
Found 0 lawsuits
Glassdoor Github Repos
Find the best open-source scrapers for Glassdoor on Github. Clone them and start scraping straight away.
Language
Code Level
Stars
Sorry, there is no github repo available.
Glassdoor Web Scraping Articles
Find the best web scraping articles for Glassdoor. Learn how to get started scraping Glassdoor.
Language
Code Level
Sorry, there is no article available.
Glassdoor Web Scraping Videos
Find the best web scraping videos for Glassdoor. Learn how to get started scraping Glassdoor.