Crawlers list: 17 common web crawlers in 2023 - GeekySameer (2024)

What will I learn?

What are Web crawlers or Web Spiders?Types of Web CrawlersList of web crawlers and their User-agents1. GoogleBot2. Bingbot3. Slurpbot4. DuckDuckBot5. Baiduspider6. Yandex Bot7. MJ12bot8. Sogou Spider9. Exabot10. Alexa crawler11. Soso Spider12. Pinterestbot13. SemrushBot14. Dotbot15. AhrefsBot16. Facebook external hit17. archive.org_botConclusion

What are Web crawlers or Web Spiders?

In simple language, web spiders are the bots or programs used by various search engines to get details about your website and index them. They can browse each kind of content such as text content, images, links on pages, sitemaps, etc. They browse the website automatically and gather information from websites to index them.

Here, we are sharing a list of all web crawlers used by the different search engines. This list will help you to make a better robots.txt file for your website by allowing or blocking the required user agents.

Types of Web Crawlers

SearchBots: These are the search bots used by the search engine to crawl websites, views images, and links, and index them on the internet.

Here are some common SearchBots:- GoogleBot – used by Google, BingBot – used by Bing, SlurpBot – used by Yahoo, etc

CommercialBots: These are the bots used by some SEO websites to provide you with SEO reports of a particular website so that you can solve any SEO issues on the Site. For e.g Ahrefsbot – Used by ahref.com, SemrushBot – Used by Semrush.com, etc

Feed Fetchers Bots: These are the bots used to collect thumbnails and titles of the contents to display on their website. For e.g. Facebook external hit – Used by the Facebook website. Twitter bot – used by Twitter.

- Advertisem*nt -

Monitoring Bots: These are checking bots that are used to check the performance of the websites like uptime, pinback, etc. For e.g. WordPress (pingback) – Used by WordPress. (not covered in this post)

List of web crawlers and their User-agents

1. GoogleBot

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (1)

What is Googlebot?

Googlebot is the most active good bot that is used by Google to view the contents of your website and index them. They actively visit your website and go through all your content.

User-Agent

Googlebot

User-Agentstring

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Googlebot example in robots.txt

- Advertisem*nt -

Below is an example showing how to prevent Google from indexing your webpage https://example.com/exnoindex/donotindexthis.html

User-agent: GooglebotDisallow: /exnoindex/donotindexthis.html

If you want to restrict Google to index your complete website, you can use the below line in your robots.txt

User-agent: GooglebotDisallow: /

Apart from Googlebot, google uses more than 9 user agents for different crawling purposes.

Below is the list of all web crawlers used by Google.

User AgentsCrawlers DetailsFull User String
Mediapartners-GoogleUsed for Google AdsenseMediapartners-Google
AdsBot-Google-MobileUse to show ads on Mobile apps
(Android/iPhone)
Android:- Mozilla/5.0 (Linux; Android 5.0; SM-G920A) AppleWebKit (KHTML,like Gecko) Chrome Mobile Safari (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)

Iphone:- Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML,like Gecko) Version/9.0 Mobile/13B143 Safari/601.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)

AdsBot-GoogleUse to show ads on the websAdsBot-Google (+http://www.google.com/adsbot.html)
Googlebot-Image
Googlebot
Used to crawl images from websitesGooglebot-Image/1.0
Googlebot-News
Googlebot
Used to crawl newsIn 2011, Google declared that Googlebot will be used to crawl News. However, Googlebot-News will still respect the robots.txt of the website.
Googlebot-Video
Googlebot
Used to index your videos from websites and youtube.Googlebot-Video/1.0
Google FaviconShow your favicon in the google search resultMozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/49.0.2623.75 Safari/537.36 Google Favicon

you can find the rest of the bot’s details here Googlebots.

- Advertisem*nt -

2. Bingbot

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (2)

What is Bingbot?

Bingbot is a web crawler Bing uses to crawl website contents and images and index them in Search Engine. It replaced the MSNbot back in 2010.

User-Agent

Bingbot

User-Agentstring

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
W.X.Y.Z Safari/537.36

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

Below is the list of all web crawlers used by Bing:

User AgentsCrawlers DetailsFull User String
BingbotUsed to crawl website contentsMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
W.X.Y.Z Safari/537.36

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

AdIdxBotUsed by Bing ads. They crawl the ads and follow the link to the adsMozilla/5.0 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)

Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)

Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; NOKIA; Lumia 530) like Gecko (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)

BingPreviewUsed to generate previews of the website for BingMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/W.X.Y.Z Safari/537.36

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

MicrosoftPreviewIt generates snapshots for Microsoft productsMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; MicrosoftPreview/2.0; +https://aka.ms/MicrosoftPreview) Chrome/W.X.Y.Z Safari/537.36
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; MicrosoftPreview/2.0; +https://aka.ms/MicrosoftPreview)

Bingbot example in robots.txt

Use the below command in your robots.txt to prevent a particular page from being index in Bing

Useragent: BingbotDisallow: /exnoindex/donotindexthis.html

If you want to restrict Bing from indexing your complete website, you can use the below line in your robots.txt

User-agent: Bingbot Disallow: /

You can use the Robots.txt tester to validate your robots.txt file. Find more detail about creating robots.txt for Bing.

3. Slurpbot

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (3)

Slurp is a web crawler used by Yahoo. Yahoo gets its search results from Slurp and Bing web crawlers. While the majority of Yahoo results are powered by Bing, it is advised to allow Slurpbot to get your website to appear in Yahoo mobile search results.

Apart from search, Slurp also helps to collect content from sites and include them in sites like Yahoo News, Yahoo Finance, and Yahoo Sports.

User-agent

Slurp

User-Agentstring

Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Example of code in a robots.txt file to allow index:

User-agent: SlurpAllow: /

Read more documentation on Slurp

4. DuckDuckBot

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (4)

Similar to other search engines, DuckDuckGo uses a web crawler known as DuckDuckBot. DuckDuckGo has now become quite a popular browser because it doesn’t track users and respects their privacy. DuckDuckGo respects robots.txt rules as well.

User-agent

DuckDuckBot

User-Agentstring

DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)

Read more about DuckDuckBot

5. Baiduspider

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (5)

As Google doesn’t operate in China, Baidu is the most used search engine there and Baiduspider is the official name of the crawler used by Baidu.

Like any other search engine crawler, Baiduspider visits your websites, reads your content, and indexes them based on relevancy.

User-agent

Baiduspider

User-Agentstring

Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

Just like Google and Bing, Baidu uses multiple bots for different content. List of all the crawlers of Baidu:

User AgentsCrawlers Details
Baiduspider-imageBaidu Image Search
BaiduspiderBaidu Web/Mobile Search
Baiduspider-videoBaidu Video Search
Baiduspider-cproBaidu Union Search
Baiduspider-newsBaidu News Search
Baiduspider-favoBaidu Bookmark Search
Baiduspider-adsBaidu Business Search

Read more about Baidu Spider

6. Yandex Bot

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (6)

Yandex Bot is the Yandex search engine crawler that visits your website and helps them get indexed on Yandex Search Result.

Yandex is the largest Search Engine in Russia. So if your targeted audience lies in Russian countries, you probably don’t want to block Yandex.

User-agent

YandexBot

User-Agentstring

Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
User AgentsCrawlers DetailsFull User StringFollow
robots.txt?
YandexAccessibilityBotYandexAccessibilityBot downloads pages to check their accessibility for users.Mozilla/5.0 (compatible; YandexAccessibilityBot/3.0; +http://yandex.com/bots)No
YandexAdNetTheYandex advertising networkrobot.Mozilla/5.0 (compatible; YandexAdNet/1.0; +http://yandex.com/bots)Yes
YandexBlogsTheblog searchrobot that indexes post comments.Mozilla/5.0 (compatible; YandexBlogs/0.99; robot; +http://yandex.com/bots)Yes
YandexBotDetecting site mirrors.Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; +http://yandex.com/bots)Yes
YandexFaviconsDownloads the site’s favicon file to display in search results.Mozilla/5.0 (compatible; YandexFavicons/1.0; +http://yandex.com/bots)No

Apart from this, there are many bots that Yandex uses.

7. MJ12bot

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (7)

MJ12bot is a web crawler bot for Majestic, a UK-based search engine that operates in 13 languages in 60+ countries. Powers hundreds of thousands of businesses to get their website online.

It respects robots.txt.

User-agent

MJ12bot

8. Sogou Spider

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (8)

Sogou is a Chinese Search Engine with an Alexa rank of 121 as of 2010. It was launched in 2004. It powers 10 billion web pages. Sogou Spider is the name of a web crawler used by Sogou.com to read website contents in index them on the internet.

User-agent

Sogou web spider

User-Agentstring

Sogou Web Spider mobile user agent

MQQBrowser/26 Mozilla/5.0 (Linux; U; Android 4.4.2; zh-cn; MB200 Build/GRJ22; CyanogenMod-7) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 (compatible;Sogou web spider/4.0; +http://www.sogou.com/docs/help/webmasters.htm#07)

Sogou Web Spider desktop user agent

Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)

9. Exabot

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (9)

Exabot is the web crawler used by Exalead’s robot. It collects data from all around the world and supplies it to search engines. Exabot collects data and includes it in the main index of Exalead and thereby included in the search results of Exaleads

User-agent

Exabot

Example of robots.txt to prevent indexing of pages from a particular directory (for example, football):

User-agent: ExabotDisallow: football

10. Alexa crawler

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (10)

Alexa retired on May 1, 2022. Alexa was an American Web traffic analysis company by Amazon. Popularly known as Alexa rank by internet was a key metric of Alexa, that was based on estimated visitors of the websites per day.

11. Soso Spider

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (11)

Soso Spider is an automated web crawler for the Soso search engine owned by Tencent Holdings Limited, famous for QQ. Soso is the 13th most visited website in china and 36th in the world with over 20m page views daily.

User-agent

SosospiderSosospider+

User-Agentstring

Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)

12. Pinterestbot

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (12)

Pinterestbot is a crawler used by Pinterest to download images of products from your website’s catalog. It also downloads metadata of the products including price, availability, and description.

It also checks the authenticity of the website under pin pictures.

User-agent

Pinterestbot

User-Agentstring

Pinterest/0.2 (+https://www.pinterest.com/bot.html) Mozilla/5.0 (compatible; Pinterestbot/1.0; +https://www.pinterest.com/bot.html) Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Pinterestbot/1.0; +https://www.pinterest.com/bot.html)

You can restrict pinterest from crawling your site by using below command in robots.txt

user-agent: Pinterestbot disallow: /

PinterestBot respects robots.txt rules.

13. SemrushBot

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (13)

Semrush bot is a search bot software that Semrush uses to collect SEO data of your sites and use them for analytics including On-page SEO, backlinks, content analysis, and many more.

It constantly crawls your websites to get updated data. If you do not use any Semrush tools or do not intend to use this in the future, it a wise advice to block this bot.

Semrush uses different bots for different tools:

User AgentsCrawlers Details
SiteAuditBotTo find different SEO and technical issues.
SemrushBot-BAFor the backlink audit tool.
SemrushBot-SIOn-Page SEO Checker tool and similar tools.
SemrushBot-SWAChecking URLs on your site for the SWA tool.
SemrushBot-CTContent Analyzer and Post Tracking tools
SplitSignalBotSplitSignal tool
SemrushBot-COUBContent Outline Builder tool

Semrush follows robots.txt rules, you can block these crawlers by adding rules in robots.txt files

User-agent: SemrushBot
Disallow: /

14. Dotbot

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (14)

Similar to Semrush, Moz uses Dotbot to find Seo and technical issues on a website. Moz is a Seo tool used for keyword research, backlink finding, and many more tools.

Data collected by Dotbot can be accessed only through pro account of MOZ, so if you ever plan to use pro membership of moz, you can allow dotbot to crawl your site. Or simply block it to save your bandwidth.

User-agent

dotbot

Block Moz from crawling your site:

User-agent: dotbotDisallow: / 

15. AhrefsBot

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (15)

Again, ahrefs is a marketing tool used for link building and website SEO audit. Ahrefsbot is used to scrap your website data and provide you with audit reports including technical issues from your website. This report is then used to improve your website SEO and much more.

Again if you are not planning to use ahrefs marketing tool, you can block their bot:

User-agent

AhrefsBot

Block Ahrefs bot from crawling your site:

User-agent: AhrefsBot
Disallow: /

Find more detail on Ahrefsbot

16. Facebook external hit

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (16)

Facebook external hit is the web crawler used by Facebook to gather metadata such as thumbnails, titles, and descriptions of the post. Whenever you copy-paste links from a website to FB, the FB crawler hits the website and collects metadata to show to FB users.

You should not block this bot if you plan to share your post of FB.

User-agent

facebookexternalhit

User-Agentstring

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

Example in robots.txt:

User-agent: facebookexternalhitDisallow: /

read more on facebook crawler.

17. archive.org_bot

Wayback Machine or Internet archives saves a copy of your website in their database of around 150 billion web pages. They use archive.org_bot to keep a snapshot of the web page or a book or probably any online elements, these are then stored and can be accessed by anyone using their website.

I personally block this bot.

User-agent

archive.org_bot

Example in robots.txt

User-agent: archive.org_botDisallow: /

Conclusion

With this, we have come to the end of our web crawler lists. I hope this list will help you to properly allow or block the user agents that harm your bandwidth and provides no value to you.

You should be now able to distinguish between good and bad bots. This list will help you to design a better robots.txt file for your website.

Crawlers list: 17 common web crawlers in 2023 - GeekySameer (2024)
Top Articles
Latest Posts
Article information

Author: Kieth Sipes

Last Updated:

Views: 6188

Rating: 4.7 / 5 (47 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Kieth Sipes

Birthday: 2001-04-14

Address: Suite 492 62479 Champlin Loop, South Catrice, MS 57271

Phone: +9663362133320

Job: District Sales Analyst

Hobby: Digital arts, Dance, Ghost hunting, Worldbuilding, Kayaking, Table tennis, 3D printing

Introduction: My name is Kieth Sipes, I am a zany, rich, courageous, powerful, faithful, jolly, excited person who loves writing and wants to share my knowledge and understanding with you.