What is crawl budget and should SEOs be concerned about it?
The crawl budget indicates how fast and how many pages a search engine wants to crawl on your website. It depends on the amount of resources a crawler wants to use on your website and the amount of crawling your server supports.
More crawling doesn't mean you will get a better rank. However, if your pages are not crawled and indexed, they will not be rated at all.
Most websites don't have to worry about the crawl budget, but there are a few times when you might want to take a look. Let's look at some of these cases.
When should you worry about the crawl budget?
Typically, you don't have to worry about the crawl budget on popular pages. Usually, it's pages that are newer, don't link well, or don't change much that aren't crawled often.
Crawl budget can be an issue with newer sites, especially those with many pages. Your server may be able to support more crawling. However, because your website is new and probably not very popular yet, a search engine may not want to crawl your website very often. This is mostly a mismatch in expectations. You want your pages to be crawled and indexed, but Google doesn't know if your pages are worth indexing and may not want to crawl as many pages as you want.
Crawling budget can also be an issue for larger websites with millions of pages or websites that are updated frequently. In general, if many pages aren't crawling or refreshing as often as you'd like, you should try to speed up the crawl. We'll talk about this later in this article.
How to check crawl activity
If you want to get an overview of Google's crawling activity and the issues it has encountered, the crawl stats report in Google Search Console is the best place to look.
Here you will find various reports that can help you identify changes in crawling behavior, problems with crawling and more information about how Google is crawling your website.
You definitely want to examine all flagged crawl states like the one shown here:
There are also timestamps of when pages were last crawled.
If you want to see hits from all bots and users, you need access to your log files. Depending on your hosting and setup, you may have access to tools like Awstats and Webalizer, as shown here on a shared host with cPanel. These tools show some aggregated data from your log files.
For more complex setups, you will need to access and save the raw log files, possibly from multiple sources. You may also need special tools for larger projects, such as MOOSE (elasticsearch, logstash, kibana) Stack that enables the storage, processing and visualization of log files. There are also log analysis tools like Splunk.
What counts against the crawl budget?
These URLs can be found by crawling and parsing pages or from a variety of other sources including sitemaps. RSS Feeds, submitting URLs for indexing in Google Search Console, or using indexing API.
There are also several Googlebots that share the crawl budget. For a list of the various Googlebots crawling your website, see the Crawl Stats report in GSC.
Google adjusts how they crawl
Each website has a different crawl budget made up of a few different inputs.
The crawl demand is simply how much Google wants to crawl on your website. More popular pages and pages with significant changes are crawled more often.
Popular pages or those with more links generally take precedence over other pages. Remember, Google needs to prioritize your pages in some way for crawling. Links are an easy way to find out which pages on your website are more popular. However, it's not just your website, but all of the pages on all of the websites on the internet that Google needs to figure out how to prioritize.
You can use the … Preferably via links Report in Site Explorer indicating which pages are likely to be crawled more often. It also shows you when Ahrefs last crawled your pages.
There is also a concept of obsolescence. When Google finds that a page is not changing, the page is crawled less often. For example, if they're crawling a page and don't see any changes after a day, they might wait three days before crawling again, ten days the next time, 30 days, 100 days, and so on Waiting for crawls. but becomes rarer over time. However, if Google detects large changes across the site or a site move, the crawl rate is usually increased, at least temporarily.
Limitation of the crawl rate
The crawl rate limit indicates how much crawling your website can support. Websites can do a certain amount of crawls before experiencing server stability issues such as slowdowns or errors. Most crawlers will roll back crawling when they see these issues so they don't harm the site.
Google adapts based on the crawl status of the website. If the site is fine with more crawling, the limit will increase. If the website has problems, Google will slow down the speed at which it is crawled.
I want google to crawl faster
There are a few things you can do to ensure that your website supports additional crawling and increases your website's crawl demand. Let's look at some of these options.
Speed up your server / increase resources
Essentially, the way Google crawls pages is to download resources and then process them at the end of them. The page speed that a user perceives is not exactly the same. The crawl budget affects how quickly Google can connect and download resources, which has more to do with the server and resources.
Further links, external & internally
Remember, crawl demand is generally based on popularity or links. You can increase your budget by increasing the number of external and / or internal links. Internal links are easier because you control the site. For suggested internal links, see the Link possibilities Report in Site Audit, which also includes a tutorial that explains how it works.
Fix broken and redirected links
Keeping links to broken or redirected pages active on your website will have little impact on the crawl budget. Usually the pages linked here have a relatively low priority as they probably haven't changed in a while. However, cleaning up problems is generally good for website maintenance and helps your crawl budget a bit.
You can easily find broken (4xx) and redirected (3xx) links on your website in the Internal pages Report in Site Audit.
Check the sitemap for broken or redirected links in the sitemap All problems Report for "3XX Forward in sitemap "and"4XX Page in sitemap ”.
To use RECEIVE Instead of POST where you can
This one is a bit more technical in that it is about it HTTP Request methods. Do not use POST Inquiries where RECEIVE Inquiries work. It basically is RECEIVE (pull) vs. POST (to press). POST Requests are not cached, so they affect the crawl budget RECEIVE Inquiries can be cached.
Use indexing API
If you want pages to be crawled faster, see if you have Google indexing permissions API. Currently this is only available for some use cases like job postings or live videos.
Bing also has indexing API that is available to everyone.
Which does not work
There are some things that are sometimes tried that don't really help your crawl budget.
- Small changes to the side. Make small changes to pages, such as: For example, updating dates, spaces, or punctuation marks in the hope that pages will be crawled more often. Google is pretty good at determining whether changes matter or not. Therefore, these small changes are unlikely to affect crawling.
- Crawl delay directive in robots.txt. This instruction will slow down many bots. However, Googlebot doesn't use it, so it has no effect. We at Ahrefs respect this. So if you ever need to slow down our crawling, you can add a crawling delay to your robots.txt file.
- Remove third-party scripts. Third-party scripts do not count towards your crawl budget. So removing these scripts doesn't help.
- Nofollow. Okay, this one is dubious. In the past, nofollow links would not have used a crawl budget. However, nofollow is now treated as a hint so Google may be able to crawl these links.
I want google to crawl slower
There are only a few good ways to slow down Google crawling. There are some other adjustments that you could technically make, such as: B. slowing down your website, but I would not recommend these methods.
Slow setting, but guaranteed
The main control Google gives us to crawl slower is a rate limiter in the Google Search Console. You can use the tool to slow the crawl rate, but it can take up to two days for it to take effect.
Fast adaptation, but with risks
If you need a faster solution, you can take advantage of Google's crawl rate adjustments for the health of your website. If you provide Googlebot with a 503 service unavailable or 429 too many requests status code on pages, pages will crawl more slowly or may stop crawling temporarily. However, you don't want to do this for more than a few days or pages may be removed from the index.
I want to reiterate that most people don't have to worry about the crawl budget. If you have any concerns, I hope this guide has been helpful.
I usually only investigate when there are problems with pages not crawling and indexing. I need to explain why someone shouldn't be concerned or I will see something that concerns me on the crawl stats report in Google Search Console.
Have any questions? Let me know on Twitter.