Understanding Common Crawl Errors

Search Engine Visibility V1 crawls your site using a Web crawler that makes HTTP requests to the server that hosts your site.

There are times when Search Engine Visibility V1 cannot complete the crawl. The following sections describe the most common reasons a crawl resulted in an error.

Broken Home Page

Your home page returns an HTTP status other than 200. While an HTTP status of 200 is a good status, all others mean there is an error somewhere. This error indicates that most search engines can't index the home page's content or follow its links. Redirects are OK as long as they end in a 200 status.

Home Page Infinite Redirect

Your home page loops indefinitely in a redirect. Most search engines consider pages like this broken because they can't get to valid content. Some common causes of this error are:

  • Your site requires cookies.
  • Your site has incorrectly configured redirects (i.e. http://coolexample.com/ redirects to http://coolexample/com/a/, but then http://coolexample.com/a/ redirects back to http://coolexample.com/). Be sure that all redirects end in a 200 status.

Home Page's Content isn't HTML

Your home page is not identifying itself as an HTML page. Most search engines only index and crawl links of HTML pages. Your HTTP response header Content-Type: value does not start with text/html or application/xhtml.

Invalid Root URL

Your root URL's format is invalid. Most search engines will not crawl it. Some common causes are:

  • The URL contains invalid characters.
  • The URL is greater than 500 characters.
  • The URL contains a username or password (i.e. http://username@coolexample.com/).

No Response from Server

We didn't receive a response from your server when we requested your home page. As a result, search engines can't crawl your site. Some common causes are:

  • You set up a new domain for hosting within the last 48 hours and DNS is still propagating. Once your DNS propagates, we recommend recrawling your website with Search Engine Visibility V1.
  • The URL and/or domain does not exist.

Your robots.txt File is Blocking Search Engine Visibility V1

Your robots.txt file explicitly blocks Search Engine Visibility V1 from crawling your home page, your home page is redirecting to a page that is blocked, or your home page's content location is blocked.

HTTP Protocol Violation

Your server committed an HTTP protocol violation so we could not properly crawl your home page. Search engines can't crawl your site.

Crawl Timed Out

We were unable to complete your site crawl within a four-hour period. Search engines can't crawl your site properly, which means many of your pages aren't being crawled or indexed. Some common causes are:

  • Your site can't respond to concurrent HTTP requests.
  • Http requests to your pages repeatedly time out.
  • Http requests to your pages repeatedly return 502, 503 or 504 HTTP statuses.

Bài này có hữu ích không?
Cảm ơn về phản hồi của bạn. Để nói chuyện với một đại diện của dịch vụ khách hàng, vui lòng sử dụng số điện thoại hỗ trợ hoặc tùy chọn trò chuyện ở trên.
Rất vui vì chúng tôi đã giúp được bạn! Chúng tôi có thể làm gì thêm cho bạn?
Rất tiếc về điều đó. Vui lòng cho chúng tôi biết điều gì làm cho bạn bối rối và vì sao giải pháp không khắc phục được vấn đề này.