Page indexing report - why Google does not index some of your pages

Imagine pouring hours of effort into crafting compelling content and optimizing your website, only to discover that Google has overlooked some of your pages during its crawling process.

It's a common frustration among website owners and e-commerce store managers alike. When Google crawls your website, it evaluates each page to determine its relevance and quality. However, not all pages make the cut for indexing, leaving them invisible to potential visitors searching on Google.

You can find pages left unindexed by Google in the Google Page Index Report, a valuable tool within your Google Search Console. This report acts as a window into Google's indexing decisions, providing insights into which pages are left out and why. It's like peering into Google's mindset, understanding its criteria for indexing or omitting pages from search results.

In this article, we will look at the common pitfalls that hinder Google from showcasing your pages on its search results page. From technical glitches to content issues, we'll clarify each error and offer actionable insights to rectify them.

Not indexed reasons

You can find a list of the most common indexing errors that you would get in the Google Search Console Indexing report here.

Redirect error
"URL blocked by robots.txt" and "Indexed, though blocked by robots.txt"
URL marked ‘noindex’
Not found (404)
Soft 404
Blocked due to unauthorized request (401)
Blocked due to access forbidden (403)
Crawled - currently not indexed
Discovered - currently not indexed
Alternate page with proper canonical tag
Duplicate without user-selected canonical
Duplicate, Google chose different canonical than user
Blocked by page removal tool
Server error (5xx)

Redirect error:

Reason: This might be due to misconfigurations or problems with your redirection setup. Google might have encountered one of the following redirect errors:

A too long redirect chain
Redirect loop
A redirect URL that exceeded the max URL length
A bad/empty URL in the redirect chain

How to resolve:

Check redirects: Review your redirect configurations to ensure they are correctly set up. In Shopify, you can find your redirects under Online Store -> Navigation -> View URL Redirects (including the ones created by Smart SEO).

Note: Some third-party apps also provide redirect functionality - for example, a translation app can provide a language redirection, so check those third-party apps as well.
Fix misconfigurations: Correct any errors in your redirect setup, ensuring that it accurately directs users and search engines to the intended destination or that no redirection loop occurs.

"URL blocked by robots.txt" and "Indexed, though blocked by robots.txt":

Reason: When a URL is blocked by the robots.txt file, it means you've instructed search engines not to crawl and index that specific page or resource.In such cases, you can get a URL blocked by robots.txt error in the Google index report.

Having a page blocked by the robots.txt file does not guarantee that Google will not index it though, so if Google has indexed this page, but it's blocked in the robots.txt file, you might get Indexed, though blocked by robots.txt instead.

Note: Every Shopify store has a default robots.txt that you can access at domain.com/robots.txt (where domain.com is the domain name of your store).

How to resolve:

Review robots.txt file: Check your robots.txt file to identify which URLs or directories are being blocked. You can check with the Shopify support team whether your robots.txt file has been overridden and how to modify it.
Adjust rules: Modify the robots.txt rules to allow crawling of the desired URLs if they are unintentionally blocked.
Use Google Search Console's robots.txt tester: Validate your robots.txt file using the testing tool in Google Search Console to ensure it allows access to the necessary URLs.

Note, that some of the pages blocked by the robots.txt file, are intended not to be indexed and that's the reason why Shopify blocks search engines from indexing them out of the box. Such examples are pages containing sensitive information or information related to the user, for example - cart page, checkout page, my account page, etc.

URL marked ‘noindex’:

Reason: A 'noindex' directive on a URL signals to search engines that the content on that page should not be included in search results.

Just like with the robots.txt file, Shopify automatically creates a sitemap for each store (accessible on domain.com/sitemap.xml, where domain.com is your domain name).

If 'noindex' is intentional, confirm that it's necessary. If not, remove it to allow indexing.

How to resolve:

Include the page in the Smart SEO sitemap feature: If you are using the Sitemap feature in Smart SEO, make sure that the page in question is not excluded from the sitemap. Here is our documentation article on how to manage your Sitemap with Smart SEO.
Check for noindex/nofollow meta tags in your theme code: In our experience, a lot of Shopify merchants (or their developers) add index/noindex and follow/nofollow meta tags in their theme code manually. If the first step does not include the page in question in the Sitemap, check your theme code for such meta tags or contact our support for further advice.

Not found (404):

Reason: A "404 Not Found" error occurs when the requested page is not available. This may happen due to broken links or incorrect URL paths.

How to resolve:

Fix the broken links: Fix any broken links pointing to the non-existent page. This can be done through the Broken Links feature in Smart SEO. Here is our documentation on how to use the Broken Links feature in Smart SEO to fix broken links.

Soft 404:

Reason: A soft 404 occurs when a page returns a '200 OK' status code, indicating that the page exists, but the content suggests it's a "Not Found" or page with no content.

A common scenario in the Shopify context, would be with the collection pages. Let's say you have a "Woman" collection in your store on the URL:

https://domain.com/collections/woman

You have a pagination and let's say that your theme setting is set to display 16 products per page, which is the default for most themes. In case you have 17 products in that collection, then the last product will be displayed on a separate page, which will also have a URL:

https://domain.com/collections/woman?page=2

Where Shopify automatically adds the ?page=2 part in the URL for you. In this example, there are two ways Google might give a "Soft 404" error:

You change the number of products per page (let's say to 20), so that the 2nd page does not have any products
You remove/temporarily disable any products from the collection, and you no longer have products on the 2nd page

Since Google might have already been aware of this 2nd page, it will check the URL and if any of the 2 conditions above occur, it will encounter the generic "No products found" on the ?page=2 URL, resulting in "Soft 404" error.

The error might be slightly different, but Google is smart enough to understand the text as an indication that the page doesn't have any content. But the page itself exists and it's not a 404 error page, but you still get a "Soft 404" error.

How to resolve:

Review content: Check the page content to ensure it has content rather than resembling a similar error like in the above scenario or "Not Found" page.

Blocked due to unauthorized request (401):

Reason: When a page returns a '401 Unauthorized' status code, it means access to the page requires authentication, and Googlebot isn't authenticated to view the content.

How to resolve:

Provide access: Ensure that the necessary authentication credentials are provided for Googlebot to access the page.
Publicly accessible content: If the content is intended for public access, remove authentication requirements or provide an alternative version of the content accessible to search engines.

Blocked due to access forbidden (403):

Reason: A '403 Forbidden' error occurs when the server understands the request, but it refuses to authorize access to the requested page.

How to resolve:

Check if your store is password protected: Ensure users and search engines to access the page.
Update security settings: It's possible that you use a third-party app to block specific countries from accessing the website, which also blocks the Google crawling bot.

Crawled - currently not indexed:

Reason: Googlebot has crawled the page but has chosen not to index it. This could be due to various reasons, such as low-quality content or duplicate content issues.

How to resolve:

Improve content: Enhance the quality and relevance of the page's content to make it more index-worthy.
Check for duplicate content: Ensure there are no issues with duplicate content that might lead Google to skip indexing.
Use proper tags: Employ appropriate meta tags, such as meta descriptions, to provide clear information about the page's content.

Discovered - currently not indexed:

Reason: Googlebot has discovered the page but has chosen not to index it. Similar to "Crawled - currently not indexed," this may be due to content issues or other factors.

How to resolve:

Content quality: Improve the overall quality and relevance of the page's content.
Review meta tags: Ensure that meta tags accurately represent the content and encourage indexing.
Check for technical issues: Address any technical issues that might hinder Google from indexing the page.

Alternate page with proper canonical tag:

Reason: The most common example, where you might get this error for a Shopify store, is if you have the same page crawled by Google multiple times. This might happen if Google has crawled the same page multiple times but with parameters at the end of the URL like

https://domain.com/products/t-shirt?variant=123456

And this page is exactly the same as:

https://domain.com/products/t-shirt

But since Google does not like duplicated content and it considers both URLs different pages, you need to make sure that your theme properly displays the canonical tag. In this example, if you inspect the HTML code of:

https://domain.com/products/t-shirt?variant=123456

You should have a canonical URL set to https://domain.com/products/t-shirt, so that Google knows that the original page is this one.

How to resolve:

Implement canonical tags: Ensure that canonical tags are correctly implemented on pages with duplicate content. All Shopify themes come with canonical tags set by default.
Verify tag accuracy: Double-check that the canonical tag points to the preferred version of the content.

Duplicate without user-selected canonical:

Reason: Google has identified duplicate content, but there is no user-selected canonical tag to indicate the preferred version.

How to resolve:

Implement canonical tags: Add rel="canonical" tags to specify the preferred version of duplicate content. All Shopify themes have a canonical link by default, so check with your theme's support team so they can confirm it's displayed properly.
Consolidate content: If applicable, consolidate similar content into a single preferred version (avoid adding exactly the same text on multiple pages)
Ensure unique content: Make sure each page provides unique and valuable content to users.

Duplicate, Google chose different canonical than user:

Reason: Google has identified duplicate content, and despite a user-selected canonical tag, Google has chosen a different canonical version.

How to resolve:

Review canonical tags: Double-check the implemented rel="canonical" tags to ensure they accurately reflect the preferred version. All Shopify themes have a canonical link by default, so check with your theme's support team so they can confirm it's displayed properly.
Content consistency: Ensure that the content on the canonical page aligns with your intended preferences.
Address content discrepancies: Resolve any discrepancies between the user-selected canonical tag and Google's choice.

Blocked by page removal tool:

Reason: If you've used Google's page removal tool to temporarily hide a page from search results, it may be reported as blocked by this tool.

How to resolve:

Review removal requests: Check your Google Search Console to see if you've submitted any removal requests.
Reconsider removal: If the removal was intentional, decide whether the page is ready to be indexed again.
Cancel removal request: If the page should be indexed, cancel the removal request in Google Search Console.

Server error (5xx):

Reason: When your server encounters an internal issue and can't fulfill a request, it triggers a 5xx error. This means Googlebot couldn't access your page because something went wrong on the server side.

This is something that is very rare to occur when your store runs on Shopify.

How to resolve:

Contact Shopify support: If a 5xx error appears for the public part of your store, the Shopify team is probably already aware and is working on resolving this but you may still reach out to them to report this issue, just in case.

Summary

In conclusion, navigating the complexities of page indexing errors is crucial for enhancing your website's visibility and search engine rankings. By leveraging insights from the Google Page Index Report and addressing common errors such as crawl issues, duplicate content, and thin content, you can optimize your site for better indexability and improved search performance.

Remember, Google's indexing decisions are influenced by a lot of factors, including technical aspects and content quality. By implementing actionable solutions and best practices, you can empower your website to claim its rightful place in Google's search results, driving increased organic traffic and achieving greater success in your search engine rankings.

So, whether you're a seasoned webmaster or a novice website owner, mastering the art of troubleshooting page indexing errors is essential for maximizing your site's potential and reaching your target audience effectively in the competitive digital space.

E-commerce Blog

Page indexing report - why Google does not index some of your pages

Not indexed reasons

Redirect error:

"URL blocked by robots.txt" and "Indexed, though blocked by robots.txt":

URL marked ‘noindex’:

Not found (404):

Soft 404:

Blocked due to unauthorized request (401):

Blocked due to access forbidden (403):

Crawled - currently not indexed:

Discovered - currently not indexed:

Alternate page with proper canonical tag:

Duplicate without user-selected canonical:

Duplicate, Google chose different canonical than user:

Blocked by page removal tool:

Server error (5xx):

Summary

Leave a comment

E-commerce Blog

Page indexing report - why Google does not index some of your pages

Not indexed reasons

Redirect error:

"URL blocked by robots.txt" and "Indexed, though blocked by robots.txt":

URL marked ‘noindex’:

Not found (404):

Soft 404:

Blocked due to unauthorized request (401):

Blocked due to access forbidden (403):

Crawled - currently not indexed:

Discovered - currently not indexed:

Alternate page with proper canonical tag:

Duplicate without user-selected canonical:

Duplicate, Google chose different canonical than user:

Blocked by page removal tool:

Server error (5xx):

Summary

Leave a comment

Related Articles

Smart SEO's New SEO Audit Tool: Advanced Meta Tag Optimization, AI Features, and In-Depth SEO Checks

Page indexing report - why Google does not index some of your pages

Shopify SEO Meta Tags Templates Explained