Technical SEO Basics: Crawling, Indexing, and Ranking

Technical SEO Basics: Crawling, Indexing, and Ranking

This post was originally published in May, 2021. Last updated: February 6, 2022.

Today, we’ll talk about the basics of technical SEO. We’ll explain the difference between crawling, indexing, and ranking. Also, we’ll show you how to leverage internal links, your robots.txt file, and your XML sitemap to help Google crawl and index your Shopify store faster and more efficiently.

Overview

How Google works: What is the difference between crawling, indexing & ranking

Google follows three steps to generate the SERPs (Search Engine Results Pages):

  • Crawling
  • Indexing
  • Ranking

Crawling is the automated process during which Googlebot discovers new data on the web, i.e., brand new pages or updated old pages.

How does this happen? Googlebot uses two resources to crawl the web:

  • A list of URLs from past crawls, i.e., pages Googlebot has already crawled
  • Sitemaps

Then, Google crawls all URLs from the list and all URLs included in sitemaps. Note: During the crawling process, Google pays extra attention to new websites, updates to old web pages, and dead links.

Googlebot can discover a new page by:

  • Following a link on a page that has already been crawled. For example, if you create a new product page and add a link to it on your homepage, the next time Googlebot crawls your homepage (a page it already knows about), it will crawl your new product page as well.
  • Reading a sitemap that has been updated and contains a link to the newly created web page.

Important

To crawl your website, Google must be able to access your web pages. This means that your Shopify store must not be password-protected (because Googlebot accesses the web as an anonymous user).

You can do several things to help Google crawl (i.e., discover) your new web pages faster. For example:

  • Create a strong internal linking structure
  • Create a robots.txt file
  • Create a sitemap.xml file and submit it to Google Search Console

We'll discuss each of these steps in more detail below.

Google uses algorithms to determine which websites to crawl, how often to crawl them, and how many pages to crawl from each website. If you've added a new web page or made changes to an existing one, you can request a recrawl - you can either submit individual URLs or an updated version of your sitemap to Google Search Console. Learn more → Ask Google to recrawl your URLs

Keep in mind that the recrawl can take up to several weeks - you can use the Index Coverage Report to monitor the progress. Note that there is no point in requesting a recrawl multiple times - it won't speed up the process.

To sum up, crawling is the process of discovering new data on the web. Indexing is the process of categorizing, organizing, and storing this data in the Google Index.

What does this mean? Once Googlebot discovers a new page, it tries to assess its content and understand what it's about. Then, it organizes and stores this information in a huge database - the Google Index. The Google Index contains hundreds of billions of pages. It is over 100 million gigabytes. Google describes it as the index in the back of a book - "with an entry for every word seen on every webpage we index. When we index a web page, we add it to the entries for all of the words it contains." (Source: Google, How Search algorithms work)

In short, when a page is indexed, it can appear on SERPs. Pro tip: Use the Index Coverage Report to check which pages of your Shopify store have been indexed and detect any indexing issues.

If you think a page on your website that was previously indexed is no longer showing on the SERPs, use the URL Inspection tool to check its status. If it is no longer indexed, check for indexing issues (e.g., 4xx errors or 5xx errors). If there are any indexing issues, fix them and request a recrawl.

“So, aren’t indexing and ranking the same thing?”, you may ask.

The answer is “No.” Here’s why: when it indexes a page, Google just adds it to the SERPs - it can appear on page 1, page 101, page 1001, etc. In terms of ranking, your ultimate goal is to get to the #1 spot on the #1 page of the SERPs.

So, what exactly is ranking?

Google's primary goal is to return the most relevant and high-quality results for each search query. To achieve this, Google has to go through all the information in the Google Index and determine which results would be the best fit for the search query. This happens every time someone uses Google Search - the process is called ranking.

To find the most relevant results, Google's ranking algorithms take into account many factors. Some are related to the user and their query:

  • User location
  • Browser history
  • Browser settings
  • Keywords
  • Search intent

Others are related to your website:

  • Expertise
  • Content relevance
  • Content quality
  • Content freshness
  • Number of backlinks
  • Domain authority (DA)
  • Web page authority (PA)
  • Usability
  • And more

In short, ranking is the process of ordering the results on the SERPs from most relevant (displayed on the #1 spot) to least relevant. To become better at retrieving the best results for each query (i.e., at ranking), Google makes small algorithm adjustments every day. They also have broad core algorithm updates, which greatly impact the SERPs and affect many industries.

Ultimately, the higher a page ranks for a search query:

  • The more relevant the page is to the search query
  • The higher its quality is (especially compared to other results on the SERPs)

  • Crawling is the process of scanning the web for new data (new web pages and updated pages)
  • Indexing is the process of organizing and storing this data in the Google Index
  • Ranking is the process of determining the position of each web page on the SERPs for each search query

So far, we haven’t mentioned technical SEO. So, naturally, you might wonder: “What does technical SEO have to do with crawling, indexing, and ranking?”

The answer is “Everything!” For Google to crawl, index, and rank your Shopify store, it must be technically optimized.

Here’s what you need to know about crawling in terms of technical SEO:

  • Googlebot should be able to reach and crawl your website. Remember that Googlebot accesses the web as an anonymous user. Thus, your Shopify store shouldn’t be password protected. Learn how to remove your Shopify store password
  • You should have an XML sitemap - a file that helps Google index your new web pages faster and more efficiently. It also helps Google assess the importance of your web pages and understand the relation between the different pages and resources.
  • You should have a robots.txt file - a simple text file that tells Google which pages of your Shopify store it can access (i.e., index) and which it cannot.
  • You should have a bulletproof internal linking strategy - internal links help Google navigate your website and discover new pages much faster.
  • Your Shopify store should have a low-depth page hierarchy, i.e., all important pages on your website should be no more than three clicks away from your homepage. This will optimize your Crawl Budget (the number of pages Google crawls on your website in a single crawl), i.e., your Crawl Budget will be allocated to your most important pages.
  • Your website should have a logical URL structure that Google can easily understand and follow.
  • Your Shopify store should have intuitive navigation.

Here’s what you need to know about indexing in terms of technical SEO:

  • You should add structured data markup to your pages. Remember that when indexing a page, Google tries to understand it - structured data markup makes this process easier. In Shopify, you need to add structured data to your homepage, collection pages, product pages, blog pages, and article pages. Learn more about structured data markup → Structured Data for Shopify: The Definitive Guide [2022]

Keep in mind that in terms of indexing, things are a bit more on-page SEO oriented. For example, other things you can do to help Google index your Shopify store include optimizing your page titles and headings, creating descriptive meta tags, optimizing your visual content, using text to convey your message, and more.

And when it comes to ranking, things get even more complex. Google takes into account the technical health of your website, as well as its overall SEO health (e.g., whether it is mobile-friendly, whether it contains fresh and relevant content, whether you follow Google’s Webmaster Guidelines, etc.). The technical SEO factors that matter most for ranking include page speed, duplicate content, and broken links.

In the months to come, we’ll discuss each of these technical SEO topics. Today, we’ll focus solely on the first steps you need to take to help Google crawl your Shopify store:

  • Create a robust internal linking strategy
  • Have an impeccable robots.txt file
  • Have an impeccable XML sitemap

How to create a robust internal linking structure

An internal link is a link on a web page that leads to another page or a resource on the same domain.

Internal links are a key component of your Shopify store’s architecture and help Google understand better the structure of your website. As a result, a strong internal linking structure helps Google crawl and index your web pages faster and more efficiently.

To build a robust internal linking structure, you must first understand the difference between the two types of internal links:

  • Navigational internal links - links that make up your Shopify store’s navigation (e.g., links in your main menu, sidebar menus, header and footer menus, etc.). They establish the hierarchy of your web pages and help both your customers and Google navigate your store. Also, they pass link equity which helps Google understand which are the most important pages in your store. As a result, Google can crawl these pages more often.
  • Contextual internal links - links in the main content of a web page (e.g., links to product pages on a category page, links in articles and product descriptions, links on policy pages, etc.). The purpose of such links is to pass link equity and help Google discover new pages faster.

Creating a bulletproof internal linking structure in 10 steps

Step 1: Ensure your internal links can be crawled.

In other words:

  • The URLs should be properly formatted. Shopify takes care of this by default. Still, you can use the URL Inspection Tool to check for any issues.

As a general rule of thumb, remember that the shorter a URL is, the better - so, avoid any unnecessary characters, symbols, numbers, and filler words (e.g., “and,” “a,” “the,” etc.).

By default, Shopify excludes symbols (e.g., “&,” “?,” “!”) from the URLs. However, it doesn’t exclude filler words (e.g. “and,” “the,” “a,” etc.) So, this is something you should pay attention to.

  • Don’t create internal links to pages blocked by your robots.txt file (unless necessary).
  • Don’t create internal links to pages that have the “no index” meta tag (unless necessary).

Note: We’ll discuss the robots.txt file and the “no index” meta tag in more detail below.

Step 2: Ensure there are no broken internal links on your website.

You can use a site audit tool like SEMRush's Site Audit to view an internal linking report and spot any broken internal links. There are two ways to fix broken internal links - you can either remove them or replace them with another relevant (and working!) internal link.

Pro tip to avoid broken internal links

If you decide to change the URL of a page, make sure that the “Create a URL redirect for old-link → new-link” checkbox is marked. In Shopify, it is marked by default. Still, it is a good practice to double-check.

Step 3: Remove all orphan pages from your Shopify store.

Orphan pages are pages that aren’t linked to from any other page in your Shopify store. Since Googlebot uses links to crawl the web, it is more difficult for it to discover orphan pages (borderline impossible if the orphan pages aren’t included in your sitemap). Also, your customers can’t actually reach orphan pages. In other words, they don’t really have SEO weight and don’t benefit you in any way.

Therefore, it is important to discover whether your website contains such pages. You can use a tool like Ahrefs’ Site Audit to check for orphan pages.

If your website contains orphan pages, you’ll need to assess their importance.

  • If they’re important, add links to them on other pages of your website. Pro tip: If they’re thin content pages, try to find common topics and merge the similar pages together - it is better to have one high-quality page than several poor-quality ones.
  • If they’re not important, just remove them.

Step 4: Ensure your website has a low-depth page hierarchy.

First, identify the most important pages on your website.

In general, the most important page on a website is its homepage - this is the page with the highest page authority (PA).

In e-commerce, the pages that directly impact the bottom line are also of high importance. These are your category pages and your product pages.

All these pages should be properly interlinked. This happens if your website has a technically optimized website architecture.

The most important thing to keep in mind here is that important pages on your website should be no more than three clicks away from your homepage (e.g., homepage > category pages > product pages). In this way, your homepage will pass more link equity to your category pages, and your category pages will pass link equity to your product pages, etc. As a result, your category and product pages will rank higher.

Also, there must be a logical correlation between the interlinked pages. For example, the product pages in one category should share similar traits, i.e., there shouldn’t be pants in your “Shirts” category.

Note: We’ll discuss this in more detail in our next article, so stay tuned!

Step 5: Use internal links to help Googlebot discover new pages faster and improve their rankings.

Say you’ve just added a new product page to your Shopify store. To help Google crawl and index it faster, you can add a link to it on your homepage or in a blog post that performs exceptionally well. As an added benefit, this will pass link equity to your new product page, which means it will have better chances of ranking higher on the SERPs - of course, we all know what this means: more visibility and more sales opportunities.

Step 6: Leverage recommended products.

A “Recommended products” section will increase your average order value and help you deliver a more engaging shopping experience. Also, it is a great interlinking opportunity - the links in your “Recommended products” section are links on your product pages that lead customers to other product pages in your Shopify store.

In Shopify, the “Recommended products” section displays an automatically generated list of product recommendations.

Products are recommended based on an algorithm that predicts the most relevant products based on the product a customer is interacting with. (Source: Shopify, Showing product recommendations on product pages)

The algorithm uses sales data (to determine which products are frequently bought together) and product descriptions (to determine which products are similar or complement each other). The algorithm associates up to ten similar products per product and displays them in order of relevance.

There are certain limitations you need to be aware of. For example, depending on your Shopify plan, you can display different types of product recommendations on your product pages. Also, you can’t customize the algorithm to exclude specific products (unless you write custom code). Learn more → Shopify, Product recommendations

Some Shopify themes support product recommendations by default:

Source: Shopify, Show product recommendations on the product page

Also, you can build a customizable “Related products” section or use a Shopify app that can help you display related products on your product pages. One such app is Related Products - Also Bought by Globo:

The app sports a ⭐⭐⭐⭐⭐- rating. It has a free plan and a paid version that costs $9.90/month (a 7-day free trial is available).

Step 7: Be mindful of anchor text.

Anchor text is important because it helps Google understand what the interlinked page is about and whether it is relevant to the page that contains the link.

The anchor text of your internal links should be relevant, descriptive, and specific. Best case scenario, it should contain keywords.

As a general rule of thumb, avoid vague anchor text like “Read more” or “Click here.” Instead, do something like this: “Read more → Internal-link-with-a-relevant-and-descriptive-anchor-text

Step 8: Avoid redirect chains.

A redirect chain occurs when there are several redirects between the initial URL (i.e., the requested URL) and the final destination URL. For example, imagine X is the initial URL and Z is the final URL. A redirect chain would be URL X > redirects to URL Y > redirects to URL Z. As a result, URL Z takes more time to load.

In general, you should avoid redirect chains because they lead to a poor user experience. Also, they make it more difficult for Google to crawl your website. Therefore, Google recommends limiting them as much as possible.

How do redirect chains occur?

Say you change the URL of a page and create a redirect to the new URL. Now imagine the page has already been interlinked, and the link leads to the old URL. When someone clicks on the link, they’ll be redirected first to the old URL and then to the new updated one.

A redirect chain can also occur when you install an SSL certificate on your website - in this case, all the old HTTP links will be automatically redirected to the new and secure HTTPS links. So, when a user clicks on an interlink you’ve already created, they’ll be first redirected to the HTTP version of the page and then to the HTTPS version.

To minimize redirect chains:

  • Ensure all internal links lead straight to a live page.
  • Update the redirects you implement during the switch from HTTP to HTTPS.
  • Avoid linking to a URL that is redirecting to another URL.
  • Regularly audit your existing redirects and remove all unnecessary redirects. You can use SEMRush’s Site Audit Tool to detect redirect chains and get advice on how to fix them.

Step 9: Don’t overdo it.

Sure, interlinks are important. But there is such a thing as “too many interlinks”, especially if they don’t have a purpose, i.e., if they don’t improve the quality of the page in any way.

A page should contain a reasonable number of internal links, and they should all make sense, i.e., there should be a logical reason for them to be on the page.

In layman’s terms, don’t create internal links just for the sake of it.

Bonus step: Build a content strategy.

Having a content strategy in place is important for several reasons:

  • It is good for SEO and helps you rank for a ton of relevant keywords. This increases visibility and can help you raise brand awareness.
  • It helps you establish yourself as an authority in your niche.
  • It builds customer trust.
  • It helps you deliver a more informed shopping experience.
  • It helps you present your products in a more engaging and thorough way.
  • It offers a ton of interlinking opportunities.

So, how do you build a strong content strategy?

First and foremost, create relevant and high-quality content. Write about topics that your target audience cares about - address their problems, answer their questions, etc.

Second, create topic clusters, i.e., come up with relevant topics and create five or ten separate blog posts for each topic. Since these posts will cover different angles of the same topic, the possibilities for interlinking will be numerous. Also, when you’ve published all blog posts from one series, you can create a pillar page and interlink all of them - this is a great way to elevate your internal linking strategy.

Third, use your blog posts to interlink your category and product pages (for example, in gift guides, posts about product collections or product launches, and more). This will help Google find them faster and could also improve their rankings.

Shopify & robots.txt: Everything you need to know

Simply put, robots.txt is a simple text file that tells Google which pages of your website to crawl and which not to crawl.

In general, having a robots.txt file is not necessary for SEO. However, having a robots.txt offers several benefits you shouldn’t overlook:

  • It keeps Googlebot (and other search engine crawlers) from crawling and indexing pages that contain sensitive information (e.g., Log in / Sign up pages, account pages, etc.).
  • It keeps Googlebot from crawling and indexing pages and resources that have no SEO weight (e.g., “Thank you” pages, preview pages, pdf files (e.g., product manuals), etc.).
  • It keeps Googlebot from crawling thin content pages or pages that contain duplicate content.
  • It helps search engines find your sitemap more easily (your robots.txt file contains a link to your sitemap).
  • It helps Google crawl and index your website faster and more efficiently.
  • It optimizes your Crawl Budget (by ensuring Google doesn’t crawl pages that shouldn’t be crawled and indexed).

What you need to know about Shopify & robots.txt:

This is the robots.txt file of Final Straw - one of our favorite Shopify stores.

Before we go further, let’s explain what each directive (i.e., each line in the robots.txt file) means:

  • The “User-agent” directive specifies which crawler the instructions are meant for. In other words, if a user agent is specified in the “User-agent” directive (e.g., “User-agent: Googlebot”), this user agent (e.g., “Googlebot”) should follow the instructions, but a different agent (e.g., “Bingbot” - Bing’s crawler) should move on and look for a more specific directive. If a user agent isn’t specified (as is the case above), the instructions should be followed by all search engine bots (or crawlers).
  • The “Allow” directive is only applicable for Googlebot. It tells Googlebot it can access a specific web page or subfolder even though its parent page or subfolder may be disallowed.
  • The “Disallow” directive tells search engine bots which pages not to crawl and index.
  • The “Sitemap” directive points search engine bots to the location of your XML sitemap.
  • The “Host” directive contains the URL of your homepage (i.e., your primary domain).

Now that you have a better understanding of how robots.txt works, here’s what you need to know about robots.txt files and Shopify:

  • Shopify automatically generates your robots.txt file.
  • Your robots.txt file is located at the root directory of your website’s primary domain name. To access it, just add “/robots.txt” in the URL of your homepage, e.g., “https://www.yourshopifystore.com/robots.txt”
  • Your robots.txt file is maintained by Shopify. This means that you can’t edit its content. However, if you don’t want Google to access specific pages that aren’t disallowed in your robots.txt file, you can hide them using the “noindex” meta tag - a line of code that tells search engines not to index a specific page. Implementing the “noindex” meta tag requires technical knowledge as you’ll need to customize your theme.liquid layout file. Learn how to noindex particular pages → Shopify, Hiding a page from search engines

If you’re not tech-savvy and aren’t familiar with Shopify Liquid, it’s best to contact a Shopify Expert and request help.

Also, you can use a Shopify SEO app like Smart SEO - Smart SEO can add a “noindex” tag to a page with just one click of a button (but more on this later).

  • You can monitor which pages have been blocked by your robots.txt file by signing up for a free Google Search Console account.

You shouldn’t worry if you notice that your checkout page has been blocked by your robots.txt file - it doesn’t need to rank for SEO. Also, by not crawling your checkout page, search engine bots have more time to crawl more important pages on your website (e.g., your homepage, category pages, product pages, blog and article pages, etc.).

However, if a page that directly impacts your bottom line (i.e., a category page, a product page, etc.) is blocked by your robots.txt file, you should be concerned. In general, there is almost a 0% chance that this happens (as Shopify automatically creates and manages your robots.txt file). Still, if it does, make sure to contact the Shopify support team immediately.

Shopify & sitemap.xml: Everything you need to know

Your XML sitemap gives Googlebot (and other search engine bots) information about the web pages and resources (e.g., media files, pdfs, etc.) on your Shopify store. Basically, it is a comprehensive list of the most important pages and resources on your website. Your sitemap also contains important information about your web pages (for example, when were they last modified, how many images they contain, what is their relation to other pages or resources, etc.).

The purpose of the sitemap is to help Google crawl your website faster and more efficiently.

Similar to robots.txt files, having a sitemap is not absolutely necessary - Google will be able to crawl your website without its help (especially if you have a strong internal linking strategy in place). However, having a sitemap can certainly benefit you. Especially in the following cases:

  • If you have a large catalog store. Imagine having to interlink 1000+ (or even just 100) product pages… impossible, right?
  • If your Shopify store is new and it still has very few backlinks and interlinks.
  • If your Shopify store contains a lot of media files such as videos and images.
  • If you upload a lot of pdf files (e.g., product manuals or instructions).
  • If you post a lot of articles.
  • In addition, having a sitemap means that Googlebot (and other search engine crawlers) will crawl your website more frequently.

What you need to know about Shopify & XML sitemaps:

This is Rebel Nell’s sitemap - another one of our favorite Shopify stores. This is how a typical sitemap of a Shopify store looks - there is one parent sitemap that links to additional sitemaps (or child sitemaps) for products, collections, blogs, and pages. This categorization helps Google navigate and crawl your Shopify store more easily.

Each of the additional sitemaps contains an extensive list of pages. For example, if we have a closer look at Rebel Nell’s product sitemap, we'll notice that it contains links to all of Rebel Nell’s product pages, as well as information on images, when the page was last modified, how frequently the page is modified, and more.

Here’s what you need to know about sitemap.xml files and Shopify:

  • Shopify automatically generates a sitemap.xml file for your store. It contains links to all your products, product images, pages, collections, and blog posts.

If you're on the Basic Shopify plan, then only your store's primary domain has a generated sitemap file and is discoverable by search engines. If you're on the Shopify, Advanced Shopify, or Shopify Plus plan, then you can use the international domains feature to create region-specific or country-specific domains. When you use international domains, sitemap files are generated for all of your domains. All of your domains are discoverable by search engines, unless they redirect to your primary domain. (Source: Shopify, Finding and submitting your sitemap)

  • Your sitemap is located at the root directory of your Shopify store’s domain, i.e., you can find it by adding “/sitemap.xml” to the URL of your homepage (e.g., “https://www.yourshopifystore.com/sitemap.xml”). Note: You can also find the location of your sitemap in your robots.txt file - remember that it is specified in the “Sitemap” directive.
  • Shopify automatically updates your sitemap every time you update your store (for example, every time you add a new product or publish a new blog post).
  • You cannot edit your sitemap manually. If you wish to exclude a certain page from your store’s sitemap, you can only do it through the Shopify API (through code). Luckily, there are some apps that can help you if you’re not tech-savvy. One such app is Smart SEO. With just one click of the button, you can exclude the pages you don’t want to appear in your sitemap. This will also add a noindex tag to them and exclude them from your site search page.

Notice the checkboxes next to each product - if a checkbox is marked, the status of the product is set to “Active.” This means that the product is included in the sitemap. If you want to exclude a product from your sitemap, all you need to do is remove the check mark from the checkbox.

Smart SEO sports a stellar ⭐⭐⭐⭐⭐ rating. It has a free plan and three paid plans. Pricing starts from $9.99/month (a 7-day free trial is available).

Now that you know what a sitemap.xml file is and how it can benefit you, let’s move on to the last section of this guide - submitting your sitemap to Google Search Console.

Submitting your sitemap to Google Search Console: Why is it important & How to do it?

You can submit your sitemap.xml file to Google Search Console any time. This is not absolutely necessary - Google will be able to find your sitemap even if you don’t submit it. But it is important that you do it!

Why? Because submitting your sitemap to Google Search Console will help Googlebot crawl and index your web pages and resources much faster. Also, submitting your sitemap to Google Search Console will improve your rankings, elevate your internal linking efforts and, ultimately, expand your reach. Simply put, submitting your sitemap will give you more exposure and lead to more sales opportunities.

How to submit your sitemap to Google Search Console:

With Smart SEO, you can easily submit your sitemap to Google without going through the manual process of using Search Console. Note that you’ll still need a Google Search Console account with a verified site property.

Conclusion

Today, we talked about how search works. We explained the difference between crawling, indexing, and ranking, as well as what technical SEO has to do with each of these processes.

Also, we showed you how to help Google crawl and index your Shopify store faster and more efficiently by:

  • Creating a robust internal linking strategy
  • Having an impeccable robots.txt file
  • Having an impeccable sitemap.xml file and submitting it to Google Search Console

If you have further questions, just drop us a line below!

In our next article, we’ll focus on website architecture. More specifically, we’ll show you how to create a low-depth page hierarchy, a logical URL structure, and intuitive website navigation - these are key steps to building a technically optimized website (and, as you already know, it helps with crawling and indexing as well). So, stay tuned!

Share

Leave a comment