Sharing is Caring

The sitemap file has become, over time, an essential part of SEO. Often cited in discussions related to SEO, it is closely linked to the exploration and indexing of the different URLs present on your site.

Particularly interesting when you design an e-commerce site with many pages, it facilitates the discovery of the different URLs of your online store. I, therefore, advise you not to neglect this file!

What is the Sitemap File?

To better understand where this highly standardized protocol comes from, let’s start with a bit of history and some key dates:

  • 2005: Google created the first version called Sitemaps 0.84.
  • December 2006: Microsoft and Yahoo announce support for Sitemaps 0.90. It was at this moment that a protocol was established, elaborating a writing standard for all future sitemaps
Official website: https://www.sitemaps.org/

To return to its primary function, the sitemap of a website is a file listing all the URLs you want to index. It must include only relevant URLs to visit for search engines.

The sitemap can be composed of different information:

  • loc: URL of the page. This is the only required field.
  • lastmod: date of last modification
  • changefreq: update frequency
  • priority: URL priority

Here is an example of an XML Sitemap file:

example of xml sitemap

It is also interesting to specify that the XML format is not obligatory. You can, for example, create a sitemap in text format (.txt). On the other hand, it will be impossible for you to indicate the optional elements mentioned above (lasted, changefreq, and priority). This will only contain the list of URLs.

The purpose of this file is to make it easier to crawl your site. The crawl represents the exploration of a website by a search engine robot. During its passage, the robot will discover and analyze the page’s source code to bring out the different elements (HTML, images, internal and external links, etc.).

When this file is declared to search engines, they will analyze it regularly to explore all the URLs listed there. Then, if everything goes as planned, they will index them so that they can appear in the results when Internet users formulate requests.

What is a Sitemap Index?

A site can have several sitemaps. In this case, a sitemap index file can be created. It will allow you to list the different sitemaps within a sitemap page. The advantage is that all sitemaps are sent at once.

As for the classic sitemap, the sitemap index must be declared in your webmaster tools.

Information to Know About the Sitemap

Now you know about a sitemap file’s main characteristics and operation, here is some other information to know:

Ability to Add it to the Location of your Choice

A sitemap file can have any name and its location. Unlike the robots.txt file, which must be located at the site’s root and have a specific name, the sitemap file can be placed anywhere and have a free URL.

The only requirement is that it must be on the domain name in question. If you want, you can choose a different file name. This change allows you to hide the different URLs you want to be crawled and indexed.

On the other hand, to find the XML sitemap file of your competitors, several solutions are available to you:

  • Add after the root of the site “/sitemap.xml” or sometimes “/sitemap_index.xml”
  • Consult the site’s/robots.txt file. Maybe the sitemap file is declared there…

As the name of the sitemap is completely free, you may come across a 404 error, but nothing prevents you from trying because most sites use this location.

Possibility of having Several Sitemaps on the Same Site

As I indicated above, a website can have several sitemap files but not necessarily listed in an index. You are completely free to create several in different folders of the site and declare them, one by one, in the webmaster tools.

Some sites have, for example, a sitemap for all pages, and categories and a sitemap for the blog, etc. To illustrate, you can very well have a sitemap at the root level of the site:

  • https://mnseoultrapro.com/page-sitemap.xml
  • https://mnseoultrapro.com/post-sitemap.xml
  • https://mnseoultrapro.com/category-sitemap.xml

URL Limits in the Sitemap

As I just mentioned, everyone is free to use an almost unlimited number of sitemap files. Nevertheless, there are some limitations. But don’t worry, you’ll probably never reach them.

The limitations are as follows:

  • A sitemap file can contain up to 50,000 URLs.
  • A sitemap index can contain 50,000 different sitemaps.
  • Each site can have up to 500 sitemap index files.
  • Once unpacked, the size of the sitemap file should not exceed 50MB.

I’ll spare you the math, but taking advantage of sitemap indexes, your total number of URLs to send is limited to 2.5 billion. Are you less worried?

Indicate Full URLs

The protocol states that URLs must be complete and not relative in your sitemap file.
In other words, these must always start with “HTTP” or “HTTPS.

Indicate only URLs with Interest for SEO

This is one of the most important points of this article, so I particularly insist on it. Since the objective of the sitemap is to facilitate the exploration of the different URLs of your site that you want to have crawled and indexed, it is in your best interest to fill in only relevant URLs.

Therefore, listing URLs of no interest for your SEO in your sitemap is useless. Only promote useful and indexable pages. It helps to save the crawl budget.

Files Supported in the Sitemap

Some sitemap files may contain specific content such as images, videos, or news. In this case, the XML format is mandatory. The XML sitemap file listing the media or images is rarely used on websites. Most of the time, the images or videos are inside the site pages already listed in the URLs of the sitemap file.

The Obligation to Create a Sitemap

The sitemap is not mandatory when creating a website. However, it is highly recommended! The sitemap is not a priority if you have a showcase site with 5-10 pages. But it quickly becomes essential if your site is large.

For e-commerce, for example, it is an essential file, given that the site is very dynamic. Many categories are added frequently, as are new articles. The sitemap file is, therefore, essential to facilitate the discovery of new URLs.

Consideration by Search Engines

Today, the main search engines Google and Bing take into account the sitemap file. You can very easily send it to the dedicated webmaster tools. Yandex and Baidu also support this protocol.

Difference between Sitemap.xml and Sitemap.html

Finally, to finish on the interesting information related to the sitemap, the name of this file can be confusing with the sitemap. It is, therefore, necessary to distinguish between these two elements.

The sitemap.xml is intended exclusively for search engine crawlers. It is not present in the standard tree structure.

On the other hand, the sitemap.html is designed for users. It is a page, often present in the footer of the sites, which lists many links to the different pages.

Sitemap and SEO

The sitemap file does not directly impact SEO. It does not work like optimizing your Title tag or editorial content.

On the other hand, it indirectly contributes to SEO :

  • It facilitates the exploration of the URLs of the site.
  • Improves the indexing of new URLs.
  • Helps in advanced SEO analysis.
  • It helps to detect orphan pages.
  • From it, Googlebot enables to understand which pages to index.
  • It helps to redesign a site when it includes URL changes.
  • It speeds up the deindexing of certain URLs.

Facilitate the Exploration of the Different URLs of the Site

The primary purpose of the sitemap file is to list all the site’s important pages. It is, therefore, a time saver for the robot crawlers to make them quickly discover your relevant URLs. The sitemap allows you to check your indexing coverage in Google’s Search Console.

search console report

This report lets you easily see if the search engine encountered any problems while crawling certain pages. The most frequently encountered errors are:

  • Some URLs in the sitemap file cannot be crawled, especially if they are previously blocked for crawling by the robots.txt file.
  • Some URLs may be redirected or in errors (301 or 404).
  • Specific URLs that Google refuses to index (very low content, duplicate content, etc.)

Accelerate the Discovery of New URLs

When new pages are added to a website, it may take some time for them to be crawled and indexed by search engines. You speed up this process by adding your latest URLs to your sitemap.

Unindex many URLs Faster

Even if the primary purpose of the sitemap is to indicate the URLs to be crawled and indexed, it also allows many URLs to be deindexed. Creating a specific sitemap, listing all the pages of your site with a no-index tag becomes a mass deindexing tool and saves time in the event of the deletion of certain pages.

It is then enough to submit this listing to in the Search Console tool, which will encourage Google to crawl the pages concerned so that the search engine realizes for itself that these no longer have to be indexed.

The specific sitemap can then be deleted, once the de-indexation has been successfully completed.

Discover Orphan Pages

When a website has a complex structure and is very large, it is not uncommon for some of its pages to be orphaned. In other words, these can be present in the sitemap but completely absent from the site structure. Without an internal link, the page fetches very little, if any, PageRank.

In this case, by crossing the URLs of your sitemap file and a crawl of your site, you can identify if pages contain no internal links. If you find that some pages in your sitemap are completely missing from the crawl, it means that they are orphan pages. If you detect any, this anomaly should be corrected.

How to Create a Sitemap?

When a sitemap or sitemap index file is created, it must be declared to webmaster tools such as Google Search Console and Bing webmaster tools.

How to Add Sitemap for SEO in Google Search Console?

Simply go to the Google Search Console interface to declare these different elements. This is specially designed to allow you to make this statement and view all the associated statistics.

To submit a sitemap, the steps are as follows:

  • Select the right priority for your site.
  • Click on the Sitemaps tab.
  • Indicate the URL of your sitemap in the appropriate field.
  • Click on the “Submit” button.
add-sitemap-google-search-console

As soon as your sitemap file is crawled, it will be analyzed by Google.

How to Add Sitemap in Bing Webmaster Tools?

Here, the process is very similar to that of Google Search Console. Once your property is validated, you just need to apply the following steps:

  • Select the right priority for your site.
  • Click on the Sitemaps tab.
  • Click on “Submit Sitemap”.
  • Indicate the full URL of your sitemap in the appropriate field.
  • Click on “Submit Button”.
add Sitemaps-in-Bing-Webmaster-Tools

Conclusion

As you understood, a sitemap’s presence does not condition a website’s success or its optimization. However, it is a valuable tool for improving the crawling and indexing of your pages by the various search engines.

Thanks to the sitemap, crawlers have easier access to certain pages, which is very valuable in the context of any site.

Given the growing number of sites on the Internet and the ease of generating sitemaps today, you might as well take advantage of it!

Related Posts

Sharing is Caring

The sitemap file has become, over time, an essential part of SEO. Often cited in discussions related to SEO, it is closely linked to the exploration and indexing of the different URLs present on your site.

Particularly interesting when you design an e-commerce site with many pages, it facilitates the discovery of the different URLs of your online store. I, therefore, advise you not to neglect this file!

What is the Sitemap File?

To better understand where this highly standardized protocol comes from, let’s start with a bit of history and some key dates:

  • 2005: Google created the first version called Sitemaps 0.84.
  • December 2006: Microsoft and Yahoo announce support for Sitemaps 0.90. It was at this moment that a protocol was established, elaborating a writing standard for all future sitemaps
Official website: https://www.sitemaps.org/

To return to its primary function, the sitemap of a website is a file listing all the URLs you want to index. It must include only relevant URLs to visit for search engines.

The sitemap can be composed of different information:

  • loc: URL of the page. This is the only required field.
  • lastmod: date of last modification
  • changefreq: update frequency
  • priority: URL priority

Here is an example of an XML Sitemap file:

example of xml sitemap

It is also interesting to specify that the XML format is not obligatory. You can, for example, create a sitemap in text format (.txt). On the other hand, it will be impossible for you to indicate the optional elements mentioned above (lasted, changefreq, and priority). This will only contain the list of URLs.

The purpose of this file is to make it easier to crawl your site. The crawl represents the exploration of a website by a search engine robot. During its passage, the robot will discover and analyze the page’s source code to bring out the different elements (HTML, images, internal and external links, etc.).

When this file is declared to search engines, they will analyze it regularly to explore all the URLs listed there. Then, if everything goes as planned, they will index them so that they can appear in the results when Internet users formulate requests.

What is a Sitemap Index?

A site can have several sitemaps. In this case, a sitemap index file can be created. It will allow you to list the different sitemaps within a sitemap page. The advantage is that all sitemaps are sent at once.

As for the classic sitemap, the sitemap index must be declared in your webmaster tools.

Information to Know About the Sitemap

Now you know about a sitemap file’s main characteristics and operation, here is some other information to know:

Ability to Add it to the Location of your Choice

A sitemap file can have any name and its location. Unlike the robots.txt file, which must be located at the site’s root and have a specific name, the sitemap file can be placed anywhere and have a free URL.

The only requirement is that it must be on the domain name in question. If you want, you can choose a different file name. This change allows you to hide the different URLs you want to be crawled and indexed.

On the other hand, to find the XML sitemap file of your competitors, several solutions are available to you:

  • Add after the root of the site “/sitemap.xml” or sometimes “/sitemap_index.xml”
  • Consult the site’s/robots.txt file. Maybe the sitemap file is declared there…

As the name of the sitemap is completely free, you may come across a 404 error, but nothing prevents you from trying because most sites use this location.

Possibility of having Several Sitemaps on the Same Site

As I indicated above, a website can have several sitemap files but not necessarily listed in an index. You are completely free to create several in different folders of the site and declare them, one by one, in the webmaster tools.

Some sites have, for example, a sitemap for all pages, and categories and a sitemap for the blog, etc. To illustrate, you can very well have a sitemap at the root level of the site:

  • https://mnseoultrapro.com/page-sitemap.xml
  • https://mnseoultrapro.com/post-sitemap.xml
  • https://mnseoultrapro.com/category-sitemap.xml

URL Limits in the Sitemap

As I just mentioned, everyone is free to use an almost unlimited number of sitemap files. Nevertheless, there are some limitations. But don’t worry, you’ll probably never reach them.

The limitations are as follows:

  • A sitemap file can contain up to 50,000 URLs.
  • A sitemap index can contain 50,000 different sitemaps.
  • Each site can have up to 500 sitemap index files.
  • Once unpacked, the size of the sitemap file should not exceed 50MB.

I’ll spare you the math, but taking advantage of sitemap indexes, your total number of URLs to send is limited to 2.5 billion. Are you less worried?

Indicate Full URLs

The protocol states that URLs must be complete and not relative in your sitemap file.
In other words, these must always start with “HTTP” or “HTTPS.

Indicate only URLs with Interest for SEO

This is one of the most important points of this article, so I particularly insist on it. Since the objective of the sitemap is to facilitate the exploration of the different URLs of your site that you want to have crawled and indexed, it is in your best interest to fill in only relevant URLs.

Therefore, listing URLs of no interest for your SEO in your sitemap is useless. Only promote useful and indexable pages. It helps to save the crawl budget.

Files Supported in the Sitemap

Some sitemap files may contain specific content such as images, videos, or news. In this case, the XML format is mandatory. The XML sitemap file listing the media or images is rarely used on websites. Most of the time, the images or videos are inside the site pages already listed in the URLs of the sitemap file.

The Obligation to Create a Sitemap

The sitemap is not mandatory when creating a website. However, it is highly recommended! The sitemap is not a priority if you have a showcase site with 5-10 pages. But it quickly becomes essential if your site is large.

For e-commerce, for example, it is an essential file, given that the site is very dynamic. Many categories are added frequently, as are new articles. The sitemap file is, therefore, essential to facilitate the discovery of new URLs.

Consideration by Search Engines

Today, the main search engines Google and Bing take into account the sitemap file. You can very easily send it to the dedicated webmaster tools. Yandex and Baidu also support this protocol.

Difference between Sitemap.xml and Sitemap.html

Finally, to finish on the interesting information related to the sitemap, the name of this file can be confusing with the sitemap. It is, therefore, necessary to distinguish between these two elements.

The sitemap.xml is intended exclusively for search engine crawlers. It is not present in the standard tree structure.

On the other hand, the sitemap.html is designed for users. It is a page, often present in the footer of the sites, which lists many links to the different pages.

Sitemap and SEO

The sitemap file does not directly impact SEO. It does not work like optimizing your Title tag or editorial content.

On the other hand, it indirectly contributes to SEO :

  • It facilitates the exploration of the URLs of the site.
  • Improves the indexing of new URLs.
  • Helps in advanced SEO analysis.
  • It helps to detect orphan pages.
  • From it, Googlebot enables to understand which pages to index.
  • It helps to redesign a site when it includes URL changes.
  • It speeds up the deindexing of certain URLs.

Facilitate the Exploration of the Different URLs of the Site

The primary purpose of the sitemap file is to list all the site’s important pages. It is, therefore, a time saver for the robot crawlers to make them quickly discover your relevant URLs. The sitemap allows you to check your indexing coverage in Google’s Search Console.

search console report

This report lets you easily see if the search engine encountered any problems while crawling certain pages. The most frequently encountered errors are:

  • Some URLs in the sitemap file cannot be crawled, especially if they are previously blocked for crawling by the robots.txt file.
  • Some URLs may be redirected or in errors (301 or 404).
  • Specific URLs that Google refuses to index (very low content, duplicate content, etc.)

Accelerate the Discovery of New URLs

When new pages are added to a website, it may take some time for them to be crawled and indexed by search engines. You speed up this process by adding your latest URLs to your sitemap.

Unindex many URLs Faster

Even if the primary purpose of the sitemap is to indicate the URLs to be crawled and indexed, it also allows many URLs to be deindexed. Creating a specific sitemap, listing all the pages of your site with a no-index tag becomes a mass deindexing tool and saves time in the event of the deletion of certain pages.

It is then enough to submit this listing to in the Search Console tool, which will encourage Google to crawl the pages concerned so that the search engine realizes for itself that these no longer have to be indexed.

The specific sitemap can then be deleted, once the de-indexation has been successfully completed.

Discover Orphan Pages

When a website has a complex structure and is very large, it is not uncommon for some of its pages to be orphaned. In other words, these can be present in the sitemap but completely absent from the site structure. Without an internal link, the page fetches very little, if any, PageRank.

In this case, by crossing the URLs of your sitemap file and a crawl of your site, you can identify if pages contain no internal links. If you find that some pages in your sitemap are completely missing from the crawl, it means that they are orphan pages. If you detect any, this anomaly should be corrected.

How to Create a Sitemap?

When a sitemap or sitemap index file is created, it must be declared to webmaster tools such as Google Search Console and Bing webmaster tools.

How to Add Sitemap for SEO in Google Search Console?

Simply go to the Google Search Console interface to declare these different elements. This is specially designed to allow you to make this statement and view all the associated statistics.

To submit a sitemap, the steps are as follows:

  • Select the right priority for your site.
  • Click on the Sitemaps tab.
  • Indicate the URL of your sitemap in the appropriate field.
  • Click on the “Submit” button.
add-sitemap-google-search-console

As soon as your sitemap file is crawled, it will be analyzed by Google.

How to Add Sitemap in Bing Webmaster Tools?

Here, the process is very similar to that of Google Search Console. Once your property is validated, you just need to apply the following steps:

  • Select the right priority for your site.
  • Click on the Sitemaps tab.
  • Click on “Submit Sitemap”.
  • Indicate the full URL of your sitemap in the appropriate field.
  • Click on “Submit Button”.
add Sitemaps-in-Bing-Webmaster-Tools

Conclusion

As you understood, a sitemap’s presence does not condition a website’s success or its optimization. However, it is a valuable tool for improving the crawling and indexing of your pages by the various search engines.

Thanks to the sitemap, crawlers have easier access to certain pages, which is very valuable in the context of any site.

Given the growing number of sites on the Internet and the ease of generating sitemaps today, you might as well take advantage of it!

Related Posts

Sharing is Caring

The sitemap file has become, over time, an essential part of SEO. Often cited in discussions related to SEO, it is closely linked to the exploration and indexing of the different URLs present on your site.

Particularly interesting when you design an e-commerce site with many pages, it facilitates the discovery of the different URLs of your online store. I, therefore, advise you not to neglect this file!

What is the Sitemap File?

To better understand where this highly standardized protocol comes from, let’s start with a bit of history and some key dates:

  • 2005: Google created the first version called Sitemaps 0.84.
  • December 2006: Microsoft and Yahoo announce support for Sitemaps 0.90. It was at this moment that a protocol was established, elaborating a writing standard for all future sitemaps
Official website: https://www.sitemaps.org/

To return to its primary function, the sitemap of a website is a file listing all the URLs you want to index. It must include only relevant URLs to visit for search engines.

The sitemap can be composed of different information:

  • loc: URL of the page. This is the only required field.
  • lastmod: date of last modification
  • changefreq: update frequency
  • priority: URL priority

Here is an example of an XML Sitemap file:

example of xml sitemap

It is also interesting to specify that the XML format is not obligatory. You can, for example, create a sitemap in text format (.txt). On the other hand, it will be impossible for you to indicate the optional elements mentioned above (lasted, changefreq, and priority). This will only contain the list of URLs.

The purpose of this file is to make it easier to crawl your site. The crawl represents the exploration of a website by a search engine robot. During its passage, the robot will discover and analyze the page’s source code to bring out the different elements (HTML, images, internal and external links, etc.).

When this file is declared to search engines, they will analyze it regularly to explore all the URLs listed there. Then, if everything goes as planned, they will index them so that they can appear in the results when Internet users formulate requests.

What is a Sitemap Index?

A site can have several sitemaps. In this case, a sitemap index file can be created. It will allow you to list the different sitemaps within a sitemap page. The advantage is that all sitemaps are sent at once.

As for the classic sitemap, the sitemap index must be declared in your webmaster tools.

Information to Know About the Sitemap

Now you know about a sitemap file’s main characteristics and operation, here is some other information to know:

Ability to Add it to the Location of your Choice

A sitemap file can have any name and its location. Unlike the robots.txt file, which must be located at the site’s root and have a specific name, the sitemap file can be placed anywhere and have a free URL.

The only requirement is that it must be on the domain name in question. If you want, you can choose a different file name. This change allows you to hide the different URLs you want to be crawled and indexed.

On the other hand, to find the XML sitemap file of your competitors, several solutions are available to you:

  • Add after the root of the site “/sitemap.xml” or sometimes “/sitemap_index.xml”
  • Consult the site’s/robots.txt file. Maybe the sitemap file is declared there…

As the name of the sitemap is completely free, you may come across a 404 error, but nothing prevents you from trying because most sites use this location.

Possibility of having Several Sitemaps on the Same Site

As I indicated above, a website can have several sitemap files but not necessarily listed in an index. You are completely free to create several in different folders of the site and declare them, one by one, in the webmaster tools.

Some sites have, for example, a sitemap for all pages, and categories and a sitemap for the blog, etc. To illustrate, you can very well have a sitemap at the root level of the site:

  • https://mnseoultrapro.com/page-sitemap.xml
  • https://mnseoultrapro.com/post-sitemap.xml
  • https://mnseoultrapro.com/category-sitemap.xml

URL Limits in the Sitemap

As I just mentioned, everyone is free to use an almost unlimited number of sitemap files. Nevertheless, there are some limitations. But don’t worry, you’ll probably never reach them.

The limitations are as follows:

  • A sitemap file can contain up to 50,000 URLs.
  • A sitemap index can contain 50,000 different sitemaps.
  • Each site can have up to 500 sitemap index files.
  • Once unpacked, the size of the sitemap file should not exceed 50MB.

I’ll spare you the math, but taking advantage of sitemap indexes, your total number of URLs to send is limited to 2.5 billion. Are you less worried?

Indicate Full URLs

The protocol states that URLs must be complete and not relative in your sitemap file.
In other words, these must always start with “HTTP” or “HTTPS.

Indicate only URLs with Interest for SEO

This is one of the most important points of this article, so I particularly insist on it. Since the objective of the sitemap is to facilitate the exploration of the different URLs of your site that you want to have crawled and indexed, it is in your best interest to fill in only relevant URLs.

Therefore, listing URLs of no interest for your SEO in your sitemap is useless. Only promote useful and indexable pages. It helps to save the crawl budget.

Files Supported in the Sitemap

Some sitemap files may contain specific content such as images, videos, or news. In this case, the XML format is mandatory. The XML sitemap file listing the media or images is rarely used on websites. Most of the time, the images or videos are inside the site pages already listed in the URLs of the sitemap file.

The Obligation to Create a Sitemap

The sitemap is not mandatory when creating a website. However, it is highly recommended! The sitemap is not a priority if you have a showcase site with 5-10 pages. But it quickly becomes essential if your site is large.

For e-commerce, for example, it is an essential file, given that the site is very dynamic. Many categories are added frequently, as are new articles. The sitemap file is, therefore, essential to facilitate the discovery of new URLs.

Consideration by Search Engines

Today, the main search engines Google and Bing take into account the sitemap file. You can very easily send it to the dedicated webmaster tools. Yandex and Baidu also support this protocol.

Difference between Sitemap.xml and Sitemap.html

Finally, to finish on the interesting information related to the sitemap, the name of this file can be confusing with the sitemap. It is, therefore, necessary to distinguish between these two elements.

The sitemap.xml is intended exclusively for search engine crawlers. It is not present in the standard tree structure.

On the other hand, the sitemap.html is designed for users. It is a page, often present in the footer of the sites, which lists many links to the different pages.

Sitemap and SEO

The sitemap file does not directly impact SEO. It does not work like optimizing your Title tag or editorial content.

On the other hand, it indirectly contributes to SEO :

  • It facilitates the exploration of the URLs of the site.
  • Improves the indexing of new URLs.
  • Helps in advanced SEO analysis.
  • It helps to detect orphan pages.
  • From it, Googlebot enables to understand which pages to index.
  • It helps to redesign a site when it includes URL changes.
  • It speeds up the deindexing of certain URLs.

Facilitate the Exploration of the Different URLs of the Site

The primary purpose of the sitemap file is to list all the site’s important pages. It is, therefore, a time saver for the robot crawlers to make them quickly discover your relevant URLs. The sitemap allows you to check your indexing coverage in Google’s Search Console.

search console report

This report lets you easily see if the search engine encountered any problems while crawling certain pages. The most frequently encountered errors are:

  • Some URLs in the sitemap file cannot be crawled, especially if they are previously blocked for crawling by the robots.txt file.
  • Some URLs may be redirected or in errors (301 or 404).
  • Specific URLs that Google refuses to index (very low content, duplicate content, etc.)

Accelerate the Discovery of New URLs

When new pages are added to a website, it may take some time for them to be crawled and indexed by search engines. You speed up this process by adding your latest URLs to your sitemap.

Unindex many URLs Faster

Even if the primary purpose of the sitemap is to indicate the URLs to be crawled and indexed, it also allows many URLs to be deindexed. Creating a specific sitemap, listing all the pages of your site with a no-index tag becomes a mass deindexing tool and saves time in the event of the deletion of certain pages.

It is then enough to submit this listing to in the Search Console tool, which will encourage Google to crawl the pages concerned so that the search engine realizes for itself that these no longer have to be indexed.

The specific sitemap can then be deleted, once the de-indexation has been successfully completed.

Discover Orphan Pages

When a website has a complex structure and is very large, it is not uncommon for some of its pages to be orphaned. In other words, these can be present in the sitemap but completely absent from the site structure. Without an internal link, the page fetches very little, if any, PageRank.

In this case, by crossing the URLs of your sitemap file and a crawl of your site, you can identify if pages contain no internal links. If you find that some pages in your sitemap are completely missing from the crawl, it means that they are orphan pages. If you detect any, this anomaly should be corrected.

How to Create a Sitemap?

When a sitemap or sitemap index file is created, it must be declared to webmaster tools such as Google Search Console and Bing webmaster tools.

How to Add Sitemap for SEO in Google Search Console?

Simply go to the Google Search Console interface to declare these different elements. This is specially designed to allow you to make this statement and view all the associated statistics.

To submit a sitemap, the steps are as follows:

  • Select the right priority for your site.
  • Click on the Sitemaps tab.
  • Indicate the URL of your sitemap in the appropriate field.
  • Click on the “Submit” button.
add-sitemap-google-search-console

As soon as your sitemap file is crawled, it will be analyzed by Google.

How to Add Sitemap in Bing Webmaster Tools?

Here, the process is very similar to that of Google Search Console. Once your property is validated, you just need to apply the following steps:

  • Select the right priority for your site.
  • Click on the Sitemaps tab.
  • Click on “Submit Sitemap”.
  • Indicate the full URL of your sitemap in the appropriate field.
  • Click on “Submit Button”.
add Sitemaps-in-Bing-Webmaster-Tools

Conclusion

As you understood, a sitemap’s presence does not condition a website’s success or its optimization. However, it is a valuable tool for improving the crawling and indexing of your pages by the various search engines.

Thanks to the sitemap, crawlers have easier access to certain pages, which is very valuable in the context of any site.

Given the growing number of sites on the Internet and the ease of generating sitemaps today, you might as well take advantage of it!

Related Posts

Sharing is Caring

The sitemap file has become, over time, an essential part of SEO. Often cited in discussions related to SEO, it is closely linked to the exploration and indexing of the different URLs present on your site.

Particularly interesting when you design an e-commerce site with many pages, it facilitates the discovery of the different URLs of your online store. I, therefore, advise you not to neglect this file!

What is the Sitemap File?

To better understand where this highly standardized protocol comes from, let’s start with a bit of history and some key dates:

  • 2005: Google created the first version called Sitemaps 0.84.
  • December 2006: Microsoft and Yahoo announce support for Sitemaps 0.90. It was at this moment that a protocol was established, elaborating a writing standard for all future sitemaps
Official website: https://www.sitemaps.org/

To return to its primary function, the sitemap of a website is a file listing all the URLs you want to index. It must include only relevant URLs to visit for search engines.

The sitemap can be composed of different information:

  • loc: URL of the page. This is the only required field.
  • lastmod: date of last modification
  • changefreq: update frequency
  • priority: URL priority

Here is an example of an XML Sitemap file:

example of xml sitemap

It is also interesting to specify that the XML format is not obligatory. You can, for example, create a sitemap in text format (.txt). On the other hand, it will be impossible for you to indicate the optional elements mentioned above (lasted, changefreq, and priority). This will only contain the list of URLs.

The purpose of this file is to make it easier to crawl your site. The crawl represents the exploration of a website by a search engine robot. During its passage, the robot will discover and analyze the page’s source code to bring out the different elements (HTML, images, internal and external links, etc.).

When this file is declared to search engines, they will analyze it regularly to explore all the URLs listed there. Then, if everything goes as planned, they will index them so that they can appear in the results when Internet users formulate requests.

What is a Sitemap Index?

A site can have several sitemaps. In this case, a sitemap index file can be created. It will allow you to list the different sitemaps within a sitemap page. The advantage is that all sitemaps are sent at once.

As for the classic sitemap, the sitemap index must be declared in your webmaster tools.

Information to Know About the Sitemap

Now you know about a sitemap file’s main characteristics and operation, here is some other information to know:

Ability to Add it to the Location of your Choice

A sitemap file can have any name and its location. Unlike the robots.txt file, which must be located at the site’s root and have a specific name, the sitemap file can be placed anywhere and have a free URL.

The only requirement is that it must be on the domain name in question. If you want, you can choose a different file name. This change allows you to hide the different URLs you want to be crawled and indexed.

On the other hand, to find the XML sitemap file of your competitors, several solutions are available to you:

  • Add after the root of the site “/sitemap.xml” or sometimes “/sitemap_index.xml”
  • Consult the site’s/robots.txt file. Maybe the sitemap file is declared there…

As the name of the sitemap is completely free, you may come across a 404 error, but nothing prevents you from trying because most sites use this location.

Possibility of having Several Sitemaps on the Same Site

As I indicated above, a website can have several sitemap files but not necessarily listed in an index. You are completely free to create several in different folders of the site and declare them, one by one, in the webmaster tools.

Some sites have, for example, a sitemap for all pages, and categories and a sitemap for the blog, etc. To illustrate, you can very well have a sitemap at the root level of the site:

  • https://mnseoultrapro.com/page-sitemap.xml
  • https://mnseoultrapro.com/post-sitemap.xml
  • https://mnseoultrapro.com/category-sitemap.xml

URL Limits in the Sitemap

As I just mentioned, everyone is free to use an almost unlimited number of sitemap files. Nevertheless, there are some limitations. But don’t worry, you’ll probably never reach them.

The limitations are as follows:

  • A sitemap file can contain up to 50,000 URLs.
  • A sitemap index can contain 50,000 different sitemaps.
  • Each site can have up to 500 sitemap index files.
  • Once unpacked, the size of the sitemap file should not exceed 50MB.

I’ll spare you the math, but taking advantage of sitemap indexes, your total number of URLs to send is limited to 2.5 billion. Are you less worried?

Indicate Full URLs

The protocol states that URLs must be complete and not relative in your sitemap file.
In other words, these must always start with “HTTP” or “HTTPS.

Indicate only URLs with Interest for SEO

This is one of the most important points of this article, so I particularly insist on it. Since the objective of the sitemap is to facilitate the exploration of the different URLs of your site that you want to have crawled and indexed, it is in your best interest to fill in only relevant URLs.

Therefore, listing URLs of no interest for your SEO in your sitemap is useless. Only promote useful and indexable pages. It helps to save the crawl budget.

Files Supported in the Sitemap

Some sitemap files may contain specific content such as images, videos, or news. In this case, the XML format is mandatory. The XML sitemap file listing the media or images is rarely used on websites. Most of the time, the images or videos are inside the site pages already listed in the URLs of the sitemap file.

The Obligation to Create a Sitemap

The sitemap is not mandatory when creating a website. However, it is highly recommended! The sitemap is not a priority if you have a showcase site with 5-10 pages. But it quickly becomes essential if your site is large.

For e-commerce, for example, it is an essential file, given that the site is very dynamic. Many categories are added frequently, as are new articles. The sitemap file is, therefore, essential to facilitate the discovery of new URLs.

Consideration by Search Engines

Today, the main search engines Google and Bing take into account the sitemap file. You can very easily send it to the dedicated webmaster tools. Yandex and Baidu also support this protocol.

Difference between Sitemap.xml and Sitemap.html

Finally, to finish on the interesting information related to the sitemap, the name of this file can be confusing with the sitemap. It is, therefore, necessary to distinguish between these two elements.

The sitemap.xml is intended exclusively for search engine crawlers. It is not present in the standard tree structure.

On the other hand, the sitemap.html is designed for users. It is a page, often present in the footer of the sites, which lists many links to the different pages.

Sitemap and SEO

The sitemap file does not directly impact SEO. It does not work like optimizing your Title tag or editorial content.

On the other hand, it indirectly contributes to SEO :

  • It facilitates the exploration of the URLs of the site.
  • Improves the indexing of new URLs.
  • Helps in advanced SEO analysis.
  • It helps to detect orphan pages.
  • From it, Googlebot enables to understand which pages to index.
  • It helps to redesign a site when it includes URL changes.
  • It speeds up the deindexing of certain URLs.

Facilitate the Exploration of the Different URLs of the Site

The primary purpose of the sitemap file is to list all the site’s important pages. It is, therefore, a time saver for the robot crawlers to make them quickly discover your relevant URLs. The sitemap allows you to check your indexing coverage in Google’s Search Console.

search console report

This report lets you easily see if the search engine encountered any problems while crawling certain pages. The most frequently encountered errors are:

  • Some URLs in the sitemap file cannot be crawled, especially if they are previously blocked for crawling by the robots.txt file.
  • Some URLs may be redirected or in errors (301 or 404).
  • Specific URLs that Google refuses to index (very low content, duplicate content, etc.)

Accelerate the Discovery of New URLs

When new pages are added to a website, it may take some time for them to be crawled and indexed by search engines. You speed up this process by adding your latest URLs to your sitemap.

Unindex many URLs Faster

Even if the primary purpose of the sitemap is to indicate the URLs to be crawled and indexed, it also allows many URLs to be deindexed. Creating a specific sitemap, listing all the pages of your site with a no-index tag becomes a mass deindexing tool and saves time in the event of the deletion of certain pages.

It is then enough to submit this listing to in the Search Console tool, which will encourage Google to crawl the pages concerned so that the search engine realizes for itself that these no longer have to be indexed.

The specific sitemap can then be deleted, once the de-indexation has been successfully completed.

Discover Orphan Pages

When a website has a complex structure and is very large, it is not uncommon for some of its pages to be orphaned. In other words, these can be present in the sitemap but completely absent from the site structure. Without an internal link, the page fetches very little, if any, PageRank.

In this case, by crossing the URLs of your sitemap file and a crawl of your site, you can identify if pages contain no internal links. If you find that some pages in your sitemap are completely missing from the crawl, it means that they are orphan pages. If you detect any, this anomaly should be corrected.

How to Create a Sitemap?

When a sitemap or sitemap index file is created, it must be declared to webmaster tools such as Google Search Console and Bing webmaster tools.

How to Add Sitemap for SEO in Google Search Console?

Simply go to the Google Search Console interface to declare these different elements. This is specially designed to allow you to make this statement and view all the associated statistics.

To submit a sitemap, the steps are as follows:

  • Select the right priority for your site.
  • Click on the Sitemaps tab.
  • Indicate the URL of your sitemap in the appropriate field.
  • Click on the “Submit” button.
add-sitemap-google-search-console

As soon as your sitemap file is crawled, it will be analyzed by Google.

How to Add Sitemap in Bing Webmaster Tools?

Here, the process is very similar to that of Google Search Console. Once your property is validated, you just need to apply the following steps:

  • Select the right priority for your site.
  • Click on the Sitemaps tab.
  • Click on “Submit Sitemap”.
  • Indicate the full URL of your sitemap in the appropriate field.
  • Click on “Submit Button”.
add Sitemaps-in-Bing-Webmaster-Tools

Conclusion

As you understood, a sitemap’s presence does not condition a website’s success or its optimization. However, it is a valuable tool for improving the crawling and indexing of your pages by the various search engines.

Thanks to the sitemap, crawlers have easier access to certain pages, which is very valuable in the context of any site.

Given the growing number of sites on the Internet and the ease of generating sitemaps today, you might as well take advantage of it!

Related Posts

Sharing is Caring

The sitemap file has become, over time, an essential part of SEO. Often cited in discussions related to SEO, it is closely linked to the exploration and indexing of the different URLs present on your site.

Particularly interesting when you design an e-commerce site with many pages, it facilitates the discovery of the different URLs of your online store. I, therefore, advise you not to neglect this file!

What is the Sitemap File?

To better understand where this highly standardized protocol comes from, let’s start with a bit of history and some key dates:

  • 2005: Google created the first version called Sitemaps 0.84.
  • December 2006: Microsoft and Yahoo announce support for Sitemaps 0.90. It was at this moment that a protocol was established, elaborating a writing standard for all future sitemaps
Official website: https://www.sitemaps.org/

To return to its primary function, the sitemap of a website is a file listing all the URLs you want to index. It must include only relevant URLs to visit for search engines.

The sitemap can be composed of different information:

  • loc: URL of the page. This is the only required field.
  • lastmod: date of last modification
  • changefreq: update frequency
  • priority: URL priority

Here is an example of an XML Sitemap file:

example of xml sitemap

It is also interesting to specify that the XML format is not obligatory. You can, for example, create a sitemap in text format (.txt). On the other hand, it will be impossible for you to indicate the optional elements mentioned above (lasted, changefreq, and priority). This will only contain the list of URLs.

The purpose of this file is to make it easier to crawl your site. The crawl represents the exploration of a website by a search engine robot. During its passage, the robot will discover and analyze the page’s source code to bring out the different elements (HTML, images, internal and external links, etc.).

When this file is declared to search engines, they will analyze it regularly to explore all the URLs listed there. Then, if everything goes as planned, they will index them so that they can appear in the results when Internet users formulate requests.

What is a Sitemap Index?

A site can have several sitemaps. In this case, a sitemap index file can be created. It will allow you to list the different sitemaps within a sitemap page. The advantage is that all sitemaps are sent at once.

As for the classic sitemap, the sitemap index must be declared in your webmaster tools.

Information to Know About the Sitemap

Now you know about a sitemap file’s main characteristics and operation, here is some other information to know:

Ability to Add it to the Location of your Choice

A sitemap file can have any name and its location. Unlike the robots.txt file, which must be located at the site’s root and have a specific name, the sitemap file can be placed anywhere and have a free URL.

The only requirement is that it must be on the domain name in question. If you want, you can choose a different file name. This change allows you to hide the different URLs you want to be crawled and indexed.

On the other hand, to find the XML sitemap file of your competitors, several solutions are available to you:

  • Add after the root of the site “/sitemap.xml” or sometimes “/sitemap_index.xml”
  • Consult the site’s/robots.txt file. Maybe the sitemap file is declared there…

As the name of the sitemap is completely free, you may come across a 404 error, but nothing prevents you from trying because most sites use this location.

Possibility of having Several Sitemaps on the Same Site

As I indicated above, a website can have several sitemap files but not necessarily listed in an index. You are completely free to create several in different folders of the site and declare them, one by one, in the webmaster tools.

Some sites have, for example, a sitemap for all pages, and categories and a sitemap for the blog, etc. To illustrate, you can very well have a sitemap at the root level of the site:

  • https://mnseoultrapro.com/page-sitemap.xml
  • https://mnseoultrapro.com/post-sitemap.xml
  • https://mnseoultrapro.com/category-sitemap.xml

URL Limits in the Sitemap

As I just mentioned, everyone is free to use an almost unlimited number of sitemap files. Nevertheless, there are some limitations. But don’t worry, you’ll probably never reach them.

The limitations are as follows:

  • A sitemap file can contain up to 50,000 URLs.
  • A sitemap index can contain 50,000 different sitemaps.
  • Each site can have up to 500 sitemap index files.
  • Once unpacked, the size of the sitemap file should not exceed 50MB.

I’ll spare you the math, but taking advantage of sitemap indexes, your total number of URLs to send is limited to 2.5 billion. Are you less worried?

Indicate Full URLs

The protocol states that URLs must be complete and not relative in your sitemap file.
In other words, these must always start with “HTTP” or “HTTPS.

Indicate only URLs with Interest for SEO

This is one of the most important points of this article, so I particularly insist on it. Since the objective of the sitemap is to facilitate the exploration of the different URLs of your site that you want to have crawled and indexed, it is in your best interest to fill in only relevant URLs.

Therefore, listing URLs of no interest for your SEO in your sitemap is useless. Only promote useful and indexable pages. It helps to save the crawl budget.

Files Supported in the Sitemap

Some sitemap files may contain specific content such as images, videos, or news. In this case, the XML format is mandatory. The XML sitemap file listing the media or images is rarely used on websites. Most of the time, the images or videos are inside the site pages already listed in the URLs of the sitemap file.

The Obligation to Create a Sitemap

The sitemap is not mandatory when creating a website. However, it is highly recommended! The sitemap is not a priority if you have a showcase site with 5-10 pages. But it quickly becomes essential if your site is large.

For e-commerce, for example, it is an essential file, given that the site is very dynamic. Many categories are added frequently, as are new articles. The sitemap file is, therefore, essential to facilitate the discovery of new URLs.

Consideration by Search Engines

Today, the main search engines Google and Bing take into account the sitemap file. You can very easily send it to the dedicated webmaster tools. Yandex and Baidu also support this protocol.

Difference between Sitemap.xml and Sitemap.html

Finally, to finish on the interesting information related to the sitemap, the name of this file can be confusing with the sitemap. It is, therefore, necessary to distinguish between these two elements.

The sitemap.xml is intended exclusively for search engine crawlers. It is not present in the standard tree structure.

On the other hand, the sitemap.html is designed for users. It is a page, often present in the footer of the sites, which lists many links to the different pages.

Sitemap and SEO

The sitemap file does not directly impact SEO. It does not work like optimizing your Title tag or editorial content.

On the other hand, it indirectly contributes to SEO :

  • It facilitates the exploration of the URLs of the site.
  • Improves the indexing of new URLs.
  • Helps in advanced SEO analysis.
  • It helps to detect orphan pages.
  • From it, Googlebot enables to understand which pages to index.
  • It helps to redesign a site when it includes URL changes.
  • It speeds up the deindexing of certain URLs.

Facilitate the Exploration of the Different URLs of the Site

The primary purpose of the sitemap file is to list all the site’s important pages. It is, therefore, a time saver for the robot crawlers to make them quickly discover your relevant URLs. The sitemap allows you to check your indexing coverage in Google’s Search Console.

search console report

This report lets you easily see if the search engine encountered any problems while crawling certain pages. The most frequently encountered errors are:

  • Some URLs in the sitemap file cannot be crawled, especially if they are previously blocked for crawling by the robots.txt file.
  • Some URLs may be redirected or in errors (301 or 404).
  • Specific URLs that Google refuses to index (very low content, duplicate content, etc.)

Accelerate the Discovery of New URLs

When new pages are added to a website, it may take some time for them to be crawled and indexed by search engines. You speed up this process by adding your latest URLs to your sitemap.

Unindex many URLs Faster

Even if the primary purpose of the sitemap is to indicate the URLs to be crawled and indexed, it also allows many URLs to be deindexed. Creating a specific sitemap, listing all the pages of your site with a no-index tag becomes a mass deindexing tool and saves time in the event of the deletion of certain pages.

It is then enough to submit this listing to in the Search Console tool, which will encourage Google to crawl the pages concerned so that the search engine realizes for itself that these no longer have to be indexed.

The specific sitemap can then be deleted, once the de-indexation has been successfully completed.

Discover Orphan Pages

When a website has a complex structure and is very large, it is not uncommon for some of its pages to be orphaned. In other words, these can be present in the sitemap but completely absent from the site structure. Without an internal link, the page fetches very little, if any, PageRank.

In this case, by crossing the URLs of your sitemap file and a crawl of your site, you can identify if pages contain no internal links. If you find that some pages in your sitemap are completely missing from the crawl, it means that they are orphan pages. If you detect any, this anomaly should be corrected.

How to Create a Sitemap?

When a sitemap or sitemap index file is created, it must be declared to webmaster tools such as Google Search Console and Bing webmaster tools.

How to Add Sitemap for SEO in Google Search Console?

Simply go to the Google Search Console interface to declare these different elements. This is specially designed to allow you to make this statement and view all the associated statistics.

To submit a sitemap, the steps are as follows:

  • Select the right priority for your site.
  • Click on the Sitemaps tab.
  • Indicate the URL of your sitemap in the appropriate field.
  • Click on the “Submit” button.
add-sitemap-google-search-console

As soon as your sitemap file is crawled, it will be analyzed by Google.

How to Add Sitemap in Bing Webmaster Tools?

Here, the process is very similar to that of Google Search Console. Once your property is validated, you just need to apply the following steps:

  • Select the right priority for your site.
  • Click on the Sitemaps tab.
  • Click on “Submit Sitemap”.
  • Indicate the full URL of your sitemap in the appropriate field.
  • Click on “Submit Button”.
add Sitemaps-in-Bing-Webmaster-Tools

Conclusion

As you understood, a sitemap’s presence does not condition a website’s success or its optimization. However, it is a valuable tool for improving the crawling and indexing of your pages by the various search engines.

Thanks to the sitemap, crawlers have easier access to certain pages, which is very valuable in the context of any site.

Given the growing number of sites on the Internet and the ease of generating sitemaps today, you might as well take advantage of it!

Related Posts