Canonical Tag: How to protect yourself from duplicate content

Category:

Table of Contents

The Canonical Tag – also called Canonical Link Attribute – is a tool to combine several identical contents. It is a way to avoid duplicate content. I’ll show you exactly how it works and in which cases you should use Canonical links in this blog post. Plus: The most common Canonical mistakes.

What is the Canonical Tag?

Canonical tags help you to control the ranking URL of your page if you have identical content on the web page. This is the case, for example, if the content is displayed on several URLs.

Technically, the canonical tag is a link in the source code of a page. The link of the duplicates points to the original URL (so-called canonical URL). This marks it as the URL you want to prefer in the index. The canonical URL also has a canonical: this, however, points to itself, one speaks of a self-referencing canonical.

Why is the Canonical Tag important for SEO?

If you have several identical contents or very similar contents on your website, Google cannot always clearly recognize which URL is relevant and should rank.

Examples for identical content are URLs with and without parameters or different formats like print or PDF versions of an HTML page.

Instead of choosing your preferred page, it can happen that Google:

  • indexes all versions, then you have duplicate content in the Google index. At the same time, all these versions don’t rank as well as a single “version” could if there was only it.
  • one of the content ranks in top positions, but not your preferred one. For example, a PDF version of your landing page.
  • does not index any version, but excludes all of them from the index.

With the Canonical Tag you show Google: Here is a main version of the content, please index it, transfer the signals of the duplicates to it and rank only the original. So the Canonical Tag is an important SEO tool.

How do I include the Canonical Tag correctly?

There are two ways to technically integrate the Canonical Tag on your website. For both types of integration Google recommends to use absolute URLs and not relative ones.

rel=”canonical” link tag in the head area of the HTML code

In HTML documents the canonical tag is integrated in the head area of the page:

The original resource refers to itself (Self referencing Canonical).

Duplicates refer to the canonical URL.

Example: The page https://seotwix.com/ refers to itself as the canonical URL with the canonical link. This is how it looks in the source code.

<link rel=”canonical” href=” https://seotwix.com/” />

Broken down, the individual elements say the following:

<link: opens the link tag

rel=”canonical”: is the actual canonical attribute

href=” https://seotwix.com/” : is the link to the canonical URL

/>: closes the link tag

rel=”canonical” in the HTTP header

For non-HTML files like PDFs there is no head section, here you need an alternative way: you include the canonical tag in the http header of the page.‍

To insert the canonical tag of a PDF in the http header, add the appropriate code to the .htaccess file:

<Files “content-a.pdf”>

Header add link ‘<“https://wwww.beispiel.com/inhalt-a/ >; rel=”canonical” ‘

</Files>‍

Note: Unlike HTML, you may use quotation marks in .htaccess only once per piece of code. You can work around this by using masking characters like \ or apostrophe.

Broken down, the elements mean:

<Files: opens the tag, here is the filename of your PDF document.

Header add link < https://wwww.beispiel.com/inhalt-a/ >; rel=”canonical” : is the canonical URL and the attribute itself.

/>: closes the tag

Resolved you will see the following in the http header of the URL:

Link: <https://www.beispiel.com/inhalt-a/>; rel=”canonical”>

Check Canonical Tags: Here’s how

You have several ways to check if the canonical tag of a URL or the canonical tags of the whole domain is used correctly. More importantly, whether Google follows your Canonical suggestions or not.

Screaming Frog: Are the Canonicals correctly integrated?

With the Screaming Frog you can find out two things relatively easy:

You can see if the tags are technically correct.

You can get a quick overview of the overall status of a domain: Do Canonicals exist at all? Which types of pages have canonical tags? It is also important whether the implementation is coherent.

For example, it is problematic if hreflang links point to canonical URLs. Or if all internal links point to canonicalized instead of canonical URLs.

This is how you proceed:

1. crawl the domain.

2. in Screaming Frog you choose the tab “Canonicals”.

3. with the help of the filters you can find technical errors like multiple or not indexable canonicals. You can also see at a glance which URLs are canonicalized and whether self-referencing canonicals are included.

3 ways to check the canonical status of single URLs

If you only want to check the status of a single URL, there are several ways:

The browser plugin “Link Redirect Trace” shows you clearly whether a canonical is used and if so, to which URL it refers.

To check if your canonical tag is accepted by Google, you can use the “info:” search operator in Google search. After the operator, enter the URL of the duplicate, and the result should be its canonical URL:

If you want to know more about Google search operators, check out the blog post of my colleague Stefan.

Use the link tool of the Google Search Console: Just paste the URL you want to check and you will see the indexation and canonical status at a glance.

Does Google generally follow my Canonical strategy?

In the coverage report of the Google Search Console you can find those URLs via the tab “Excluded” that Google currently does not index for various reasons.

Ideally, if Google follows all your wishes, your canonicalized URLs are listed under the item “Alternative page with correct canonical tag”:

But as SEOs we know: Google never follows all our wishes, so you will also find URLs where Google ignores your suggestion and prefers another page than your canonical one. (red frame in the screenshot)

This alone is no reason to panic. You should pay attention if only a few of your canonicals are followed by Google. In this case, you have to find out:

Why does Google consider other URLs than your preferred ones as worthy of indexing?

This is the case if the bot assumes that these pages are relevant for the searchers, among other things because‍ the canonicalized variants of the URLs receive significantly more traffic than the canonical page.

many internal links point to the canonicalized URLs. Especially if you link to canonicalized URLs from high-traffic pages, Google assumes that these pages are more relevant than the originals.

external domains link to your canonicalized pages.

Hreflang tags to link to canonicalized URL.

The more consistent and clear (and of course technically correct) you tell Google which pages of your domain are important, the more likely the search engine bot will follow your “suggestion”.

Use cases for the Canonical Link

The use cases for canonical tags are straightforward. They can help your website in these four scenarios:

#1 Dynamic URLs

Many websites use dynamic URLs that do not significantly change the content of the page: For example, session IDs or different representations of a category page.

In the example, you can see a category page of Hagebaumarkt, which is accessible via several URLs: Once without and once with a parameter.

This is a positive example of the use of the Canonical tag with dynamic URLs: The appended parameter does not change the content of the page, so the Canonical of the parameter URL points to the original. The Canonical of the original page points to itself.

#2 Content is accessible under several URLs

Due to the system, it can happen that your articles are accessible under different URLs, for example, if you sort them into different categories:

  • your-shop.com/skirts/purple-skirt &
  • your-shop.com/summer-fashion/purple-skirt

Also in this case you should decide on a canonical URL and let the variant refer to it via rel=”canonical”.

But: If it is technically and usability possible, it is always best to have only one version of content – except for single cases. The “second best” variant is to use a 301 redirect. The canonical is the stopgap solution. Depending on the case, it may even make more sense to set the duplicate to “”noindex””.

If your content is reachable under multiple URLs, for example with “www”, without “www”, with trailing slash, without trailing slash AND your developers are technically NOT able to implement a 301 forwarding, you should use the Canonical attribute to set a preferred version.

301 forwarding vs. Canonical tag

With the 301 status code, you tell the search engines that the page is permanently moved. Google crawls only the redirect destination, not the original page. Also, the users are led directly to the new page. If you use canonical tags, Google will continue to crawl the duplicates and users will still be able to access these pages. The canonical is only a hint for Google – which it usually follows, but which is not mandatory for Google. A 301 redirect, on the other hand, cannot be ignored by Google.

#3: Content in different formats

In some cases, it makes sense to offer your content in different formats – for example, a checklist additionally as a print or PDF version. Then it can happen that Google prefers the “wrong” version from your point of view.

‍To ensure that the HTML document is ranked instead of the PDF, you can set a canonical link to the HTML URL in the http header of the PDF. In addition, the canonical link in the HTML document should point to its own URL. So Google understands: The PDF is the “copy” of the HTML document.

‍#4: Content on other domains

Cross-domain canonicals are a special case. If you publish your content on other domains as well, you can have them point a canonical tag to the original content of your website. It is certain that duplicate content is avoided this way.

‍The associated, positive user signals can be transferred to your original article, according to Rand Fishkin at Moz: “So something above 90% of the link authority and ranking signals will transfer […].”

‍When using cross-domain canonical tags, make sure that the content of the page does not change – i.e. text, images and videos must be (almost) identical. The links in the content should also match in the best case. That the URL, design and navigation change is fine. The meta title can also differ.

‍Don’t: 7 common Canonical mistakes.

We see the same mistakes occurring over and over again in our SEO analyses when using canonicals. These are the most common ones:

‍#1 The canonical URL is not indexable/reachable.

Always make sure that the canonical URL is reachable. You can easily check this by checking the status codes of the canonical targets, for example via a crawl.

‍#2 Simultaneous use of rel=”canonical” & “”noindex””

The simultaneous use of canonicals and “noindex” is undesirable for the following reason: While the canonical says that two pages are identical, the “noindex” tag says that the page should not be indexed. Google must then ask itself the following question: If this page should not be indexed, then neither should its original, right?

‍#3 Canonical links on pagination pages

Do not use canonicals on back pagination pages to link to the first page. The pages are not identical, the content is different. Ergo, a Canonical tag makes no sense.

‍#4 More than one Canonical

Make sure you have only one canonical tag on each page. If you use more than one, Google will simply ignore all canonical links.

‍#5 Canonical tag on too different pages

Use the Canonical attribute only if the pages are very similar (98% the same). If they differ too much, Google will ignore your Canonical statement.

‍#6 Canonical link in <body> instead of <head>

The canonical link should always be placed in the head area or HTTP header. If you insert it somewhere else, Google will ignore it.

‍#7 Canonicalized URLs in the sitemap

The sitemap should only contain URLs that you want Google to index. That’s why canonicalized pages have no place in the sitemap.

‍Conclusion

Canonical tags are important for SEO, especially in the e-commerce area, where many filter URLs are created. They give you the possibility to show Google the right way to your important pages. And even if Google doesn’t always follow this path 100% – without Canonicals our favourite bot would probably get lost in the URL jungle – to the detriment of your SEO performance.

Of course, Canonical Tags are not a panacea. In many cases, that’s why: Weigh possible technical alternatives like the “noindex” tag, 301 redirects and co. against each other.

Share a post

Request a
free website audit

What to read next