Skip to content

Implement canonicals #13

@tomschr

Description

@tomschr

Situation

When building HTML pages, the HTML stylesheet does not create the canonicals. The stylesheet does not know which translations are available for this particular deliverable.

This is how a full set of canonicals including hreflangs looks like:

<link href="https://documentation.suse.com/sles/15-SP7/html/SLES-all/book-deployment.html" rel="canonical"/>
<link href="https://documentation.suse.com/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="x-default" rel="alternate"/>
<link href="https://documentation.suse.com/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="en-US" rel="alternate"/>
<link href="https://documentation.suse.com/de-de/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="de-DE" rel="alternate"/>
<link href="https://documentation.suse.com/es-es/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="es-ES" rel="alternate"/>
<link href="https://documentation.suse.com/fr-fr/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="fr-FR" rel="alternate"/>
<link href="https://documentation.suse.com/ja-jp/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="ja-JP" rel="alternate"/>
<link href="https://documentation.suse.com/pt-br/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="pt-BR" rel="alternate"/>
<link href="https://documentation.suse.com/zh-cn/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="zh-CN" rel="alternate"/>

Use Case

Canonicals help search engines to solve duplicate content and tell them, which version is the primary or preferred source. For international sites, they work directly with hreflang attributes to ensure that each specified language translation points to its correct, authoritative version.

Possible Implementation

Canonicals can be implemented in two possible approaches:

  1. "On-the-fly"
    The docbuild script knows about all languages. With this knowledge, it can pass specific XSLT parameters to the stylesheet to create the respective set of canonical links.
    This approach speeds up the whole build, but this approach does not work with Cloudnative or Multi Linux Manager. Both are built by Antora.
  2. "After build"
    This approach will traverse the target directory, reads the HTML code, try to find canonicals and if not there, adds them. Then it stores the page to a temporary file and goes to the next file. When all files are processed, the final step is to rename all the temporary files into the original names.
    This approach is very costly as it needs to traverse the whole directory structure and has to modify the HTML files. However, it is independent from the source format and works with both ADoc and DocBook.

"On-the-fly" approach

The canonicals are introduced during the XSLT transformation from DocBook to HTML. This should be the preferred approach when dealing with DocBook or ADoc which is built through DAPS.

  1. Collect the translations.
  2. Pass the translation from step 1 into a XSLT parameter, maybe canonicals.hreflangs='en-us de-de es-es fr-fr'?
  3. Let the HTML stylesheet parse the XSLT parameter and create the necessary <link/> structure (see above).
  4. Done.
  • Advantage: Direct. When the deliverable is being built, it's finished. No need for an extra step.
  • Disadvantage: ?

"After-build" approach

The canonicals are introduced by traversing a directory structure with this algorithm:

  1. Start with the English directory and find all HTML files.
  2. Try to find the same files in a different language directories. Do it for all supported languages. We need to check the <html lang="..." xml:lang="..."> attribute(s) to not have English. Only then it's a "real" translation.
  3. Collect the result of the last step. Either we have zero or translations.
  4. Adapt the URLs of the translations and create the hreflang links.
  5. Store the HTML file.
  • Advantage: Independent from any configuration.
  • Disadvantage: The filenames need to be the same between English and the translations. If no translation was being built, no hreflangs were created.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions