-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Situation
When building HTML pages, the HTML stylesheet does not create the canonicals. The stylesheet does not know which translations are available for this particular deliverable.
This is how a full set of canonicals including hreflangs looks like:
<link href="https://documentation.suse.com/sles/15-SP7/html/SLES-all/book-deployment.html" rel="canonical"/>
<link href="https://documentation.suse.com/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="x-default" rel="alternate"/>
<link href="https://documentation.suse.com/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="en-US" rel="alternate"/>
<link href="https://documentation.suse.com/de-de/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="de-DE" rel="alternate"/>
<link href="https://documentation.suse.com/es-es/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="es-ES" rel="alternate"/>
<link href="https://documentation.suse.com/fr-fr/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="fr-FR" rel="alternate"/>
<link href="https://documentation.suse.com/ja-jp/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="ja-JP" rel="alternate"/>
<link href="https://documentation.suse.com/pt-br/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="pt-BR" rel="alternate"/>
<link href="https://documentation.suse.com/zh-cn/sles/15-SP4/html/SLES-all/book-deployment.html" hreflang="zh-CN" rel="alternate"/>Use Case
Canonicals help search engines to solve duplicate content and tell them, which version is the primary or preferred source. For international sites, they work directly with hreflang attributes to ensure that each specified language translation points to its correct, authoritative version.
Possible Implementation
Canonicals can be implemented in two possible approaches:
- "On-the-fly"
The docbuild script knows about all languages. With this knowledge, it can pass specific XSLT parameters to the stylesheet to create the respective set of canonical links.
This approach speeds up the whole build, but this approach does not work with Cloudnative or Multi Linux Manager. Both are built by Antora. - "After build"
This approach will traverse the target directory, reads the HTML code, try to find canonicals and if not there, adds them. Then it stores the page to a temporary file and goes to the next file. When all files are processed, the final step is to rename all the temporary files into the original names.
This approach is very costly as it needs to traverse the whole directory structure and has to modify the HTML files. However, it is independent from the source format and works with both ADoc and DocBook.
"On-the-fly" approach
The canonicals are introduced during the XSLT transformation from DocBook to HTML. This should be the preferred approach when dealing with DocBook or ADoc which is built through DAPS.
- Collect the translations.
- Pass the translation from step 1 into a XSLT parameter, maybe
canonicals.hreflangs='en-us de-de es-es fr-fr'? - Let the HTML stylesheet parse the XSLT parameter and create the necessary
<link/>structure (see above). - Done.
- Advantage: Direct. When the deliverable is being built, it's finished. No need for an extra step.
- Disadvantage: ?
"After-build" approach
The canonicals are introduced by traversing a directory structure with this algorithm:
- Start with the English directory and find all HTML files.
- Try to find the same files in a different language directories. Do it for all supported languages. We need to check the
<html lang="..." xml:lang="...">attribute(s) to not have English. Only then it's a "real" translation. - Collect the result of the last step. Either we have zero or translations.
- Adapt the URLs of the translations and create the hreflang links.
- Store the HTML file.
- Advantage: Independent from any configuration.
- Disadvantage: The filenames need to be the same between English and the translations. If no translation was being built, no hreflangs were created.