The general abbreviation processing will be implemented in gh-422
But there may stay some special cases, mostly from news native tags, that we should handle separately/differently.
Examples:
u.s.a. -> usa
u. s. a. -> usa
u s a -> usa
ph.d. -> phd (there a lot of abbreviations with multi-letter parts, like ph)
- but
.net -> dot-net
- but
r programming -> r-programming
We may need to add checks in multiple places:
- in
tags.converters to detect some patterns in raw tags;
- in
tags.normalizers to detect some patterns after initial normalization.
Attention: check the statistics of such tags in the DB before implementing anything, to avoid overengineering.
Internal task id: ff-524
The general abbreviation processing will be implemented in gh-422
But there may stay some special cases, mostly from news native tags, that we should handle separately/differently.
Examples:
u.s.a.->usau. s. a.->usau s a->usaph.d.->phd(there a lot of abbreviations with multi-letter parts, likeph).net->dot-netr programming->r-programmingWe may need to add checks in multiple places:
tags.convertersto detect some patterns in raw tags;tags.normalizersto detect some patterns after initial normalization.Attention: check the statistics of such tags in the DB before implementing anything, to avoid overengineering.
Internal task id: ff-524