Skip to content

Resolve duplicates due to field length restrictions of old build process by updating class labels from archived / retired RxNORM RXCUIs #94

@hoganwr

Description

@hoganwr

Discussed in #91

Originally posted by hoganwr December 1, 2025
Some duplicate labels (approx. 90 of them) result from label truncation in the old Scala-based build process. How can we fix these?

Investigation thus far:

  1. The update_label process only updates active RxNorm "concepts" in the lastest monthly release
  2. The labels involved are "historical" or "retired" RxCuis, and thus the latest monthly releases do not include them and update them
  3. This RxAPI call can get the label from the RxCui: https://rxnav.nlm.nih.gov/REST/rxcui/805526/historystatus.json, and the RxCui 805526 is an example of one of the issues

So, we can try writing a script that takes the RxCuis as input, makes the API call, gets the "name" (aka 'label') out, and updates the template somehow.

Alternatively:
4. The RXNATOMARCHIVE.RRF folder has the name, we could either write a script to pull names out of that file or to load the RXNATOMARCHIVE.RRF file into sqlite and do a SQL-based update
5. The benefits of the latter in 4 is that it's easy to write out the templates from sqlite because that's the normal build process and we have a script for that already

To do that, we'd have to write a sql script that creates teh RXNATOMARCHIVE table and populates it, we could follow the examples for thte other tables.

Now that the build process loads RXNATOMARCHIVE, I did the following:

For all rxcui in rxnatomarchive, get the last updated information:
create view v_last_rxcui_update as select RXCUI, max(VSAB) as LAST_UPDATE from RXNATOMARCHIVE group by RXCUI;

This query gets everything where the class label differs from the last update in rxnorm:
select d.rxcui, d.curie, d.label, r.STR from dron.clinical_drug_form d, rxnorm.RXNATOMARCHIVE r, rxnorm.v_last_rxcui_update u where d.rxcui = r.RXCUI and r.rxcui = u.rxcui and r.VSAB = u.LAST_UPDATE and d.label != r.STR and r.RXCUI not in (SELECT RXCUI from rxnorm.RXNCONSO);

I exported those results and manually reviewed them.

Then I ran the query to do the update:
UPDATE dron.clinical_drug_form SET label = criteria_table.label FROM ( select d.rxcui as rxcui, d.curie, d.label as old_label, r.STR as label from dron.clinical_drug_form d, rxnorm.RXNATOMARCHIVE r, rxnorm.v_last_rxcui_update u where d.rxcui = r.RXCUI and r.rxcui = u.rxcui and r.VSAB = u.LAST_UPDATE and d.label != r.STR and r.RXCUI not in (SELECT RXCUI from rxnorm.RXNCONSO) ) AS criteria_table WHERE dron.clinical_drug_form.rxcui = criteria_table.rxcui;

Then I ran:
sh run.sh make update-rxnorm

Then to review / verify:
git diff ../templates

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions