Skip to content

Error in tokenizer package build: "casting &T to &mut T is undefined behavior" #51

@solarchemist

Description

@solarchemist

I tried to build iga locally to test it on my machine, Ubuntu 24.04 with Python 3.12.3 and Rust (Cargo) 1.95.0.
But both pip install iga (in a venv) and pipx install iga failed with the same error. Additionally, pip install git+https://github.com/caltechlibrary/iga.git (in a different folder, venv of its own) also failed with the same error:

  Building wheel for tokenizers (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for tokenizers (pyproject.toml) did not run successfully.
  │ exit code: 1
[...]
      error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
         --> tokenizers-lib/src/models/bpe/trainer.rs:526:47
          |
      522 |                     let w = &words[*i] as *const _ as *mut _;
          |                             -------------------------------- casting happened here
      ...
      526 |                         let word: &mut Word = &mut (*w);
          |                                               ^^^^^^^^^
          |
          = note: for more information, visit <https://doc.rust-lang.org/book/ch15-05-interior-mutability.html>
          = note: `#[deny(invalid_reference_casting)]` on by default

I don't know enough Python to tell for sure, but this feels to me like the Rust version being too recent and the code in tokenizer being out of date compared to it? Is there anything I, as a user looking to get iga up and running, can do to fix this?

If you know that a downgrade of Rust will help, please advise as to which version? And would that not cause errors in other pip packages that iga depends on (I already tried installing with Rust 1.75 at first, but that caused other errors)?

Software environment

  • Tested with latest release "CFF Updating" of iga (pip and pipx) and tested latest commit (pip).
  • Python 3.12.3, Rust 1.95.0, pip 26.0.1
  • Ubuntu 24.04.4

Sort of related existing reports

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions