Refactor cli internals by fantazio · Pull Request #54 · LexiFi/dead_code_analyzer

fantazio · 2026-02-05T14:57:59Z

This is a significant change and part of an even larger refactor planned (#43).

Key changes

Introducing the `Config` module

Configuration related code is now contained in its own module (and submodule Sections). With this, it will be easier to debug the configuration of the analysis when necessary (debugging facilities such as pretty printing are planned for later work).

This module provides an immutable API with private types, ensuring there is no risk of an involuntary update of the configuration. There are only 2 ways to obtain a configuration : either by using the Config.default_config or by parsing the command line by using Config.parse_cli ().
The only updatable field is sections.style by using the Config.update_style function. This is necessary for now because the analysis must not track stylistic issues when processing the paths passed to the --references command line argument ¹.

The State.t has been updated to hold the configuration. The analysis state can only be initialized by providing a configuration ².
Rather than accessing the different configuration information through globally accessible refs, the analyzer would read the appropriate state.config fields.

Simplifying the command line arguments order

Previously, the following command would have meant "analyze path1 for the unused exported values only and path2 for both the unused exported values and the unused methods" :
dead_code_analyzer --nothing -E all path1 -M all path2
Now all the paths are gathered together before any analysis is run, so the configuration is the same for all files. Hence, the above command now means "analyze path1 and path2 for unused exported values and methods". Consequently, one could prefer using its "normalized" form instead :
dead_code_analyzer --nothing -E all -M all path1 path2

Disabling complex command line semantics is voluntary. It could be re-enabled in the future if there is a request for it but for now, the simplified meaning should lead to clearer documentation and easier maintenance.

Note

This change will facilitate the 5.3 update, which would require to know which .cm* files are provided to find out some location information. (The 5.3 update currently fails on cases like #53).

In OCaml 5.3, the cmt_infos.cmt_value_dependencies has been replaced by cmt_infos.cmt_declaration_dependencies which uses Shape.Uid.t instead of Types.value_description. This change of type makes location harder to get because they were directly accessible in Types.value_description but are inexistant in Shape.Uid.t.
The analysis relies on locations. Finding an uid's location can be done using cmt_infos.cmt_uid_to_decl but, as described in Shape.mli, one needs to read the .cmt of external units to find the location of corresponding "external" uids (although they appear in the cmt_declaration_dependencies).

Thus, knowing which files are given to the tool will indicate which in files "external" uids can be looked for, and the analyzer will be able to correctly associate locations like it does in OCaml 5.2

I think this rule should be changed to still track stylistic issues on "reference" paths. The idea behind --references was to not track new declarations (be it values, types, methods or optional arguments). This idea does not contradict the identification of stylistic issues. ↩
A default state exists and provides the default_config, but is not explicitly exposed in the API. It is only used as the initial value for the current state before any State.update happened. ↩

This is the first of a series of commits aiming at extracting configuration related code from the analysis code. NOTE: There are multiple end goals at stake : - for the user interface, its specification should be rigidified to avoid confusion on command lines such as `dead_code_analyzer --nothing -E all path1 -M all path2` which would currently mean "analyze path1 for the exported values only and path2 for both the exported values and the methods". Ideally, it should be simplified to "analyze path1 and path2 for exported values and methods" and encourage the use of `dead_code_analyzer --nothing -E all -M all path1 path2` instead. - for the configuration, it should appear as immutable in order to facilitate debugging and accessing its information. - for the analysis, the end goal is to be able to split the analysis in 2 passes : 1. on the interfaces to gather all the declarations 2. on the implementations to gather all the references. They are currently interlaced (actually, for each compilation unit, we first process the interface and then the implementation). This split combined with a more robust configuration should help transition to OCaml 5.3

The main sections config was a structure with fields `print`, `threshold`, and `call_sites`. The opt args sections config was a structure with fields `print`, `threshold`, and `call_sites`. The only difference between the 2 is the type of `threshold`. Hence, they can be grouped under a same record type with `threshold`'s type as parameter. Additionally, having the `threshold` and `call_sites` fields mandatory does not make sense when the main usage of the tool does not require them, and the `print` field would express if the reports were activated for the section. Hence, the sections configuration is simplified to express the 3 states it can be in more explicitly : - deactivated (`Off`) - activated without threshold (`On`) - activated with threshold (`Threshold _`) The `threshold` and `call_sites` information are now only available in the `Threshold` mode, where they make sense.

It was a structure with fields `percentage`, `exceptions` and `optional`. The `optional` field was actually a mode selection (` `Both` or ` `Percent`). If the mode was ` `Both`, then the `exceptions` field would be used. Otherwise, only `percentage` would. Rather than having this implicit semantic with everything at hand all the time, the type is now properly split in 2 constructors : `Both` and `Percent`, which both only hold the relevant information.

It contains all the configuration. For now it simply regroup all the configuration values that were found independently `Config`. The extra indirection makes the code heavier to read for now. The section configuration fields are still `ref`. Hence the `!(config._)` constructs. The other ones are now mutable fields.

`Config.config` is now a ref and its fields are neither `mutable` nor `ref`. Additionally, now that the cli parsing is in `Config`, `Config`'s API is reduced as to not expose the "update" functions. Only `update_style` remains for now.

The new `Config.Sections` module exposes an immutable API for the sections configuration. The sections configuration is now exposed through `Config.config.sections`.

This simplifies the code in consumers of the config API

Peths considered for analysis, to gather references from, and those excluded from the analysis are now available in the config. This step will enable moving to an immutable config and an "out-of-arg-parsing" analysis (analysis is currently triggered during the parsing of arguments).

The `Config` API is now immutable. `parse_cli` will produce a new `Config.t` each time rather than update the `Config.config`, and the old `config : t ref` is now simply `default_config : t`. `State.init` now expects a config as argument to produce the initial state. During the analysis, the configuration that must be acounted for is the one in the state.

This protects the configuration from updates outside the expected canals (the functions in the API)

+ Fix some typos in comments. + Move `StringSet` to `Utils` (for now) + Easier to read `Arg.parse`. + Rename `Config.*_activated` into `Config.must_report_*` + Rewrite `Config.normalize_path` for clarity

nojb · 2026-02-11T06:05:22Z

Thanks for the useful description.

This change will facilitate the 5.3 update, which would require to know which .cm* files are provided to find out some location information. (The 5.3 update currently fails on cases like #53).

About this: when I last looked at it I thought that for an accurate analysis a full dependency tree of each module was needed (but perhaps I was missing something obvious). I considered using the output of dune describe to retrieve it from Dune itself. I thought I would just mention it in case it makes sense (but it may very well not :)).

fantazio · 2026-02-11T11:58:26Z

Thanks for the pointer to dune describe. Sorry for the lengthy response.

when I last looked at it I thought that for an accurate analysis a full dependency tree of each module was needed

You are right that the full dependency tree will help provide a more accurate analysis. What I usually do for projects built with dune is run dead_code_analyzer [options] _build and let the analyzer discover all the .cm* files it can analyze in _build.

In practice, we don't build the dependency tree (yet ?) because the analysis is mostly order-insensitive. The only* order-sensitive part of the analysis relies on the fact that a compilation unit's interface is analyzed before its implementation.

* There may be some corner cases where order may have an influence. It was the case for re-exports (07408d8). To ensure the results of the analyzer are insensitive to the order of the paths in the command line, they are analyzed in lexicographical order, and the reports are also sorted in that order. Before the current change, the paths on the command line were analyzed in their order of appearance, and only the paths found when traversing directories would be analyzed in lexicographical order.

I considered using the output of dune describe to retrieve it from Dune itself.

If I understand correctly, you mean parsing that output to pass all the right files to the analyzer. I am not sure this would provide value in most cases vs passing _build in the command line. There are probably situations where a .cm* file is not stored in the same _build directory, but wouldn't that mean it is independent from the rest ?

Note

This is actually the case for the dead_code_analyzer : its build files are in _build, the examples/using_dune build files are in examples/using_dune/_build, the examples/using_make build files are spread in examples/using_make (not using dune), and check/src build files are in check/src (not using dune).
This separation is voluntary and there is no dependency in between each component listed.

If you meant using the output of dune describe to build the dependency tree for each module in the analyzer itself, this is a good idea. The dependency trees could probably be built using information stored in .cmi as well. I think this is not critical for now but that I see 2 immediate ways such trees could help :

Scheduling the analysis : analyze the leaves first, then their parents, and group files for parallel analysis. I actually plan on the long term to have an incremental/differential-analysis mode (a la Infer), which would allow to only analyze and report on subsets of projects that are impacted by a change, and this kind of scheduling would greatly help.
Dependency-aware caching : for the issue described in the Note section of this PR, and combined with a dependency-aware scheduler, .cm* files can be cached for exactly as long as they are needed, and information retrieval would be straightforward.

fantazio · 2026-02-16T13:27:41Z

@nojb Let me know if you'd like to review this PR more in depth. Otherwise, I'll merge it Wednesday ;)

fantazio added enhancement cli labels Feb 5, 2026

fantazio force-pushed the refactor_cli branch from 15bbc95 to cbe7178 Compare February 6, 2026 13:57

fantazio mentioned this pull request Feb 8, 2026

Store cmi and cmt paths only in the config #56

Draft

fantazio added 11 commits February 9, 2026 14:04

[src][config][5/n] move the command line argument parsing to Config

7b01ddb

[src][config][6/n] config's content is immutable

b1ba29d

`Config.config` is now a ref and its fields are neither `mutable` nor `ref`. Additionally, now that the cli parsing is in `Config`, `Config`'s API is reduced as to not expose the "update" functions. Only `update_style` remains for now.

[src][config][7/n] extract sections config to their own module

ee97de4

The new `Config.Sections` module exposes an immutable API for the sections configuration. The sections configuration is now exposed through `Config.config.sections`.

[src][config][8/n] add dedicated has_*_section_activated functions

d4568d5

This simplifies the code in consumers of the config API

[src][config][11/n] config types are private

ea0ee8e

This protects the configuration from updates outside the expected canals (the functions in the API)

fantazio force-pushed the refactor_cli branch from cbe7178 to 9055d06 Compare February 9, 2026 13:05

[src][config][12/n] small cleanup after self review

07ca382

+ Fix some typos in comments. + Move `StringSet` to `Utils` (for now) + Easier to read `Arg.parse`. + Rename `Config.*_activated` into `Config.must_report_*` + Rewrite `Config.normalize_path` for clarity

fantazio force-pushed the refactor_cli branch from 9055d06 to 07ca382 Compare February 9, 2026 13:09

fantazio merged commit f022714 into LexiFi:master Feb 18, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor cli internals#54

Refactor cli internals#54
fantazio merged 12 commits intoLexiFi:masterfrom
fantazio:refactor_cli

fantazio commented Feb 5, 2026 •

edited

Loading

Uh oh!

nojb commented Feb 11, 2026

Uh oh!

fantazio commented Feb 11, 2026 •

edited

Loading

Uh oh!

fantazio commented Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

fantazio commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key changes

Introducing the Config module

Simplifying the command line arguments order

Note

Footnotes

Uh oh!

nojb commented Feb 11, 2026

Uh oh!

fantazio commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fantazio commented Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

fantazio commented Feb 5, 2026 •

edited

Loading

Introducing the `Config` module

fantazio commented Feb 11, 2026 •

edited

Loading