Conversation
15bbc95 to
cbe7178
Compare
This is the first of a series of commits aiming at extracting configuration related code from the analysis code. NOTE: There are multiple end goals at stake : - for the user interface, its specification should be rigidified to avoid confusion on command lines such as `dead_code_analyzer --nothing -E all path1 -M all path2` which would currently mean "analyze path1 for the exported values only and path2 for both the exported values and the methods". Ideally, it should be simplified to "analyze path1 and path2 for exported values and methods" and encourage the use of `dead_code_analyzer --nothing -E all -M all path1 path2` instead. - for the configuration, it should appear as immutable in order to facilitate debugging and accessing its information. - for the analysis, the end goal is to be able to split the analysis in 2 passes : 1. on the interfaces to gather all the declarations 2. on the implementations to gather all the references. They are currently interlaced (actually, for each compilation unit, we first process the interface and then the implementation). This split combined with a more robust configuration should help transition to OCaml 5.3
The main sections config was a structure with fields `print`, `threshold`, and `call_sites`. The opt args sections config was a structure with fields `print`, `threshold`, and `call_sites`. The only difference between the 2 is the type of `threshold`. Hence, they can be grouped under a same record type with `threshold`'s type as parameter. Additionally, having the `threshold` and `call_sites` fields mandatory does not make sense when the main usage of the tool does not require them, and the `print` field would express if the reports were activated for the section. Hence, the sections configuration is simplified to express the 3 states it can be in more explicitly : - deactivated (`Off`) - activated without threshold (`On`) - activated with threshold (`Threshold _`) The `threshold` and `call_sites` information are now only available in the `Threshold` mode, where they make sense.
It was a structure with fields `percentage`, `exceptions` and `optional`. The `optional` field was actually a mode selection (` `Both` or ` `Percent`). If the mode was ` `Both`, then the `exceptions` field would be used. Otherwise, only `percentage` would. Rather than having this implicit semantic with everything at hand all the time, the type is now properly split in 2 constructors : `Both` and `Percent`, which both only hold the relevant information.
It contains all the configuration. For now it simply regroup all the configuration values that were found independently `Config`. The extra indirection makes the code heavier to read for now. The section configuration fields are still `ref`. Hence the `!(config._)` constructs. The other ones are now mutable fields.
`Config.config` is now a ref and its fields are neither `mutable` nor `ref`. Additionally, now that the cli parsing is in `Config`, `Config`'s API is reduced as to not expose the "update" functions. Only `update_style` remains for now.
The new `Config.Sections` module exposes an immutable API for the sections configuration. The sections configuration is now exposed through `Config.config.sections`.
This simplifies the code in consumers of the config API
Peths considered for analysis, to gather references from, and those excluded from the analysis are now available in the config. This step will enable moving to an immutable config and an "out-of-arg-parsing" analysis (analysis is currently triggered during the parsing of arguments).
The `Config` API is now immutable. `parse_cli` will produce a new `Config.t` each time rather than update the `Config.config`, and the old `config : t ref` is now simply `default_config : t`. `State.init` now expects a config as argument to produce the initial state. During the analysis, the configuration that must be acounted for is the one in the state.
This protects the configuration from updates outside the expected canals (the functions in the API)
cbe7178 to
9055d06
Compare
+ Fix some typos in comments. + Move `StringSet` to `Utils` (for now) + Easier to read `Arg.parse`. + Rename `Config.*_activated` into `Config.must_report_*` + Rewrite `Config.normalize_path` for clarity
9055d06 to
07ca382
Compare
|
Thanks for the useful description.
About this: when I last looked at it I thought that for an accurate analysis a full dependency tree of each module was needed (but perhaps I was missing something obvious). I considered using the output of |
|
Thanks for the pointer to
You are right that the full dependency tree will help provide a more accurate analysis. What I usually do for projects built with dune is run In practice, we don't build the dependency tree (yet ?) because the analysis is mostly order-insensitive. The only* order-sensitive part of the analysis relies on the fact that a compilation unit's interface is analyzed before its implementation. * There may be some corner cases where order may have an influence. It was the case for re-exports (07408d8). To ensure the results of the analyzer are insensitive to the order of the paths in the command line, they are analyzed in lexicographical order, and the reports are also sorted in that order. Before the current change, the paths on the command line were analyzed in their order of appearance, and only the paths found when traversing directories would be analyzed in lexicographical order.
If I understand correctly, you mean parsing that output to pass all the right files to the analyzer. I am not sure this would provide value in most cases vs passing Note This is actually the case for the dead_code_analyzer : its build files are in If you meant using the output of
|
|
@nojb Let me know if you'd like to review this PR more in depth. Otherwise, I'll merge it Wednesday ;) |
This is a significant change and part of an even larger refactor planned (#43).
Key changes
Introducing the
ConfigmoduleConfiguration related code is now contained in its own module (and submodule
Sections). With this, it will be easier to debug the configuration of the analysis when necessary (debugging facilities such as pretty printing are planned for later work).This module provides an immutable API with
privatetypes, ensuring there is no risk of an involuntary update of the configuration. There are only 2 ways to obtain a configuration : either by using theConfig.default_configor by parsing the command line by usingConfig.parse_cli ().The only updatable field is
sections.styleby using theConfig.update_stylefunction. This is necessary for now because the analysis must not track stylistic issues when processing the paths passed to the--referencescommand line argument 1.The
State.thas been updated to hold the configuration. The analysis state can only be initialized by providing a configuration 2.Rather than accessing the different configuration information through globally accessible
refs, the analyzer would read the appropriatestate.configfields.Simplifying the command line arguments order
Previously, the following command would have meant "analyze path1 for the unused exported values only and path2 for both the unused exported values and the unused methods" :
dead_code_analyzer --nothing -E all path1 -M all path2Now all the paths are gathered together before any analysis is run, so the configuration is the same for all files. Hence, the above command now means "analyze path1 and path2 for unused exported values and methods". Consequently, one could prefer using its "normalized" form instead :
dead_code_analyzer --nothing -E all -M all path1 path2Disabling complex command line semantics is voluntary. It could be re-enabled in the future if there is a request for it but for now, the simplified meaning should lead to clearer documentation and easier maintenance.
Note
This change will facilitate the 5.3 update, which would require to know which
.cm*files are provided to find out some location information. (The 5.3 update currently fails on cases like #53).In OCaml 5.3, the
cmt_infos.cmt_value_dependencieshas been replaced bycmt_infos.cmt_declaration_dependencieswhich usesShape.Uid.tinstead ofTypes.value_description. This change of type makes location harder to get because they were directly accessible inTypes.value_descriptionbut are inexistant inShape.Uid.t.The analysis relies on locations. Finding an uid's location can be done using
cmt_infos.cmt_uid_to_declbut, as described in Shape.mli, one needs to read the.cmtof external units to find the location of corresponding "external" uids (although they appear in thecmt_declaration_dependencies).Thus, knowing which files are given to the tool will indicate which in files "external" uids can be looked for, and the analyzer will be able to correctly associate locations like it does in OCaml 5.2
Footnotes
I think this rule should be changed to still track stylistic issues on "reference" paths. The idea behind
--referenceswas to not track new declarations (be it values, types, methods or optional arguments). This idea does not contradict the identification of stylistic issues. ↩A default state exists and provides the
default_config, but is not explicitly exposed in the API. It is only used as the initial value for thecurrentstate before anyState.updatehappened. ↩