Skip to content

Conversation

@fgerlits
Copy link
Contributor

https://issues.apache.org/jira/browse/MINIFICPP-2342

Currently, MiNiFi modifies the installed config files (minifi.properties and config.yml) while it runs. This confuses the Windows installer, so changes to these config files get lost when MiNiFi is upgraded.

After this change, the minifi.properties, minifi-log.properties and minifi-uid.properties files are no longer modified by MiNiFi at runtime, so they can be safely replaced by new versions during upgrade. All changes to the settings should be put into new files in the minifi.properties.d (minifi-log.properties.d, minifi-uid.properties.d) directory; these new files will not be touched by the upgrade.

The config.yml file will no longer be installed as part of MiNiFi. If C2 is enabled, config.yml will be fetched from the C2 server; otherwise, MiNiFi will create a new file with an empty flow, and the user can edit this. Either way, config.yml will not be touched by an upgrade.


Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

  • Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file?
  • If applicable, have you updated the NOTICE file?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check GitHub Actions CI results for build issues and submit an update to your PR as soon as possible.

@fgerlits fgerlits force-pushed the MINIFICPP-2342_Do-not-overwrite-installed-config-files branch 2 times, most recently from ebca3b0 to 604baba Compare November 17, 2025 09:12
@fgerlits fgerlits changed the title MINIFICPP-2342 Do not overwrite installed config files MINIFICPP-2342 Do not overwrite config files during an upgrade Nov 17, 2025
#include "utils/Environment.h"

namespace {
bool fileContentsMatch(const std::filesystem::path& file_name, const std::unordered_set<std::string>& expected_contents) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this function is a bit misleading, it could be something line propertiesMatchInFile to indicate that this is not a generic file content match check. Also this might be a bit shorter like this:

for (std::string line; std::getline(file, line); ) {
    actual_contents.insert(std::move(line));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed the function to settingsInFileAreAsExpected() and changed the while loop to a for loop in fb3cdef

I don't think the move would be correct here, since we would reuse line after we moved from it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(thread reply) I think move is valid: the moved-from state is messy in general, some people say invalid, the standard says unspecified but valid state. But if it's reassigned (by getfile), then it probably doesn't matter. I would be a proponent of a destructive move / relocate, which would prevent reuse like this, but we don't have it yet.

I think it's fine either way.

Comment on lines 192 to 235
const auto extra_properties_files_dir = extra_properties_files_dir_name();
std::vector<std::filesystem::path> extra_properties_file_names;
if (utils::file::exists(extra_properties_files_dir) && utils::file::is_directory(extra_properties_files_dir)) {
utils::file::list_dir(extra_properties_files_dir, [&](const std::filesystem::path&, const std::filesystem::path& file_name) {
if (!file_name.string().ends_with(".bak")) {
extra_properties_file_names.push_back(file_name);
}
return true;
}, logger_, /* recursive = */ false);
}
std::ranges::sort(extra_properties_file_names);
for (const auto& file_name : extra_properties_file_names) {
properties_files_.push_back(extra_properties_files_dir / file_name);
}

std::ifstream file(properties_file_, std::ifstream::in);
if (!file.good()) {
logger_->log_error("load configure file failed {}", properties_file_);
return;
logger_->log_info("Using configuration file to load configuration for {} from {} (located at {})",
getName().c_str(), configuration_file.string(), base_properties_file_.string());
if (!extra_properties_file_names.empty()) {
auto list_of_files = utils::string::join(", ", extra_properties_file_names, [](const auto& path) { return path.string(); });
logger_->log_info("Also reading configuration from files {} in {}", list_of_files, extra_properties_files_dir.string());
}

properties_.clear();
dirty_ = false;
for (const auto& line : PropertiesFile{file}) {
auto key = line.getKey();
auto persisted_value = line.getValue();
auto value = utils::string::replaceEnvironmentVariables(persisted_value);
bool need_to_persist_new_value = false;
fixValidatedProperty(std::string(prefix) + key, persisted_value, value, need_to_persist_new_value, *logger_);
dirty_ = dirty_ || need_to_persist_new_value;
properties_[key] = {persisted_value, value, need_to_persist_new_value};
for (const auto& properties_file : properties_files_) {
std::ifstream file(properties_file, std::ifstream::in);
if (!file.good()) {
logger_->log_error("load configure file failed {}", properties_file);
continue;
}
for (const auto& line : PropertiesFile{file}) {
auto key = line.getKey();
auto persisted_value = line.getValue();
auto value = utils::string::replaceEnvironmentVariables(persisted_value);
bool need_to_persist_new_value = false;
fixValidatedProperty(std::string(prefix) + key, persisted_value, value, need_to_persist_new_value, *logger_);
dirty_ = dirty_ || need_to_persist_new_value;
properties_[key] = {persisted_value, value, need_to_persist_new_value};
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are 2 separate sections here, doing 2 separate things that can be encapsulated, I would extract them to 2 separate functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored loadConfigureFile by pulling out two self-contained sections: edf88de

Comment on lines 247 to 261
const auto output_file = (persist_to_ == PersistTo::SingleFile ? base_properties_file_ : extra_properties_files_dir_name() / C2PropertiesFileName);
if (!std::filesystem::exists(output_file)) {
logger_->log_debug("Configuration file {} does not exist yet, creating it", output_file);
utils::file::create_dir(output_file.parent_path(), /* recursive = */ true);
std::ofstream file{output_file};
}

std::ifstream file(output_file, std::ifstream::in);
if (!file) {
logger_->log_error("load configure file failed {}", properties_file_);
logger_->log_error("Failed to load configuration file {}", output_file);
return false;
}

auto new_file = properties_file_;
new_file += ".new";

PropertiesFile current_content{file};
for (const auto& prop : properties_) {
if (!prop.second.need_to_persist_new_value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extract reading the PropertiesFile content to a separate function

Copy link
Contributor Author

@fgerlits fgerlits Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling out the file-reading part is not so easy because of the return false in line 219/257.

I have pulled out the next section, which sets the changed properties in current_content: 9d355e8

README.md Outdated
these are the default settings supplied by the latest MiNiFi version. If you would like to modify these, you should create a corresponding
.d directory (e.g. conf/minifi.properties.d) and put your settings in a new file inside this directory. These files are read and applied
in lexicographic order, after the default settings file.
The Windows installer creates a conf/minifi.properties.d/10_installer_properties file, which contains C2 connection settings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files under minifi.properties.d should have the .properties extension too by convention, but it's OK if the minifi service doesn't enforce this and reads all files under the directory.

Suggested change
The Windows installer creates a conf/minifi.properties.d/10_installer_properties file, which contains C2 connection settings.
The Windows installer creates a conf/minifi.properties.d/10_installer.properties file, which contains C2 connection settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about convention: most of the files under /etc/*.d on my machine have no extension, with .conf a distant second and many others like .driver, .sources etc. I have changed them to .properties anyway, in 12e6d40.

When reading them, we will continue to read all files in the .d directory except backup files.

Currently, MiNiFi modifies the installed config files (minifi.properties
and config.yml) while it runs. This confuses the Windows installer, so
changes to these config files get lost when MiNiFi is upgraded.

After this change, the minifi.properties, minifi-log.properties and
minifi-uid.properties files are no longer modified by MiNiFi at runtime,
so they can be safely replaced by new versions during upgrade.
All changes to the settings should be put into new files in the
minifi.properties.d (minifi-log.properties.d, minifi-uid.properties.d)
directory; these new files will not be touched by the upgrade.

The config.yml file will no longer be installed as part of MiNiFi. If C2
is enabled, config.yml will be fetched from the C2 server; otherwise,
MiNiFi will create a new file with an empty flow, and the user can
edit this. Either way, config.yml will not be touched by an upgrade.
... and use a for loop instead of while
@fgerlits fgerlits force-pushed the MINIFICPP-2342_Do-not-overwrite-installed-config-files branch from 12e6d40 to 5894882 Compare December 5, 2025 13:40

namespace {
void createDefaultFlowConfigFile(const std::filesystem::path& path) {
std::ofstream ostream(path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a bit of error reporting

Suggested change
std::ofstream ostream(path);
std::ofstream ostream(path);
ostream.exceptions(std::ofstream::failbit | std::ofstream::badbit);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied

Comment on lines +465 to +466
these are the default settings supplied by the latest MiNiFi version. If you would like to modify these, you should create a corresponding
.d directory (e.g. conf/minifi.properties.d) and put your settings in a new file inside this directory. These files are read and applied
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it would be more user-friendly if we shipped the directory. One option could be to ship a symlink in it to minifi.properties like /etc/sysctl.d. Alternatively we should ship a README or an empty file just to keep the directory.

Copy link
Contributor Author

@fgerlits fgerlits Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to ship all three directories: minifi.properties.d, minifi-log.properties.d and minifi-uid.properties.d, or only minifi.properties.d?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think more about this, let me get back to this after the holidays. Now I'm thinking about a single minifi.properties.d with things like 10-general.properties, 10-c2.properties, 10-log.properties, 10-uid.properties, and they're symlinks to the old filenames under conf/, but I'm not sure yet this would be the right approach.

#include "utils/Environment.h"

namespace {
bool fileContentsMatch(const std::filesystem::path& file_name, const std::unordered_set<std::string>& expected_contents) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(thread reply) I think move is valid: the moved-from state is messy in general, some people say invalid, the standard says unspecified but valid state. But if it's reassigned (by getfile), then it probably doesn't matter. I would be a proponent of a destructive move / relocate, which would prevent reuse like this, but we don't have it yet.

I think it's fine either way.

Comment on lines +165 to +167
utils::file::list_dir(extra_properties_files_dir, [&](const std::filesystem::path&, const std::filesystem::path& file_name) {
if (!file_name.string().ends_with(".bak")) {
extra_properties_file_names.push_back(file_name);
Copy link
Member

@szaszm szaszm Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed my mind about the pattern matching: I think I'd rather have a whitelist by matching extensions, rather than a blacklist by *.bak. It's more explicit about what we want to read, and it leaves room for things like a README.txt like in /etc/sysctl.d on my system. By the way, another pattern I was thinking about following was /etc/X11/xorg.conf.d.

What do you think? As always, I'm open to different viewpoints and to being wrong.

Comment on lines 44 to 67
return *file_name_;
gsl_Expects(!file_locations_.empty());
return file_locations_.front().filename();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function still make sense? If it's using a list of files now, and exposes a single file name, then I'd rather change it to not expose a single file name instead of just taking the first one in the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still used when we serialize the checksum into the heartbeat. I agree that e.g. using a "checksum name" passed to the constructor would be nicer, but it would add a lot of extra code. Instead, I have renamed the function to getFileNameOfFirstFileLocation() in d06d435.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants