[bp/1.37] Don't use sysroot when building using non-hermetic toolchain (#43681)#43768
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves persistent linking issues encountered when building Envoy with non-hermetic toolchains. Previously, the use of Envoy's hermetic sysroot, which contains an older Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to disable the hermetic sysroot when using a non-hermetic toolchain, to avoid linking issues, controlled by the BAZEL_USE_HOST_SYSROOT environment variable. However, it also introduces potential Starlark injection vulnerabilities in bazel/repo.bzl by directly formatting unsanitized environment variables into generated Bazel files, which could allow an attacker with control over the build environment to execute arbitrary code on the build machine.
| # NOTE: The cleanest way to provide this environment variable would be via Bazel's | ||
| # repo_env flag. It's particularly important when using remote build execution (aka | ||
| # RBE), as host environment variables are not directly passed to remote workers. | ||
| local_sysroot = repository_ctx.os.environ.get("BAZEL_USE_HOST_SYSROOT", "False") |
There was a problem hiding this comment.
The environment variable BAZEL_USE_HOST_SYSROOT is read and directly formatted into the generated compiler.bzl and BUILD files without any sanitization or validation. This allows an attacker who can control the environment variables of the build process to inject arbitrary Starlark code into these files. Since these files are loaded and executed by Bazel, this can lead to arbitrary command execution on the build machine (e.g., by injecting a genrule with a malicious command).
To remediate this, ensure the input is validated to be a boolean value before using it in the template.
| local_sysroot = repository_ctx.os.environ.get("BAZEL_USE_HOST_SYSROOT", "False") | |
| local_sysroot = "True" if repository_ctx.os.environ.get("BAZEL_USE_HOST_SYSROOT", "False").lower() == "true" else "False" |
|
please rebase when you land to keep the commits separate so its clear what was backported |
envoyproxy#43681) Commit Message: Currently, even when using non-hermetic toolchain we in a few places use Envoy sysroot. Depending on the environment you use when building it can cause linking issues. The problem is that sysroot contains a libc library, while libc++ comes with the toolchain. If libc++ coming with the toolchain was built against a different version of libc than the one we have in Envoy sysroot, libc++ may contain references to the symbols that do not exist in the libc that we have in Envoy sysroot - that results in linking failures. In practice it affects systems that use libc version much newere than what we have in Envoy sysroot. The newer version of libc exposes more functions and libc++, when linked againt it, takes advantage of those newer symbols. As a result, when you combine libc++ linked against a newer version of libc and older version of libc you get linking issues due to a bunch of undefined symbols that libc++ uses. Envoy sysroot currently supports libc version 2.31 (the default) and 2.28 (which seem to only exist for Istio, because Istio intentionally wants to use as old version of libc as practical, because it needs to distribute Envoy binaries that will run outside of container environments and using a very old version of libc makes it easier). glibc 2.31 is 6 years old at this point, so many distribution (both popular like Ubuntu and more rare like Azure Linux) moved on to newer versions of glibc. There are a few ways we can handle this situation: 1. We can add more sysroot versions with newer versions of libc - given that we don't test non-hermetic toolchain builts and different versions of sysroot on CI, I think it's better to not place the burden of supporting different sysroots on the community. 2. We can add libc++ to the sysroot - I didn't test it, but I'd imagine it's possible to make it work, but because hermetic toolchain already comes with libc++ linked against an old enough version of libc to not cause problems, that will introduce some duplication (both sysroot and toolchain will have a version of libc++ libraries) 3. We can say that non-hermetic builds for Envoy just aren't supported at all - that's undesirable, because LLVM toolchains on GitHub are only available for a few operating systems (Ubuntu, RedHat) and if you're not using any of those, you are out of luck. 4. When not using hermetic toolchain, just don't provide sysroot and let the host toolchain figure out how and where to find the libraries in the host. The option 4 makes most sense to me, so this PR implements that, but we can discuss other options as well. Additional Description: It would be nice to cherry pick this to 1.37 release branch as well, that is if we aggree that this approach makes sense. Risk Level: low Testing: I did test builds both with hermetic toolchain and with host toolchain for both FIPS and non-FIPS builds to verify that the changes work. And CI run regular tests with hermetic toolchain. Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a --------- Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>
envoyproxy#43731) Commit Message: It's a followup for envoyproxy#43681 - after spllitting sysroot from the toolchain I didn't push the local change that declares `BAZEL_USE_HOST_SYSROOT` environment variable in the repository rule. As a result local tests passed (because I had that change locally) and remote tests passed (because they don't test setup without sysroot), but if you checkout state from the upstream repository building with host sysroot still uses Envoy provided sysroot and that results in error of finding the right libraries. Additional Description: It does not break any builds that worked before, but without this commit setting `BAZEL_USE_HOST_SYSROOT` also does not do anything, as it's ignored. Risk Level: low Testing: built locally Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>
…roxy#43800) Commit Message: When I tried to backport envoyproxy#43681 and envoyproxy#43731 to 1.37 release branch Copilot pointed out that I didn't validate the value of BAZEL_USE_HOST_SYSROOT and just included it as-is into the generated Starlak script. It does not seem to be a particularly critical issue, but it's also easy to fix, so let's just fix it. Additionnally, while backporting I discovered that `./ci/do_ci.sh deps` fails currently with the following error: ``` Observed dataplane core deps {'rules_python', 'fips_go_linux_amd64', 'fips_go_linux_arm64', 'fips_cmake_linux_x86_64', 'com_github_nghttp2_nghttp2', 'zlib', 'com_github_nlohmann_json', 'googleurl', 'boringssl', 'com_googlesource_code_re2', 'envoy_repo', 'com_google_protobuf_protoc_osx_x86_64', 'com_github_axboe_liburing', 'com_github_libevent_libevent', 'com_google_protobuf_protoc_linux_ppcle_64', 'dev_cel', 'com_google_protobuf_protoc_win64', 'rules_foreign_cc', 'com_github_fmtlib_fmt', 'io_bazel_rules_go', 'com_github_jbeder_yaml_cpp', 'com_github_cncf_xds', 'com_google_protobuf_protoc_osx_aarch_64', 'prometheus_metrics_model', 'envoy_toolshed', 'aws_lc', 'com_google_googleapis', 'fips_cmake_linux_aarch64', 'rules_cc', 'com_github_zlib_ng_zlib_ng', 'fips_ninja', 'com_github_cyan4973_xxhash', 'com_google_absl', 'com_github_google_perfetto', 'com_github_grpc_grpc', 'com_google_protobuf_protoc_linux_aarch_64', 'com_google_protobuf', 'com_google_protobuf_protoc_linux_x86_64', 'com_google_protoconverter', 'com_github_gabime_spdlog', 'com_envoyproxy_protoc_gen_validate', 'com_github_openhistogram_libcircllhist', 'rules_license', 'com_github_google_quiche'} is not covered by "use_category" implied core deps {'com_github_openzipkin_zipkinapi', 'gperftools', 'rules_python', 'fips_go_linux_amd64', 'fips_go_linux_arm64', 'fips_cmake_linux_x86_64', 'com_github_nghttp2_nghttp2', 'zlib', 'com_github_nlohmann_json', 'googleurl', 'boringssl', 'com_googlesource_code_re2', 'com_github_cares_cares', 'com_google_protobuf_protoc_osx_x86_64', 'com_github_axboe_liburing', 'com_github_libevent_libevent', 'com_google_protobuf_protoc_linux_ppcle_64', 'rules_foreign_cc', 'com_google_protobuf_protoc_win64', 'dev_cel', 'com_github_fmtlib_fmt', 'com_github_jbeder_yaml_cpp', 'io_bazel_rules_go', 'com_google_protobuf_protoc_osx_aarch_64', 'prometheus_metrics_model', 'com_github_cncf_xds', 'rules_proto', 'envoy_toolshed', 'aws_lc', 'com_google_googleapis', 'fips_cmake_linux_aarch64', 'rules_cc', 'toolchains_llvm', 'rules_rust', 'com_github_zlib_ng_zlib_ng', 'com_github_google_tcmalloc', 'com_github_cyan4973_xxhash', 'com_google_absl', 'fips_ninja', 'com_github_google_perfetto', 'com_github_grpc_grpc', 'opentelemetry_proto', 'com_google_protobuf_protoc_linux_aarch_64', 'bazel_skylib', 'com_google_protobuf', 'rules_buf', 'com_google_protobuf_protoc_linux_x86_64', 'com_google_protoconverter', 'com_github_gabime_spdlog', 'com_envoyproxy_protoc_gen_validate', 'com_github_openhistogram_libcircllhist', 'rules_license', 'com_github_google_quiche'}: {'envoy_repo'} are missing ``` The failure is due to the fact that `envoy_repo` repository rule is listed when we query dependencies in Bazel, but because it's not actually an external repository we don't list it in `bazel/deps.yaml`. So the checker complains that we have an unexpected dependency. This PR adds `envoy_repo` to the list of ignored dependencies because it falls under the category of "internal repository structure", so it should be ok to exclude. Additional Description: n/a Risk Level: low Testing: build locally with local sysroot in FIPS mode and without FIPS. Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a --------- Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>
169f55b to
0bfe6aa
Compare
Commit Message:
Currently, even when using non-hermetic toolchain we in a few places use
Envoy sysroot. Depending on the environment you use when building it can
cause linking issues.
The problem is that sysroot contains a libc library, while libc++ comes
with the toolchain. If libc++ coming with the toolchain was built
against a different version of libc than the one we have in Envoy
sysroot, libc++ may contain references to the symbols that do not exist
in the libc that we have in Envoy sysroot - that results in linking
failures.
In practice it affects systems that use libc version much newere than
what we have in Envoy sysroot. The newer version of libc exposes more
functions and libc++, when linked againt it, takes advantage of those
newer symbols. As a result, when you combine libc++ linked against a
newer version of libc and older version of libc you get linking issues
due to a bunch of undefined symbols that libc++ uses.
Envoy sysroot currently supports libc version 2.31 (the default) and
2.28 (which seem to only exist for Istio, because Istio intentionally
wants to use as old version of libc as practical, because it needs to
distribute Envoy binaries that will run outside of container
environments and using a very old version of libc makes it easier).
glibc 2.31 is 6 years old at this point, so many distribution (both
popular like Ubuntu and more rare like Azure Linux) moved on to newer
versions of glibc.
There are a few ways we can handle this situation:
that we don't test non-hermetic toolchain builts and different versions
of sysroot on CI, I think it's better to not place the burden of
supporting different sysroots on the community.
it's possible to make it work, but because hermetic toolchain already
comes with libc++ linked against an old enough version of libc to not
cause problems, that will introduce some duplication (both sysroot and
toolchain will have a version of libc++ libraries)
at all - that's undesirable, because LLVM toolchains on GitHub are only
available for a few operating systems (Ubuntu, RedHat) and if you're not
using any of those, you are out of luck.
the host toolchain figure out how and where to find the libraries in the
host.
The option 4 makes most sense to me, so this PR implements that, but we
can discuss other options as well.
Additional Description:
This PR includes 3 PRs from the main branch:
Those were supposed to be a single PR originally, but I made a mistake of not pushing all the local changes and, in the end, had to send another PR. And I also had to address a deps check failure, so in the end I had 3 PRs.
Risk Level: low
Testing: Tested FIPS and non-FIPS build with and without hermetic toolchain.
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a