Skip to content

Redesign python build dependencies - For discussion#198

Draft
couteau wants to merge 11 commits intoopen-vcpkg:mainfrom
couteau:build_deps
Draft

Redesign python build dependencies - For discussion#198
couteau wants to merge 11 commits intoopen-vcpkg:mainfrom
couteau:build_deps

Conversation

@couteau
Copy link
Copy Markdown
Contributor

@couteau couteau commented Apr 6, 2026

I am starting this Pull Request to begin a discussion about a potential redesign of the build process for python modules in the python-registry repo. Now that QGIS4 is out, I've noticed that the python packages bundled with the app (at least on Mac) include not only the important data science package QGIS users depend on but also all the build dependencies for those packages. But we don't really need those build dependences bundled and distributed with QGIS. E.g., QGIS probably doesn't need wheel, gpep517, or setuptools bundled with the application, but they are being included in the bundle because at build time, they were necessary to build other python packages.

In an attempt to build a version of QGIS that leaves out these packages, I tried building with different host and target triplets (same OS and same architecture, just a different triplet name to force installation of packages into a different build directory). My thought was that the build deps would be installed in the host build directory, but only the packages in the target build directory would be bundled into the final app. However, it appears that the design of the build scripts in vcpkg-python-scripts assumes that the host and target triplets are the same.

The first obstacle I ran into is that the packages in this repo are inconsistent with whether build dependencies are marked for host installation. So I went through all the packages and made sure dependencies like setuptools, py-meson, etc. were all marked as host dependencies. It also seems like vcpkg-python-scripts should be included as a host dependency, too, though it only depends on python3 so it probably doesn't affect the final QGIS bundle. But for good measure, I also went through and marked vcpkg-python-scripts as a host dependency. But the next problem I ran into is that the various build scripts included in vcpkg-python-scripts are hardcoded to run in the VCPKG_INSTALLED_DIR environment, but after my changes, the build dependencies are installed in the VCPKG_HOST_INSTALLED_DIR environment. So the scripts can't find them and the build fails.

Fixing this would therefore require changes to the core python build and install scripts, and there are several ways to do it. Rather than pick one and implement it, I thought it might be more productive to raise the issue for discussion and see if there is interest in this change. Here are the possibilities I see as the most likely:

Easiest would be to run the scripts in the VCPKG_HOST_INSTALLED_DIR environment and implicitly require all build dependencies to be installed there. That should solve the problem but would continue having the problem of inflexibility that the current method has.

The second option would be to detect whether vcpkg-python-scripts has been installed in the host or target environment and set the relevant variables to use the appropriate environment when executing the build scripts. This should be doable and would still be adaptable.

And the third idea I had would be to use vcpkg's x_vcpkg_get_python_packages utility to create a virtual environment for building python packages rather than using the application's build environment. The virtual environment could be created with the most common and/or required build components such as gpep517, setuptools, wheel, flit-core, etc. and others could be available as features -- e.g. meson, hatchling. And if a package has unusual build requirements, the path to the build environment would be available in a CMAKE variable, and could be used to install the additional dependences using pip. Or we could create a utility function to do that more easily. Since the build deps would all be installed in a virtual environment, they wouldn't muck up the application's python environment which will then have only the packages actually needed at run time.

Related to these issues, I also noticed that py-gpep517 is currently being included inconsistently. It seems that most of the packages in the repo don't depend on it directly, but some of the other build packages (e.g. py-packaging) do depend on it, even though it isn't actually a build requirement for them. Notably vcpkg-python-scripts doesn't depend on py-gpep517 even though it uses the gpep517 module to build and install python packages. The result is that packages that use the scripts but don't depend directly or indirectly on py-gpep517 will run into build-time errors. I think the solution is probably for vcpkg-python-scripts to depend on py-gpep517 since it does in fact need it. However, if we were to redesign the python build scripts to use a virtual environment, that would obviate this issue. But for now, this PR removes the py-gpep517 dependency for packages that don't require it and adds a dependency on py-gpep517 to the vcpkg-python-scripts port.

@m-kuhn I'm eager to hear what you think.

@couteau couteau changed the title Redesign python build dependencies - NOT READY TO MERGE Redesign python build dependencies - For discussion - NOT READY TO MERGE Apr 6, 2026
@couteau couteau marked this pull request as draft April 6, 2026 16:39
@couteau couteau changed the title Redesign python build dependencies - For discussion - NOT READY TO MERGE Redesign python build dependencies - For discussion Apr 6, 2026
@m-kuhn
Copy link
Copy Markdown
Contributor

m-kuhn commented Apr 8, 2026

Very good points here, indeed the separation between host and target is not consistent at the moment, a few dependencies can clearly be moved to host dependencies.

For QGIS we have always used the same host and target triplets as this reduces build times considerably (e.g. qt will be built twice otherwise), this is not ideal but a pragmatic choice for faster development roundtrips and limited CI resources.

Easiest would be to run the scripts in the VCPKG_HOST_INSTALLED_DIR environment and implicitly require all build dependencies to be installed there. That should solve the problem but would continue having the problem of inflexibility that the current method has.

This would be the simplest path forward and my preference if we don`t want to open up a big change.

The second option would be to detect whether vcpkg-python-scripts has been installed in the host or target environment and set the relevant variables to use the appropriate environment when executing the build scripts. This should be doable and would still be adaptable.

I think this shouldn´t be needed, the vcpkg-python-scripts dependency should always be installed in the host environment (which will be equal to the target environment if both triplets match). Or do you see a reason why this would be required?

And the third idea I had would be to use vcpkg's x_vcpkg_get_python_packages utility to create a virtual environment for building python packages rather than using the application's build environment. The virtual environment could be created with the most common and/or required build components such as gpep517, setuptools, wheel, flit-core, etc. and others could be available as features -- e.g. meson, hatchling. And if a package has unusual build requirements, the path to the build environment would be available in a CMAKE variable, and could be used to install the additional dependences using pip. Or we could create a utility function to do that more easily. Since the build deps would all be installed in a virtual environment, they wouldn't muck up the application's python environment which will then have only the packages actually needed at run time.

This is something that I find very interesting but never pursued. This would also have the advantage of always using one python version (system python) during the build process (I am pretty sure at the moment we still fall back to using system python in some code paths while we use vcpkg python for most others). The main uncertainty I had here is if this will create a new venv for every package we build. IIRC it does and the package cache doesn`t work in this scenario which adds some network traffic, wait times and an increased potential for instability due to network errors. Another thing to look at would be version pinning and SHA checks for the downloaded packages.

As for the specific proposal in this PR (moving py-gpep517 to a vcpkg-python-scripts dependency, this looks good to me but as you mentioned only makes sense if we dont generally switch to x_vcpkg_get_python_packages`.

@couteau
Copy link
Copy Markdown
Contributor Author

couteau commented Apr 13, 2026

@m-kuhn I just pushed a few commits that start to sketch out what I had in mind with the third option of using a virtual environment for building python packages. It is still a work in progress, but it works to build most packages on MacOS. I haven't tested Windows yet. It currently uses the vckpkg-installed python, not the system python. Using the system python could be made an option, but I worry that if packages are built against a different version of the python binaries than the one bundled with QGIS, it could create problems.

The main uncertainty I had here is if this will create a new venv for every package we build.

The approach here creates a single virtual environment in the CURRENT_HOST_INSTALLED_DIR/tools directory and uses that to build all the packages. No separate environment for each package

Another thing to look at would be version pinning and SHA checks for the downloaded packages.

Yes, this could become an issue if packages being built need different or conflicting versions of the same build tool. The current approach doesn't account for that. For the tools installed through features of the vcpkg-python-scripts, I currently just install the latest version available from PyPi, which so far has not caused any issues, though I don't yet have every package building successfully.

I see what you mean about PyQt needing to build qtbase for both host and target if they differ -- I had the same issue with py-qscintilla needing qscintilla in both host and target environments. I so far have not run into that for other packages. There are a few python build requirements that are needed at both build-time and runtime (e.g., numpy, pybind11, and PyQt6-sip), but using a virtual env allows those build requirements to be satisfied using pip so they don't take a lot of time or consume a lot of cpu resources (though they do, of course, require some network bandwidth to install). And you are correct that the binary cache doesn't help with this issue -- vcpkg seems to cache packages built against different triplets separately, even if they are identical. But it only builds Qt/qscintilla twice (once for the host environment, once for the target environment), not once for every package that depends on them.

The second option would be to detect whether vcpkg-python-scripts has been installed in the host or target environment and set the relevant variables to use the appropriate environment when executing the build scripts. This should be doable and would still be adaptable.

I think this shouldn´t be needed, the vcpkg-python-scripts dependency should always be installed in the host environment (which will be equal to the target environment if both triplets match). Or do you see a reason why this would be required?

Right, it shouldn't be necessary. But there are currently a number of ports that don't make vcpkg-python-scripts a host dependency. That's probably an error, and if so, then we can fix those, and the scripts can then safely assume they are always installed in the host environment.

@m-kuhn
Copy link
Copy Markdown
Contributor

m-kuhn commented Apr 15, 2026

@m-kuhn I just pushed a few commits that start to sketch out what I had in mind with the third option of using a virtual environment for building python packages. It is still a work in progress, but it works to build most packages on MacOS. I haven't tested Windows yet. It currently uses the vckpkg-installed python, not the system python. Using the system python could be made an option, but I worry that if packages are built against a different version of the python binaries than the one bundled with QGIS, it could create problems.

I think this should not be a problem, in general build tools and runtime tools tend to be fairly well separated. But using the vcpkg installed python version certainly adds to standardizing tools.

Another thing to look at would be version pinning and SHA checks for the downloaded packages.

Yes, this could become an issue if packages being built need different or conflicting versions of the same build tool. The current approach doesn't account for that. For the tools installed through features of the vcpkg-python-scripts, I currently just install the latest version available from PyPi, which so far has not caused any issues, though I don't yet have every package building successfully.

It's also about a proper audit trail (e.g. supply chain attacks). At the moment we know exactly what has been used to produce a specific build. If we just use "latest", that will no longer be the case. We could possibly use a requirements.txt with --require-hashes, but I don't have any experience with that.

I see what you mean about PyQt needing to build qtbase for both host and target if they differ -- I had the same issue with py-qscintilla needing qscintilla in both host and target environments. I so far have not run into that for other packages. There are a few python build requirements that are needed at both build-time and runtime (e.g., numpy, pybind11, and PyQt6-sip), but using a virtual env allows those build requirements to be satisfied using pip so they don't take a lot of time or consume a lot of cpu resources (though they do, of course, require some network bandwidth to install). And you are correct that the binary cache doesn't help with this issue -- vcpkg seems to cache packages built against different triplets separately, even if they are identical. But it only builds Qt/qscintilla twice (once for the host environment, once for the target environment), not once for every package that depends on them.

Flagging things properly as host dependencies is a good move in any case, even if we don't make use of it in some build environments.
IIRC qtdeclarative was also a host dependency for some tools and takes a rather long time to build.

The second option would be to detect whether vcpkg-python-scripts has been installed in the host or target environment and set the relevant variables to use the appropriate environment when executing the build scripts. This should be doable and would still be adaptable.

I think this shouldn´t be needed, the vcpkg-python-scripts dependency should always be installed in the host environment (which will be equal to the target environment if both triplets match). Or do you see a reason why this would be required?

Right, it shouldn't be necessary. But there are currently a number of ports that don't make vcpkg-python-scripts a host dependency. That's probably an error, and if so, then we can fix those, and the scripts can then safely assume they are always installed in the host environment.

Perfect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants