Skip to content

Conversation

@dorindabassey
Copy link
Collaborator

@dorindabassey dorindabassey commented Nov 18, 2025

Summary of the PR

Add a new --gpu-path CLI argument that allows users to specify which GPU device to use for rendering when using the virglrenderer backend.
This is useful for systems with multiple GPUs where a specific device needs to be selected.

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

  • All commits in this PR have Signed-Off-By trailers (with
    git commit -s), and the commit message has max 60 characters for the
    summary and max 75 characters for each description line.
  • All added/changed functionality has a corresponding unit/integration
    test.
  • All added/changed public-facing functionality has entries in the "Upcoming
    Release" section of CHANGELOG.md (if no such section exists, please create one).
  • Any newly added unsafe code is properly documented.

@dorindabassey dorindabassey force-pushed the gpuflag branch 2 times, most recently from c557101 to 91bf7f3 Compare November 19, 2025 11:50
@dorindabassey dorindabassey force-pushed the gpuflag branch 2 times, most recently from 0dd8175 to ec80528 Compare November 19, 2025 13:47
Comment on lines +167 to +173
// Open the GPU device if a path was specified
// Note: The path is validated early in start_backend(), so this should not fail
let render_server_fd = config
.flags()
.gpu_path
.as_ref()
.map(|path| File::open(path).map(|f| f.into()).unwrap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't see the point. Now we can propagate the error, so what we are gaining to check this in advance?

Also this is racy, because the file can disappear between the 2 opens and so here it can crash...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put that validation early on because without it, when File::open() fails in VirglRendererAdapter::new(), the error is propagated up through the event loop but the daemon just hangs waiting for events, silently ignoring that the initialization failed. So now we have a clear error message that the user can see.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this is racy, because the file can disappear between the 2 opens and so here it can crash...

About this, it's strange that a GPU device node like /dev/dri/renderD128 would disappear, they're not regular files. Also what are the chances that someone unloads the GPU driver between both validations? the time between both operations is like nanoseconds.

Copy link
Member

@stefano-garzarella stefano-garzarella Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this is racy, because the file can disappear between the 2 opens and so here it can crash...

About this, it's strange that a GPU device node like /dev/dri/renderD128 would disappear, they're not regular files. Also what are the chances that someone unloads the GPU driver between both validations? the time between both operations is like nanoseconds.

Looking better in the code after your concern, IIUC there is a lazy initialization of the backend with lazy_init_and_handle_event (that was the missing point, at least for me). So it's really not nanoseconds but depends on when the guest will send the first request IIUC. So between starting the vhost-user device, launching the VM, and having the guest driver loaded is more than several seconds and everything can happen (file permission changed, driver unloaded, etc).

In general, hoping that everything will be fine because we have "some time" is a red flag IMO. So again, I'm sorry, but opening a file/device or anything else twice, just to check if the path is accessible and hoping that between the 2 syscall nothing will happen, makes no sense to me; there are a thousand ways to attack this thing, and it is definitely not good programming practice.

That said, can we open the file while creating GpuConfig or VhostUserGpuBackend and pass the file (not the path) to VirglRendererAdapter::new() in some way ?

Change all GPU adapter new() methods to return
io::Result<Self>. This allows callers to decide
how to handle initialization failures.

Signed-off-by: Dorinda Bassey <[email protected]>
Add a new --gpu-path CLI argument that allows users
to specify which GPU device to use for rendering
when using the virglrenderer backend.
This is useful for systems with multiple GPUs where
a specific device needs to be selected.

Signed-off-by: Dorinda Bassey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants