Skip to content

Wayland: resolve session lock client crashes on sleep, unlock, and output changes#829

Open
Luke458 wants to merge 1 commit into
quickshell-mirror:masterfrom
Luke458:fix-session-lock-crash
Open

Wayland: resolve session lock client crashes on sleep, unlock, and output changes#829
Luke458 wants to merge 1 commit into
quickshell-mirror:masterfrom
Luke458:fix-session-lock-crash

Conversation

@Luke458
Copy link
Copy Markdown

@Luke458 Luke458 commented May 28, 2026

Under the ext-session-lock-v1 protocol, calling unlock_and_destroy immediately invalidates and destroys all associated lock surface objects on the compositor.

In WlSessionLock::unlock(), the manager was unlocked before deleting the surfaces. Because surface->deleteLater() is asynchronous, the destructors ran in the next event loop tick and sent ext_session_lock_surface_v1.destroy requests to the compositor for object IDs that the compositor had already discarded, leading to a fatal invalid object Wayland protocol error and session crash.

Similarly, when screens are removed, deleting them asynchronously can trigger the same race condition.

This PR resolves the issue by synchronously deleting the surfaces first (delete surface;) before unlocking the session, conforming to the correct destruction order.

Fixes #540
Fixes #786
Fixes #733
Fixes #727
Fixes #608
Fixes #571
Fixes #406
Fixes #366
Fixes #368
Fixes #370
Fixes #361
Fixes #832

@Luke458
Copy link
Copy Markdown
Author

Luke458 commented May 28, 2026

Update: Added a second commit to prevent a crash during sleep/DPMS transitions.

When display outputs are temporarily disconnected or in a transition state, updateSurfaces() is now prevented from creating lock surfaces on placeholder or invalid screens. This avoids triggering the qFatal check in the QSWaylandSessionLockSurface constructor.

@Luke458
Copy link
Copy Markdown
Author

Luke458 commented May 28, 2026

upon further testing, this is an improvement but it doesn't fully resolve the wake from suspend/dpms off crashing. It prevents all wayland clients from crashing but quickshell itself will still crash :/

@Luke458 Luke458 changed the title wayland: destroy lock surfaces synchronously before unlocking fix(wayland): resolve session lock client crashes on sleep, unlock, and output changes May 28, 2026
@Luke458 Luke458 changed the title fix(wayland): resolve session lock client crashes on sleep, unlock, and output changes Wayland: resolve session lock client crashes on sleep, unlock, and output changes May 28, 2026
@Luke458 Luke458 marked this pull request as draft May 28, 2026 06:59
@Luke458
Copy link
Copy Markdown
Author

Luke458 commented May 28, 2026

Okay final update (I hope), after the second commit there were still 3 issues remaining.

The first issue involved a screen state transition abort caused by a qFatal check. During suspend and wake transitions, displays are briefly disconnected and reconnected, causing Qt to temporarily advertise placeholder screens with null Wayland outputs. Quickshell’s lock surface constructor contained a hardcoded qFatal safety check that immediately aborted the process when these placeholder screens were encountered. This was resolved by filtering out placeholder screens and screens with null outputs in WlSessionLock::updateSurfaces() before initializing new lock surfaces, safely bypassing the abort condition.

The second issue was a Wayland protocol error: wl_display#1: error 0: invalid object XX. In the original implementation, lock surfaces were deleted asynchronously using deleteLater(). When unlocking, or when a monitor was disconnected, the server-side lock object was destroyed first. Later, when the Qt event loop processed the deferred surface deletion, the client sent a destroy request for the surface. Because the compositor had already freed the corresponding server-side surface ID, this resulted in an invalid object protocol violation. The fix involved converting the raw lock pointer in LockWindowExtension to a weak QPointer<QSWaylandSessionLock>, and updating QSWaylandSessionLockSurface::~QSWaylandSessionLockSurface() to check whether the lock was inactive or null. If the lock was no longer valid, the code now cleans up the client-side proxy locally using wl_proxy_destroy() instead of sending a server-bound destroy request.

The third issue was another protocol error: [destroyed object]: error 1: Null buffer attached. The ext-session-lock-v1 protocol strictly forbids committing a null buffer to a lock surface, triggering a null_buffer error if violated. When surfaces were deleted before unlocking, the lock role remained active on the compositor. Because surface cleanup occurred asynchronously, Qt’s normal window-hide routine attempted to clear the screen by committing a null buffer to the associated wl_surface before teardown had fully completed, causing a crash. This was fixed by changing the destruction order in WlSessionLock::unlock() to call manager->unlock() first, allowing the compositor to exit lock mode and remove lock roles from all surfaces before deleting the local surface objects. Additionally, WlSessionLockSurface::~WlSessionLockSurface() was modified to call this->window->destroy() synchronously before scheduling deleteLater(). This immediately releases the platform window resources and tears down the Wayland objects in the correct protocol order, preventing asynchronous paint or hide events from committing null buffers.

@Luke458 Luke458 marked this pull request as ready for review May 28, 2026 08:14
@Luke458 Luke458 marked this pull request as draft May 28, 2026 08:21
@Luke458 Luke458 force-pushed the fix-session-lock-crash branch from 92a0fdd to 1ea83fc Compare May 28, 2026 08:23
@Luke458
Copy link
Copy Markdown
Author

Luke458 commented May 28, 2026

FINAL FINAL commits this time to fix the last couple of issues.

Firstly, just fixed the CI failure with qt 6.6.3 as described in the commit.

More importantly, resolved a double-free crash with the error Proxy requested for unref has no references. In QtWayland, the C++ platform wrapper classes automatically call wl_proxy_destroy() on their associated Wayland proxies from their base destructors. The initial implementation added a manual wl_proxy_destroy() call inside QSWaylandSessionLockSurface::~QSWaylandSessionLockSurface(), which caused the proxy to be freed twice, once manually and once again by the base class destructors. This triggered a fatal libwayland-client assertion. The fix involved removing the manual wl_proxy_destroy() call from ~QSWaylandSessionLockSurface(). When shouldDestroyOnServer is false, cleanup is now left entirely to the base destructor. An early return guard, if (this->object() == nullptr) return;, was also added at the start of the destructor. Additionally, LockWindowExtension::destroySurface() was modified to destroy the surface early via surface->destroy() only when the lock is still active. If the lock is inactive during unlocking, it now skips early destruction and allows the window destructor to clean up the proxy exactly once.

@Luke458
Copy link
Copy Markdown
Author

Luke458 commented May 28, 2026

After extensive debugging and validation, I've identified why this crash may be highly reproducible on systems running amdgpu/Mesa with Hyprland, and how the final changes resolve it.

The issue is triggered by how quickly the open-source AMD GPU stack handles display power transitions. Unlike some proprietary driver setups(Nvidia), amdgpu with Mesa performs DPMS off, suspend, and wake transitions almost immediately.

Similarly, Hyprland is extremely strict about Wayland object state and the moment unlock_and_destroy is processed, Hyprland immediately invalidates the surface IDs. If the client sends a destroy or commit on those IDs a split-second later, Hyprland raises a fatal protocol error ( invalid object or Null buffer attached ) and kills the client.

This exposes a race condition in Quickshell’s client-side surface lifecycle. If the client calls destroy on a lock surface after Hyprland has already freed it, the Wayland connection terminates with an invalid object error. Conversely, if the client destroys the window while the lock role is still active, Qt’s default hide routine may commit a null buffer, violating the ext-session-lock-v1 protocol and producing a Null buffer attached error.

The final fix resolves this by enforcing a strict teardown order. In WlSessionLock::unlock(), each lock surface is first explicitly destroyed through surface->ext->destroySurface() before any windows are hidden or deleted. Since the lock is still active at this point, this sends ext_session_lock_surface_v1.destroy to the compositor and strips the lock role server-side.

After that, the window is destroyed synchronously with window->destroy(). Any null buffers committed by Qt during this phase are ignored by the compositor because the surface is no longer acting as a lock surface.

The destructor also now begins with an early-return guard, if (this->object() == nullptr) return;. If the proxy was already destroyed while the lock was active, the destructor exits immediately. If the lock was inactive during unlock, QtWayland’s base class is left to clean up the proxy locally exactly once, avoiding the double-free crash reported as Proxy requested for unref has no references.

Finally, unlock_and_destroy is sent to the compositor only after the surfaces and windows have been handled. This matches the required protocol order, avoids desktop exposure flicker, and prevents both the invalid object and Null buffer attached crashes.

@Luke458
Copy link
Copy Markdown
Author

Luke458 commented May 30, 2026

AvengeMedia/DankMaterialShell#1468 (comment)

Many thanks to @bzbetty for testing and proposing this final change ^^

The original reason for switching to synchronous delete was to ensure that the Wayland lock surface role was destroyed before unlock_and_destroy was sent. However, now that destroySurface() explicitly and synchronously handles destruction of the lock surface role before any object deletion occurs, deletion of the WlSessionLockSurface itself becomes purely a local cleanup concern.

WlSessionLockSurface is a QML-exposed QObject with active property bindings (visible, width, height, and screen) as well as signal connections. Deleting it synchronously with delete while the QML engine may still have pending property evaluations or signal handlers referencing the object introduces a risk of use-after-free crashes within the QML engine. Using deleteLater() avoids this by deferring destruction until the next event loop iteration, allowing QML to safely complete any outstanding work before the object is destroyed.

The current unlock sequence is:

  1. ext->destroySurface() — synchronously sends ext_session_lock_surface_v1.destroy to the compositor, removing the lock role.
  2. surface->deleteLater() — schedules cleanup of the QML object and associated window resources for the next event loop iteration.
  3. manager->unlock() — sends unlock_and_destroy to the compositor.

By the time the deferred deletion from step 2 actually executes, the lock role has already been removed in step 1 and the session has already been unlocked in step 3. As a result, any null-buffer commits generated by Qt's normal window-hide or teardown routines are no longer subject to the restrictions of the ext-session-lock-v1 protocol, because the surface is no longer acting as a lock surface. This preserves correct protocol behavior while also maintaining Qt/QML's preferred object-lifetime semantics.

Copy link
Copy Markdown
Member

@outfoxxed outfoxxed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I missed some reasoning for these. I took a look through qtbase and I'm not seeing the null surface commit on close either.

Comment on lines -142 to +157
this->manager->unlock();

for (auto* surface: this->surfaces) {
surface->deleteLater();
if (surface->ext != nullptr) {
surface->ext->destroySurface();
}
}

for (auto* surface: this->surfaces) {
surface->deleteLater();
}
this->surfaces.clear();

this->manager->unlock();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://wayland.app/protocols/ext-session-lock-v1#ext_session_lock_v1:request:unlock_and_destroy and https://wayland.app/protocols/ext-session-lock-v1#ext_session_lock_surface_v1:request:destroy, it should be entirely legal and even preferable destroy the surfaces after the lock.

After this request is made, lock surfaces created through this object should be destroyed by the client as they will no longer be used by the compositor.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the spec says the client should destroy the surfaces after unlocking, doing so asynchronously in Qt's event loop (via deleteLater()) triggers a race condition.

If we send unlock_and_destroy first, the compositor immediately destroys the server-side lock objects and surfaces. When Qt's event loop subsequently runs the destructors for the window/surfaces, the client tries to send ext_session_lock_surface_v1.destroy. Because the compositor has already torn them down, this request targets a non-existent object ID and triggers a fatal Wayland connection crash (protocol error: invalid object).

Explicitly calling destroySurface() on the extension prior to unlock_and_destroy ensures the client cleanly tears down its Wayland surface role while they are still valid on the server.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the compositor immediately destroys the server-side lock objects and surfaces

Because the compositor has already torn them down, this request targets a non-existent object ID and triggers a fatal Wayland connection crash (protocol error: invalid object).

This shouldn't happen for the same reasons mentioned in the surface destructor, only on a double destroy of the same surface.

I don't have a problem with moving destroy here if there is some case which commits a null surface like you mentioned but it should be after the unlock.

Comment on lines -90 to +92
QSWaylandSessionLock* lock = nullptr;
QPointer<QObject> lock = nullptr;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be a pointer to QObject+casting over QsWaylandSessionLock, and what destroy sequence is making the QPointer valuable?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a QPointer because when the session is unlocked, the parent QSWaylandSessionLock is deleted. Since the surfaces themselves are deleted asynchronously (deleteLater()), their destructors run afterwards and need to check if the parent lock is still alive (via this->ext->lock->active()) to avoid a use-after-free segfault on a dangling raw pointer.

Using QPointer<QObject> and casting inside the source files is a workaround to pass compilation under older Qt versions (like 6.6.3 in CI). In those versions, instantiating QPointer<T> on an incomplete, forward-declared type fails compile-time checks. Because QSWaylandSessionLock relies on a generated Wayland header (qwayland-ext-session-lock-v1.h), we cannot include its definition in the public session_lock.hpp header without breaking the include paths of other targets. Using QPointer<QObject> avoids this but please let me know if there's something I'm missing here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the surfaces themselves are deleted asynchronously (deleteLater()), their destructors run afterwards and need to check if the parent lock is still alive (via this->ext->lock->active()) to avoid a use-after-free segfault on a dangling raw pointer.

since we resolved that thread this should no longer be true

Comment thread src/wayland/session_lock/surface.cpp Outdated
Comment thread src/wayland/session_lock/surface.cpp Outdated
@Luke458 Luke458 force-pushed the fix-session-lock-crash branch 2 times, most recently from d92828d to 6247351 Compare June 7, 2026 02:17
…anges

- Destroys Wayland lock surface roles synchronously prior to sending unlock_and_destroy.
- Avoids creating lock surfaces on invalid/placeholder/removed outputs to prevent early initialization failures.
- Fixes Qt 6.6.3 compilation compatibility when QPointer uses incomplete forward-declared types.
- Reverts memory cleanup to asynchronous deleteLater() for QML safety.
@Luke458 Luke458 force-pushed the fix-session-lock-crash branch from 6247351 to 66b5c24 Compare June 7, 2026 03:02
@Luke458
Copy link
Copy Markdown
Author

Luke458 commented Jun 7, 2026

Hi @bzbetty

Sorry to bother you, do you mind rebuilding and testing on your side again?

I've simplified the PR to more strictly align with the wayland protocol spec, but I want to make sure it doesn't reintroduce any real-world crashes or race conditions on your end.

Seems to be working on my end after testing wake from suspend, dpms off/on and monitor unplug/plug while in sleep and while awake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment