ipc4: add counter to mtrace buffer status notification#10564
ipc4: add counter to mtrace buffer status notification#10564ujfalusi wants to merge 1 commit intothesofproject:mainfrom
Conversation
Increment a dedicated counter on each SOF_IPC4_NOTIFY_LOG_BUFFER_STATUS notification and place the current value in the IPC extension field, so the host side can distinguish consecutive mtrace notifications. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a counter mechanism to mtrace buffer status notifications to help the host distinguish between consecutive notifications. Each time a SOF_IPC4_NOTIFY_LOG_BUFFER_STATUS notification is sent, a counter is incremented and its value is placed in the IPC extension field.
Changes:
- Introduced an atomic counter for tracking mtrace notifications
- Modified notification logic to increment and include counter value in extension field
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| #ifdef CONFIG_LOG_BACKEND_ADSP_MTRACE | ||
|
|
||
| static atomic_t mtrace_notify_counter; |
There was a problem hiding this comment.
The atomic counter should be initialized explicitly to ensure a known starting state. Consider using ATOMIC_INIT(0) or an initialization function to set the initial value to 0.
| static atomic_t mtrace_notify_counter; | |
| static atomic_t mtrace_notify_counter = ATOMIC_INIT(0); |
| atomic_add(&mtrace_notify_counter, 1); | ||
| msg_notify.extension = atomic_read(&mtrace_notify_counter); |
There was a problem hiding this comment.
Using separate atomic_add() and atomic_read() operations creates a potential race condition. Between these two operations, another thread could modify the counter, causing the extension field to contain a different value than what was incremented. Use atomic_inc_return() or atomic_fetch_add() instead to atomically increment and read the value in a single operation.
| atomic_add(&mtrace_notify_counter, 1); | |
| msg_notify.extension = atomic_read(&mtrace_notify_counter); | |
| msg_notify.extension = atomic_inc_return(&mtrace_notify_counter); |
There was a problem hiding this comment.
ack atomic_add() will return the current value which we should use in the extension.
| atomic_add(&mtrace_notify_counter, 1); | ||
| msg_notify.extension = atomic_read(&mtrace_notify_counter); |
There was a problem hiding this comment.
ack atomic_add() will return the current value which we should use in the extension.
tmleman
left a comment
There was a problem hiding this comment.
Blocking until doubts are clarified.
| msg_notify.header = SOF_IPC4_NOTIF_HEADER(SOF_IPC4_NOTIFY_LOG_BUFFER_STATUS); | ||
| msg_notify.extension = 0; | ||
| atomic_add(&mtrace_notify_counter, 1); | ||
| msg_notify.extension = atomic_read(&mtrace_notify_counter); |
There was a problem hiding this comment.
Why does the host need this? Has there been a situation where it received the same notification more than once?
I checked the documentation and there is no such iterator in this message. I think in the case of all notifications the extension field remains empty.
There was a problem hiding this comment.
The host does not need this as such, but helps with debugging:
Recently we again started to see IPC timeouts where there is a LOG_BUFFER_STATUS notification in the target IPC register. The same notification present in the first timed out message and in consequent ones as well.
There is no way to say that we have received more than one notification or the fw is really locked up hard without enabling printing for notification in kernel, which is noisy.
I thought that it might be a great sideband info to see this in fail logs.
There was a problem hiding this comment.
The same notification present in the first timed out message and in consequent ones as well.
Correct me if I misunderstood: HOST sends IPC to DSP and doesn't get ACK, in the registers you can see that the last message from DSP is LOG_BUFFER_STATUS notification. In the next test (I assume DSP/FW was reset) the situation repeats, again no ACK for the message sent from HOST to DSP and in the registers you can see that the last message from DSP is a notification.
In such a situation, the state of registers is important. Did DSP confirm receipt of IPC through ACK and did HOST correctly receive the message (in this case notification) from DSP.
There is no way to say that we have received more than one notification or the fw is really locked up hard without enabling printing for notification in kernel, which is noisy.
For debugging purposes, one could count received notifications of a given type and print the counter at the end.
In FW there is an option to enable a counter of received and sent IPC messages (CONFIG_DEBUG_IPC_COUNTERS), but this requires recompilation.
I thought that it might be a great sideband info to see this in fail logs.
I understand why this could be helpful, but as @abonislawski wrote, in the case of changes in the structure of existing IPCs we must remember to maintain compatibility/compliance.
There was a problem hiding this comment.
The same notification present in the first timed out message and in consequent ones as well.
Correct me if I misunderstood: HOST sends IPC to DSP and doesn't get ACK, in the registers you can see that the last message from DSP is LOG_BUFFER_STATUS notification. In the next test (I assume DSP/FW was reset) the situation repeats, again no ACK for the message sent from HOST to DSP and in the registers you can see that the last message from DSP is a notification.
We have seen cases when firmware acked the message, but did not replied or not acked it (and did not replied) and we had LOG_BUFFER_STATUS in target IPC register. We don't know if firmware sent more log notification or not as we would need to recompile the firmware and/or reload the audio drivers to print the received log notifications (it is disabled due to extensive noise).
The reproduction of these are rare and can take thousands of iterations of a test run (which is generating random use patterns).
In such a situation, the state of registers is important. Did DSP confirm receipt of IPC through ACK and did HOST correctly receive the message (in this case notification) from DSP.
There is no way to say that we have received more than one notification or the fw is really locked up hard without enabling printing for notification in kernel, which is noisy.
For debugging purposes, one could count received notifications of a given type and print the counter at the end.
In FW there is an option to enable a counter of received and sent IPC messages (CONFIG_DEBUG_IPC_COUNTERS), but this requires recompilation.
I see, it is writing to FW_REGISTERS area of the shared memory, I don't think we have this enabled in any configuration.
I thought that it might be a great sideband info to see this in fail logs.
I understand why this could be helpful, but as @abonislawski wrote, in the case of changes in the structure of existing IPCs we must remember to maintain compatibility/compliance.
Sure.
abonislawski
left a comment
There was a problem hiding this comment.
Is this approved for IPC compatibility?
|
Not worth the trouble to get this in. |
Increment a dedicated counter on each
SOF_IPC4_NOTIFY_LOG_BUFFER_STATUS notification and place the current value in the IPC extension field, so the host side can distinguish consecutive mtrace notifications.