Skip to content

Fix delayed hw_error handling due to missing wakeup during SSR and Timer unit error#930

Open
shuaz-shuai wants to merge 2 commits intoqualcomm-linux:tech/net/bluetoothfrom
shuaz-shuai:ssr-33s
Open

Fix delayed hw_error handling due to missing wakeup during SSR and Timer unit error#930
shuaz-shuai wants to merge 2 commits intoqualcomm-linux:tech/net/bluetoothfrom
shuaz-shuai:ssr-33s

Conversation

@shuaz-shuai
Copy link
Copy Markdown

When Bluetooth controller encounters a coredump, it triggers
the Subsystem Restart (SSR) mechanism. The controller first
reports the coredump data, and once the data upload is complete,
it sends a hw_error event. The host relies on this event to
proceed with subsequent recovery actions.

If the host has not finished processing the coredump data
when the hw_error event is received,
it sets a timer to wait until either the data processing is complete
or the timeout expires before handling the event.

The current implementation lacks a wakeup trigger. As a result,
even if the coredump data has already been processed, the host
continues to wait until the timer expires, causing unnecessary
delays in handling the hw_error event.

To fix this issue, adds a clear_and_wake_up_bit() call after the host finishes
processing the coredump data. This ensures that the waiting thread is
promptly notified and can proceed to handle the hw_error event without
waiting for the timeout.

Test case:

  • Trigger controller coredump using the command: hcitool cmd 0x3f 0c 26.
  • Use btmon to capture HCI logs.
  • Observe the time interval between receiving the hw_error event
    and the execution of the power-off sequence in the HCI log.

CRs-Fixes: 4498534

Since the timer uses jiffies as its unit rather than ms, the timeout value
must be converted from ms to jiffies when configuring the timer. Otherwise,
the intended 8s timeout is incorrectly set to approximately 33s.

Cc: stable@vger.kernel.org
Fixes: d841502 ("Bluetooth: hci_qca: Collect controller memory dump during SSR")
Link: https://lore.kernel.org/all/f6a9419d-5e63-4c36-a7e9-aab6ac798703@oss.qualcomm.com/
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Shuai Zhang <shuai.zhang@oss.qualcomm.com>
…ring SSR

When Bluetooth controller encounters a coredump, it triggers
the Subsystem Restart (SSR) mechanism. The controller first
reports the coredump data, and once the data upload is complete,
it sends a hw_error event. The host relies on this event to
proceed with subsequent recovery actions.

If the host has not finished processing the coredump data
when the hw_error event is received,
it sets a timer to wait until either the data processing is complete
or the timeout expires before handling the event.

The current implementation lacks a wakeup trigger. As a result,
even if the coredump data has already been processed, the host
continues to wait until the timer expires, causing unnecessary
delays in handling the hw_error event.

To fix this issue, adds a `wake_up_bit()` call after the host finishes
processing the coredump data. This ensures that the waiting thread is
promptly notified and can proceed to handle the hw_error event without
waiting for the timeout.

Test case:
- Trigger controller coredump using the command: `hcitool cmd 0x3f 0c 26`.
- Use `btmon` to capture HCI logs.
- Observe the time interval between receiving the hw_error event
and the execution of the power-off sequence in the HCI log.

Link: https://lore.kernel.org/all/20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com/
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Shuai Zhang <shuai.zhang@oss.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant