Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -309,3 +309,388 @@
- Lower
- High, primarily due to the overhead of **CPU Hotplug** for non-boot CPUs

**********************************************
Low Power Mode Selection in S2Idle (OSI Mode)

Check warning on line 313 in source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [RedHat.Headings] Use sentence-style capitalization in 'Low Power Mode Selection in S2Idle (OSI Mode)'. Raw Output: {"message": "[RedHat.Headings] Use sentence-style capitalization in 'Low Power Mode Selection in S2Idle (OSI Mode)'.", "location": {"path": "source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst", "range": {"start": {"line": 313, "column": 1}}}, "severity": "INFO"}
**********************************************

S2Idle with OSI mode enables sophisticated low-power mode selection based on system constraints,
power domain hierarchy, and predicted idle duration. The system can automatically select between
multiple low-power modes without user intervention, adapting to the runtime requirements.

Check warning on line 318 in source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [RedHat.SimpleWords] Use simple language. Consider using 'many' rather than 'multiple'. Raw Output: {"message": "[RedHat.SimpleWords] Use simple language. Consider using 'many' rather than 'multiple'.", "location": {"path": "source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst", "range": {"start": {"line": 318, "column": 1}}}, "severity": "INFO"}

Power Domain Hierarchy in Device Tree

Check warning on line 320 in source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [RedHat.Headings] Use sentence-style capitalization in 'Power Domain Hierarchy in Device Tree'. Raw Output: {"message": "[RedHat.Headings] Use sentence-style capitalization in 'Power Domain Hierarchy in Device Tree'.", "location": {"path": "source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst", "range": {"start": {"line": 320, "column": 1}}}, "severity": "INFO"}
======================================

The power domain hierarchy in the device tree defines how different system components are grouped

Check warning on line 323 in source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [RedHat.PassiveVoice] 'are grouped' is passive voice. In general, use active voice. Consult the style guide for acceptable use of passive voice. Raw Output: {"message": "[RedHat.PassiveVoice] 'are grouped' is passive voice. In general, use active voice. Consult the style guide for acceptable use of passive voice.", "location": {"path": "source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst", "range": {"start": {"line": 323, "column": 87}}}, "severity": "INFO"}
and how their power states are coordinated. This hierarchical structure is fundamental to OSI mode's

Check warning on line 324 in source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [RedHat.PassiveVoice] 'are coordinated' is passive voice. In general, use active voice. Consult the style guide for acceptable use of passive voice. Raw Output: {"message": "[RedHat.PassiveVoice] 'are coordinated' is passive voice. In general, use active voice. Consult the style guide for acceptable use of passive voice.", "location": {"path": "source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst", "range": {"start": {"line": 324, "column": 28}}}, "severity": "INFO"}
"Last Man Standing" logic.

**Hierarchical Structure:**

.. code-block:: text

MAIN_PD (System Level)
├──> CLUSTER_PD (Cluster Level)
│ │
│ ├──> CPU_PD (CPU Level)
│ │ ├──> CPU0
│ │ └──> CPU1
│ │
│ └──> Cluster-sensitive peripherals
│ ├──> CPSW3G (Ethernet)
│ └──> DSS0 (Display)
└──> Main domain peripherals
├──> UART, I2C, SPI controllers
├──> Timers
├──> SDHCI controllers
└──> USB controllers

**Device Tree Implementation:**

In the Device Tree, this hierarchy is established through power domain mappings:

Check warning on line 351 in source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [RedHat.PassiveVoice] 'is established' is passive voice. In general, use active voice. Consult the style guide for acceptable use of passive voice. Raw Output: {"message": "[RedHat.PassiveVoice] 'is established' is passive voice. In general, use active voice. Consult the style guide for acceptable use of passive voice.", "location": {"path": "source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst", "range": {"start": {"line": 351, "column": 36}}}, "severity": "INFO"}

.. code-block:: dts

&psci {
CPU_PD: power-controller-cpu {
#power-domain-cells = <0>;
power-domains = <&CLUSTER_PD>;
domain-idle-states = <&cpu_sleep_0>, <&cpu_sleep_1>;
};

CLUSTER_PD: power-controller-cluster {
#power-domain-cells = <0>;
domain-idle-states = <&cluster_sleep_0>;
power-domains = <&MAIN_PD>;
};

MAIN_PD: power-controller-main {
#power-domain-cells = <0>;
domain-idle-states = <&main_sleep_deep>, <&main_sleep_rtcddr>;
};
};

**Why Domain Grouping is Needed:**

Check warning on line 374 in source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [RedHat.PassiveVoice] 'is Needed' is passive voice. In general, use active voice. Consult the style guide for acceptable use of passive voice. Raw Output: {"message": "[RedHat.PassiveVoice] 'is Needed' is passive voice. In general, use active voice. Consult the style guide for acceptable use of passive voice.", "location": {"path": "source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst", "range": {"start": {"line": 374, "column": 23}}}, "severity": "INFO"}

The domain grouping serves several critical purposes:

1. **Hardware Dependency Management**: Certain peripherals must remain active for specific low-power
states. For example, DDR controllers must remain operational in RTC+DDR mode, but can be powered

Check warning on line 379 in source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [RedHat.Definitions] Define acronyms and abbreviations (such as 'DDR') on first occurrence if they're likely to be unfamiliar. Raw Output: {"message": "[RedHat.Definitions] Define acronyms and abbreviations (such as 'DDR') on first occurrence if they're likely to be unfamiliar.", "location": {"path": "source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst", "range": {"start": {"line": 379, "column": 72}}}, "severity": "INFO"}

Check warning on line 379 in source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [RedHat.Definitions] Define acronyms and abbreviations (such as 'RTC') on first occurrence if they're likely to be unfamiliar. Raw Output: {"message": "[RedHat.Definitions] Define acronyms and abbreviations (such as 'RTC') on first occurrence if they're likely to be unfamiliar.", "location": {"path": "source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst", "range": {"start": {"line": 379, "column": 68}}}, "severity": "INFO"}

Check warning on line 379 in source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [RedHat.Definitions] Define acronyms and abbreviations (such as 'DDR') on first occurrence if they're likely to be unfamiliar. Raw Output: {"message": "[RedHat.Definitions] Define acronyms and abbreviations (such as 'DDR') on first occurrence if they're likely to be unfamiliar.", "location": {"path": "source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst", "range": {"start": {"line": 379, "column": 25}}}, "severity": "INFO"}
down in Deep Sleep mode.

2. **Constraint Propagation**: When a device in the CLUSTER_PD is active (e.g., Display Subsystem),
the cluster cannot enter its deepest idle state. The constraint propagates up the hierarchy,
preventing both CLUSTER_PD and MAIN_PD from entering deeper states.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the above point correct, since when we use rtcwake, even if the Display is connected, it does enter the selected LPM

3. **Automatic Mode Selection**: The cpuidle framework uses the hierarchy to automatically select
the deepest possible state. If any device in a power domain is active or has latency constraints,
shallower states are automatically chosen.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, the framework is obeying the latency constraint for mode selection but not the power domain mapping on forced s2idle

4. **Race Condition Prevention**: The hierarchy ensures that the PSCI firmware can verify all
components in a domain are truly idle before powering down that domain.

**Peripheral Power Domain Mapping:**

The ``power-domain-map`` property explicitly assigns peripherals to power domains:

.. code-block:: dts

&scmi_pds {
power-domain-map = <3 &CLUSTER_PD>, /* CPSW3G Ethernet */
<39 &CLUSTER_PD>, /* DSS0 Display */
<38 &CLUSTER_PD>, /* DSS_DSI0 */
<15 &MAIN_PD>, /* TIMER0 */
<26 &MAIN_PD>, /* SDHCI1 */
<89 &MAIN_PD>, /* UART0 */
<95 &MAIN_PD>; /* USBSS0 */
};

This mapping ensures that when the Display (DSS0) is active, the system won't enter states that
would cause DDR Auto Self-Refresh issues. Similarly, active UART or USB connections prevent
deeper system states that would disconnect those interfaces.

Role in Mode Selection
=======================

During s2idle entry, the cpuidle framework traverses the power domain hierarchy from bottom to top:

.. code-block:: text

Mode Selection Flow during S2Idle Entry
========================================

1. Freeze user space tasks
2. Suspend all devices (call runtime_suspend hooks)
3. For each CPU (in cpuidle framework):

CPU Level (CPU_PD):
├─> Check QoS latency constraints
├─> Check device activity in CPU_PD
└─> Select CPU idle state: cpu_sleep_0 (Standby) or cpu_sleep_1 (PowerDown)

Cluster Level (CLUSTER_PD):
├─> Check if this is the last CPU in cluster
├─> Check device activity in CLUSTER_PD (e.g., Display, Ethernet)
├─> If last CPU and no constraints:
│ └─> Select cluster idle state: cluster_sleep_0
└─> Else: Skip cluster power-down

System Level (MAIN_PD):
├─> Check if last CPU in system
├─> Check device activity in MAIN_PD (e.g., UART, USB, Timers)
├─> Check QoS constraints for entire system
├─> Compare latency requirements to available states:
│ ├─> main_sleep_rtcddr (exit latency: 600ms)
│ └─> main_sleep_deep (exit latency: 10ms)
└─> Select deepest state that meets all constraints

4. Last CPU issues composite CPU_SUSPEND with selected state
5. PSCI firmware verifies and executes power-down

Idle State Definitions
=======================

The Device Tree defines multiple idle states at each level of the hierarchy, each with different
power/latency trade-offs. The key states are:

**CPU-Level Idle States:**

* **cpu_sleep_1 (PowerDown)**: CPU is powered down with context loss

* ``arm,psci-suspend-param = <0x012233>``
* Exit latency: ~100ms

**Cluster-Level Idle States:**

* **cluster_sleep_0 (Low-Latency Standby)**: Cluster enters low-power standby when all CPUs are idle

* ``arm,psci-suspend-param = <0x01000021>``
* Exit latency: ~300μs

**System-Level Idle States (Main Domain):**

* **main_sleep_deep (Deep Sleep)**: DDR in self-refresh, more peripherals remain powered for faster resume

* ``arm,psci-suspend-param = <0x2012235>``
* Exit latency: 10ms
* Use case: Short to moderate idle periods with faster resume requirements

* **main_sleep_rtcddr (RTC+DDR)**: DDR in self-refresh, minimal peripherals powered (RTC, I/O retention only)

* ``arm,psci-suspend-param = <0x2012234>``
* Exit latency: 600ms
* Use case: Long idle periods requiring maximum power savings

.. note::
For complete Device Tree definitions including all latency parameters, refer to the platform's
device tree source files (e.g., ``k3-am62l-main.dtsi``).

Understanding the Suspend Parameters
=====================================

The ``arm,psci-suspend-param`` values encode the target power state using the PSCI standard format
described earlier. Let's decode the key parameters for the main domain states:

**Deep Sleep Mode (main_sleep_deep):**

Parameter: ``0x2012235``

.. code-block:: text

Binary: 0000 0010 0000 0001 0010 0010 0011 0101
Hex: 0x02012235

[31:26] = 0 → Reserved
[25:24] = 2 → Power Level = System (0x2)
[23:17] = 0 → Reserved
[16] = 1 → State Type = Power Down
[15:0] = 0x2235 → State ID (platform-specific)

**Interpretation:**

- **Power Level = 2 (System)**: The entire system, including the SoC, enters a low-power state
- **State Type = 1 (Power Down)**: Context is lost; firmware must restore state on resume
- **State ID = 0x2235**: Platform-specific identifier that the PSCI firmware (TF-A) recognizes
as "Deep Sleep" mode where DDR is in Self-Refresh and more peripherals in the Main domain
remain powered compared to RTC+DDR mode, providing faster resume at the cost of higher power

**RTC+DDR Mode (main_sleep_rtcddr):**

Parameter: ``0x2012234``

.. code-block:: text

Binary: 0000 0010 0000 0001 0010 0010 0011 0100
Hex: 0x02012234

[31:26] = 0 → Reserved
[25:24] = 2 → Power Level = System (0x2)
[23:17] = 0 → Reserved
[16] = 1 → State Type = Power Down
[15:0] = 0x2234 → State ID (platform-specific)

**Interpretation:**

- **Power Level = 2 (System)**: System-level power state
- **State Type = 1 (Power Down)**: Power-down with context loss
- **State ID = 0x2234**: Platform-specific identifier for "RTC+DDR" mode where DDR is in
Self-Refresh and only minimal peripherals (RTC, I/O retention) remain powered in the Main
domain, providing maximum power savings at the cost of longer resume latency

The cpuidle governor uses these latency and residency values to automatically select the appropriate
mode. If predicted idle time is short and latency constraints are tight, Deep Sleep mode (the
shallower state) is chosen for faster resume. For longer predicted idle periods with relaxed
latency requirements, RTC+DDR mode (the deeper state) is preferred for maximum power savings.

QoS Latency Constraints and Mode Selection
===========================================

The Linux kernel's PM QoS (Quality of Service) framework allows drivers and applications to
specify maximum acceptable wakeup latency. These constraints directly influence which idle
state can be entered during s2idle.

**How QoS Constraints Work:**

1. Each device or CPU can register a latency constraint (in nanoseconds)
2. The cpuidle governor queries these constraints before selecting an idle state
3. Only idle states with ``exit-latency-us`` ≤ constraint are considered
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the governor compares the constraint with exit-latency-us + entry-latency-us. This could cause confusion while setting the constraints

4. The deepest eligible state is selected

**Setting QoS Constraints from User Space:**

Copy link
Copy Markdown

@ti-scaria ti-scaria Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we could also provide using exec command that Kevin mentioned for setting the constraints
"exec 4<>/dev/cpu_wakeup_latency; echo 0x3e8 >&4"

Applications can constrain the system's low-power behavior by writing to the PM QoS device file.
Below is a C program that demonstrates this:

.. code-block:: c

/* testqos.c - Set CPU wakeup latency constraint */
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>

#define QOS_DEV "/dev/cpu_wakeup_latency"
#define LATENCY_VAL "0x1000" /* 4096 ns (4 μs) in hex */
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the latency value is in microseconds


static volatile int keep_running = 1;

void sig_handler(int sig) {
keep_running = 0;
}

int main(void) {
int fd;

signal(SIGINT, sig_handler);
signal(SIGTERM, sig_handler);

fd = open(QOS_DEV, O_RDWR);
if (fd < 0) {
perror("open");
return 1;
}

if (write(fd, LATENCY_VAL, sizeof(LATENCY_VAL) - 1) < 0) {
perror("write");
close(fd);
return 1;
}

printf("QoS set to %s. Press Ctrl+C to exit.\n", LATENCY_VAL);

while (keep_running)
sleep(1);

close(fd);
printf("Released.\n");
return 0;
}

**Why This Program is Needed:**

This program demonstrates how to control low-power mode selection by setting QoS latency constraints.
By applying a tight latency constraint (4 μs in the example), you can force the system to stay in
shallow idle states, preventing entry into Deep Sleep or RTC+DDR modes. This is useful for testing
that the cpuidle governor correctly respects QoS constraints and selects the appropriate idle state
based on latency requirements.

**Selecting Specific Low-Power Modes:**

To force selection of a specific mode, set the QoS constraint strategically based on the exit
latencies of the available states. The latency value must be provided as a **hex string**
(e.g., "0x7ef41").

* **To force Deep Sleep mode**: Set constraint above Deep Sleep's exit latency (10ms = 10,000 μs)
but below RTC+DDR's exit latency (600ms = 600,000 μs). For example, use **520 μs (520,001 ns)**:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above the constraint compares with exit+entry


.. code-block:: c

#define LATENCY_VAL "0x7ef41" /* 520,001 ns = 520 μs in hex */

**Calculation:**

- Target latency: 520 μs = 520,000 ns (round to 520,001 for convenience)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values here should be 520ms and 520000 us

- Convert to hex: 520,001₁₀ = 0x7EF41₁₆
- Write as hex string: ``"0x7ef41"``
- This allows Deep Sleep (10,000 μs exit latency) but blocks RTC+DDR (600,000 μs exit latency)

* **To allow RTC+DDR mode**: Set constraint higher than 600ms (600,000 μs) or don't apply any
constraint, allowing the cpuidle governor to select the deepest state (RTC+DDR) during long
idle periods.

**How It Sets QoS Constraints:**

The program opens the special device file ``/dev/cpu_wakeup_latency``, which is part of the
kernel's PM QoS framework. Writing a latency value (in nanoseconds) to this file:

1. Registers a global CPU wakeup latency constraint
2. Causes the cpuidle governor to filter out any idle states with exit latency exceeding this value
3. Remains active as long as the file descriptor is open

Check failure on line 649 in source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [RedHat.TermsErrors] Use 'if' or 'provided that' rather than 'as long as'. Raw Output: {"message": "[RedHat.TermsErrors] Use 'if' or 'provided that' rather than 'as long as'.", "location": {"path": "source/linux/Foundational_Components/Power_Management/pm_psci_s2idle.rst", "range": {"start": {"line": 649, "column": 19}}}, "severity": "ERROR"}
Copy link
Copy Markdown

@ti-scaria ti-scaria Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two points should be highlighted a bit more since the constraints are only valid till the file is open and this is how we enable mode selection

4. Automatically releases the constraint when the file descriptor is closed (on program exit)


**Example: Deep Sleep Mode Selection:**

Consider a scenario where the system has active I2C or SPI communication requiring responses
within 20ms. A QoS constraint of 20,000 μs (20ms) would be applied:

.. code-block:: text

Available Main Domain States:
├─> main_sleep_rtcddr: exit-latency = 600,000 μs (600ms) → REJECTED (exceeds constraint)
└─> main_sleep_deep: exit-latency = 10,000 μs (10ms) → SELECTED (meets constraint)

Result: System enters Deep Sleep mode instead of RTC+DDR mode

In this example, even though RTC+DDR provides better power savings, the 20ms latency constraint
forces the system to use the shallower Deep Sleep mode. The selection is between the two main
domain idle states defined for s2idle suspend.

**Usage Example:**

.. code-block:: console

root@am62lxx-evm:~# gcc testqos.c -o testqos
root@am62lxx-evm:~# ./testqos
QoS set to 0x1000. Press Ctrl+C to exit.

# In another terminal, observe the constrained behavior:
root@am62lxx-evm:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state*/latency
0 # state0: WFI
125 # state1: Standby
350125 # state2: PowerDown (disabled by QoS)

# Press Ctrl+C in the first terminal
Released.

# Now the deeper states are available again:
root@am62lxx-evm:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state*/latency
0
125
350125 # state2: PowerDown (now enabled)

The value ``0x1000`` (4096 ns = ~4 μs) prevents any idle state with exit latency greater than
4 μs from being entered. In the example above, the PowerDown state with 350ms exit latency
is effectively disabled while the constraint is active.

Loading