msm-4.14

mirror of https://github.com/rd-stuffs/msm-4.14.git synced 2025-02-20 11:45:48 +08:00

Author	SHA1	Message	Date
Wan Jiabing	fbc956a5e9	BACKPORT: cpuidle: teo: remove unneeded semicolon in teo_select() Fix following coccicheck warning: drivers/cpuidle/governors/teo.c:315:10-11: Unneeded semicolon Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	2e5905f712	BACKPORT: cpuidle: teo: Rework most recent idle duration values treatment The TEO (Timer Events Oriented) cpuidle governor uses several most recent idle duration values for a given CPU to refine the idle state selection in case the previous long-term trends have not been followed recently and a new trend appears to be forming. That is done by computing the average of the most recent idle duration values falling below the time till the next timer event ("sleep length"), provided that they are the majority of the most recent idle duration values taken into account, and using it as the new expected idle duration value. However, idle state selection based on that value may not be optimal, because the average does not really indicate which of the idle states with target residencies less than or equal to it is likely to be the best fit. Thus, instead of computing the average, make the governor carry out computations based on the distribution of the most recent idle duration values among the bins corresponding to different idle states. Namely, if the majority of the most recent idle duration values taken into consideration are less than the current sleep length (which means that the CPU is likely to wake up early), find the idle state closest to the "candidate" one "matching" the sleep length whose target residency is less than or equal to the majority of the most recent idle duration values that have fallen below the current sleep length (which means that it is likely to be "shallow enough" this time). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	08c3ff3eee	BACKPORT: cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Tashar02: Backport to k4.19, adapt variables and teo_middle_of_bin() return type to mircoseconds as unit of time ] Co-authored-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	24ccdab224	BACKPORT: cpuidle: teo: Cosmetic modification of teo_select() Initialize local variables in teo_select() where they are declared. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	ef371e5767	BACKPORT: cpuidle: teo: Cosmetic modifications of teo_update() Rename a local variable in teo_update() so that its purpose is better reflected by its name and use one more local variable in the loop over the CPU idle states in that function to make the code somewhat easier to read. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Tashar02: Backport to k4.19 and use int data type for target_residency ] Co-authored-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	ccbfc28eb3	BACKPORT: cpuidle: teo: Take negative "sleep length" values into account Modify the TEO governor to take possible negative return values of tick_nohz_get_next_hrtimer() into account by changing the data type of some variables used by it to s64 which allows it to carry out computations without potentially problematic data type conversions into u64. Also change the computations in teo_select() so that the negative values themselves are handled in a natural way to avoid adding extra negative value checks to that function. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Tashar02: Backport to k4.19 and change data type from unsigned int to int to allow negative values ] Co-authored-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	2b14c54cef	BACKPORT: cpuidle: teo: Adjust handling of very short idle times If the time till the next timer event is shorter than the target residency of the first idle state (state 0), the TEO governor does not update its metrics for any idle states, but arguably it should record a "hit" for idle state 0 in that case, so modify it to do that. Accordingly, also make it record an "early hit" for idle state 0 if the measured idle duration is less than its target residency, which allows one branch more to be dropped from teo_update(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Ikjoon Jang	4198741d6a	BACKPORT: cpuidle: teo: Fix intervals[] array indexing bug Fix a simple bug in rotating array index. Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems") Signed-off-by: Ikjoon Jang <ikjn@chromium.org> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	02c1d7202f	BACKPORT: cpuidle: teo: Avoid code duplication in conditionals There are three places in teo_select() where a given amount of time is compared with TICK_NSEC if tick_nohz_tick_stopped() returns true, which is a bit of duplicated code. Avoid that code duplication by defining a helper function to do the check and using it in all of the places in question. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Tashar02: Backport to k4.19 and adapt function to use microseconds as unit of time ] Co-authored-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	57a1158f4b	BACKPORT: cpuidle: teo: Avoid using "early hits" incorrectly If the current state with the maximum "early hits" metric in teo_select() is also the one "matching" the expected idle duration, it will be used as the candidate one for selection even if its "misses" metric is greater than its "hits" metric, which is not correct. In that case, the candidate state should be shallower than the current one and its "early hits" metric should be the maximum among the idle states shallower than the current one. To make that happen, modify teo_select() to save the index of the state whose "early hits" metric is the maximum for the range of states below the current one and go back to that state if it turns out that the current one should be rejected. Fixes: 159e48560f51 ("cpuidle: teo: Fix "early hits" handling for disabled idle states") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	0047fa96ab	BACKPORT: cpuidle: teo: Exclude cpuidle overhead from computations One purpose of the computations in teo_update() is to determine whether or not the (saved) time till the next timer event and the measured idle duration fall into the same "bin", so avoid using values that include the cpuidle overhead to obtain the latter. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	d0674a6d37	BACKPORT: cpuidle: Consolidate disabled state checks There are two reasons why CPU idle states may be disabled: either because the driver has disabled them or because they have been disabled by user space via sysfs. In the former case, the state's "disabled" flag is set once during the initialization of the driver and it is never cleared later (it is read-only effectively). In the latter case, the "disable" field of the given state's cpuidle_state_usage struct is set and it may be changed via sysfs. Thus checking whether or not an idle state has been disabled involves reading these two flags every time. In order to avoid the additional check of the state's "disabled" flag (which is effectively read-only anyway), use the value of it at the init time to set a (new) flag in the "disable" field of that state's cpuidle_state_usage structure and use the sysfs interface to manipulate another (new) flag in it. This way the state is disabled whenever the "disable" field of its cpuidle_state_usage structure is nonzero, whatever the reason, and it is the only place to look into to check whether or not the state has been disabled. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Samuel Pascua <pascua.samuel.14@gmail.com> Signed-off-by: Dark-Matter7232 <me@const.eu.org> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	c59da5221a	BACKPORT: cpuidle: teo: Fix "early hits" handling for disabled idle states commit 159e48560f51d9c2aa02d762a18cd24f7868ab27 upstream. The TEO governor uses idle duration "bins" defined in accordance with the CPU idle states table provided by the driver, so that each "bin" covers the idle duration range between the target residency of the idle state corresponding to it and the target residency of the closest deeper idle state. The governor collects statistics for each bin regardless of whether or not the idle state corresponding to it is currently enabled. In particular, the "early hits" metric measures the likelihood of a situation in which the idle duration measured after wakeup falls into to given bin, but the time till the next timer (sleep length) falls into a bin corresponding to one of the deeper idle states. It is used when the "hits" and "misses" metrics indicate that the state "matching" the sleep length should not be selected, so that the state with the maximum "early hits" value is selected instead of it. If the idle state corresponding to the given bin is disabled, it cannot be selected and if it turns out to be the one that should be selected, a shallower idle state needs to be used instead of it. Nevertheless, the metrics collected for the bin corresponding to it are still valid and need to be taken into account as though that state had not been disabled. As far as the "early hits" metric is concerned, teo_select() tries to take disabled states into account, but the state index corresponding to the maximum "early hits" value computed by it may be incorrect. Namely, it always uses the index of the previous maximum "early hits" state then, but there may be enabled idle states closer to the disabled one in question. In particular, if the current candidate state (whose index is the idx value) is closer to the disabled one and the "early hits" value of the disabled state is greater than the current maximum, the index of the current candidate state (idx) should replace the "maximum early hits state" index. Modify the code to handle that case correctly. Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems") Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	3c8eac93fa	BACKPORT: cpuidle: teo: Consider hits and misses metrics of disabled states commit e43dcf20215f0287ea113102617ca04daa76b70e upstream. The TEO governor uses idle duration "bins" defined in accordance with the CPU idle states table provided by the driver, so that each "bin" covers the idle duration range between the target residency of the idle state corresponding to it and the target residency of the closest deeper idle state. The governor collects statistics for each bin regardless of whether or not the idle state corresponding to it is currently enabled. In particular, the "hits" and "misses" metrics measure the likelihood of a situation in which both the time till the next timer (sleep length) and the idle duration measured after wakeup fall into the given bin. Namely, if the "hits" value is greater than the "misses" one, that situation is more likely than the one in which the sleep length falls into the given bin, but the idle duration measured after wakeup falls into a bin corresponding to one of the shallower idle states. If the idle state corresponding to the given bin is disabled, it cannot be selected and if it turns out to be the one that should be selected, a shallower idle state needs to be used instead of it. Nevertheless, the metrics collected for the bin corresponding to it are still valid and need to be taken into account as though that state had not been disabled. For this reason, make teo_select() always use the "hits" and "misses" values of the idle duration range that the sleep length falls into even if the specific idle state corresponding to it is disabled and if the "hits" values is greater than the "misses" one, select the closest enabled shallower idle state in that case. Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	5525852cb2	BACKPORT: cpuidle: teo: Rename local variable in teo_select() commit 4f690bb8ce4cc5d3fabe3a8e9c2401de1554cdc1 upstream. Rename a local variable in teo_select() in preparation for subsequent code modifications, no intentional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:59 -03:00
Rafael J. Wysocki	e5af7f73e9	BACKPORT: cpuidle: teo: Ignore disabled idle states that are too deep commit 069ce2ef1a6dd84cbd4d897b333e30f825e021f0 upstream. Prevent disabled CPU idle state with target residencies beyond the anticipated idle duration from being taken into account by the TEO governor. Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:58 -03:00
Rafael J. Wysocki	13a6873fff	BACKPORT: cpuidle: teo: Get rid of redundant check in teo_update() Notice that setting measured_us to UINT_MAX in teo_update() earlier doesn't change the behavior of the following code, so do that and eliminate a redundant check used for setting measured_us to UINT_MAX. This change is not expected to alter functionality. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:58 -03:00
Rafael J. Wysocki	8138e95761	BACKPORT: cpuidle: teo: Allow tick to be stopped if PM QoS is used The TEO goveror prevents the scheduler tick from being stopped (unless stopped already) if there is a PM QoS latency constraint for the given CPU and the target residency of the deepest idle state matching that constraint is below the tick boundary. However, that is problematic if CPUs with PM QoS latency constraints are idle for long times, because it effectively causes the tick to run on them all the time which is wasteful. [It is also confusing and questionable if they are full dynticks CPUs.] To address that issue, modify the TEO governor to carry out the entire search for the most suitable idle state (from the target residency perspective) even if a latency constraint is present, to allow it to determine the expected idle duration in all cases. Also, when using the last several measured idle duration values to refine the idle state selection, make it compare those values with the current expected idle duration value (instead of comparing them with the target residency of the idle state selected so far) which should prevent the tick from being retained when it makes sense to stop it sometimes (especially in the presence of PM QoS latency constraints). Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:58 -03:00
Rafael J. Wysocki	4d2483cf76	BACKPORT: cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> [ Tashar02: Backport to k4.19 ] Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:58 -03:00
Lina Iyer	102886177b	BACKPORT: cpuidle: governor: export cpuidle governor functions Commit 83788c0caed3 ("cpuidle: remove unused exports") removed capability of registering cpuidle governors, which was unused at that time. By exporting the symbol, let's allow platform specific modules to register cpuidle governors and use cpuidle_governor_latency_req() to get the QoS for the CPU. Signed-off-by: Lina Iyer <ilina@codeaurora.org> Teo Fix. Signed-off-by: Carlos Jimenez (JavaShin-X) <javashin1986@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:58 -03:00
Rafael J. Wysocki	5e097c5c52	BACKPORT: UPSTREAM: cpuidle: governors: Consolidate PM QoS handling There is some code duplication related to the PM QoS handling between the existing cpuidle governors, so move that code to a common helper function and call that from the governors. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> * Fix For TEO (cpuidle: New timer events oriented governor for tickless systems) Partial Commit From Kernel 4.19 Signed-off-by: Carlos Jimenez (JavaShin-X) <javashin1986@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:58 -03:00
Rafael J. Wysocki	966317657d	BACKPORT: UPSTREAM: cpuidle: menu: Fix wakeup statistics updates for polling state [ Upstream commit 5f26bdceb9c0a5e6c696aa2899d077cd3ae93413 ] If the CPU exits the "polling" state due to the time limit in the loop in poll_idle(), this is not a real wakeup and it just means that the "polling" state selection was not adequate. The governor mispredicted short idle duration, but had a more suitable state been selected, the CPU might have spent more time in it. In fact, there is no reason to expect that there would have been a wakeup event earlier than the next timer in that case. Handling such cases as regular wakeups in menu_update() may cause the menu governor to make suboptimal decisions going forward, but ignoring them altogether would not be correct either, because every time menu_select() is invoked, it makes a separate new attempt to predict the idle duration taking distinct time to the closest timer event as input and the outcomes of all those attempts should be recorded. For this reason, make menu_update() always assume that if the "polling" state was exited due to the time limit, the next proper wakeup event for the CPU would be the next timer event (not including the tick). Fixes: a37b969a61c1 "cpuidle: poll_state: Add time limit to poll_idle()" Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org> From Kernel 4.19 Signed-off-by: Carlos Jimenez (JavaShin-X) <javashin1986@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:58 -03:00
Rafael J. Wysocki	239e93294f	BACKPORT: UPSTREAM: cpuidle: poll_state: Avoid invoking local_clock() too often Rik reports that he sees an increase in CPU use in one benchmark due to commit 612f1a22f067 "cpuidle: poll_state: Add time limit to poll_idle()" that caused poll_idle() to call local_clock() in every iteration of the loop. Utilization increase generally means more non-idle time with respect to total CPU time (on the average) which implies reduced CPU frequency. Doug reports that limiting the rate of local_clock() invocations in there causes much less power to be drawn during a CPU-intensive parallel workload (with idle states 1 and 2 disabled to enforce more state 0 residency). These two reports together suggest that executing local_clock() on multiple CPUs in parallel at a high rate may cause chips to get hot and trigger thermal/power limits on them to kick in, so reduce the rate of local_clock() invocations in poll_idle() to avoid that issue. Fixes: 612f1a22f067 "cpuidle: poll_state: Add time limit to poll_idle()" Reported-by: Rik van Riel <riel@surriel.com> Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Rik van Riel <riel@surriel.com> Reviewed-by: Rik van Riel <riel@surriel.com> From Kernel 4.19 Signed-off-by: Carlos Jimenez (JavaShin-X) <javashin1986@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:58 -03:00
Rafael J. Wysocki	56a8f286a4	BACKPORT: UPSTREAM: cpuidle: poll_state: Add time limit to poll_idle() If poll_idle() is allowed to spin until need_resched() returns 'true', it may actually spin for a much longer time than expected by the idle governor, since set_tsk_need_resched() is not always called by the timer interrupt handler. If that happens, the CPU may spend much more time than anticipated in the "polling" state. To prevent that from happening, limit the time of the spinning loop in poll_idle(). Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Doug Smythies <dsmythies@telus.net> From Kernel 4.19 Signed-off-by: Carlos Jimenez (JavaShin-X) <javashin1986@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> Change-Id: Iea6540fa03bdca659562b9b47361beda8d5db4ee Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-22 23:56:58 -03:00
Richard Raya	98358ce487	build.sh: Rework repo management Change-Id: Ic74095861e4402bd7740307790ad92feb46ceb94 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-19 03:24:15 -03:00
Angelo G. Del Regno	69fb1f355d	scripts: Lower kernel gzip compression to fastest First of all, this is a downstream kernel - always keep that in mind! Now, this kernel is targeting new very powerful Qualcomm platforms like SM8250 and the Sony Edo platform - which has a very fast UFS card. Keep in mind that the bootloader sets the CPU at a frequency that is slightly faster than the "in the middle" ones, which is anyway not veeeery fast - but that's good, really. I agree. So.. check this out: for Image.gz-dtb..... COMP_LEVEL SIZE 9 20116171 5 20220479 2 20940223 1 21231290 Remember again that we're loading from a UFS card and that we are loading ~1.1MB more out of a 20MB file. If you're smart enough you surely know already about RAM and CPU overhead of very high compression levels. If you still disagree with what I just did, read this commit description another 20 times, or more, until you understand it. :))) Change-Id: Ic28bff2011b40631fc81b582a25029ac8d12d48e Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-19 03:24:15 -03:00
Jebaitedneko	5a0532fc09	subsys-pil-tz: Use memcpy_toio() for pil_init_image_trusted() The new optimized memcpy doesn't work well on device memory, and when subsys tries to load any FW, we are met with: [ 11.111213] ueventd: firmware: loading 'cdsp.mdt' for '/devices/platform/soc/8300000.qcom,turing/firmware/cdsp.mdt' [ 11.113128] ueventd: loading /devices/platform/soc/8300000.qcom,turing/firmware/cdsp.mdt took 2ms [ 11.113170] subsys-pil-tz 8300000.qcom,turing: cdsp: loading from 0x0000000099100000 to 0x000000009a500000 [ 11.117481] ueventd: firmware: loading 'adsp.mdt' for '/devices/platform/soc/17300000.qcom,lpass/firmware/adsp.mdt' [ 11.117518] Unable to handle kernel paging request at virtual address ffffff801f7c6d5c [ 11.117522] Mem abort info: [ 11.117525] Exception class = DABT (current EL), IL = 32 bits [ 11.117527] SET = 0, FnV = 0 [ 11.117529] EA = 0, S1PTW = 0 [ 11.117530] FSC = 33 [ 11.117532] Data abort info: [ 11.117534] ISV = 0, ISS = 0x00000061 [ 11.117536] CM = 0, WnR = 1 [ 11.117539] swapper pgtable: 4k pages, 39-bit VAs, pgd = 000000003e4fd651 [ 11.117541] [ffffff801f7c6d5c] pgd=00000001f8883003, pud=00000001f8883003, pmd=00000001f30fe003, pte=00680000fd475703 [ 11.117547] Internal error: Oops: 96000061 [#1] PREEMPT SMP [ 11.117551] Modules linked in: [ 11.117554] Process init (pid: 568, stack limit = 0x000000005be89f40) [ 11.117558] CPU: 4 PID: 568 Comm: init Tainted: G S W 4.14.239-MOCHI #1 [ 11.117560] Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 VAYU (DT) [ 11.117562] task: 00000000e65c9d8d task.stack: 000000005be89f40 [ 11.117570] pc : memcpy+0x188/0x2a0 [ 11.117577] lr : pil_init_image_trusted+0x130/0x234 [ 11.117579] sp : ffffff80229937c0 pstate : 80000145 [ 11.117581] x29: ffffff80229937c0 x28: ffffffeaebf93a00 [ 11.117584] x27: ffffff8599334000 x26: 0000000000000040 [ 11.117586] x25: ffffff859a0aeaa0 x24: ffffff859a63a000 [ 11.117589] x23: ffffffeaf557d880 x22: ffffff801f7c5000 [ 11.117592] x21: 0000000000001d9c x20: ffffff801f65d000 [ 11.117594] x19: ffffff859933cb08 x18: 0000007c2b138000 [ 11.117597] x17: 0000007ebf1ef1e4 x16: ffffff8597c02274 [ 11.117599] x15: ffffffffffffffff x14: ffffffffffffffff [ 11.117602] x13: ffffffffffffffff x12: ffffffffffffffff [ 11.117604] x11: ffffffffffffffff x10: ffffffffffffffff [ 11.117607] x9 : ffffffffffffffff x8 : ffffffffffffffff [ 11.117609] x7 : ffffffffffffffff x6 : ffffffffffffffff [ 11.117612] x5 : ffffff801f7c6d9c x4 : ffffff801f65ed9c [ 11.117614] x3 : ffffff801f7c6d40 x2 : ffffffffffffffcc [ 11.117617] x1 : ffffff801f65ed80 x0 : ffffff801f7c5000 [ 11.117620] [ 11.117620] PC: 0xffffff8598c00e28: [ 11.117622] 0e28 a9022468 a9422428 a9032c6a a9432c2a a984346c a9c4342c f1010042 54fffee8 [ 11.117628] 0e48 a97c3c8e a9011c66 a97d1c86 a9022468 a97e2488 a9032c6a a97f2c8a a904346c [ 11.117634] 0e68 a93c3cae a93d1ca6 a93e24a8 a93f2caa d65f03c0 d503201f a97f348c 92400cae [ 11.117639] 0e88 cb0e0084 cb0e0042 a97f1c86 a93f34ac a97e2488 a97d2c8a a9fc348c cb0e00a5 [ 11.117645] [ 11.117645] LR: 0xffffff8597ef067c: [ 11.117647] 067c 97eeeccc f9400265 b4fffe65 52801803 910143e2 aa1503e1 910183e0 d2804004 [ 11.117652] 069c 72a02803 d63f00a0 aa0003f6 17ffffe9 aa1403e1 aa1503e2 aa1603e0 9434418a [ 11.117658] 06bc 97ff9590 72001c1f b9404ae1 f9402be0 54000241 910133e4 910163e2 d2800085 [ 11.117663] 06dc d2800103 290b03e1 52800021 52800040 97ff9657 2a0003f3 f9413bf4 f9402bf7 [ 11.117669] [ 11.117669] SP: 0xffffff8022993780: [ 11.117671] 3780 98c00e68 ffffff85 80000145 00000000 9a0aeaa0 ffffff85 00000040 00000000 [ 11.117677] 37a0 ffffffff 0000007f 97ef0644 ffffff85 229937c0 ffffff80 98c00e68 ffffff85 [ 11.117682] 37c0 22993ba0 ffffff80 97eeec4c ffffff85 f557d918 ffffffea 00000000 00000000 [ 11.117688] 37e0 f6264460 ffffffea f6264400 ffffffea f55b9480 ffffffea 97b704d8 ffffff85 [ 11.117693] [ 11.117695] Call trace: [ 11.117698] memcpy+0x188/0x2a0 [ 11.117701] pil_boot+0x358/0x730 [ 11.117704] subsys_powerup+0x28/0x30 [ 11.117709] subsys_start+0x38/0x134 [ 11.117711] __subsystem_get+0xb0/0x11c [ 11.117713] subsystem_get+0x10/0x18 [ 11.117716] cdsp_loader_do.isra.0+0xe4/0x1a8 [ 11.117718] cdsp_boot_store+0x8c/0x168 [ 11.117722] kobj_attr_store+0x14/0x24 [ 11.117726] sysfs_kf_write+0x34/0x44 [ 11.117729] kernfs_fop_write+0x118/0x184 [ 11.117733] __vfs_write+0x2c/0xd8 [ 11.117735] vfs_write+0x80/0xec [ 11.117738] SyS_write+0x54/0xac [ 11.117741] el0_svc_naked+0x34/0x38 [ 11.117744] Code: a97e2488 a9032c6a a97f2c8a a904346c (a93c3cae) [ 11.117747] ---[ end trace fc45fc8b1fa34513 ]--- Fixes booting with the new optimized memcpy routine. Change-Id: Ia95b740ec0ce2fd90e7eea5d5ee162d26f354179 Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-18 23:08:26 -03:00
Robin Murphy	0e51f2e288	arm64: lib: Import latest memcpy()/memmove() implementation Import the latest implementation of memcpy(), based on the upstream code of string/aarch64/memcpy.S at commit afd6244 from https://github.com/ARM-software/optimized-routines, and subsuming memmove() in the process. Note that for simplicity Arm have chosen to contribute this code to Linux under GPLv2 rather than the original MIT license. Note also that the needs of the usercopy routines vs. regular memcpy() have now diverged so far that we abandon the shared template idea and the damage which that incurred to the tuning of LDP/STP loops. We'll be back to tackle those routines separately in future. Link: https://lore.kernel.org/r/3c953af43506581b2422f61952261e76949ba711.1622128527.git.robin.murphy@arm.com Change-Id: I78b4d7bf65b1a4eebf509b087d0120b0f99e51c4 Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-18 23:08:21 -03:00
EmanuelCN	6993932a60	schedutil: Implement tapered dvfs_headroom Inspired by: LineageOS/android_kernel_google_gs201@752c5f9 Change-Id: I2426f750416cbf9a7cb6876bcd386ae4c40825ca Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-17 04:17:29 -03:00
Samuel Pascua	d9f0298279	schedutil: Use map_util_freq() Change-Id: If9cf1b47dee3b9bd0663c88034da8edc98bd28f6 Signed-off-by: Samuel Pascua <pascua.samuel.14@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-17 04:17:21 -03:00
Samuel Pascua	a48f50b6bf	cpufreq: Add map_util_freq() Change-Id: Ibe3ac6ab685c97874fd381482a1e0cbd60fba806 Signed-off-by: Samuel Pascua <pascua.samuel.14@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-17 04:17:21 -03:00
Wei Wang	0942717376	defconfig: Enable CONFIG_FAIR_GROUP_SCHED Enable CONFIG_FAIR_GROUP_SCHED with proper tuning can help prioritize important work in Android. The feature was taken off due to improper setting on the background tasks' share. Now Tasks in root group has already moved into a newly created system subgroup, so the shares can be properly set. +----------------------------------------------------------------------------------------+ \| Cold App Launch Time (* w/ prio120 8 threads running in root cpuset) \| +----------------------------------------+--------+------+---------+-----------+---------+ \| \| chrome \| maps \| youtube \| playstore \| setting \| +----------------------------------------+--------+------+---------+-----------+---------+ \| No CONFIG_FAIR_GROUP_SCHED support() \| 591 \| 1314 \| 887 \| 1952 \| 551 \| +----------------------------------------+--------+------+---------+-----------+---------+ \| CONFIG_FAIR_GROUP_SCHED w/ 1% limit() \| 567 \| 637 \| 668 \| 1450 \| 529 \| +----------------------------------------+--------+------+---------+-----------+---------+ \| No stress runnning (best case) \| 416 \| 463 \| 484 \| 1075 \| 363 \| +----------------------------------------+--------+------+---------+-----------+---------+ xNombre: It is needed by UClamp Bug: 171740453 Test: Build and boot Change-Id: Ibb7e48c93136e3967da6381d7c0c94d0cdaee443 Signed-off-by: Wei Wang <wvw@google.com> Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-17 04:17:21 -03:00
Samuel Pascua	da52819107	defconfig: Switch to UClamp Change-Id: I10a70d27d1f7a4baf2b697ee560bfe39ce29e774 Signed-off-by: Samuel Pascua <pascua.samuel.14@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-17 02:39:52 -03:00
Qais Yousef	de8ca9bcc7	sched/uclamp: Fix rq->uclamp_max not set on first enqueue [ Upstream commit 315c4f884800c45cb6bd8c90422fad554a8b9588 ] Commit d81ae8aac85c ("sched/uclamp: Fix initialization of struct uclamp_rq") introduced a bug where uclamp_max of the rq is not reset to match the woken up task's uclamp_max when the rq is idle. The code was relying on rq->uclamp_max initialized to zero, so on first enqueue static inline void uclamp_rq_inc_id(struct rq rq, struct task_struct p, enum uclamp_id clamp_id) { ... if (uc_se->value > READ_ONCE(uc_rq->value)) WRITE_ONCE(uc_rq->value, uc_se->value); } was actually resetting it. But since commit d81ae8aac85c changed the default to 1024, this no longer works. And since rq->uclamp_flags is also initialized to 0, neither above code path nor uclamp_idle_reset() update the rq->uclamp_max on first wake up from idle. This is only visible from first wake up(s) until the first dequeue to idle after enabling the static key. And it only matters if the uclamp_max of this task is < 1024 since only then its uclamp_max will be effectively ignored. Fix it by properly initializing rq->uclamp_flags = UCLAMP_FLAG_IDLE to ensure uclamp_idle_reset() is called which then will update the rq uclamp_max value as expected. Fixes: d81ae8aac85c ("sched/uclamp: Fix initialization of struct uclamp_rq") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/20211202112033.1705279-1-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2024-12-16 14:46:43 -03:00
Quentin Perret	56ed3a51c0	sched: Fix UCLAMP_FLAG_IDLE setting The UCLAMP_FLAG_IDLE flag is set on a runqueue when dequeueing the last uclamp active task (that is, when buckets.tasks reaches 0 for all buckets) to maintain the last uclamp.max and prevent blocked util from suddenly becoming visible. However, there is an asymmetry in how the flag is set and cleared which can lead to having the flag set whilst there are active tasks on the rq. Specifically, the flag is cleared in the uclamp_rq_inc() path, which is called at enqueue time, but set in uclamp_rq_dec_id() which is called both when dequeueing a task _and_ in the update_uclamp_active() path. As a result, when both uclamp_rq_{dec,ind}_id() are called from update_uclamp_active(), the flag ends up being set but not cleared, hence leaving the runqueue in a broken state. Fix this by clearing the flag in update_uclamp_active() as well. Fixes: e496187da710 ("sched/uclamp: Enforce last task's UCLAMP_MAX") Reported-by: Rick Yiu <rickyiu@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Qais Yousef <qais.yousef@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20210805102154.590709-2-qperret@google.com Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2024-12-16 14:46:43 -03:00
Quentin Perret	89fd7c7fa5	ANDROID: sched: Make uclamp changes depend on CAP_SYS_NICE There is currently nothing preventing tasks from changing their per-task clamp values in anyway that they like. The rationale is probably that system administrators are still able to limit those clamps thanks to the cgroup interface. However, this causes pain in a system where both per-task and per-cgroup clamp values are expected to be under the control of core system components (as is the case for Android). To fix this, let's require CAP_SYS_NICE to change per-task clamp values. There are ongoing discussions upstream about more flexible approaches than this using the RLIMIT API -- see [1]. But the upstream discussion has not converged yet, and this is way too late for UAPI changes in android12-5.10 anyway, so let's apply this change which provides the behaviour we want without actually impacting UAPIs. [1] https://lore.kernel.org/lkml/20210623123441.592348-4-qperret@google.com/ Bug: 187186685 Signed-off-by: Quentin Perret <qperret@google.com> Change-Id: I749312a77306460318ac5374cf243d00b78120dd Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2024-12-16 14:46:43 -03:00
Xuewen Yan	1d6a30daff	sched/uclamp: Ignore max aggregation if rq is idle [ Upstream commit 3e1493f46390618ea78607cb30c58fc19e2a5035 ] When a task wakes up on an idle rq, uclamp_rq_util_with() would max aggregate with rq value. But since there is no task enqueued yet, the values are stale based on the last task that was running. When the new task actually wakes up and enqueued, then the rq uclamp values should reflect that of the newly woken up task effective uclamp values. This is a problem particularly for uclamp_max because it default to 1024. If a task p with uclamp_max = 512 wakes up, then max aggregation would ignore the capping that should apply when this task is enqueued, which is wrong. Fix that by ignoring max aggregation if the rq is idle since in that case the effective uclamp value of the rq will be the ones of the task that will wake up. Fixes: 9d20ad7dfc9a ("sched/uclamp: Add uclamp_util_with()") Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> [qias: Changelog] Reviewed-by: Qais Yousef <qais.yousef@arm.com> Link: https://lore.kernel.org/r/20210630141204.8197-1-xuewen.yan94@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2024-12-16 14:46:43 -03:00
Qais Yousef	4683c606d0	sched/uclamp: Fix uclamp_tg_restrict() [ Upstream commit 0213b7083e81f4acd69db32cb72eb4e5f220329a ] Now cpu.uclamp.min acts as a protection, we need to make sure that the uclamp request of the task is within the allowed range of the cgroup, that is it is clamp()'ed correctly by tg->uclamp[UCLAMP_MIN] and tg->uclamp[UCLAMP_MAX]. As reported by Xuewen [1] we can have some corner cases where there's inversion between uclamp requested by task (p) and the uclamp values of the taskgroup it's attached to (tg). Following table demonstrates 2 corner cases: \| p \| tg \| effective -----------+-----+------+----------- CASE 1 -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 60% -----------+-----+------+----------- uclamp_max \| 80% \| 50% \| 50% -----------+-----+------+----------- CASE 2 -----------+-----+------+----------- uclamp_min \| 0% \| 30% \| 30% -----------+-----+------+----------- uclamp_max \| 20% \| 50% \| 20% -----------+-----+------+----------- With this fix we get: \| p \| tg \| effective -----------+-----+------+----------- CASE 1 -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 50% -----------+-----+------+----------- uclamp_max \| 80% \| 50% \| 50% -----------+-----+------+----------- CASE 2 -----------+-----+------+----------- uclamp_min \| 0% \| 30% \| 30% -----------+-----+------+----------- uclamp_max \| 20% \| 50% \| 30% -----------+-----+------+----------- Additionally uclamp_update_active_tasks() must now unconditionally update both UCLAMP_MIN/MAX because changing the tg's UCLAMP_MAX for instance could have an impact on the effective UCLAMP_MIN of the tasks. \| p \| tg \| effective -----------+-----+------+----------- old -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 50% -----------+-----+------+----------- uclamp_max \| 80% \| 50% \| 50% -----------+-----+------+----------- new -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 60% -----------+-----+------+----------- uclamp_max \| 80% \|70% \| 70% -----------+-----+------+----------- [1] https://lore.kernel.org/lkml/CAB8ipk_a6VFNjiEnHRHkUMBKbA+qzPQvhtNjJ_YNzQhqV_o8Zw@mail.gmail.com/ Fixes: 0c18f2ecfcc2 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min") Reported-by: Xuewen Yan <xuewen.yan94@gmail.com> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210617165155.3774110-1-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2024-12-16 14:46:43 -03:00
Qais Yousef	6756e1b42a	sched/uclamp: Fix wrong implementation of cpu.uclamp.min [ Upstream commit 0c18f2ecfcc274a4bcc1d122f79ebd4001c3b445 ] cpu.uclamp.min is a protection as described in cgroup-v2 Resource Distribution Model Documentation/admin-guide/cgroup-v2.rst which means we try our best to preserve the minimum performance point of tasks in this group. See full description of cpu.uclamp.min in the cgroup-v2.rst. But the current implementation makes it a limit, which is not what was intended. For example: tg->cpu.uclamp.min = 20% p0->uclamp[UCLAMP_MIN] = 0 p1->uclamp[UCLAMP_MIN] = 50% Previous Behavior (limit): p0->effective_uclamp = 0 p1->effective_uclamp = 20% New Behavior (Protection): p0->effective_uclamp = 20% p1->effective_uclamp = 50% Which is inline with how protections should work. With this change the cgroup and per-task behaviors are the same, as expected. Additionally, we remove the confusing relationship between cgroup and !user_defined flag. We don't want for example RT tasks that are boosted by default to max to change their boost value when they attach to a cgroup. If a cgroup wants to limit the max performance point of tasks attached to it, then cpu.uclamp.max must be set accordingly. Or if they want to set different boost value based on cgroup, then sysctl_sched_util_clamp_min_rt_default must be used to NOT boost to max and set the right cpu.uclamp.min for each group to let the RT tasks obtain the desired boost value when attached to that group. As it stands the dependency on !user_defined flag adds an extra layer of complexity that is not required now cpu.uclamp.min behaves properly as a protection. The propagation model of effective cpu.uclamp.min in child cgroups as implemented by cpu_util_update_eff() is still correct. The parent protection sets an upper limit of what the child cgroups will effectively get. Fixes: 3eac870a3247 (sched/uclamp: Use TG's clamps to restrict TASK's clamps) Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2024-12-16 14:46:43 -03:00
Quentin Perret	f871554be4	FROMLIST: sched: Fix out-of-bound access in uclamp Util-clamp places tasks in different buckets based on their clamp values for performance reasons. However, the size of buckets is currently computed using a rounding division, which can lead to an off-by-one error in some configurations. For instance, with 20 buckets, the bucket size will be 1024/20=51. A task with a clamp of 1024 will be mapped to bucket id 1024/51=20. Sadly, correct indexes are in range [0,19], hence leading to an out of bound memory access. Clamp the bucket id to fix the issue. Bug: 186415778 Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting") Suggested-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20210430151412.160913-1-qperret@google.com Change-Id: Ibc28662de5554f80f97533b60e747f8a6e871c56 Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2024-12-16 14:46:43 -03:00
Qais Yousef	860c8f0032	sched/uclamp: Fix locking around cpu_util_update_eff() cpu_cgroup_css_online() calls cpu_util_update_eff() without holding the uclamp_mutex or rcu_read_lock() like other call sites, which is a mistake. The uclamp_mutex is required to protect against concurrent reads and writes that could update the cgroup hierarchy. The rcu_read_lock() is required to traverse the cgroup data structures in cpu_util_update_eff(). Surround the caller with the required locks and add some asserts to better document the dependency in cpu_util_update_eff(). Fixes: 7226017ad37a ("sched/uclamp: Fix a bug in propagating uclamp value in new cgroups") Reported-by: Quentin Perret <qperret@google.com> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210510145032.1934078-3-qais.yousef@arm.com Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2024-12-16 14:46:43 -03:00
Qais Yousef	4b629cbf3a	FROMGIT: sched/uclamp: Fix a bug in propagating uclamp value in new cgroups When a new cgroup is created, the effective uclamp value wasn't updated with a call to cpu_util_update_eff() that looks at the hierarchy and update to the most restrictive values. Fix it by ensuring to call cpu_util_update_eff() when a new cgroup becomes online. Without this change, the newly created cgroup uses the default root_task_group uclamp values, which is 1024 for both uclamp_{min, max}, which will cause the rq to to be clamped to max, hence cause the system to run at max frequency. The problem was observed on Ubuntu server and was reproduced on Debian and Buildroot rootfs. By default, Ubuntu and Debian create a cpu controller cgroup hierarchy and add all tasks to it - which creates enough noise to keep the rq uclamp value at max most of the time. Imitating this behavior makes the problem visible in Buildroot too which otherwise looks fine since it's a minimal userspace. Bug: 120440300 Fixes: 0b60ba2dd342 ("sched/uclamp: Propagate parent clamps") Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Doug Smythies <dsmythies@telus.net> Link: https://lore.kernel.org/lkml/000701d5b965$361b6c60$a2524520$@net/ (cherry picked from commit 7226017ad37a888915628e59a84a2d1e57b40707 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core) Signed-off-by: Qais Yousef <qais.yousef@arm.com> Change-Id: I9636c60e04d58bbfc5041df1305b34a12b5a3f46 Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2024-12-16 14:46:43 -03:00
Qais Yousef	babc23d9b6	UPSTREAM: sched/uclamp: Fix incorrect condition uclamp_update_active() should perform the update when p->uclamp[clamp_id].active is true. But when the logic was inverted in [1], the if condition wasn't inverted correctly too. [1] https://lore.kernel.org/lkml/20190902073836.GO2369@hirez.programming.kicks-ass.net/ Bug: 120440300 Reported-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Patrick Bellasi <patrick.bellasi@matbug.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: babbe170e053 ("sched/uclamp: Update CPU's refcount on TG's clamp changes") Link: https://lkml.kernel.org/r/20191114211052.15116-1-qais.yousef@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 6e1ff0773f49c7d38e8b4a9df598def6afb9f415) Signed-off-by: Qais Yousef <qais.yousef@arm.com> Change-Id: I51b58a6089290277e08a0aaa72b86f852eec1512 Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2024-12-16 14:46:42 -03:00
Valentin Schneider	862c9ce4e6	BACKPORT: sched/fair: Make task_fits_capacity() consider uclamp restrictions task_fits_capacity() drives CPU selection at wakeup time, and is also used to detect misfit tasks. Right now it does so by comparing task_util_est() with a CPU's capacity, but doesn't take into account uclamp restrictions. There's a few interesting uses that can come out of doing this. For instance, a low uclamp.max value could prevent certain tasks from being flagged as misfit tasks, so they could merrily remain on low-capacity CPUs. Similarly, a high uclamp.min value would steer tasks towards high capacity CPUs at wakeup (and, should that fail, later steered via misfit balancing), so such "boosted" tasks would favor CPUs of higher capacity. Introduce uclamp_task_util() and make task_fits_capacity() use it. [QP: fixed missing dependency on fits_capacity() by using the open coded alternative] Bug: 120440300 Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Quentin Perret <qperret@google.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191211113851.24241-5-valentin.schneider@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit a7008c07a568278ed2763436404752a98004c7ff) Signed-off-by: Quentin Perret <qperret@google.com> Change-Id: Iabde2eda7252c3bcc273e61260a7a12a7de991b1	2024-12-16 14:46:42 -03:00
Satya Durga Srinivasu Prabhala	1a15c7c1b3	sched/fair: honor uclamp restrictions in fbt() While calculating untilization of CPU during task placement in fbt(), current code doesn't take uclamp into account which would lead to selection of incorrect CPU for the task when uclamp restrictions are in place for the task. Change-Id: I8371affe3b37733d222e5c57953e53f91fc19a53 Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>	2024-12-16 14:46:42 -03:00
Dietmar Eggemann	6bc2a06eae	sched/uclamp: Allow to reset a task uclamp constraint value In case the user wants to stop controlling a uclamp constraint value for a task, use the magic value -1 in sched_util_{min,max} with the appropriate sched_flags (SCHED_FLAG_UTIL_CLAMP_{MIN,MAX}) to indicate the reset. The advantage over the 'additional flag' approach (i.e. introducing SCHED_FLAG_UTIL_CLAMP_RESET) is that no additional flag has to be exported via uapi. This avoids the need to document how this new flag has be used in conjunction with the existing uclamp related flags. The following subtle issue is fixed as well. When a uclamp constraint value is set on a !user_defined uclamp_se it is currently first reset and then set. Fix this by AND'ing !user_defined with !SCHED_FLAG_UTIL_CLAMP which stands for the 'sched class change' case. The related condition 'if (uc_se->user_defined)' moved from __setscheduler_uclamp() into uclamp_reset(). Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Yun Hsiang <hsiang023167@gmail.com> Link: https://lkml.kernel.org/r/20201113113454.25868-1-dietmar.eggemann@arm.com	2024-12-16 14:46:42 -03:00
Qinglang Miao	5e192458aa	sched/uclamp: Remove unnecessary mutex_init() The uclamp_mutex lock is initialized statically via DEFINE_MUTEX(), it is unnecessary to initialize it runtime via mutex_init(). Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20200725085629.98292-1-miaoqinglang@huawei.com Signed-off-by: RuRuTiaSaMa <1009087450@qq.com>	2024-12-16 14:46:42 -03:00
Hridaya Prajapati	c48b14d697	sched/cpupri: Checkout changes from redbull Branch: android-msm-redbull-4.19-u-beta5.3 Change-Id: I0283b176f3308459473973ca7df4eee2db1ca644 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>	2024-12-16 14:46:39 -03:00
Qais Yousef	cce1c19561	FROMGIT: sched/rt: Re-instate old behavior in select_task_rq_rt() When RT Capacity Aware support was added, the logic in select_task_rq_rt was modified to force a search for a fitting CPU if the task currently doesn't run on one. But if the search failed, and the search was only triggered to fulfill the fitness request; we could end up selecting a new CPU unnecessarily. Fix this and re-instate the original behavior by ensuring we bail out in that case. This behavior change only affected asymmetric systems that are using util_clamp to implement capacity aware. None asymmetric systems weren't affected. Bug: 120440300 LINK: https://lore.kernel.org/lkml/20200218041620.GD28029@codeaurora.org/ Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Fixes: 804d402fb6f6 ("sched/rt: Make RT capacity-aware") Link: https://lkml.kernel.org/r/20200302132721.8353-3-qais.yousef@arm.com (cherry picked from commit b28bc1e002c23ff8a4999c4a2fb1d4d412bc6f5e https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core) Signed-off-by: Qais Yousef <qais.yousef@arm.com> Change-Id: I670ab7f95a3bd8b4790e1cafe89308ead524367e	2024-12-16 14:46:39 -03:00
Qais Yousef	8c6e539b2e	FROMGIT: sched/rt: Remove unnecessary push for unfit tasks In task_woken_rt() and switched_to_rto() we try trigger push-pull if the task is unfit. But the logic is found lacking because if the task was the only one running on the CPU, then rt_rq is not in overloaded state and won't trigger a push. The necessity of this logic was under a debate as well, a summary of the discussion can be found in the following thread: https://lore.kernel.org/lkml/20200226160247.iqvdakiqbakk2llz@e107158-lin.cambridge.arm.com/ Remove the logic for now until a better approach is agreed upon. Bug: 120440300 Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Fixes: 804d402fb6f6 ("sched/rt: Make RT capacity-aware") Link: https://lkml.kernel.org/r/20200302132721.8353-6-qais.yousef@arm.com (cherry picked from commit d94a9df49069ba8ff7c4aaeca1229e6471a01a15 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core) Signed-off-by: Qais Yousef <qais.yousef@arm.com> Change-Id: Id120ada4a89972b3feb8d8b022babb98db1a157f	2024-12-16 14:46:39 -03:00

... 2 3 4 5 6 ...

812235 Commits