793506 Commits

Author SHA1 Message Date
Can Guo
3879476438 scsi: ufs-qcom: Turn off PHY only if link is not active
Do not assume that AH8 always puts the link to hibern8 state before
turnning off PHY during suspend. Turn off PHY only if link is not
active.

Change-Id: Ia50fdbe95e29e825b40679ba4519b49e2806b67a
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2021-06-05 14:55:41 +05:30
laoyi
bb32c75f6c fs: proc: Update perms of process_reclaim node
Other userspace apps like AppCOmpaction would like to use this node,
so update permission.

Change-Id: Ied22bd6ad489bef4028cde943ac185d1354ab971
Signed-off-by: <laoyi@codeaurora.org>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2021-06-05 14:55:41 +05:30
John Dias
f1033fb035 fs: Improve eventpoll logging to stop indicting timerfd
timerfd doesn't create any wakelocks; eventpoll can, and is creating the
wakelocks we see called "[timerfd]".  eventpoll creates two kinds of
wakelocks: a single top-level lock associated with the eventpoll fd
itself, and one additional lock for each fd it is polling that needs such
a lock (e.g. those using EPOLLWAKEUP).  Current code names the per-fd
locks using the undecorated names of the fds' associated files (hence
"[timerfd]"), and is naming the top-level lock after the PID of the caller
and the name of the file behind the first fd for which a per-fd lock is
created.  To make things clearer, the top-level lock is now named using
the caller PID and an "epollfd" designation, while the per-fd locks are
also named with the caller's PID (to associate them with the top-level
lock) and their respective fds' file names.

Port of fix already applied to previous 2 generations.  Note that this
set of changes does not fully solve the problem of eventpoll/timerfd
wakelock attribution to the original process, since most activity is
relayed through system_server, but it does at least ensure that different
eventpoll wakelocks - and their stats - are properly disambiguated.

Test: Ran on device and observed new wakelock naming in
/d/wakeup_sources and (file naming in) lsof output.
Bug: 116363986
Change-Id: I34bada5ddab04cf3830762c745f46bfcd1549cb8
Signed-off-by: John Dias <joaodias@google.com>
Signed-off-by: Kelly Rossmoyer <krossmo@google.com>
Signed-off-by: Miguel de Dios <migueldedios@google.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2021-06-05 14:55:41 +05:30
Sultan Alsawaf
94598231de bpf: Eliminate CONFIG_MODULES limitation from JIT for arm64
CONFIG_MODULES is only needed for its memory allocator, which is trivial
to add back into arm64 when modules aren't enabled. Do so in order to
take advantage of JIT compilation when CONFIG_MODULES=n.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2021-06-05 14:55:40 +05:30
Adithya R
1cd596b249 Merge tag 'LA.UM.9.1.r1-10200-SMxxx0.0' of https://source.codeaurora.org/quic/la/platform/vendor/opensource/audio-kernel into staging/LA.UM.9.x
"LA.UM.9.1.r1-10200-SMxxx0.0"
2021-06-05 14:54:16 +05:30
Adithya R
e3f73a5a8d Merge tag 'LA.UM.9.1.r1-10200-SMxxx0.0' of https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/wlan/qca-wifi-host-cmn into staging/LA.UM.9.x
"LA.UM.9.1.r1-10200-SMxxx0.0"
2021-06-05 14:41:40 +05:30
Adithya R
bbd4ca491c Merge tag 'LA.UM.9.1.r1-10200-SMxxx0.0' of https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/wlan/qcacld-3.0 into staging/LA.UM.9.x
"LA.UM.9.1.r1-10200-SMxxx0.0"
2021-06-05 14:40:33 +05:30
Adithya R
4ad4df234e Merge tag 'LA.UM.9.1.r1-10200-SMxxx0.0' of https://source.codeaurora.org/quic/la/kernel/msm-4.14 into staging/LA.UM.9.x
"LA.UM.9.1.r1-10200-SMxxx0.0"
2021-06-05 14:39:57 +05:30
Adithya R
9ea5ae0229 drm/msm: dsi_display: Prevent GPU boosting with battery saver
refer 3db0705104385b945dfa6f475c60d00b38b082db
2021-06-04 13:34:58 +05:30
Adithya R
fa86a4a095 Revert "ARM64/configs: surya: Disable RELR relocations"
This reverts commit a90526691613bc755d985c3fcc32081ff496f6a2.
2021-06-04 13:01:58 +05:30
FlyFrog
3c2d7070a3 lib: int_sqrt: Improve 3x faster integer sqrt.
Result on 10,000,000 call.
Old:
sqrt(12345689) = 3513
real	0m0.768s
user	0m0.760s
sys	0m0.004s

New:
sqrt(12345689) = 3513
real	0m0.222s
user	0m0.224s
sys	0m0.000s

Signed-off-by: Vaisakh Murali <mvaisakh@statixos.com>
2021-06-01 17:06:34 +05:30
Mohammed Nayeem Ur Rahman
b5d4c7f447 char: adsprpc: Set QoS only to silver cluster
Glink IRQ mostly is taken by the silver cluster. RPC driver vote
for sliver cluster prevents collapsing gold cluster. This saves
significant power.

Change-Id: Ic24bddceb7ae37d1182d2fca683c622b4ab71a55
Acked-by: Tadakamalla Krishnaiah <ktadakam@qti.qualcomm.com>
Signed-off-by: Mohammed Nayeem Ur Rahman <mohara@codeaurora.org>
Signed-off-by: celtare21 <celtare21@gmail.com
2021-06-01 17:06:32 +05:30
Vaisakh Murali
7a6332d901 msm: ipa: Only include emulation init with CONFIG_IPA_EMULATION
* Native code doesn't seem to have any use for this.

Change-Id: I59b33af3a67b5b5a7f4b42dbffa6f41f21aae567
Signed-off-by: Vaisakh Murali <mvaisakh@statixos.com>
2021-06-01 17:06:32 +05:30
Linux Build Service Account
d2fd228796 Merge 516a0a6251f23be4dc7136a996b762d586423079 on remote branch
Change-Id: Ie0361b7309d4b15904757569ecd7b461f12d1825
2021-05-27 10:46:33 -07:00
Linux Build Service Account
857bd2e080 Merge f8526eed1d13bee359929006ea9f3f7f0fe26438 on remote branch
Change-Id: I49addd030b5fdc3586d37dda20b25956097b7d0f
2021-05-27 10:44:41 -07:00
Linux Build Service Account
c0206fbd70 Merge 407d2eafc7aca94bb28098710cbc48555c3dfb55 on remote branch
Change-Id: I4f0901a343384e33bcda53a96231e7522ed18e1d
2021-05-27 10:44:34 -07:00
Linux Build Service Account
9838220d29 Merge 42140f1656cd1a6b073540039d116f7c2f6f78f5 on remote branch
Change-Id: I69f41511ff63fc6651655f1bd832ab5c3cd0a665
2021-05-27 10:40:20 -07:00
Adithya R
8207e6c43c build.sh: Switch to transfer.sh for zip upload 2021-05-25 19:26:37 +05:30
Adithya R
18b2b5faa4 ARM64/configs: surya: Enable fuse short circuit 2021-05-25 19:26:37 +05:30
Adhitya Mohan
8a4166ccd4 fs/fuse: shortcircuit: Make it compile 2021-05-25 19:26:37 +05:30
LibXZR
0f1df8b6c0 fs/fuse: shortcircuit: Disable logging 2021-05-25 19:26:37 +05:30
LibXZR
a650eef6fc fs: fuse: Implement fuse short circuit
* This significantly improves i/o performance under /sdcard
* From OnePlus 8T Oxygen OS 11.0.8.11.KB05AA and OnePlus 8 Oxygen OS 11.0.5.5.IN21AA and OnePlus 8 Pro Oxygen OS 11.0.5.5.IN11AA

RealJohnGalt: make proper Kconfig, add back dependencies from OnePlus
source onto our CAF tree.

Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
2021-05-25 19:26:37 +05:30
Theodore Ts'o
b8328115cd ext4: Improve smp scalability for inode generation
->s_next_generation is protected by s_next_gen_lock but its usage
pattern is very primitive.  We don't actually need sequentially
increasing new generation numbers, so let's use prandom_u32() instead.

Reported-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
2021-05-25 19:26:37 +05:30
Vincent Guittot
9f4d369621 sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list
Although not exactly identical, unthrottle_cfs_rq() and enqueue_task_fair()
are quite close and follow the same sequence for enqueuing an entity in the
cfs hierarchy. Modify unthrottle_cfs_rq() to use the same pattern as
enqueue_task_fair(). This fixes a problem already faced with the latter and
add an optimization in the last for_each_sched_entity loop.

Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
Reported-by Tao Zhou <zohooouoto@zoho.com.cn>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Link: https://lkml.kernel.org/r/20200513135528.4742-1-vincent.guittot@linaro.org
2021-05-25 19:26:37 +05:30
Muchun Song
1bbdcf23b4 sched/fair: Mark sched_init_granularity __init
Function sched_init_granularity() is only called from __init
functions, so mark it __init as well.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20200406074750.56533-1-songmuchun@bytedance.com
2021-05-25 19:26:37 +05:30
Huaixin Chang
0951f4d996 sched/fair: Refill bandwidth before scaling
In order to prevent possible hardlockup of sched_cfs_period_timer()
loop, loop count is introduced to denote whether to scale quota and
period or not. However, scale is done between forwarding period timer
and refilling cfs bandwidth runtime, which means that period timer is
forwarded with old "period" while runtime is refilled with scaled
"quota".

Move do_sched_cfs_period_timer() before scaling to solve this.

Fixes: 2e8e19226398 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup")
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/20200420024421.22442-3-changhuaixin@linux.alibaba.com
2021-05-25 19:26:37 +05:30
Peng Wang
aa3d11385a sched/fair: Simplify the code of should_we_balance()
We only consider group_balance_cpu() after there is no idle
cpu. So, just do comparison before return at these two cases.

Signed-off-by: Peng Wang <rocking@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/245c792f0e580b3ca342ad61257f4c066ee0f84f.1586594833.git.rocking@linux.alibaba.com
Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
2021-05-25 19:26:37 +05:30
Paul Turner
24619225af sched/fair: Eliminate bandwidth race between throttling and distribution
There is a race window in which an entity begins throttling before quota
is added to the pool, but does not finish throttling until after we have
finished with distribute_cfs_runtime(). This entity is not observed by
distribute_cfs_runtime() because it was not on the throttled list at the
time that distribution was running. This race manifests as rare
period-length statlls for such entities.

Rather than heavy-weight the synchronization with the progress of
distribution, we can fix this by aborting throttling if bandwidth has
become available. Otherwise, we immediately add the entity to the
throttled list so that it can be observed by a subsequent distribution.

Additionally, we can remove the case of adding the throttled entity to
the head of the throttled list, and simply always add to the tail.
Thanks to 26a8b12747c97, distribute_cfs_runtime() no longer holds onto
its own pool of runtime. This means that if we do hit the !assign and
distribute_running case, we know that distribution is about to end.

Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/20200410225208.109717-2-joshdon@google.com
2021-05-25 19:26:36 +05:30
Peter Zijlstra
77c9280087 sched,rt: Use cpumask_any*_distribute()
Replace a bunch of cpumask_any*() instances with
cpumask_any*_distribute(), by injecting this little bit of random in
cpu selection, we reduce the chance two competing balance operations
working off the same lowest_mask pick the same CPU.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102347.190759694@infradead.org
2021-05-25 19:26:36 +05:30
Paul Turner
3155932a9e sched/core: Distribute tasks within affinity masks
Currently, when updating the affinity of tasks via either cpusets.cpus,
or, sched_setaffinity(); tasks not currently running within the newly
specified mask will be arbitrarily assigned to the first CPU within the
mask.

This (particularly in the case that we are restricting masks) can
result in many tasks being assigned to the first CPUs of their new
masks.

This:
 1) Can induce scheduling delays while the load-balancer has a chance to
    spread them between their new CPUs.
 2) Can antogonize a poor load-balancer behavior where it has a
    difficult time recognizing that a cross-socket imbalance has been
    forced by an affinity mask.

This change adds a new cpumask interface to allow iterated calls to
distribute within the intersection of the provided masks.

The cases that this mainly affects are:
 - modifying cpuset.cpus
 - when tasks join a cpuset
 - when modifying a task's affinity via sched_setaffinity(2)

Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Tested-by: Qais Yousef <qais.yousef@arm.com>
Link: https://lkml.kernel.org/r/20200311010113.136465-1-joshdon@google.com
2021-05-25 19:26:36 +05:30
Peter Zijlstra
171dc61962 sched: core: Fix balance_callback()
The intent of balance_callback() has always been to delay executing
balancing operations until the end of the current rq->lock section.
This is because balance operations must often drop rq->lock, and that
isn't safe in general.

However, as noted by Scott, there were a few holes in that scheme;
balance_callback() was called after rq->lock was dropped, which means
another CPU can interleave and touch the callback list.

Rework code to call the balance callbacks before dropping rq->lock
where possible, and otherwise splice the balance list onto a local
stack.

This guarantees that the balance list must be empty when we take
rq->lock. IOW, we'll only ever run our own balance callbacks.

Reported-by: Scott Wood <swood@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.203901269@infradead.org
2021-05-25 19:26:36 +05:30
Vincent Guittot
957309028f sched/fair: Optimize update_blocked_averages()
commit 31bc6aeaab1d1de8959b67edbed5c7a4b3cdbe7c upstream.

Removing a cfs_rq from rq->leaf_cfs_rq_list can break the parent/child
ordering of the list when it will be added back. In order to remove an
empty and fully decayed cfs_rq, we must remove its children too, so they
will be added back in the right order next time.

With a normal decay of PELT, a parent will be empty and fully decayed
if all children are empty and fully decayed too. In such a case, we just
have to ensure that the whole branch will be added when a new task is
enqueued. This is default behavior since :

  commit f6783319737f ("sched/fair: Fix insertion in rq->leaf_cfs_rq_list")

In case of throttling, the PELT of throttled cfs_rq will not be updated
whereas the parent will. This breaks the assumption made above unless we
remove the children of a cfs_rq that is throttled. Then, they will be
added back when unthrottled and a sched_entity will be enqueued.

As throttled cfs_rq are now removed from the list, we can remove the
associated test in update_blocked_averages().

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: sargun@sargun.me
Cc: tj@kernel.org
Cc: xiexiuqi@huawei.com
Cc: xiezhipeng1@huawei.com
Link: https://lkml.kernel.org/r/1549469662-13614-2-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Vishnu Rangayyan <vishnu.rangayyan@apple.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-25 19:26:36 +05:30
Vincent Guittot
1c5a28a2b9 sched/fair: Move the rq_of() helper function
Move rq_of() helper function so it can be used in pelt.c

[ mingo: Improve readability while at it. ]

Bug: 120440300
Change-Id: I2133979476631d68baaffcaa308f4cdab94f22b1
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: patrick.bellasi@arm.com
Cc: pjt@google.com
Cc: pkondeti@codeaurora.org
Cc: quentin.perret@arm.com
Cc: rjw@rjwysocki.net
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Link: https://lkml.kernel.org/r/1548257214-13745-2-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 62478d9911fab9694c195f0ca8e4701de09be98e)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: DennySPB <dennyspb@gmail.com>
2021-05-25 19:26:36 +05:30
Srikar Dronamraju
c40b2cdda2 BACKPORT: sched/fair: Optimize select_idle_core()
Currently we loop through all threads of a core to evaluate if the core is
idle or not. This is unnecessary. If a thread of a core is not idle, skip
evaluating other threads of a core. Also while clearing the cpumask, bits
of all CPUs of a core can be cleared in one-shot.

Collecting ticks on a Power 9 SMT 8 system around select_idle_core
while running schbench shows us

(units are in ticks, hence lesser is better)
Without patch
    N        Min     Max     Median         Avg      Stddev
x 130        151    1083        284   322.72308   144.41494

With patch
    N        Min     Max     Median         Avg      Stddev   Improvement
x 164         88     610        201   225.79268   106.78943        30.03%

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lkml.kernel.org/r/20191206172422.6578-1-srikar@linux.vnet.ibm.com
Signed-off-by: DennySPB <dennyspb@gmail.com>
2021-05-25 19:26:36 +05:30
bsegall@google.com
282c967bd3 sched/fair: Don't push cfs_bandwith slack timers forward
When a cfs_rq sleeps and returns its quota, we delay for 5ms before
waking any throttled cfs_rqs to coalesce with other cfs_rqs going to
sleep, as this has to be done outside of the rq lock we hold.

The current code waits for 5ms without any sleeps, instead of waiting
for 5ms from the first sleep, which can delay the unthrottle more than
we want. Switch this around so that we can't push this forward forever.

This requires an extra flag rather than using hrtimer_active, since we
need to start a new timer if the current one is in the process of
finishing.

Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Xunlei Pang <xlpang@linux.alibaba.com>
Acked-by: Phil Auld <pauld@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/xm26a7euy6iq.fsf_-_@bsegall-linux.svl.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: DennySPB <dennyspb@gmail.com>
2021-05-25 19:26:36 +05:30
Vincent Guittot
cc0fe39ac0 sched/fair: Fix runnable_avg for throttled cfs
When a cfs_rq is throttled, its group entity is dequeued and its running
tasks are removed. We must update runnable_avg with the old h_nr_running
and update group_se->runnable_weight with the new h_nr_running at each
level of the hierarchy.

Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes: 9f68395333ad ("sched/pelt: Add a new runnable average signal")
Link: https://lkml.kernel.org/r/20200227154115.8332-1-vincent.guittot@linaro.org
2021-05-25 19:26:36 +05:30
Vincent Guittot
fb07d11ba5 sched/fair: Don't set LBF_ALL_PINNED unnecessarily
Setting LBF_ALL_PINNED during active load balance is only valid when there
is only 1 running task on the rq otherwise this ends up increasing the
balance interval whereas other tasks could migrate after the next interval
once they become cache-cold as an example.

LBF_ALL_PINNED flag is now always set it by default. It is then cleared
when we find one task that can be pulled when calling detach_tasks() or
during active migration.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20210107103325.30851-3-vincent.guittot@linaro.org
2021-05-25 19:26:36 +05:30
Anna-Maria Behnsen
d068c8d169 sched: Prevent raising SCHED_SOFTIRQ when CPU is !active
SCHED_SOFTIRQ is raised to trigger periodic load balancing. When CPU is not
active, CPU should not participate in load balancing.

The scheduler uses nohz.idle_cpus_mask to keep track of the CPUs which can
do idle load balancing. When bringing a CPU up the CPU is added to the mask
when it reaches the active state, but on teardown the CPU stays in the mask
until it goes offline and invokes sched_cpu_dying().

When SCHED_SOFTIRQ is raised on a !active CPU, there might be a pending
softirq when stopping the tick which triggers a warning in NOHZ code. The
SCHED_SOFTIRQ can also be raised by the scheduler tick which has the same
issue.

Therefore remove the CPU from nohz.idle_cpus_mask when it is marked
inactive and also prevent the scheduler_tick() from raising SCHED_SOFTIRQ
after this point.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20201215104400.9435-1-anna-maria@linutronix.de
2021-05-25 19:26:36 +05:30
Joel Fernandes
42b731c077 sched/fair: Remove impossible condition from find_idlest_group_cpu()
find_idlest_group_cpu() goes through CPUs of a group previous selected by
find_idlest_group(). find_idlest_group() returns NULL if the local group is the
selected one and doesn't execute find_idlest_group_cpu if the group to which
'cpu' belongs to is chosen. So we're always guaranteed to call
find_idlest_group_cpu() with a group to which 'cpu' is non-local.

This makes one of the conditions in find_idlest_group_cpu() an impossible one,
which we can get rid off.

Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Brendan Jackman <brendan.jackman@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Android Kernel <kernel-team@android.com>
Cc: Atish Patra <atish.patra@oracle.com>
Cc: Chris Redpath <Chris.Redpath@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: EAS Dev <eas-dev@lists.linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Ramussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rohit Jain <rohit.k.jain@oracle.com>
Cc: Saravana Kannan <skannan@quicinc.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20171215153944.220146-3-joelaf@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-05-25 19:26:36 +05:30
Wen Yang
0a6a693e3f sched/rt: Make update_curr_rt() more accurate
rq->clock_task may be updated between the two calls of
rq_clock_task() in update_curr_rt(). Calling rq_clock_task() only
once makes it more accurate and efficient, taking update_curr() as
reference.

Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: zhong.weidong@zte.com.cn
Link: http://lkml.kernel.org/r/1517800721-42092-1-git-send-email-wen.yang99@zte.com.cn
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-05-25 19:26:36 +05:30
Peter Zijlstra
42a8cffcad sched/fair: Do not migrate due to a sync wakeup on exit
When a task exits, it notifies the parent that it has exited. This is a
sync wakeup and the exiting task may pull the parent towards the wakers
CPU. For simple workloads like using a shell, it was observed that the
shell is pulled across nodes by exiting processes. This is daft as the
parent may be long-lived and properly placed. This patch special cases a
sync wakeup on exit to avoid pulling tasks across nodes. Testing on a range
of workloads and machines showed very little differences in performance
although there was a small 3% boost on some machines running a shellscript
intensive workload (git regression test suite).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Giovanni Gherdovich <ggherdovich@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180213133730.24064-5-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-05-25 19:26:36 +05:30
Paul Walmsley
b8206629a6 sched: Reinitialize rq->next_balance when a CPU is hot-added
Reinitialize rq->next_balance when a CPU is hot-added.  Otherwise,
scheduler domain rebalancing may be skipped if rq->next_balance was
set to a future time when the CPU was last active, and the
newly-re-added CPU is in idle_balance().  As a result, the
newly-re-added CPU will remain idle with no tasks scheduled until the
softlockup watchdog runs - potentially 4 seconds later.  This can
waste energy and reduce performance.

This behavior can be observed in some SoC kernels, which use CPU
hotplug to dynamically remove and add CPUs in response to load.  In
one case that triggered this behavior,

0. the system started with all cores enabled, running multi-threaded
   CPU-bound code;

1. the system entered some single-threaded code;

2. a CPU went idle and was hot-removed;

3. the system started executing a multi-threaded CPU-bound task;

4. the CPU from event 2 was re-added, to respond to the load.

The time interval between events 2 and 4 was approximately 300
milliseconds.

Of course, ideally CPU hotplug would not be used in this manner,
but this patch does appear to fix a real bug.

Nvidia folks: this patch is submitted as at least a partial fix for
bug 1243368 ("[sched] Load-balancing not happening correctly after
cores brought online")

Change-Id: Iabac21e110402bb581b7db40c42babc951d378d0
Signed-off-by: Paul Walmsley <pwalmsley@nvidia.com>
Cc: Peter Boonstoppel <pboonstoppel@nvidia.com>
Reviewed-on: http://git-master/r/206918
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Amit Kamath <akamath@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Peter Boonstoppel <pboonstoppel@nvidia.com>
Reviewed-by: Diwakar Tundlam <dtundlam@nvidia.com>
2021-05-25 19:26:36 +05:30
John Dias
cd063abd00 sched/fair: vruntime should normalize when switching from fair
When rt_mutex_setprio changes a task's scheduling class to RT,
we're seeing cases where the task's vruntime is not updated
correctly upon return to the fair class.
Specifically, the following is being observed:
- task is deactivated while still in the fair class
- task is boosted to RT via rt_mutex_setprio, which changes
  the task to RT and calls check_class_changed.
- check_class_changed leads to detach_task_cfs_rq, at which point
  the vruntime_normalized check sees that the task's state is TASK_WAKING,
  which results in skipping the subtraction of the rq's min_vruntime
  from the task's vruntime
- later, when the prio is deboosted and the task is moved back
  to the fair class, the fair rq's min_vruntime is added to
  the task's vruntime, even though it wasn't subtracted earlier.
The immediate result is inflation of the task's vruntime, giving
it lower priority (starving it if there's enough available work).
The longer-term effect is inflation of all vruntimes because the
task's vruntime becomes the rq's min_vruntime when the higher
priority tasks go idle. That leads to a vicious cycle, where
the vruntime inflation repeatedly doubled.

The change here is to detect when vruntime_normalized is being
called when the task is waking but is waking in another class,
and to conclude that this is a case where vruntime has not
been normalized.

Bug: 80502612
Change-Id: If0bb02eb16939ca5e91ef282b7f9119ff68622c4
Signed-off-by: John Dias <joaodias@google.com>
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
2021-05-25 19:26:36 +05:30
Linus Torvalds
cbf26b2f83 sched/fair: Fix infinite loop in update_blocked_averages()
commit c40f7d74c741a907cfaeb73a7697081881c497d0 upstream.

Zhipeng Xie, Xie XiuQi and Sargun Dhillon reported lockups in the
scheduler under high loads, starting at around the v4.18 time frame,
and Zhipeng Xie tracked it down to bugs in the rq->leaf_cfs_rq_list
manipulation.

Do a (manual) revert of:

  a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path")

It turns out that the list_del_leaf_cfs_rq() introduced by this commit
is a surprising property that was not considered in followup commits
such as:

  9c2791f936ef ("sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list")

As Vincent Guittot explains:

 "I think that there is a bigger problem with commit a9e7f6544b9c and
  cfs_rq throttling:

  Let take the example of the following topology TG2 --> TG1 --> root:

   1) The 1st time a task is enqueued, we will add TG2 cfs_rq then TG1
      cfs_rq to leaf_cfs_rq_list and we are sure to do the whole branch in
      one path because it has never been used and can't be throttled so
      tmp_alone_branch will point to leaf_cfs_rq_list at the end.

   2) Then TG1 is throttled

   3) and we add TG3 as a new child of TG1.

   4) The 1st enqueue of a task on TG3 will add TG3 cfs_rq just before TG1
      cfs_rq and tmp_alone_branch will stay  on rq->leaf_cfs_rq_list.

  With commit a9e7f6544b9c, we can del a cfs_rq from rq->leaf_cfs_rq_list.
  So if the load of TG1 cfs_rq becomes NULL before step 2) above, TG1
  cfs_rq is removed from the list.
  Then at step 4), TG3 cfs_rq is added at the beginning of rq->leaf_cfs_rq_list
  but tmp_alone_branch still points to TG3 cfs_rq because its throttled
  parent can't be enqueued when the lock is released.
  tmp_alone_branch doesn't point to rq->leaf_cfs_rq_list whereas it should.

  So if TG3 cfs_rq is removed or destroyed before tmp_alone_branch
  points on another TG cfs_rq, the next TG cfs_rq that will be added,
  will be linked outside rq->leaf_cfs_rq_list - which is bad.

  In addition, we can break the ordering of the cfs_rq in
  rq->leaf_cfs_rq_list but this ordering is used to update and
  propagate the update from leaf down to root."

Instead of trying to work through all these cases and trying to reproduce
the very high loads that produced the lockup to begin with, simplify
the code temporarily by reverting a9e7f6544b9c - which change was clearly
not thought through completely.

This (hopefully) gives us a kernel that doesn't lock up so people
can continue to enjoy their holidays without worrying about regressions. ;-)

[ mingo: Wrote changelog, fixed weird spelling in code comment while at it. ]

Analyzed-by: Xie XiuQi <xiexiuqi@huawei.com>
Analyzed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reported-by: Zhipeng Xie <xiezhipeng1@huawei.com>
Reported-by: Sargun Dhillon <sargun@sargun.me>
Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
Tested-by: Zhipeng Xie <xiezhipeng1@huawei.com>
Tested-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: <stable@vger.kernel.org> # v4.13+
Cc: Bin Li <huawei.libin@huawei.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path")
Link: http://lkml.kernel.org/r/1545879866-27809-1-git-send-email-xiexiuqi@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-25 19:26:35 +05:30
Cheng Jian
4f395300de sched/fair: Optimize select_idle_cpu
select_idle_cpu() will scan the LLC domain for idle CPUs,
it's always expensive. so the next commit :

	1ad3aaf3fcd2 ("sched/core: Implement new approach to scale select_idle_cpu()")

introduces a way to limit how many CPUs we scan.

But it consume some CPUs out of 'nr' that are not allowed
for the task and thus waste our attempts. The function
always return nr_cpumask_bits, and we can't find a CPU
which our task is allowed to run.

Cpumask may be too big, similar to select_idle_core(), use
per_cpu_ptr 'select_idle_mask' to prevent stack overflow.

Fixes: 1ad3aaf3fcd2 ("sched/core: Implement new approach to scale select_idle_cpu()")
Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20191213024530.28052-1-cj.chengjian@huawei.com
Signed-off-by: DennySPB <dennyspb@gmail.com>
2021-05-25 19:26:35 +05:30
Tejun Heo
f0773485ab sched/fair: Fix O(nr_cgroups) in load balance path
Currently, rq->leaf_cfs_rq_list is a traversal ordered list of all
live cfs_rqs which have ever been active on the CPU; unfortunately,
this makes update_blocked_averages() O(# total cgroups) which isn't
scalable at all.

This shows up as a small CPU consumption and scheduling latency
increase in the load balancing path in systems with CPU controller
enabled across most cgroups.  In an edge case where temporary cgroups
were leaking, this caused the kernel to consume good several tens of
percents of CPU cycles running update_blocked_averages(), each run
taking multiple millisecs.

This patch fixes the issue by taking empty and fully decayed cfs_rqs
off the rq->leaf_cfs_rq_list.

Signed-off-by: Tejun Heo <tj@kernel.org>
[ Added cfs_rq_is_decayed() ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Chris Mason <clm@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170426004350.GB3222@wtj.duckdns.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-05-25 19:26:35 +05:30
Vincent Guittot
aa2e2889cc sched/fair: Fix load_balance redo for !imbalance
It can happen that load_balance() finds a busiest group and then a
busiest rq but the calculated imbalance is in fact 0.

In such situation, detach_tasks() returns immediately and lets the
flag LBF_ALL_PINNED set. The busiest CPU is then wrongly assumed to
have pinned tasks and removed from the load balance mask. then, we
redo a load balance without the busiest CPU. This creates wrong load
balance situation and generates wrong task migration.

If the calculated imbalance is 0, it's useless to try to find a
busiest rq as no task will be migrated and we can return immediately.

This situation can happen with heterogeneous system or smp system when
RT tasks are decreasing the capacity of some CPUs.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: jhugo@codeaurora.org
Link: http://lkml.kernel.org/r/1536306664-29827-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: atndko <z1281552865@gmail.com>
Signed-off-by: Zlatan Radovanovic <zlatan.radovanovic@fet.ba>
2021-05-25 19:26:35 +05:30
Sai Gurrappadi
8b2fae44ed tick: Don't clear idle and iowait sums on CPU down
NOHZ related per-cpu data is cleared on CPU down. This was introduced by
4b0c0f294 "tick: Cleanup NOHZ per cpu data on cpu down" which breaks
/proc/stats because the idle and iowait sums are now non-monotonic
across a CPU down/up cycle.

Fix this by not clearing the idle_sleeptime and iowait_sleeptime fields
on CPU down.

Change-Id: Ifb755e15b601c74dad81655ebb25e037dac2afd0
Signed-off-by: Sai Gurrappadi <sgurrappadi@nvidia.com>
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Patch-mainline: linux-kernel @ 30 Apr 2014 13:18:34
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Swetha Chikkaboraiah <schikk@codeaurora.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
2021-05-25 19:26:35 +05:30
Mel Gorman
a3f3189165 mm: slub: Default slub_max_order to 0
To avoid locking and per-cpu overhead, SLUB optimisically uses
high-order allocations up to order-3 by default and falls back to
lower allocations if they fail. While care is taken that the caller
and kswapd take no unusual steps in response to this, there are
further consequences like shrinkers who have to free more objects to
release any memory. There is anecdotal evidence that significant time
is being spent looping in shrinkers with insufficient progress being
made (https://lkml.org/lkml/2011/4/28/361) and keeping kswapd awake.

SLUB is now the default allocator and some bug reports have been
pinned down to SLUB using high orders during operations like
copying large amounts of data. SLUBs use of high-orders benefits
applications that are sized to memory appropriately but this does not
necessarily apply to large file servers or desktops.  This patch
causes SLUB to use order-0 pages like SLAB does by default.
There is further evidence that this keeps kswapd's usage lower
(https://lkml.org/lkml/2011/5/10/383).

Signed-off-by: Mel Gorman <mgorman@suse.de>
2021-05-25 19:26:35 +05:30
Kefeng Wang
682d77d038 arm64: Add support ARCH_SUPPORTS_INT128
The gcc support __SIZEOF_INT128__ and __int128 in arm64, thus,
enable ARCH_SUPPORTS_INT128 to make mul_u64_u32_shr() a bit
more efficient in scheduler.

Change-Id: I56abffc9acb4c519acdd1be3dc2aa1a7a66c385d
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: mydongistiny <jaysonedson@gmail.com>
Signed-off-by: DennySPB <dennyspb@gmail.com>
2021-05-25 19:26:35 +05:30