802645 Commits

Author SHA1 Message Date
Al Viro
5e4c4c5f6b
BACKPORT: FROMGIT: [PATCH] __inode_security_revalidate() never gets NULL opt_dentry
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:58 +07:00
Al Viro
95f5c735e3
BACKPORT: FROMGIT: [PATCH] fix breakage caused by d_find_alias() semantics change
"VFS: don't keep disconnected dentries on d_anon" had a non-trivial
side-effect - d_unhashed() now returns true for those dentries,
making d_find_alias() skip them altogether.  For most of its callers
that's fine - we really want a connected alias there.  However,
there is a codepath where we relied upon picking such aliases
if nothing else could be found - selinux delayed initialization
of contexts for inodes on already mounted filesystems used to
rely upon that.

Cc: stable@kernel.org # f1ee616214cb "VFS: don't keep disconnected dentries on d_anon"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:57 +07:00
Richard Guy Briggs
0d9be5f3fc
BACKPORT: FROMGIT: [PATCH] audit: normalize MAC_POLICY_LOAD record
The audit MAC_POLICY_LOAD record had redundant dangling keywords and was
missing information about which LSM was responsible and its completion
status.  While this record is only issued on success, the parser expects
the res= field to be present.

Old record:
type=MAC_POLICY_LOAD msg=audit(1479299795.404:43): policy loaded auid=0 ses=1

Delete the redundant dangling keywords, add the lsm= field and the res=
field.

New record:
type=MAC_POLICY_LOAD msg=audit(1523293846.204:894): auid=0 ses=1 lsm=selinux res=1

See: https://github.com/linux-audit/audit-kernel/issues/47

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:57 +07:00
Richard Guy Briggs
c178d7130e
BACKPORT: FROMGIT: [PATCH] audit: normalize MAC_STATUS record
There were two formats of the audit MAC_STATUS record, one of which was more
standard than the other.  One listed enforcing status changes and the
other listed enabled status changes with a non-standard label.  In
addition, the record was missing information about which LSM was
responsible and the operation's completion status.  While this record is
only issued on success, the parser expects the res= field to be present.

old enforcing/permissive:
type=MAC_STATUS msg=audit(1523312831.378:24514): enforcing=0 old_enforcing=1 auid=0 ses=1
old enable/disable:
type=MAC_STATUS msg=audit(1523312831.378:24514): selinux=0 auid=0 ses=1

List both sets of status and old values and add the lsm= field and the
res= field.

Here is the new format:
type=MAC_STATUS msg=audit(1523293828.657:891): enforcing=0 old_enforcing=1 auid=0 ses=1 enabled=1 old-enabled=1 lsm=selinux res=1

This record already accompanied a SYSCALL record.

See: https://github.com/linux-audit/audit-kernel/issues/46

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: 80-char fixes, merge fuzz, use new SELinux state functions]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:56 +07:00
Stephen Smalley
3f24797324
BACKPORT: FROMGIT: [PATCH] selinux: fix missing dput() before selinuxfs unmount
Commit 0619f0f5e36f ("selinux: wrap selinuxfs state") triggers a BUG
when SELinux is runtime-disabled (i.e. systemd or equivalent disables
SELinux before initial policy load via /sys/fs/selinux/disable based on
/etc/selinux/config SELINUX=disabled).

This does not manifest if SELinux is disabled via kernel command line
argument or if SELinux is enabled (permissive or enforcing).

Before:
  SELinux:  Disabled at runtime.
  BUG: Dentry 000000006d77e5c7{i=17,n=null}  still in use (1) [unmount of selinuxfs selinuxfs]

After:
  SELinux:  Disabled at runtime.

Fixes: 0619f0f5e36f ("selinux: wrap selinuxfs state")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:56 +07:00
Kirill Tkhai
264647a7ef
BACKPORT: FROMGIT: [PATCH] security: Remove rtnl_lock() in
selinux_xfrm_notify_policyload()

rt_genid_bump_all() consists of ipv4 and ipv6 part.
ipv4 part is incrementing of net::ipv4::rt_genid,
and I see many places, where it's read without rtnl_lock().

ipv6 part calls __fib6_clean_all(), and it's also
called without rtnl_lock() in other places.

So, rtnl_lock() here was used to iterate net_namespace_list only,
and we can remove it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:55 +07:00
Matthew Garrett
dfeb1b9925
BACKPORT: FROMGIT: [PATCH] security: Add a cred_getsecid hook
For IMA purposes, we want to be able to obtain the prepared secid in the
bprm structure before the credentials are committed. Add a cred_getsecid
hook that makes this possible.

Signed-off-by: Matthew Garrett <mjg59@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:55 +07:00
Eric W. Biederman
9ee79460d0
BACKPORT: FROMGIT: [PATCH] msg/security: Pass kern_ipc_perm not msg_queue into the
msg_queue security hooks

All of the implementations of security hooks that take msg_queue only
access q_perm the struct kern_ipc_perm member.  This means the
dependencies of the msg_queue security hooks can be simplified by
passing the kern_ipc_perm member of msg_queue.

Making this change will allow struct msg_queue to become private to
ipc/msg.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:54 +07:00
Eric W. Biederman
a34d86144a
BACKPORT: FROMGIT: [PATCH] shm/security: Pass kern_ipc_perm not shmid_kernel into the
shm security hooks

All of the implementations of security hooks that take shmid_kernel only
access shm_perm the struct kern_ipc_perm member.  This means the
dependencies of the shm security hooks can be simplified by passing
the kern_ipc_perm member of shmid_kernel..

Making this change will allow struct shmid_kernel to become private to ipc/shm.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:53 +07:00
Eric W. Biederman
72372af9d6
BACKPORT: FROMGIT: sem/security: Pass kern_ipc_perm not sem_array into the sem security hooks
All of the implementations of security hooks that take sem_array only
access sem_perm the struct kern_ipc_perm member.  This means the
dependencies of the sem security hooks can be simplified by passing
the kern_ipc_perm member of sem_array.

Making this change will allow struct sem and struct sem_array
to become private to ipc/sem.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:53 +07:00
Stephen Smalley
92941cb641
BACKPORT: FROMGIT: [PATCH] selinux: fix handling of uninitialized selinux state in
get_bools/classes

If security_get_bools/classes are called before the selinux state is
initialized (i.e. before first policy load), then they should just
return immediately with no booleans/classes.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:52 +07:00
Eric Biggers
d9022c4e5d
BACKPORT: FROMGIT: [PATCH] selinux: constify write_op[]
write_op[] is never modified, so make it 'const'.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:51 +07:00
Peter Enderborg
74089ec737
BACKPORT: FROMGIT: selinux: Squash cleanup printk loggings
Replace printk with pr_* to avoid checkpatch warnings.

Signed-off-by: Peter Enderborg <peter.enderborg@sony.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:51 +07:00
Kent Overstreet
ad1bcf9435
BACKPORT: FROMGIT: selinux: convert to kvmalloc
The flex arrays were being used for constant sized arrays, so there's no
benefit to using flex_arrays over something simpler.

Link: http://lkml.kernel.org/r/20181217131929.11727-4-kent.overstreet@gmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:50 +07:00
Deepak Kumar Singh
813474a82b
rpmsg: glink: reset should_wakeup before calling system wakeup
There may be multiple packets coming in suspend state.
This may cause glink to call pm_system_wakeup for each packet.

To avoid such scenario make should_wakeup false after first packet
received in suspend state.

Change-Id: Ifa2bd13229ec756c11d02c1891f105596697e87b
Signed-off-by: Deepak Kumar Singh <deesin@codeaurora.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:49 +07:00
Andrzej Perczak
6813115336
drivers: power/irqchip: Add wakeup irq loggers
Currently wakeup irq logging is broken which ends up with "Resume cause
unknown" in logs. This makes battery stats unreadable thus turning power
usage analysis into divination from a crystal ball.

Fix this by logging wakeup irq to Google wakeup_stats driver.

Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:34 +07:00
Peter Zijlstra (Intel)
547fb63531
idle: Prevent late-arriving interrupts from disrupting offline
[ Upstream commit e78a761 ]

Scheduling-clock interrupts can arrive late in the CPU-offline process,
after idle entry and the subsequent call to cpuhp_report_idle_dead().
Once execution passes the call to rcu_report_dead(), RCU is ignoring
the CPU, which results in lockdep complaints when the interrupt handler
uses RCU:

------------------------------------------------------------------------

=============================
WARNING: suspicious RCU usage
5.2.0-rc1+ #681 Not tainted
-----------------------------
kernel/sched/fair.c:9542 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from offline CPU!
rcu_scheduler_active = 2, debug_locks = 1
no locks held by swapper/5/0.

stack backtrace:
CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.2.0-rc1+ #681
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
Call Trace:
 <IRQ>
 dump_stack+0x5e/0x8b
 trigger_load_balance+0xa8/0x390
 ? tick_sched_do_timer+0x60/0x60
 update_process_times+0x3b/0x50
 tick_sched_handle+0x2f/0x40
 tick_sched_timer+0x32/0x70
 __hrtimer_run_queues+0xd3/0x3b0
 hrtimer_interrupt+0x11d/0x270
 ? sched_clock_local+0xc/0x74
 smp_apic_timer_interrupt+0x79/0x200
 apic_timer_interrupt+0xf/0x20
 </IRQ>
RIP: 0010:delay_tsc+0x22/0x50
Code: ff 0f 1f 80 00 00 00 00 65 44 8b 05 18 a7 11 48 0f ae e8 0f 31 48 89 d6 48 c1 e6 20 48 09 c6 eb 0e f3 90 65 8b 05 fe a6 11 48 <41> 39 c0 75 18 0f ae e8 0f 31 48 c1 e2 20 48 09 c2 48 89 d0 48 29
RSP: 0000:ffff8f92c0157ed0 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000005 RBX: ffff8c861f356400 RCX: ffff8f92c0157e64
RDX: 000000321214c8cc RSI: 00000032120daa7f RDI: 0000000000260f15
RBP: 0000000000000005 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8c861ee18000 R15: ffff8c861ee18000
 cpuhp_report_idle_dead+0x31/0x60
 do_idle+0x1d5/0x200
 ? _raw_spin_unlock_irqrestore+0x2d/0x40
 cpu_startup_entry+0x14/0x20
 start_secondary+0x151/0x170
 secondary_startup_64+0xa4/0xb0

------------------------------------------------------------------------

This happens rarely, but can be forced by happen more often by
placing delays in cpuhp_report_idle_dead() following the call to
rcu_report_dead().  With this in place, the following rcutorture
scenario reproduces the problem within a few minutes:

tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 5 --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" --configs "TREE04"

This commit uses the crude but effective expedient of moving the disabling
of interrupts within the idle loop to precede the cpu_is_offline()
check.  It also invokes tick_nohz_idle_stop_tick() instead of
tick_nohz_idle_stop_tick_protected() to shut off the scheduling-clock
interrupt.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
[ paulmck: Revert tick_nohz_idle_stop_tick_protected() removal, new callers. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:33 +07:00
Juhyung Park
15c03df8c0
sched: promote nodes out of CONFIG_SCHED_DEBUG
xNombre: Android modifies some scheduler parameters on boot.
Applying these manually resulted in better hackbench performance.

Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:27:29 +07:00
Minchan Kim
be7a87c7d6
mm: introduce deactivate_page
perprocess reclaims needs to deactivate file pages from active LRU
when echo file > /proc/<pid>/reclaim.
Add deactivate_file pages.

Bug: 131016077
Bug: 153444106
(cherry picked from b07eab27085611203ad359b7f4eecd138d7d771a)
Change-Id: I06fed20103671e4ca6fb8663d5029736442162a5
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:23:44 +07:00
Minchan Kim
6e93bc3e86
mm: reclaim more pages to find free pages in compaction
There were many order-3 fail allocation report while VM had lots of
*reclaimable* memory.

17353.434071] kworker/u16:4 invoked oom-killer: gfp_mask=0x6160c0(GFP_KERNEL|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC), nodemask=(null), order=3, oom_score_adj=0
[17353.434079] kworker/u16:4 cpuset=/ mems_allowed=0
[17353.434086] CPU: 6 PID: 30045 Comm: kworker/u16:4 Tainted: G S      WC O      4.19.95-g8137b6ce669e-ab6554412 #1
[17353.434089] Hardware name: Google Inc. MSM sm7250 v2 Bramble DVT (DT)
[17353.434194] Workqueue: iparepwq95 __typeid__ZTSFiP44ipa_disable_force_clear_datapath_req_msg_v01E_global_addr [ipa3]
[17353.434197] Call trace:
[17353.434206] __typeid__ZTSFjP11task_structPK11user_regsetE_global_addr+0x14/0x18
[17353.434210] dump_stack+0xbc/0xf8
[17353.434217] dump_header+0xc8/0x250
[17353.434220] oom_kill_process+0x130/0x538
[17353.434222] out_of_memory+0x320/0x444
[17353.434226] __alloc_pages_nodemask+0x1124/0x13b4
[17353.434314] ipa3_alloc_rx_pkt_page+0x64/0x1a8 [ipa3]
[17353.434403] ipa3_wq_page_repl+0x78/0x1a4 [ipa3]
[17353.434407] process_one_work+0x3a8/0x6e4
[17353.434410] worker_thread+0x394/0x820
[17353.434413] kthread+0x19c/0x1ac
[17353.434417] ret_from_fork+0x10/0x18
[17353.434419] Mem-Info:
[17353.434424] active_anon:357378 inactive_anon:119141 isolated_anon:13\x0a active_file:97495 inactive_file:122151 isolated_file:22\x0a unevictable:49750 dirty:3553 writeback:0 unstable:0\x0a slab_reclaimable:30018 slab_unreclaimable:73884\x0a mapped:259586 shmem:27580 pagetables:39581 bounce:0\x0a free:17710 free_pcp:301 free_cma:0
[17353.434433] Node 0 active_anon:1429512kB inactive_anon:476564kB active_file:389980kB inactive_file:488604kB unevictable:199000kB isolated(anon):52kB isolated(file):88kB mapped:1038344kB dirty:14212kB writeback:0kB shmem:110320kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[17353.434439] Normal free:70840kB min:9172kB low:43900kB high:49484kB active_anon:1429284kB inactive_anon:476336kB active_file:389980kB inactive_file:488604kB unevictable:199000kB writepending:14212kB present:5764280kB managed:5584928kB mlocked:199000kB kernel_stack:92656kB shadow_call_stack:5792kB pagetables:158324kB bounce:0kB free_pcp:1204kB local_pcp:108kB free_cma:0kB
[17353.434441] lowmem_reserve[]: 0 0
[17353.434444] Normal: 8956*4kB (UMEH) 2726*8kB (UH) 751*16kB (UH) 33*32kB (H) 7*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 71152kB
[17353.434451] 300317 total pagecache pages
[17353.434454] 4228 pages in swap cache
[17353.434456] Swap cache stats: add 20710158, delete 20707317, find 1014864/9891370
[17353.434459] Free swap  = 103732kB
[17353.434460] Total swap = 2097148kB
[17353.434462] 1441070 pages RAM
[17353.434465] 0 pages HighMem/MovableOnly
[17353.434466] 44838 pages reserved
[17353.434469] 73728 pages cma reserved

When we saw the trace, compaction finished with COMPACT_COMPLETE(iow, it
already did full scanning a zone but failed to create order-3 allocation)
so should_compact_retry returns "false".

           <...>-30045 [006] .... 17353.433704: reclaim_retry_zone: node=0 zone=Normal   order=3 reclaimable=696132 available=713920 min_wmark=2293 no_progress_loops=0 wmark_check=0
           <...>-30045 [006] .... 17353.433706: compact_retry: order=3 priority=COMPACT_PRIO_SYNC_FULL compaction_result=failed retries=0 max_retries=16 should_retry=0

If we see previous trace, we could see compaction is hard to find free pages
in the zone so free scanner of compaction moves fast toward migration scanner
and finally, they(migration scanner and free page scanner) crossed over.

           <...>-30045 [006] .... 17353.427026: mm_compaction_isolate_freepages: range=(0x144c00 ~ 0x145000) nr_scanned=784 nr_taken=0
           <...>-30045 [006] .... 17353.427037: mm_compaction_isolate_freepages: range=(0x144800 ~ 0x144c00) nr_scanned=1019 nr_taken=0
           <...>-30045 [006] .... 17353.427049: mm_compaction_isolate_freepages: range=(0x144400 ~ 0x144800) nr_scanned=880 nr_taken=1
           <...>-30045 [006] .... 17353.427061: mm_compaction_isolate_freepages: range=(0x144000 ~ 0x144400) nr_scanned=869 nr_taken=0
           <...>-30045 [006] .... 17353.427212: mm_compaction_isolate_freepages: range=(0x140c00 ~ 0x141000) nr_scanned=1016 nr_taken=0
..
..
           <...>-30045 [006] .... 17353.433696: mm_compaction_finished: node=0 zone=Normal   order=3 ret=complete
           <...>-30045 [006] .... 17353.433698: mm_compaction_end: zone_start=0x80600 migrate_pfn=0xc9400 free_pfn=0xc9500 zone_end=0x200000, mode=sync status=complete

If we see previous trace to see reclaim activities, we could see
it was not hard to reclaim memory.

           <...>-30045 [006] .... 17353.413941: mm_vmscan_direct_reclaim_begin: order=3 may_writepage=1 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC classzone_idx=0
           <...>-30045 [006] d..1 17353.413946: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=8 nr_scanned=8 nr_skipped=0 nr_taken=8 lru=inactive_anon
           <...>-30045 [006] .... 17353.413958: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=8 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=8 nr_ref_keep=0 nr_unmap_fail=0 priority=12 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.413960: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119119 inactive=119119 total_active=357352 active=357352 ratio=3 flags=RECLAIM_WB_ANON
           <...>-30045 [006] d..1 17353.413965: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=22 nr_scanned=22 nr_skipped=0 nr_taken=22 lru=inactive_file
           <...>-30045 [006] .... 17353.413979: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=22 nr_reclaimed=22 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=12 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.413979: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122195 inactive=122195 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE
           <...>-30045 [006] .... 17353.413980: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119119 inactive=119119 total_active=357352 active=357352 ratio=3 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414134: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414135: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON
           <...>-30045 [006] d..1 17353.414141: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=29 nr_scanned=29 nr_skipped=0 nr_taken=29 lru=inactive_anon
           <...>-30045 [006] .... 17353.414170: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=29 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=29 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.414170: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119107 inactive=119107 total_active=357385 active=357385 ratio=3 flags=RECLAIM_WB_ANON
           <...>-30045 [006] d..1 17353.414176: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=active_anon
           <...>-30045 [006] .... 17353.414206: mm_vmscan_lru_shrink_active: nid=0 nr_taken=32 nr_active=0 nr_deactivated=32 nr_referenced=32 priority=10 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
           <...>-30045 [006] d..1 17353.414212: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=inactive_file
           <...>-30045 [006] .... 17353.414225: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=32 nr_reclaimed=32 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.414225: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122131 inactive=122131 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE
           <...>-30045 [006] d..1 17353.414228: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=16 nr_scanned=16 nr_skipped=0 nr_taken=16 lru=inactive_file
           <...>-30045 [006] .... 17353.414235: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=16 nr_reclaimed=16 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
           <...>-30045 [006] .... 17353.414235: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122115 inactive=122115 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE
           <...>-30045 [006] .... 17353.414236: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119139 inactive=119139 total_active=357353 active=357353 ratio=3 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414320: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414321: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON
           <...>-30045 [006] .... 17353.414339: mm_vmscan_direct_reclaim_end: nr_reclaimed=70

Based on that, we could assume that if reclaimer has reclaimed more pages,
compaction could find free pages easily so free scanner of compaction were
not moved fast like that. That means it wouldn't fail for non-costly high-order
allocation.

What this patch does is if the order is non-costly high order allocation,
it will keep trying migration with reclaiming if system has enough
reclaimable memory.

Bug: 156785617
Bug: 158449887
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ic02146be8acc4334b51be6cea54411432547608d
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:23:44 +07:00
Vlastimil Babka
1ce0e31661
mm, compaction: raise compaction priority after it withdrawns
Mike Kravetz reports that "hugetlb allocations could stall for minutes or
hours when should_compact_retry() would return true more often then it
should.  Specifically, this was in the case where compact_result was
COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no progress was being
made."

The problem is that the compaction_withdrawn() test in
should_compact_retry() includes compaction outcomes that are only possible
on low compaction priority, and results in a retry without increasing the
priority.  This may result in furter reclaim, and more incomplete
compaction attempts.

With this patch, compaction priority is raised when possible, or
should_compact_retry() returns false.

The COMPACT_SKIPPED result doesn't really fit together with the other
outcomes in compaction_withdrawn(), as that's a result caused by
insufficient order-0 pages, not due to low compaction priority.  With this
patch, it is moved to a new compaction_needs_reclaim() function, and for
that outcome we keep the current logic of retrying if it looks like
reclaim will be able to help.

Bug: 156785617
Link: http://lkml.kernel.org/r/20190806014744.15446-4-mike.kravetz@oracle.com
Reported-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I67134003597caa963d5ecff7e2a42ef101e3aa4a
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:23:43 +07:00
Jaewon Kim
9ad9dbeb35
BACKPORT: page_alloc: consider highatomic reserve in watermark fast
zone_watermark_fast was introduced by commit 48ee5f3696f6 ("mm,
page_alloc: shortcut watermark checks for order-0 pages").  The commit
simply checks if free pages is bigger than watermark without additional
calculation such like reducing watermark.

It considered free cma pages but it did not consider highatomic reserved.
This may incur exhaustion of free pages except high order atomic free
pages.

Assume that reserved_highatomic pageblock is bigger than watermark min,
and there are only few free pages except high order atomic free.  Because
zone_watermark_fast passes the allocation without considering high order
atomic free, normal reclaimable allocation like GFP_HIGHUSER will consume
all the free pages.  Then finally order-0 atomic allocation may fail on
allocation.

This means watermark min is not protected against non-atomic allocation.
The order-0 atomic allocation with ALLOC_HARDER unwantedly can be failed.
Additionally the __GFP_MEMALLOC allocation with ALLOC_NO_WATERMARKS also
can be failed.

To avoid the problem, zone_watermark_fast should consider highatomic
reserve.  If the actual size of high atomic free is counted accurately
like cma free, we may use it.  On this patch just use
nr_reserved_highatomic.  Additionally introduce
__zone_watermark_unusable_free to factor out common parts between
zone_watermark_fast and __zone_watermark_ok.

This is an example of ALLOC_HARDER allocation failure using v4.19 based
kernel.

 Binder:9343_3: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
 Call trace:
 [<ffffff8008f40f8c>] dump_stack+0xb8/0xf0
 [<ffffff8008223320>] warn_alloc+0xd8/0x12c
 [<ffffff80082245e4>] __alloc_pages_nodemask+0x120c/0x1250
 [<ffffff800827f6e8>] new_slab+0x128/0x604
 [<ffffff800827b0cc>] ___slab_alloc+0x508/0x670
 [<ffffff800827ba00>] __kmalloc+0x2f8/0x310
 [<ffffff80084ac3e0>] context_struct_to_string+0x104/0x1cc
 [<ffffff80084ad8fc>] security_sid_to_context_core+0x74/0x144
 [<ffffff80084ad880>] security_sid_to_context+0x10/0x18
 [<ffffff800849bd80>] selinux_secid_to_secctx+0x20/0x28
 [<ffffff800849109c>] security_secid_to_secctx+0x3c/0x70
 [<ffffff8008bfe118>] binder_transaction+0xe68/0x454c
 Mem-Info:
 active_anon:102061 inactive_anon:81551 isolated_anon:0
  active_file:59102 inactive_file:68924 isolated_file:64
  unevictable:611 dirty:63 writeback:0 unstable:0
  slab_reclaimable:13324 slab_unreclaimable:44354
  mapped:83015 shmem:4858 pagetables:26316 bounce:0
  free:2727 free_pcp:1035 free_cma:178
 Node 0 active_anon:408244kB inactive_anon:326204kB active_file:236408kB inactive_file:275696kB unevictable:2444kB isolated(anon):0kB isolated(file):256kB mapped:332060kB dirty:252kB writeback:0kB shmem:19432kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
 Normal free:10908kB min:6192kB low:44388kB high:47060kB active_anon:409160kB inactive_anon:325924kB active_file:235820kB inactive_file:276628kB unevictable:2444kB writepending:252kB present:3076096kB managed:2673676kB mlocked:2444kB kernel_stack:62512kB pagetables:105264kB bounce:0kB free_pcp:4140kB local_pcp:40kB free_cma:712kB
 lowmem_reserve[]: 0 0
 Normal: 505*4kB (H) 357*8kB (H) 201*16kB (H) 65*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 10236kB
 138826 total pagecache pages
 5460 pages in swap cache
 Swap cache stats: add 8273090, delete 8267506, find 1004381/4060142

This is an example of ALLOC_NO_WATERMARKS allocation failure using v4.14
based kernel.

 kswapd0: page allocation failure: order:0, mode:0x140000a(GFP_NOIO|__GFP_HIGHMEM|__GFP_MOVABLE), nodemask=(null)
 kswapd0 cpuset=/ mems_allowed=0
 CPU: 4 PID: 1221 Comm: kswapd0 Not tainted 4.14.113-18770262-userdebug #1
 Call trace:
 [<0000000000000000>] dump_backtrace+0x0/0x248
 [<0000000000000000>] show_stack+0x18/0x20
 [<0000000000000000>] __dump_stack+0x20/0x28
 [<0000000000000000>] dump_stack+0x68/0x90
 [<0000000000000000>] warn_alloc+0x104/0x198
 [<0000000000000000>] __alloc_pages_nodemask+0xdc0/0xdf0
 [<0000000000000000>] zs_malloc+0x148/0x3d0
 [<0000000000000000>] zram_bvec_rw+0x410/0x798
 [<0000000000000000>] zram_rw_page+0x88/0xdc
 [<0000000000000000>] bdev_write_page+0x70/0xbc
 [<0000000000000000>] __swap_writepage+0x58/0x37c
 [<0000000000000000>] swap_writepage+0x40/0x4c
 [<0000000000000000>] shrink_page_list+0xc30/0xf48
 [<0000000000000000>] shrink_inactive_list+0x2b0/0x61c
 [<0000000000000000>] shrink_node_memcg+0x23c/0x618
 [<0000000000000000>] shrink_node+0x1c8/0x304
 [<0000000000000000>] kswapd+0x680/0x7c4
 [<0000000000000000>] kthread+0x110/0x120
 [<0000000000000000>] ret_from_fork+0x10/0x18
 Mem-Info:
 active_anon:111826 inactive_anon:65557 isolated_anon:0\x0a active_file:44260 inactive_file:83422 isolated_file:0\x0a unevictable:4158 dirty:117 writeback:0 unstable:0\x0a            slab_reclaimable:13943 slab_unreclaimable:43315\x0a mapped:102511 shmem:3299 pagetables:19566 bounce:0\x0a free:3510 free_pcp:553 free_cma:0
 Node 0 active_anon:447304kB inactive_anon:262228kB active_file:177040kB inactive_file:333688kB unevictable:16632kB isolated(anon):0kB isolated(file):0kB mapped:410044kB d irty:468kB writeback:0kB shmem:13196kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
 Normal free:14040kB min:7440kB low:94500kB high:98136kB reserved_highatomic:32768KB active_anon:447336kB inactive_anon:261668kB active_file:177572kB inactive_file:333768k           B unevictable:16632kB writepending:480kB present:4081664kB managed:3637088kB mlocked:16632kB kernel_stack:47072kB pagetables:78264kB bounce:0kB free_pcp:2280kB local_pcp:720kB free_cma:0kB        [ 4738.329607] lowmem_reserve[]: 0 0
 Normal: 860*4kB (H) 453*8kB (H) 180*16kB (H) 26*32kB (H) 34*64kB (H) 6*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 14232kB

This is trace log which shows GFP_HIGHUSER consumes free pages right
before ALLOC_NO_WATERMARKS.

  <...>-22275 [006] ....   889.213383: mm_page_alloc: page=00000000d2be5665 pfn=970744 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
  <...>-22275 [006] ....   889.213385: mm_page_alloc: page=000000004b2335c2 pfn=970745 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
  <...>-22275 [006] ....   889.213387: mm_page_alloc: page=00000000017272e1 pfn=970278 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
  <...>-22275 [006] ....   889.213389: mm_page_alloc: page=00000000c4be79fb pfn=970279 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
  <...>-22275 [006] ....   889.213391: mm_page_alloc: page=00000000f8a51d4f pfn=970260 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
  <...>-22275 [006] ....   889.213393: mm_page_alloc: page=000000006ba8f5ac pfn=970261 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
  <...>-22275 [006] ....   889.213395: mm_page_alloc: page=00000000819f1cd3 pfn=970196 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
  <...>-22275 [006] ....   889.213396: mm_page_alloc: page=00000000f6b72a64 pfn=970197 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO
kswapd0-1207  [005] ...1   889.213398: mm_page_alloc: page= (null) pfn=0 order=0 migratetype=1 nr_free=3650 gfp_flags=GFP_NOWAIT|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_MOVABLE

[jaewon31.kim@samsung.com: remove redundant code for high-order]
  Link: http://lkml.kernel.org/r/20200623035242.27232-1-jaewon31.kim@samsung.com

Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: Michal Hocko <mhocko@kernel.org>
Link: http://lkml.kernel.org/r/20200619235958.11283-1-jaewon31.kim@samsung.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f27ce0e14088b23f8d54ae4a44f70307ec420e64)
Change-Id: I2638d575f809e885272c3b2a4e5100f2d6b8934d
Bug: 175184106
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:23:43 +07:00
Minchan
6e6e36e8ce
mm: abort per-process reclaim
It's possible user to launch a app while platform reclaims memory
of the app via per-process reclaim. In that case, platform should
stop the reclaim and let the app launch via releasing mmap_sem.

Bug: 158479061
Signed-off-by: Minchan <minchan@google.com>
Change-Id: I7315031981629f32eb16a465e0e6da3cd13d6373
(cherry picked from commit 2cd57b68c235a8b5cbd7c970cddc1a0fdd643fd1)
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:23:42 +07:00
Minchan Kim
a9b4957653
mm: perproc-reclaim: do not scanning anonymous vma
If we don't have enough swap space or no more anonymous page,
it's pointless to scan anonymous vma. Let's skip it.

Bug: 131016077
Bug: 152499875
Test: boot
(cherry picked from f57c6720030cf839e2bef149796d7e9f86c0d7d6)
Change-Id: If9d0831e9e712a2a335c3f3e771eb8ed4af94c6e
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:23:41 +07:00
Minchan Kim
feff085d5c
mm: perproc-reclaim: do not discarding file-backed pages
With testing, I found sometime perproc-reclaim shows regression
with more LMKD kill because LMKD still relies on the file-LRU size
while perproc-reclaim could make the file-LRU smaller.

This patch changes the policy of file-backed LRU shrinking.
Instead of discarding pages of perprocess reclaim, just makes
them easy-reclaimable pages, for instance, move active file pages
to inactive, clear PG_referenced and pte access bits.

With that, we could keep file-LRU bigger so that more change to
hit in the cache while VM can shrink quickly when memory spike
happens.

Bug: 131016077
Bug: 153444106
Test: boot
(cherry picked from 0f412d789861f06b14737d1e681b06e95cefda62)
Change-Id: I6b054feb223ac66977ddcf92a669f032d4030de1
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:23:40 +07:00
Minchan Kim
72220ad54d
mm: per-process reclaim
These day, there are many platforms available in the embedded market
and they are smarter than kernel which has very limited information
about working set so they want to involve memory management more heavily
like android's lowmemory killer and ashmem or recent many lowmemory
notifier.

One of the simple imagine scenario about userspace's intelligence is that
platform can manage tasks as forground and background so it would be
better to reclaim background's task pages for end-user's *responsibility*
although it has frequent referenced pages.

This patch adds new knob "reclaim under proc/<pid>/" so task manager
can reclaim any target process anytime, anywhere. It could give another
method to platform for using memory efficiently.

It can avoid process killing for getting free memory, which was really
terrible experience because I lost my best score of game I had ever
after I switch the phone call while I enjoyed the game.

Reclaim file-backed pages only.
	echo file > /proc/PID/reclaim
Reclaim anonymous pages only.
	echo anon > /proc/PID/reclaim
Reclaim all pages
	echo all > /proc/PID/reclaim

Bug: 131016077
Bug: 153444106
Test: boot
(cherry picked from 18c2af05a553f17d354b88b3a45dadc114c8c72c)
Change-Id: I99b51544f79202c097214d3856678cac4449a743
Signed-off-by: Tim Murray <timmurray@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:23:40 +07:00
Andrzej Perczak
dc764e5ee6
mm: Remove process reclaim
It will be backported from redbull.

Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:23:38 +07:00
Matthew Wilcox
215918c2f1
mm: get 7% more pages in a pagevec
We don't have to use an entire 'long' for the number of elements in the
pagevec; we know it's a number between 0 and 14 (now 15).  So we can
store it in a char, and then the bool packs next to it and we still have
two or six bytes of padding for more elements in the header.  That gives
us space to cram in an extra page.

Link: http://lkml.kernel.org/r/20171206022521.GM26021@bombadil.infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:57 +07:00
Mel Gorman
b698bab3fb
mm, pagevec: rename pagevec drained field
According to Vlastimil Babka, the drained field in pagevec is
potentially misleading because it might be interpreted as draining this
pagevec instead of the percpu lru pagevecs.  Rename the field for
clarity.

Link: http://lkml.kernel.org/r/20171019093346.ylahzdpzmoriyf4v@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:57 +07:00
Mel Gorman
9443cf7bcc
mm, pagevec: remove cold parameter for pagevecs
Every pagevec_init user claims the pages being released are hot even in
cases where it is unlikely the pages are hot.  As no one cares about the
hotness of pages being released to the allocator, just ditch the
parameter.

No performance impact is expected as the overhead is marginal.  The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.

Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:56 +07:00
Mel Gorman
8fbf24ee0e
mm: only drain per-cpu pagevecs once per pagevec usage
When a pagevec is initialised on the stack, it is generally used
multiple times over a range of pages, looking up entries and then
releasing them.  On each pagevec_release, the per-cpu deferred LRU
pagevecs are drained on the grounds the page being released may be on
those queues and the pages may be cache hot.  In many cases only the
first drain is necessary as it's unlikely that the range of pages being
walked is racing against LRU addition.  Even if there is such a race,
the impact is marginal where as constantly redraining the lru pagevecs
costs.

This patch ensures that pagevec is only drained once in a given
lifecycle without increasing the cache footprint of the pagevec
structure.  Only sparsetruncate tiny is shown here as large files have
many exceptional entries and calls pagecache_release less frequently.

sparsetruncate (tiny)
                              4.14.0-rc4             4.14.0-rc4
                        batchshadow-v1r1          onedrain-v1r1
Min          Time      141.00 (   0.00%)      141.00 (   0.00%)
1st-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
Max-95%      Time      146.00 (   0.00%)      145.00 (   0.68%)
Max-99%      Time      198.00 (   0.00%)      194.00 (   2.02%)
Max          Time      254.00 (   0.00%)      208.00 (  18.11%)
Amean        Time      145.12 (   0.00%)      144.30 (   0.56%)
Stddev       Time       12.74 (   0.00%)        9.62 (  24.49%)
Coeff        Time        8.78 (   0.00%)        6.67 (  24.06%)
Best99%Amean Time      144.29 (   0.00%)      143.82 (   0.32%)
Best95%Amean Time      142.68 (   0.00%)      142.31 (   0.26%)
Best90%Amean Time      142.52 (   0.00%)      142.19 (   0.24%)
Best75%Amean Time      142.26 (   0.00%)      141.98 (   0.20%)
Best50%Amean Time      141.90 (   0.00%)      141.71 (   0.13%)
Best25%Amean Time      141.80 (   0.00%)      141.43 (   0.26%)

The impact on bonnie is marginal and within the noise because a
significant percentage of the file being truncated has been reclaimed
and consists of shadow entries which reduce the hotness of the
pagevec_release path.

Link: http://lkml.kernel.org/r/20171018075952.10627-5-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:55 +07:00
celtare21
82bca10d95
sched/core: Fix rq clock warning in sched_migrate_to_cpumask_end()
The following warning occurs because we don't update the runqueue's
clock when taking rq->lock in sched_migrate_to_cpumask_end():

rq->clock_update_flags < RQCF_ACT_SKIP
WARNING: CPU: 0 PID: 991 at update_curr+0x1c8/0x2bc
[...]
Call trace:
update_curr+0x1c8/0x2bc
dequeue_task_fair+0x7c/0x1238
do_set_cpus_allowed+0x64/0x28c
sched_migrate_to_cpumask_end+0xa8/0x1b4
m_stop+0x40/0x78
seq_read+0x39c/0x4ac
__vfs_read+0x44/0x12c
vfs_read+0xf0/0x1d8
SyS_read+0x6c/0xcc
el0_svc_naked+0x34/0x38

Fix it by adding an update_rq_clock() call when taking rq->lock.

Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:52 +07:00
Danny Lin
209041fb24
qcacld: Fix regulatory domain country names
Clang warns:

../drivers/staging/qcacld-3.0/core/cds/src/cds_regdomain.c:284:43: warning: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Wstring-concatenation]
        {CTRY_TURKS_AND_CAICOS, FCC3_WORLD, "TC" "TURKS AND CAICOS"},
                                                 ^
                                                ,
../drivers/staging/qcacld-3.0/core/cds/src/cds_regdomain.c:284:38: note: place parentheses around the string literal to silence warning
        {CTRY_TURKS_AND_CAICOS, FCC3_WORLD, "TC" "TURKS AND CAICOS"},
                                            ^
../drivers/staging/qcacld-3.0/core/cds/src/cds_regdomain.c:296:45: warning: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Wstring-concatenation]
        {CTRY_WALLIS_AND_FUTUNA, ETSI1_WORLD, "WF" "WALLIS"},
                                                   ^
                                                  ,
../drivers/staging/qcacld-3.0/core/cds/src/cds_regdomain.c:296:40: note: place parentheses around the string literal to silence warning
        {CTRY_WALLIS_AND_FUTUNA, ETSI1_WORLD, "WF" "WALLIS"},
                                              ^

Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: atndko <z1281552865@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:51 +07:00
Aditya Bavanari
7d99ca60c8
dsp: Fix improper mutex unlock in afe close
During SSR use cases, when AFE APR handle is NULL
and AFE close is invoked, mutex unlock is done without
locking. Fix it and bail out without unlocking the
mutex in this scenario.

Change-Id: Ia2988b56425d8c2d5c726d5860c13e655e7e4ed1
Signed-off-by: Aditya Bavanari <abavanar@codeaurora.org>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:50 +07:00
Vincent Guittot
4199755a1c
sched/fair: Fix load_balance redo for !imbalance
It can happen that load_balance() finds a busiest group and then a
busiest rq but the calculated imbalance is in fact 0.

In such situation, detach_tasks() returns immediately and lets the
flag LBF_ALL_PINNED set. The busiest CPU is then wrongly assumed to
have pinned tasks and removed from the load balance mask. then, we
redo a load balance without the busiest CPU. This creates wrong load
balance situation and generates wrong task migration.

If the calculated imbalance is 0, it's useless to try to find a
busiest rq as no task will be migrated and we can return immediately.

This situation can happen with heterogeneous system or smp system when
RT tasks are decreasing the capacity of some CPUs.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: jhugo@codeaurora.org
Link: http://lkml.kernel.org/r/1536306664-29827-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:50 +07:00
Peter Zijlstra
5a6759a303
sched/fair: Do not migrate due to a sync wakeup on exit
When a task exits, it notifies the parent that it has exited. This is a
sync wakeup and the exiting task may pull the parent towards the wakers
CPU. For simple workloads like using a shell, it was observed that the
shell is pulled across nodes by exiting processes. This is daft as the
parent may be long-lived and properly placed. This patch special cases a
sync wakeup on exit to avoid pulling tasks across nodes. Testing on a range
of workloads and machines showed very little differences in performance
although there was a small 3% boost on some machines running a shellscript
intensive workload (git regression test suite).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Giovanni Gherdovich <ggherdovich@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180213133730.24064-5-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:49 +07:00
Xunlei Pang
91be8a1686
sched/fair: Advance global expiration when period timer is restarted
When period gets restarted after some idle time, start_cfs_bandwidth()
doesn't update the expiration information, expire_cfs_rq_runtime() will
see cfs_rq->runtime_expires smaller than rq clock and go to the clock
drift logic, wasting needless CPU cycles on the scheduler hot path.

Update the global expiration in start_cfs_bandwidth() to avoid frequent
expire_cfs_rq_runtime() calls once a new period begins.

Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180620101834.24455-2-xlpang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:48 +07:00
Juri Lelli
f497440c3c
BACKPORT: sched/deadline: Fix switched_from_dl() warning
Mark noticed that syzkaller is able to reliably trigger the following warning:

  dl_rq->running_bw > dl_rq->this_bw
  WARNING: CPU: 1 PID: 153 at kernel/sched/deadline.c:124 switched_from_dl+0x454/0x608
  Kernel panic - not syncing: panic_on_warn set ...

  CPU: 1 PID: 153 Comm: syz-executor253 Not tainted 4.18.0-rc3+ #29
  Hardware name: linux,dummy-virt (DT)
  Call trace:
   dump_backtrace+0x0/0x458
   show_stack+0x20/0x30
   dump_stack+0x180/0x250
   panic+0x2dc/0x4ec
   __warn_printk+0x0/0x150
   report_bug+0x228/0x2d8
   bug_handler+0xa0/0x1a0
   brk_handler+0x2f0/0x568
   do_debug_exception+0x1bc/0x5d0
   el1_dbg+0x18/0x78
   switched_from_dl+0x454/0x608
   __sched_setscheduler+0x8cc/0x2018
   sys_sched_setattr+0x340/0x758
   el0_svc_naked+0x30/0x34

syzkaller reproducer runs a bunch of threads that constantly switch
between DEADLINE and NORMAL classes while interacting through futexes.

The splat above is caused by the fact that if a DEADLINE task is setattr
back to NORMAL while in non_contending state (blocked on a futex -
inactive timer armed), its contribution to running_bw is not removed
before sub_rq_bw() gets called (!task_on_rq_queued() branch) and the
latter sees running_bw > this_bw.

Fix it by removing a task contribution from running_bw if the task is
not queued and in non_contending state while switched to a different
class.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reviewed-by: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: claudio@evidence.eu.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20180711072948.27061-1-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:48 +07:00
Daniel Bristot de Oliveira
9943a95c87
sched/deadline: Update rq_clock of later_rq when pushing a task
Daniel Casini got this warn while running a DL task here at RetisLab:

  [  461.137582] ------------[ cut here ]------------
  [  461.137583] rq->clock_update_flags < RQCF_ACT_SKIP
  [  461.137599] WARNING: CPU: 4 PID: 2354 at kernel/sched/sched.h:967 assert_clock_updated.isra.32.part.33+0x17/0x20
      [a ton of modules]
  [  461.137646] CPU: 4 PID: 2354 Comm: label_image Not tainted 4.18.0-rc4+ #3
  [  461.137647] Hardware name: ASUS All Series/Z87-K, BIOS 0801 09/02/2013
  [  461.137649] RIP: 0010:assert_clock_updated.isra.32.part.33+0x17/0x20
  [  461.137649] Code: ff 48 89 83 08 09 00 00 eb c6 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 98 7a 6c a5 c6 05 bc 0d 54 01 01 48 89 e5 e8 a9 84 fb ff <0f> 0b 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 83 7e 60 01 74 0a 48 3b
  [  461.137673] RSP: 0018:ffffa77e08cafc68 EFLAGS: 00010082
  [  461.137674] RAX: 0000000000000000 RBX: ffff8b3fc1702d80 RCX: 0000000000000006
  [  461.137674] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8b3fded164b0
  [  461.137675] RBP: ffffa77e08cafc68 R08: 0000000000000026 R09: 0000000000000339
  [  461.137676] R10: ffff8b3fd060d410 R11: 0000000000000026 R12: ffffffffa4e14e20
  [  461.137677] R13: ffff8b3fdec22940 R14: ffff8b3fc1702da0 R15: ffff8b3fdec22940
  [  461.137678] FS:  00007efe43ee5700(0000) GS:ffff8b3fded00000(0000) knlGS:0000000000000000
  [  461.137679] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  461.137680] CR2: 00007efe30000010 CR3: 0000000301744003 CR4: 00000000001606e0
  [  461.137680] Call Trace:
  [  461.137684]  push_dl_task.part.46+0x3bc/0x460
  [  461.137686]  task_woken_dl+0x60/0x80
  [  461.137689]  ttwu_do_wakeup+0x4f/0x150
  [  461.137690]  ttwu_do_activate+0x77/0x80
  [  461.137692]  try_to_wake_up+0x1d6/0x4c0
  [  461.137693]  wake_up_q+0x32/0x70
  [  461.137696]  do_futex+0x7e7/0xb50
  [  461.137698]  __x64_sys_futex+0x8b/0x180
  [  461.137701]  do_syscall_64+0x5a/0x110
  [  461.137703]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [  461.137705] RIP: 0033:0x7efe4918ca26
  [  461.137705] Code: 00 00 00 74 17 49 8b 48 20 44 8b 59 10 41 83 e3 30 41 83 fb 20 74 1e be 85 00 00 00 41 ba 01 00 00 00 41 b9 01 00 00 04 0f 05 <48> 3d 01 f0 ff ff 73 1f 31 c0 c3 be 8c 00 00 00 49 89 c8 4d 31 d2
  [  461.137738] RSP: 002b:00007efe43ee4928 EFLAGS: 00000283 ORIG_RAX: 00000000000000ca
  [  461.137739] RAX: ffffffffffffffda RBX: 0000000005094df0 RCX: 00007efe4918ca26
  [  461.137740] RDX: 0000000000000001 RSI: 0000000000000085 RDI: 0000000005094e24
  [  461.137741] RBP: 00007efe43ee49c0 R08: 0000000005094e20 R09: 0000000004000001
  [  461.137741] R10: 0000000000000001 R11: 0000000000000283 R12: 0000000000000000
  [  461.137742] R13: 0000000005094df8 R14: 0000000000000001 R15: 0000000000448a10
  [  461.137743] ---[ end trace 187df4cad2bf7649 ]---

This warning happened in the push_dl_task(), because
__add_running_bw()->cpufreq_update_util() is getting the rq_clock of
the later_rq before its update, which takes place at activate_task().
The fix then is to update the rq_clock before calling add_running_bw().

To avoid double rq_clock_update() call, we set ENQUEUE_NOCLOCK flag to
activate_task().

Reported-by: Daniel Casini <daniel.casini@santannapisa.it>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
Fixes: e0367b12674b sched/deadline: Move CPU frequency selection triggering points
Link: http://lkml.kernel.org/r/ca31d073a4788acf0684a8b255f14fea775ccf20.1532077269.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:47 +07:00
Juri Lelli
1a2c8cb99d
sched/deadline: Fix missing clock update
A missing clock update is causing the following warning:

 rq->clock_update_flags < RQCF_ACT_SKIP
 WARNING: CPU: 10 PID: 0 at kernel/sched/sched.h:963 inactive_task_timer+0x5d6/0x720
 Call Trace:
  <IRQ>
  __hrtimer_run_queues+0x10f/0x530
  hrtimer_interrupt+0xe5/0x240
  smp_apic_timer_interrupt+0x79/0x2b0
  apic_timer_interrupt+0xf/0x20
  </IRQ>
  do_idle+0x203/0x280
  cpu_startup_entry+0x6f/0x80
  start_secondary+0x1b0/0x200
  secondary_startup_64+0xa5/0xb0
 hardirqs last  enabled at (793919): [<ffffffffa27c5f6e>] cpuidle_enter_state+0x9e/0x360
 hardirqs last disabled at (793920): [<ffffffffa2a0096e>] interrupt_entry+0xce/0xe0
 softirqs last  enabled at (793922): [<ffffffffa20bef78>] irq_enter+0x68/0x70
 softirqs last disabled at (793921): [<ffffffffa20bef5d>] irq_enter+0x4d/0x70

This happens because inactive_task_timer() calls sub_running_bw() (if
TASK_DEAD and non_contending) that might trigger a schedutil update,
which might access the clock. Clock is however currently updated only
later in inactive_task_timer() function.

Fix the problem by updating the clock right after task_rq_lock().

Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Claudio Scordino <claudio@evidence.eu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180530160809.9074-1-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:46 +07:00
Wen Yang
80974f58bd
sched/deadline: Make update_curr_dl() more accurate
rq->clock_task may be updated between the two calls of
rq_clock_task() in update_curr_dl(). Calling rq_clock_task() only
once makes it more accurate and efficient, taking update_curr() as
reference.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: zhong.weidong@zte.com.cn
Link: http://lkml.kernel.org/r/1517882148-44599-1-git-send-email-wen.yang99@zte.com.cn
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:46 +07:00
Juri Lelli
85aeed6e76
sched/deadline: Make bandwidth enforcement scale-invariant
Apply frequency and CPU scale-invariance correction factor to bandwidth
enforcement (similar to what we already do to fair utilization tracking).

Each delta_exec gets scaled considering current frequency and maximum
CPU capacity; which means that the reservation runtime parameter (that
need to be specified profiling the task execution at max frequency on
biggest capacity core) gets thus scaled accordingly.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Claudio Scordino <claudio@evidence.eu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: alessio.balsini@arm.com
Cc: bristot@redhat.com
Cc: dietmar.eggemann@arm.com
Cc: joelaf@google.com
Cc: juri.lelli@redhat.com
Cc: mathieu.poirier@linaro.org
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: rjw@rjwysocki.net
Cc: rostedt@goodmis.org
Cc: tkjos@android.com
Cc: tommaso.cucinotta@santannapisa.it
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/20171204102325.5110-9-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:45 +07:00
Juri Lelli
45ee5bb85f
sched/deadline: Fix bandwidth accounting at all levels after offline migration
[ Upstream commit 59d06cea1198d665ba11f7e8c5f45b00ff2e4812 ]

If a task happens to be throttled while the CPU it was running on gets
hotplugged off, the bandwidth associated with the task is not correctly
migrated with it when the replenishment timer fires (offline_migration).

Fix things up, for this_bw, running_bw and total_bw, when replenishment
timer fires and task is migrated (dl_task_offline_migration()).

Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bristot@redhat.com
Cc: claudio@evidence.eu.com
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: mathieu.poirier@linaro.org
Cc: rostedt@goodmis.org
Cc: tj@kernel.org
Cc: tommaso.cucinotta@santannapisa.it
Link: https://lkml.kernel.org/r/20190719140000.31694-5-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:44 +07:00
Prasad Sodagudi
7067d40574
sched: Take irq_sparse lock during the isolation
irq_migrate_all_off_this_cpu() is used to migrate IRQs and this
function checks for all active irq in the allocated_irqs mask.
irq_migrate_all_off_this_cpu() expects the caller to take irq_sparse
lock to avoid race conditions while accessing allocated_irqs
mask variable. Prevent a race between irq alloc/free and irq
migration by adding irq_sparse lock across CPU isolation.

Change-Id: I9edece1ecea45297c8f6529952d88b3133046467
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:42 +07:00
Satya Durga Srinivasu Prabhala
9fae95e5b2
sched: move watchdog_disable() call before isolation work
Commit be45bf5395e0886
("watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug")
added wait_for_completion() call in watchdog_disable() which leads to
below issue when we try to isolate any CPU. Fix it by moving
watchdog_disable() call.

[  207.300191] BUG: sleeping function called from invalid context at \
					kernel/sched/completion.c:99
[  208.006089]  ___might_sleep+0x1c8/0x1e0
[  208.010032]  __might_sleep+0x50/0x88
[  208.013709]  wait_for_completion+0x28/0x60
[  208.017919]  watchdog_disable+0x70/0x90
[  208.021860]  do_isolation_work_cpu_stop+0x54/0x200
[  208.026784]  cpu_stopper_thread+0xac/0x150
[  208.030993]  smpboot_thread_fn+0x1c8/0x2e8
[  208.035202]  kthread+0x11c/0x130
[  208.038526]  ret_from_fork+0x10/0x1c.

Change-Id: I4d928dc03c71e68604c61c4986675fd629b69d1d
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:41 +07:00
Aniket Randive
535facd55d
Serial: msm_geni_serial: Use correct condition for device suspend check
Check the device is suspended or not by using the proper condition and
return true if usage count is zero and status of runtime PM is suspended.

Change-Id: Id7d99959966871da2a1bb405deb9d29cba1df408
Signed-off-by: Aniket Randive <arandive@codeaurora.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:21:15 +07:00
0ctobot
f9e7c5c566
HACK: lib: Compile out nmi_backtrace for ARM64
This silences the following compilation warning, presumably emitted by llvm-ar in conjunction with Clang (Thin)LTO:
lib/nmi_backtrace.o: no symbols

This is a watchdog support library which is no-op on this
architecture, hence the empty object file, so let's avoid building
it entirely until a more aesthetically pleasing solution presents
itself.

Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:19:55 +07:00
Andrzej Perczak
3dfdbb7afe
arm64: Optimize for Cortex A76
By following the code of obtaining flags for mcpu=native [1] I found out
that the most proper optimization for our SOC is cortex-a76.

/proc/cpuinfo says:
 * implementer: 0x51
 * part: 0x805 (LITTLE), 0x804 (big)

[1] https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Host.cpp

Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:19:27 +07:00
davidchao
ca8c30120d
arm64: dts: Remove QC BCL default settings for performance impact
Bug: 145785063
Bug: 152664241
Test: thermistor reading works normally

Change-Id: I55cd31b190fbc4995da8bc690aa6a83871eefcc1
Signed-off-by: davidchao <davidchao@google.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:04:10 +07:00
Sultan Alsawaf
bbb36e45c3
kernel: Allow wakeup IRQs to cancel ongoing suspend
Wakeup IRQs are only "armed" to cancel suspend very late into the suspend
process, meaning that they cannot stop a suspend that's ongoing. This can
be particularly painful due to how long the freezer may spend trying to
freeze processes, during which time a wakeup IRQ cannot make the freezer
abort. Wakeup IRQs should be honored throughout the entire suspend process
rather than just at the end, so tweak the IRQ PM wakeup check to allow
unarmed wakeup IRQs to cancel suspend partway through.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-06 12:24:38 +07:00