282 Commits

Author SHA1 Message Date
Sultan Alsawaf
7b98484a22 simple_lmk: Run reclaim kthread on big CPU cluster
Simple LMK's reclaim thread needs to run as quickly as possible to
reduce memory allocation latency when memory pressure is high. Run it
on fast, big cluster CPUs.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Change-Id: I6fef904710722b89b19cf119bf92779c364b2c2e
2023-11-14 20:23:14 -03:00
Richard Raya
6db22e1b63 Revert "simple_lmk: Set the victims to the highest priority"
This reverts commit 05293dbde43c7f464f3ad8b97dbbc05073b01d9b.

Change-Id: I0a597332b72577c1416087e55407659057cd38b3
2023-11-14 20:23:14 -03:00
Carlos Llamas
06b501a635
binderfs: add support for feature files
Provide userspace with a mechanism to discover features supported by
the binder driver to refrain from using any unsupported ones in the
first place. Starting with "oneway_spam_detection" only new features
are to be listed under binderfs and all previous ones are assumed to
be supported.

Assuming an instance of binderfs has been mounted at /dev/binderfs,
binder feature files can be found under /dev/binderfs/features/.
Usage example:

$ mkdir /dev/binderfs
$ mount -t binder binder /dev/binderfs
$ cat /dev/binderfs/features/oneway_spam_detection
1

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20210715031805.1725878-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-08-25 14:44:16 +00:00
Greg Kroah-Hartman
5ec9bbb9fa
ANDROID: binder: fix up use of MAX_USER_RT_PRIO
Commit e00eb41c0c7b ("ANDROID: binder: add support for RT prio
inheritance.") added the use of MAX_USER_RT_PRIO to the binder.c code,
but that commit was never sent upstream.

In commit ae18ad281e82 ("sched: Remove MAX_USER_RT_PRIO"), that define
was taken away, so to fix up the build breakage, move the binder code to
use MAX_RT_PRIO instead of the now-removed MAX_USER_RT_PRIO define.

Hopefully this is correct, who knows, it's binder RT code!  :)

Bug: 34461621
Bug: 37293077
Bug: 120446518
Fixes: e00eb41c0c7b ("ANDROID: binder: add support for RT prio inheritance.")
Fixes: ae18ad281e82 ("sched: Remove MAX_USER_RT_PRIO")
Cc: Martijn Coenen <maco@google.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I66b85e99697fdde462ec2d2ade5c92d5917896a3
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-08-25 14:44:15 +00:00
Kazuki Hashimoto
05293dbde4
simple_lmk: Set the victims to the highest priority
Signed-off-by: Kazuki Hashimoto <kazukih@tuta.io>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-08-25 14:43:15 +00:00
azrim
6ac613581b
Merge remote-tracking branch 'google/android-4.14-stable' into richelieu
* google/android-4.14-stable:
  FROMLIST: binder: fix UAF of ref->proc caused by race condition
  UPSTREAM: x86/pci: Fix the function type for check_reserved_t
  Linux 4.14.290
  PCI: hv: Fix interrupt mapping for multi-MSI
  PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()
  PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI
  PCI: hv: Fix multi-MSI to allow more than one MSI vector
  net: usb: ax88179_178a needs FLAG_SEND_ZLP
  tty: use new tty_insert_flip_string_and_push_buffer() in pty_write()
  tty: extract tty_flip_buffer_commit() from tty_flip_buffer_push()
  tty: drop tty_schedule_flip()
  tty: the rest, stop using tty_schedule_flip()
  tty: drivers/tty/, stop using tty_schedule_flip()
  Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
  Bluetooth: SCO: Fix sco_send_frame returning skb->len
  Bluetooth: Fix passing NULL to PTR_ERR
  Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg
  Bluetooth: SCO: Replace use of memcpy_from_msg with bt_skb_sendmsg
  Bluetooth: Add bt_skb_sendmmsg helper
  Bluetooth: Add bt_skb_sendmsg helper
  ALSA: memalloc: Align buffer allocations in page size
  tilcdc: tilcdc_external: fix an incorrect NULL check on list iterator
  drm/tilcdc: Remove obsolete crtc_mode_valid() hack
  bpf: Make sure mac_header was set before using it
  mm/mempolicy: fix uninit-value in mpol_rebind_policy()
  Revert "Revert "char/random: silence a lockdep splat with printk()""
  be2net: Fix buffer overflow in be_get_module_eeprom
  tcp: Fix a data-race around sysctl_tcp_notsent_lowat.
  igmp: Fix a data-race around sysctl_igmp_max_memberships.
  igmp: Fix data-races around sysctl_igmp_llm_reports.
  net: stmmac: fix dma queue left shift overflow issue
  i2c: cadence: Change large transfer count reset logic to be unconditional
  tcp: Fix a data-race around sysctl_tcp_probe_interval.
  tcp: Fix a data-race around sysctl_tcp_probe_threshold.
  tcp/dccp: Fix a data-race around sysctl_tcp_fwmark_accept.
  ip: Fix a data-race around sysctl_fwmark_reflect.
  perf/core: Fix data race between perf_event_set_output() and perf_mmap_close()
  power/reset: arm-versatile: Fix refcount leak in versatile_reboot_probe
  xfrm: xfrm_policy: fix a possible double xfrm_pols_put() in xfrm_bundle_lookup()
  xen/gntdev: Ignore failure to unmap INVALID_GRANT_HANDLE
2022-08-03 08:32:56 +00:00
Carlos Llamas
7c57e75125 FROMLIST: binder: fix UAF of ref->proc caused by race condition
A transaction of type BINDER_TYPE_WEAK_HANDLE can fail to increment the
reference for a node. In this case, the target proc normally releases
the failed reference upon close as expected. However, if the target is
dying in parallel the call will race with binder_deferred_release(), so
the target could have released all of its references by now leaving the
cleanup of the new failed reference unhandled.

The transaction then ends and the target proc gets released making the
ref->proc now a dangling pointer. Later on, ref->node is closed and we
attempt to take spin_lock(&ref->proc->inner_lock), which leads to the
use-after-free bug reported below. Let's fix this by cleaning up the
failed reference on the spot instead of relying on the target to do so.

  ==================================================================
  BUG: KASAN: use-after-free in _raw_spin_lock+0xa8/0x150
  Write of size 4 at addr ffff5ca207094238 by task kworker/1:0/590

  CPU: 1 PID: 590 Comm: kworker/1:0 Not tainted 5.19.0-rc8 #10
  Hardware name: linux,dummy-virt (DT)
  Workqueue: events binder_deferred_func
  Call trace:
   dump_backtrace.part.0+0x1d0/0x1e0
   show_stack+0x18/0x70
   dump_stack_lvl+0x68/0x84
   print_report+0x2e4/0x61c
   kasan_report+0xa4/0x110
   kasan_check_range+0xfc/0x1a4
   __kasan_check_write+0x3c/0x50
   _raw_spin_lock+0xa8/0x150
   binder_deferred_func+0x5e0/0x9b0
   process_one_work+0x38c/0x5f0
   worker_thread+0x9c/0x694
   kthread+0x188/0x190
   ret_from_fork+0x10/0x20

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Bug: 239630375
Link: https://lore.kernel.org/all/20220801182511.3371447-1-cmllamas@google.com/
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Change-Id: I5085dd0dc805a780a64c057e5819f82dd8f02868
(cherry picked from commit ae3fa5d16a02ba7c7b170e0e1ab56d6f0ba33964)
2022-08-02 22:09:29 +00:00
John Galt
c00c654a9f
treewide: selectively over inline
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-13 09:19:54 +00:00
azrim
06161f8fc7
Revert "slmk: lower vmpressure trigger threshold"
This reverts commit dfbc35d68b62a4d793d81672caf5a28b84f9517c.
2022-07-07 15:04:57 +00:00
Juhyung Park
1e9a1be8f8
simple_lmk: expose simple_lmk_trigger()
vmpressure won't be the only external caller.

Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-01 16:55:28 +00:00
Juhyung Park
dfbc35d68b
slmk: lower vmpressure trigger threshold
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-01 16:55:28 +00:00
Sultan Alsawaf
7fade643e7
binder: Stub out debug prints by default
Binder code is very hot, so checking frequently to see if a debug
message should be printed is a waste of cycles. We're not debugging
binder, so just stub out the debug prints to compile them out entirely.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-03 06:17:08 +07:00
Sultan Alsawaf
f6aec2c5eb
simple_lmk: Be extra paranoid if tasks can have no pages
If it's possible for a task to have no pages, then there could be a case
where `pages_found` is zero while `nr_found` isn't, which would cause
the found tasks' locks to never be unlocked, and thus mayhem. We can
change the `pages_found` check to use `nr_found` instead in order to
naturally defend against this scenario, in case it is indeed possible.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:26 +07:00
Sultan Alsawaf
2c81993096
simple_lmk: Optimize victim finder to eliminate hard-coded adj ranges
Hard-coding adj ranges to search for victims results in a few problems.
Firstly, the hard-coded adjs must be vigilantly updated to match what
userspace uses, which makes long-term support a headache. Secondly, a
full traversal of every running process must be done for each adj range,
which can turn out to be quite expensive, especially if userspace
assigns many different adj values and we want to enumerate them all.
This leads us to the final problem, which is that processes with
different adjs within the same hard-coded adj range will be treated the
same, even though they're not: the process with a higher adj is less
important, and the process with a lower adj is more important. This
could be fixed by enumerating every possible adj, but again, that would
necessitate several scans through the active process list, which is bad
for performance, especially since latency is critical here.

Since adjs are only 16 bits, and we only care about positive adjs, that
leaves us with 15 bits of the adj that matter. This is a relatively
small number of potential adjs (32,768), which makes it possible to
allocate a static array that's indexed using the adj. Each entry in this
array is a pointer to the first task_struct in a singly-linked list of
task_structs sharing an adj. A `simple_lmk_next` member is added to
task_struct to accommodate this linked list. The victim finder now
iterates downward through the array searching for linked lists of tasks,
starting from the highest adj found, so that the lowest-priority
processes are always considered first for reclaim. This fixes all of the
problems mentioned above, and now there is only one traversal through
every running process. The array itself only takes up 256 KiB of memory
on 64-bit, which is a very small price to pay for the advantages gained.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:24 +07:00
Sultan Alsawaf
a02dc5f94f
simple_lmk: Cacheline-align the victims array and mm_free_lock on SMP
The victims array and mm_free_lock data structures can be used very
heavily in parallel on SMP, in which case they would benefit from being
cacheline-aligned. Make it so for SMP.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:23 +07:00
Sultan Alsawaf
daf3bb197d
simple_lmk: Pass a custom swap function to sort()
When sort() isn't provided with a custom swap function, it falls back
onto its generic implementation of just swapping one byte at a time,
which is quite slow. Since we know the type of the objects being sorted,
we can provide our own swap function which simply uses the swap() macro.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:23 +07:00
Sultan Alsawaf
c8a84516ef
simple_lmk: Skip victim reduction when all victims need to be killed
When there aren't enough pages found, it means all of the victims that
were found need to be killed. The additional processing that attempts to
reduce the number of victims can be skipped in this case.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:23 +07:00
Sultan Alsawaf
37c7f9ddc3
simple_lmk: Use MIN_FREE_PAGES wherever pages_needed is used
There's no reason to pass this constant around in a parameter.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:22 +07:00
Sultan Alsawaf
797d1ab6bd
simple_lmk: Don't block in simple_lmk_mm_freed() on mm_free_lock
When the mm_free_lock write lock is held, it means that reclaim is
either starting or ending, in which case there's nothing that needs to
be done in simple_lmk_mm_freed(). We can use a trylock here instead to
avoid blocking.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:22 +07:00
Sultan Alsawaf
8c606531b4
simple_lmk: Update Kconfig description for VM pressure change
Simple LMK uses VM pressure now, not a kswapd hook like before. Update
the Kconfig description to reflect such.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:21 +07:00
Sultan Alsawaf
020c04f42e
simple_lmk: Add !PSI dependency
When PSI is enabled, lmkd in userspace will use PSI notifications to
perform low memory kills. Therefore, to ensure that Simple LMK is the
only active LMK implementation, add a !PSI dependency.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:21 +07:00
Sultan Alsawaf
3c05635f4f
simple_lmk: Print a message when the timeout is reached
This aids in selecting an adequate timeout. If the timeout is hit often
and Simple LMK is killing too much, then the timeout should be
lengthened. If the timeout is rarely hit and Simple LMK is not killing
fast enough under pressure, then the timeout should be shortened.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:21 +07:00
Sultan Alsawaf
ed8581696d
simple_lmk: Remove unnecessary clean-up when timeout is reached
Zeroing out the mm struct pointers when the timeout is hit isn't needed
because mm_free_lock prevents any readers from accessing the mm struct
pointers while clean-up occurs, and since the simple_lmk_mm_freed() loop
bound is set to zero during clean-up, there is no possibility of dying
processes ever reading stale mm struct pointers.

Therefore, it is unnecessary to clear out the mm struct pointers when
the timeout is reached. Now the only step to do when the timeout is
reached is to re-init the completion, but since reinit_completion() just
sets a struct member to zero, call reinit_completion() unconditionally
as it is faster than encapsulating it within a conditional statement.

Also take this opportunity to rename some variables and tidy up some
code indentation.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:20 +07:00
Sultan Alsawaf
f6ef330e3e
simple_lmk: Hold an RCU read lock instead of the tasklist read lock
We already check to see if each eligible process isn't already dying, so
an RCU read lock can be used to speed things up instead of holding the
tasklist read lock.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:20 +07:00
Sultan Alsawaf
263d64db01
simple_lmk: Consider all positive adjs when finding victims
We are allowed to kill any process with a positive adj, so we shouldn't
exclude any processes with adjs greater than 999. This would present a
problem with quirky applications that set their own adj score, such as
stress-ng. In the case of stress-ng, it would set its adj score to 1000
and thus exempt itself from being killed by Simple LMK. This shouldn't
be allowed; any process with a positive adj, up to the highest positive
adj possible (32767) should be killable.

Reported-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:20 +07:00
Sultan Alsawaf
dd1405e35e
simple_lmk: Update adj targeting for Android 10
Android 10 changed its adj assignments. Update Simple LMK to use the
new adjs, which also requires looking at each pair of adjs as a range.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:18 +07:00
Sultan Alsawaf
08eda9b371
simple_lmk: Use vmpressure notifier to trigger kills
Using kswapd's scan depth to trigger task kills is inconsistent and
unreliable. When memory pressure quickly spikes, the kswapd scan depth
trigger fails to kick off Simple LMK fast enough, causing severe lag.
Additionally, kswapd could stop scanning prematurely before reaching the
desired scan depth to trigger Simple LMK, which could also cause stalls.

To remedy this, use the vmpressure framework instead, since it provides
more consistent and accurate readings on memory pressure. This is not
very tunable though, so remove CONFIG_ANDROID_SIMPLE_LMK_AGGRESSION.
Triggering Simple LMK to kill when the reported memory pressure is 100
should yield good results on all setups.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:18 +07:00
Sultan Alsawaf
fd7495b5ad
simple_lmk: Include swap memory usage in the size of victims
Swap memory usage is important when determining what to kill, so include
it in the victim size calculation.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:17 +07:00
Sultan Alsawaf
f26d26304f
simple_lmk: Relax memory barriers and clean up some styling
wake_up() executes a full memory barrier when waking a process up, so
there's no need for the acquire in the wait event. Additionally,
because of this, the atomic_cmpxchg() only needs a read barrier.

The cmpxchg() in simple_lmk_mm_freed() is atomic when it doesn't need to
be, so replace it with an extra line of code.

The atomic_inc_return() in simple_lmk_mm_freed() lies within a lock, so
it doesn't need explicit memory barriers.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:17 +07:00
Sultan Alsawaf
f418a9b2db
simple_lmk: Place victims onto SCHED_RR
Just increasing the victim's priority to the maximum niceness isn't
enough to make it totally preempt everything in SCHED_FAIR, which is
important to make sure victims die quickly. Resource-wise, this isn't
very burdensome since the RT priority is just set to zero, and because
dying victims don't have much to do: they only need to finish whatever
they're doing quickly. SCHED_RR is used over SCHED_FIFO so that CPU time
between the victims is divided evenly to help them all finish at around
the same time, as fast as possible.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:17 +07:00
Sultan Alsawaf
2930c9cda7
simple_lmk: Add a timeout to stop waiting for victims to die
Simple LMK tries to wait until all of the victims it kills have their
memory freed; however, sometimes victims can take a while to die, which
can block Simple LMK from killing more processes in time when needed.
After the specified timeout elapses, Simple LMK will stop waiting and
make itself available to kill more processes.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:16 +07:00
Sultan Alsawaf
3db71fb3c3
simple_lmk: Ignore tasks that won't free memory
Dying processes aren't going to help free memory, so ignore them.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:16 +07:00
Sultan Alsawaf
65e0c8e158
simple_lmk: Simplify tricks used to speed up the death process
set_user_nice() doesn't schedule, and although set_cpus_allowed_ptr()
can schedule, it will only do so when the specified task cannot run on
the new set of allowed CPUs. Since cpu_all_mask is used,
set_cpus_allowed_ptr() will never schedule. Therefore, both the priority
elevation and cpus_allowed change can be moved to inside the task lock
to simplify and speed things up.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:16 +07:00
Sultan Alsawaf
207207f7e5
simple_lmk: Mark victim thread group with TIF_MEMDIE
The OOM killer sets the TIF_MEMDIE thread flag for its victims to alert
other kernel code that the current process was killed due to memory
pressure, and needs to finish whatever it's doing quickly. In the page
allocator this allows victim processes to quickly allocate memory using
emergency reserves. This is especially important when memory pressure is
high; if all processes are taking a while to allocate memory, then our
victim processes will face the same problem and can potentially get
stuck in the page allocator for a while rather than die expeditiously.

To ensure that victim processes die quickly, set TIF_MEMDIE for the
entire victim thread group.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:15 +07:00
Sultan Alsawaf
1462193257
simple_lmk: Print a message when there are no processes to kill
Makes it clear that Simple LMK tried its best but there was nothing it
could do.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:15 +07:00
Sultan Alsawaf
2cf93c2aeb
simple_lmk: Remove compat cruft not specific to 4.14
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:14 +07:00
Sultan Alsawaf
866771f719
simple_lmk: Update copyright to 2020
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:14 +07:00
Sultan Alsawaf
e2cb78a57e
simple_lmk: Don't queue up new reclaim requests during reclaim
Queuing up reclaim requests while a reclaim is in progress doesn't make
sense, since the additional reclaims may not be needed after the
existing reclaim completes. This would cause Simple LMK to go berserk
during periods of high memory pressure where kswapd would fire off
reclaim requests nonstop.

Make Simple LMK ignore new reclaim requests until an existing reclaim is
finished to prevent a slaughter-fest.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:14 +07:00
Sultan Alsawaf
7e86f1e61f
simple_lmk: Increase default minfree value
After commit "simple_lmk: Make reclaim deterministic", Simple LMK's
behavior changed and thus requires some slight re-tuning to make it work
well again.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:14 +07:00
Sultan Alsawaf
dcfe07c47c
simple_lmk: Clean up some code style nitpicks
Using a parameter to pass around a unmodified pointer to a global
variable is crufty; just use the `victims` variable directly instead.
Also, compress the code in simple_lmk_init_set() a bit to make it look
cleaner.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:14 +07:00
Sultan Alsawaf
c2bf2617ab
simple_lmk: Make reclaim deterministic
The 20 ms delay in the reclaim thread is a hacky fudge factor that can
cause Simple LMK to behave wildly differently depending on the
circumstances of when it is invoked. When kswapd doesn't get enough CPU
time to finish up and go back to sleep within 20 ms, Simple LMK performs
superfluous reclaims.

This is suboptimal, so make Simple LMK more deterministic by eliminating
the delay and instead queuing up reclaim requests from kswapd.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:13 +07:00
Sultan Alsawaf
35822c67a2
simple_lmk: Fix broken multicopy atomicity for victims_to_kill
When the reclaim thread writes to victims_to_kill on one CPU, it expects
the updated value to be immediately reflected on all CPUs in order for
simple_lmk_mm_freed() to work correctly. Due to the lack of memory
barriers to guarantee multicopy atomicity, simple_lmk_mm_freed() can be
given a victim's mm without knowing the correct victims_to_kill value,
which can cause the reclaim thread to remain stuck waiting forever for
all victims to be freed. This scenario, despite being rare, has been
observed.

Fix this by using proper atomic helpers with memory barriers.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:13 +07:00
Sultan Alsawaf
0ba3e147b9
simple_lmk: Use proper atomic_* operations where needed
cmpxchg() is only atomic with respect to the local CPU, so it cannot be
relied on with how it's used in Simple LMK. Switch to fully atomic
operations instead for full atomic guarantees.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:13 +07:00
Sultan Alsawaf
e541f19aa1
simple_lmk: Remove kthread_should_stop() exit condition
Simple LMK's reclaim thread should never stop; there's no need to have
this check.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:13 +07:00
Sultan Alsawaf
b9bddacca1
simple_lmk: Fix pages_found calculation
Previously, pages_found would be calculated using an uninitialized
variable. Fix it.

Reported-by: Julian Liu <wlootlxt123@gmail.com>
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:12 +07:00
Sultan Alsawaf
7b77adf0d3
simple_lmk: Introduce Simple Low Memory Killer for Android
This is a complete low memory killer solution for Android that is small
and simple. Processes are killed according to the priorities that
Android gives them, so that the least important processes are always
killed first. Processes are killed until memory deficits are satisfied,
as observed from kswapd struggling to free up pages. Simple LMK stops
killing processes when kswapd finally goes back to sleep.

The only tunables are the desired amount of memory to be freed per
reclaim event and desired frequency of reclaim events. Simple LMK tries
to free at least the desired amount of memory per reclaim and waits
until all of its victims' memory is freed before proceeding to kill more
processes.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:20:12 +07:00
John Dias
415363aa10
binder: Set binder_debug_mask=0 to suppress logging
Excessive logging -- not present on angler -- is affecting
performance, contributing to missed audio deadlines and likely other
latency-dependent tasks.
Bug: 30375418

Change-Id: I88b9c7fa4540ad46e564f44a0e589b5215e8487d
Signed-off-by: Alex Naidis <alex.naidis@linux.com>
[nullxception: extend it to binder_alloc_debug_mask]
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:18:15 +07:00
Sherry Yang
83f9e8f603
android: binder: Rate-limit debug and userspace triggered err msgs
Use rate-limited debug messages where userspace can trigger
excessive log spams.

Acked-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:18:15 +07:00
Carlos Llamas
16a960f5b0
ANDROID: binder: retry security_secid_to_secctx()
security_secid_to_secctx() can fail because of a GFP_ATOMIC allocation
This needs to be retried from userspace. However, binder driver doesn't
propagate specific enough error codes just yet (WIP b/28321379). We'll
retry on the binder driver as a temporary work around until userspace
can do this instead.

Bug: 174806915
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Change-Id: Ifebddeb7adf9707613512952b97ab702f0d2d592
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:18:14 +07:00
Luca Stefani
d202f98d0b
UPSTREAM: binder: Return EFAULT if we fail BINDER_ENABLE_ONEWAY_SPAM_DETECTION
All the other ioctl paths return EFAULT in case the
copy_from_user/copy_to_user call fails, make oneway spam detection
follow the same paradigm.

Fixes: a7dc1e6f99df ("binder: tell userspace to dump current backtrace when detected oneway spamming")
Acked-by: Todd Kjos <tkjos@google.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
Link: https://lore.kernel.org/r/20210506193726.45118-1-luca.stefani.ge1@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ced081a436d21a7d34d4d42acb85058f9cf423f2)
Bug: 187129171
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I7c5e6ec7108c42721de6c82f4c1e9ff3d4f0e88d
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:18:14 +07:00