An srcu_struct structure that is initialized before rcu_init_geometry()
will have its srcu_node hierarchy based on CONFIG_NR_CPUS. Once
rcu_init_geometry() is called, this hierarchy is compressed as needed
for the actual maximum number of CPUs for this system.
Later on, that srcu_struct structure is confused, sometimes referring
to its initial CONFIG_NR_CPUS-based hierarchy, and sometimes instead
to the new num_possible_cpus() hierarchy. For example, each of its
->mynode fields continues to reference the original leaf rcu_node
structures, some of which might no longer exist. On the other hand,
srcu_for_each_node_breadth_first() traverses to the new node hierarchy.
There are at least two bad possible outcomes to this:
1) a) A callback enqueued early on an srcu_data structure (call it
*sdp) is recorded pending on sdp->mynode->srcu_data_have_cbs in
srcu_funnel_gp_start() with sdp->mynode pointing to a deep leaf
(say 3 levels).
b) The grace period ends after rcu_init_geometry() shrinks the
nodes level to a single one. srcu_gp_end() walks through the new
srcu_node hierarchy without ever reaching the old leaves so the
callback is never executed.
This is easily reproduced on an 8 CPUs machine with CONFIG_NR_CPUS >= 32
and "rcupdate.rcu_self_test=1". The srcu_barrier() after early tests
verification never completes and the boot hangs:
[ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 seconds.
[ 5413.147564] Not tainted 5.12.0-rc4+ #28
[ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5413.159753] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00004000
[ 5413.168099] Call Trace:
[ 5413.170555] __schedule+0x36c/0x930
[ 5413.174057] ? wait_for_completion+0x88/0x110
[ 5413.178423] schedule+0x46/0xf0
[ 5413.181575] schedule_timeout+0x284/0x380
[ 5413.185591] ? wait_for_completion+0x88/0x110
[ 5413.189957] ? mark_held_locks+0x61/0x80
[ 5413.193882] ? mark_held_locks+0x61/0x80
[ 5413.197809] ? _raw_spin_unlock_irq+0x24/0x50
[ 5413.202173] ? wait_for_completion+0x88/0x110
[ 5413.206535] wait_for_completion+0xb4/0x110
[ 5413.210724] ? srcu_torture_stats_print+0x110/0x110
[ 5413.215610] srcu_barrier+0x187/0x200
[ 5413.219277] ? rcu_tasks_verify_self_tests+0x50/0x50
[ 5413.224244] ? rdinit_setup+0x2b/0x2b
[ 5413.227907] rcu_verify_early_boot_tests+0x2d/0x40
[ 5413.232700] do_one_initcall+0x63/0x310
[ 5413.236541] ? rdinit_setup+0x2b/0x2b
[ 5413.240207] ? rcu_read_lock_sched_held+0x52/0x80
[ 5413.244912] kernel_init_freeable+0x253/0x28f
[ 5413.249273] ? rest_init+0x250/0x250
[ 5413.252846] kernel_init+0xa/0x110
[ 5413.256257] ret_from_fork+0x22/0x30
2) An srcu_struct structure that is initialized before rcu_init_geometry()
and used afterward will always have stale rdp->mynode references,
resulting in callbacks to be missed in srcu_gp_end(), just like in
the previous scenario.
This commit therefore causes init_srcu_struct_nodes to initialize the
geometry, if needed. This ensures that the srcu_node hierarchy is
properly built and distributed from the get-go.
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: azrim <mirzaspc@gmail.com>
Currently, rcu_spawn_core_kthreads() is invoked via an early_initcall(),
which works, except that rcu_spawn_gp_kthread() is also invoked via an
early_initcall() and rcu_spawn_core_kthreads() relies on adjustments to
kthread_prio that are carried out by rcu_spawn_gp_kthread(). There is
no guaranttee of ordering among early_initcall() handlers, and thus no
guarantee that kthread_prio will be properly checked and range-limited
at the time that rcu_spawn_core_kthreads() needs it.
In most cases, this bug is harmless. After all, the only reason that
rcu_spawn_gp_kthread() adjusts the value of kthread_prio is if the user
specified a nonsensical value for this boot parameter, which experience
indicates is rare.
Nevertheless, a bug is a bug. This commit therefore causes the
rcu_spawn_core_kthreads() function to be invoked directly from
rcu_spawn_gp_kthread() after any needed adjustments to kthread_prio have
been carried out.
Fixes: 48d07c04b4cc ("rcu: Enable elimination of Tree-RCU softirq processing")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: azrim <mirzaspc@gmail.com>
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.
Usually a local wake up happening while running the idle task is handled
in one of the need_resched() checks carefully placed within the idle
loop that can break to the scheduler.
Unfortunately the call to rcu_idle_enter() is already beyond the last
generic need_resched() check and we may halt the CPU with a resched
request unhandled, leaving the task hanging.
Fix this with splitting the rcuog wakeup handling from rcu_idle_enter()
and place it before the last generic need_resched() check in the idle
loop. It is then assumed that no call to call_rcu() will be performed
after that in the idle loop until the CPU is put in low power mode.
Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf)
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-3-frederic@kernel.org
Signed-off-by: azrim <mirzaspc@gmail.com>
Deferred wakeup of rcuog kthreads upon RCU idle mode entry is going to
be handled differently whether initiated by idle, user or guest. Prepare
with pulling that control up to rcu_eqs_enter() callers.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-2-frederic@kernel.org
Signed-off-by: azrim <mirzaspc@gmail.com>
Joel Fernandes found that the synchronize_rcu_tasks() was taking a
significant amount of time. He demonstrated it with the following test:
# cd /sys/kernel/tracing
# while [ 1 ]; do x=1; done &
# echo '__schedule_bug:traceon' > set_ftrace_filter
# time echo '!__schedule_bug:traceon' > set_ftrace_filter;
real 0m1.064s
user 0m0.000s
sys 0m0.004s
Where it takes a little over a second to perform the synchronize,
because there's a loop that waits 1 second at a time for tasks to get
through their quiescent points when there's a task that must be waited
for.
After discussion we came up with a simple way to wait for holdouts but
increase the time for each iteration of the loop but no more than a
full second.
With the new patch we have:
# time echo '!__schedule_bug:traceon' > set_ftrace_filter;
real 0m0.131s
user 0m0.000s
sys 0m0.004s
Which drops it down to 13% of what the original wait time was.
Link: http://lkml.kernel.org/r/20180523063815.198302-2-joel@joelfernandes.org
Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
wakeup_source_device_create creates a symlink to the directory of the
device associated with the wakelock. Not only is this unnecessary, this
also results in selinux denials and the following errors in logcat if
the directory isn't whitelisted in sepolicy:
"Error opening kernel wakelock stats for:
wakeup[n]: Permission denied"
Stop creating the symlink to silence the errors as it's useless anyway.
Signed-off-by: Kazuki Hashimoto <kaz205@tuta.io>
Signed-off-by: K A R T H I K <karthik.lal558@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Excessive logging -- not present on angler -- is affecting
performance, contributing to missed audio deadlines and likely other
latency-dependent tasks.
Bug: 30375418
Change-Id: I88b9c7fa4540ad46e564f44a0e589b5215e8487d
Signed-off-by: Alex Naidis <alex.naidis@linux.com>
[nullxception: extend it to binder_alloc_debug_mask]
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
security_secid_to_secctx() can fail because of a GFP_ATOMIC allocation
This needs to be retried from userspace. However, binder driver doesn't
propagate specific enough error codes just yet (WIP b/28321379). We'll
retry on the binder driver as a temporary work around until userspace
can do this instead.
Bug: 174806915
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Change-Id: Ifebddeb7adf9707613512952b97ab702f0d2d592
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
When async binder buffer got exhausted, some normal oneway transactions
will also be discarded and may cause system or application failures. By
that time, the binder debug information we dump may not be relevant to
the root cause. And this issue is difficult to debug if without the
backtrace of the thread sending spam.
This change will send BR_ONEWAY_SPAM_SUSPECT to userspace when oneway
spamming is detected, request to dump current backtrace. Oneway spamming
will be reported only once when exceeding the threshold (target process
dips below 80% of its oneway space, and current process is responsible
for either more than 50 transactions, or more than 50% of the oneway
space). And the detection will restart when the async buffer has
returned to a healthy state.
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Hang Lu <hangl@codeaurora.org>
Link: https://lore.kernel.org/r/1617961246-4502-3-git-send-email-hangl@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 181190340
Change-Id: Id3d2526099bc89f04d8ad3ad6e48141b2a8f2515
(cherry picked from commit a7dc1e6f99df59799ab0128d9c4e47bbeceb934d)
Signed-off-by: Hang Lu <hangl@codeaurora.org>
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Currently cgroup freezer is used to freeze the application threads, and
BINDER_FREEZE is used to freeze the corresponding binder interface.
There's already a mechanism in ioctl(BINDER_FREEZE) to wait for any
existing transactions to drain out before actually freezing the binder
interface.
But freezing an app requires 2 steps, freezing the binder interface with
ioctl(BINDER_FREEZE) and then freezing the application main threads with
cgroupfs. This is not an atomic operation. The following race issue
might happen.
1) Binder interface is frozen by ioctl(BINDER_FREEZE);
2) Main thread A initiates a new sync binder transaction to process B;
3) Main thread A is frozen by "echo 1 > cgroup.freeze";
4) The response from process B reaches the frozen thread, which will
unexpectedly fail.
This patch provides a mechanism to check if there's any new pending
transaction happening between ioctl(BINDER_FREEZE) and freezing the
main thread. If there's any, the main thread freezing operation can
be rolled back to finish the pending transaction.
Furthermore, the response might reach the binder driver before the
rollback actually happens. That will still cause failed transaction.
As the other process doesn't wait for another response of the response,
the response transaction failure can be fixed by treating the response
transaction like an oneway/async one, allowing it to reach the frozen
thread. And it will be consumed when the thread gets unfrozen later.
NOTE: This patch reuses the existing definition of struct
binder_frozen_status_info but expands the bit assignments of __u32
member sync_recv.
To ensure backward compatibility, bit 0 of sync_recv still indicates
there's an outstanding sync binder transaction. This patch adds new
information to bit 1 of sync_recv, indicating the binder transaction
happens exactly when there's a race.
If an existing userspace app runs on a new kernel, a sync binder call
will set bit 0 of sync_recv so ioctl(BINDER_GET_FROZEN_INFO) still
return the expected value (true). The app just doesn't check bit 1
intentionally so it doesn't have the ability to tell if there's a race.
This behavior is aligned with what happens on an old kernel which
doesn't set bit 1 at all.
A new userspace app can 1) check bit 0 to know if there's a sync binder
transaction happened when being frozen - same as before; and 2) check
bit 1 to know if that sync binder transaction happened exactly when
there's a race - a new information for rollback decision.
Fixes: 432ff1e91694 ("binder: BINDER_FREEZE ioctl")
Acked-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Li Li <dualli@google.com>
Test: stress test with apps being frozen and initiating binder calls at
the same time, confirmed the pending transactions succeeded.
Link: https://lore.kernel.org/r/20210910164210.2282716-2-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 198493121
(cherry picked from commit b564171ade70570b7f335fa8ed17adb28409e3ac
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
char-misc-linus)
Change-Id: I488ba75056f18bb3094ba5007027b76b5caebec9
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Add a per-transaction flag to indicate that the buffer
must be cleared when the transaction is complete to
prevent copies of sensitive data from being preserved
in memory.
Signed-off-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20201120233743.3617529-1-tkjos@google.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 171501513
Change-Id: Ic9338c85cbe3b11ab6f2bda55dce9964bb48447a
(cherry picked from commit 0f966cba95c78029f491b433ea95ff38f414a761)
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Releasing the procs lock while freezing a binder context allows for
other processes to modify the process list while the scan is still
ongoing.
Don't release the process locks during the scan operatoin, but store
matching processes in a dynamic array and process them at a later phase.
Signed-off-by: Marco Ballesio <balejs@google.com>
Bug: 176996063
Test: verified that all contexts are correctly frozen and unfrozen
Change-Id: Iea527e3b9188b04303f8b9b08b404e0c062a0189
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
binder_wait_for_work used to return -ERESTARTSYS if interrupted by a
signal. This error wasn't logged to avoid spamming the console. After
the return value changed to -EINTR to better conform to the kernel API,
the logging portion wasn't modified.
Filter EINTR from binder error logs.
Test: verified that the console isn't spammed anymore
Bug: 172330837
Signed-off-by: Marco Ballesio <balejs@google.com>
Change-Id: Ie7789bdbf5f0b3b0d55793d4f147c395de2c6641
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
binder freeze stops at the first context found for any pid, but
multiple ones are possible with the result that a process might end
up with inconsistent context states after freezing or unfreezing its
binder.
Freeze or unfreeze all contexts in a process upon a BINDER_FREEZE
ioctl.
Bug: 176996063
Test: verified that all contexts in a specific process with multiple
binders are frozen or unfrozen.
Signed-off-by: Marco Ballesio <balejs@google.com>
Change-Id: If0822e078e830e9fde10cc17b99e39ec7cf358d5
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
when interrupted by a signal, binder_wait_for_work currently returns
-ERESTARTSYS. This error code is usually restricted to the kernel.
Replace this instance of -ERESTARTSYS with -EINTR.
Bug: 143717177
Test: built, booted, interrupted a worker thread within
binder_wait_for_work
Signed-off-by: Marco Ballesio <balejs@google.com>
Change-Id: I0bd1be173e0a75c917399b773046e819babb9d4b
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
User space needs to know if binder transactions occurred to frozen
processes. Introduce a new BINDER_GET_FROZEN ioctl and keep track of
transactions occurring to frozen proceses. Also, allow async
transactions toward frozen processes and improve error hendling.
Bug: 143717177
Test: atest testBinderLib
Signed-off-by: Marco Ballesio <balejs@google.com>
Change-Id: I9ee1c2e5fe3d4ab31fc1a137d840bd4cd38a8704
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
The most common cause of the binder transaction buffer filling up is a
client rapidly firing oneway transactions into a process, before it has
a chance to handle them. Yet the root cause of this is often hard to
debug, because either the system or the app will stop, and by that time
binder debug information we dump in bugreports is no longer relevant.
This change warns as soon as a process dips below 80% of its oneway
space (less than 100kB available in the configuration), when any one
process is responsible for either more than 50 transactions, or more
than 50% of the oneway space.
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Martijn Coenen <maco@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20200821122544.1277051-1-maco@android.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 261e7818f06ec51e488e007f787ccd7e77272918
git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/
char-misc-next)
Bug: 147795659
Change-Id: Idc2b03ddc779880ca4716fdae47a70df43211f25
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Frozen tasks can't process binder transactions, so a way is required to
inform transmitting ends of communication failures due to the frozen
state of their receiving counterparts. Additionally, races are possible
between transitions to frozen state and binder transactions enqueued to a
specific process.
Implement BINDER_FREEZE ioctl for user space to inform the binder driver
about the intention to freeze or unfreeze a process. When the ioctl is
called, block the caller until any pending binder transactions toward the
target process are flushed. Return an error to transactions to processes
marked as frozen.
Bug: 143717177
Change-Id: Ie16f72b490bbe1785b82dee2442452f71ad7dc65
Signed-off-by: Marco Ballesio <balejs@google.com>
Co-developed-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
It's a normal thing when display setup is happening, no need to
verbosely prints it.
Signed-off-by: Nauval Rizky <enuma.alrizky@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Sometimes, it may be desirable to use CPU frequency tables different
from the ones in the hardware's OSM LUTs. This commit adds support for
overriding each CPU's frequency table with a list of allowed frequencies
defined in the OSM driver's DT node.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Instrumenting print_time is insufficient for adjusting the timestamp of
logged messages. Add the sleep time offset to true message timestamps as
well to address this issue.
Test: dmesg shows true boot time after several suspend/resume cycles
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Fiqri Ardyansyah <fiqri15072019@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
cpu_clock() uses monotonic time, which skews when the system suspends,
making it difficult to interpret kmsg timestamps.
Add the sleep time offset to cpu_clock() in order to make kmsg timestamps
reflect the actual boot time.
Signed-off-by: Sultanxda <sultanxda@gmail.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Fiqri Ardyansyah <fiqri15072019@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Returning before thermal trips due to lower battery should
fix the lag due to restricted thermals. This what BCL does.
test: check 2-3 cycles of battery percentage under 10%
result: Fixes device lag completely. Maintains constant performance
Signed-off-by: azrim <mirzaspc@gmail.com>