Few wakelocks tends to get stuck for no reason. Blocking them
isn't necessary and sometimes blocking them breaks basic
functionality.
Wakelocks like "tx_swr_ctrl" tends to get stuck if we keep earphones
connected and drops battery massively.
Test: Keep earphones plugged in and leave device for few hours
Expected result: No "tx_swr_ctrl" is being stuck.
Actual result: Patch is working as expected.
Change-Id: I5296990a84ab44cf6e449d6535b8b99408c415c8
Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live>
Signed-off-by: azrim <mirzaspc@gmail.com>
This can't and shouldn't be negative
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
In order to improve the performance for display usecases,
remove the wait_for_completion bits in the TCS commands generated
when sending the batch requests for WAKE and SLEEP sets. This
will also result in a Fire-and-Forget batch request when sent
to RPMH.
Change-Id: I8b8bedf51bf086fcb83ae7fa9e21e70f036b4012
Signed-off-by: David Dai <daidavid1@codeaurora.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Instead of checking for blacklisted nodes for each node as
we traverse through the graph and only consider directly connected
nodes, create a list of blacklisted nodes at the source node that
presists throughout the path finding. The previous implemmentation
wasn't very useful in preventing circular paths for some masters
as it would block out every request from an upstream master and is
equvalent to not listing the connection to start with. Listing black
listed nodes at the master would allow certain connections to traverse
downstream through the gateway while blocking illegal paths from certain
masters.
Change-Id: I6ae4660b05b08a66835f7936a948dc1e17ba218d
Signed-off-by: David Dai <daidavid1@codeaurora.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Added branch prediction in an effort to make
the data path more efficent.
Acked-by: Tal Gelbard <tgelbard@qti.qualcomm.com>
Change-Id: I3bd2157ee6c263d89de9425c7a0249370ab918fc
Signed-off-by: Amir Levy <alevy@codeaurora.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
We have queued spin locks, so spin lock starvation isn't a problem.
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Use lockdep to check that IRQs are enabled or disabled as expected. This
way the sanity check only shows overhead when concurrency correctness
debug code is enabled.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David S . Miller <davem@davemloft.net>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1509980490-4285-12-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
A side-effect to the old code is that now SCHED_DEADLINE is also
recognized.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171004154901.26904-2-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
This helper returns true if a task has elevated priority which is true
for RT tasks (SCHED_RR and SCHED_FIFO) and also for SCHED_DEADLINE.
A task which runs at RT priority due to PI-boosting is not considered as
one with elevated priority.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171004154901.26904-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
The handling of a priority inversion between timer cancelling and a a not
well defined possible preemption of softirq kthread is not very clear.
Especially in the posix timers side it's unclear why there is a specific RT
wait callback.
All the nice explanations can be found in the initial changelog of
f61eff83cec9 (hrtimer: Prepare support for PREEMPT_RT").
Extract the detailed informations from there and put it into comments.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190820132656.GC2093@lenoir
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
hrtimer_force_reprogram() and hrtimer_interrupt() invokes
__hrtimer_get_next_event() to find the earliest expiry time of hrtimer
bases. __hrtimer_get_next_event() does not update
cpu_base::[softirq_]_expires_next to preserve reprogramming logic. That
needs to be done at the callsites.
hrtimer_force_reprogram() updates cpu_base::softirq_expires_next only when
the first expiring timer is a softirq timer and the soft interrupt is not
activated. That's wrong because cpu_base::softirq_expires_next is left
stale when the first expiring timer of all bases is a timer which expires
in hard interrupt context. hrtimer_interrupt() does never update
cpu_base::softirq_expires_next which is wrong too.
That becomes a problem when clock_settime() sets CLOCK_REALTIME forward and
the first soft expiring timer is in the CLOCK_REALTIME_SOFT base. Setting
CLOCK_REALTIME forward moves the clock MONOTONIC based expiry time of that
timer before the stale cpu_base::softirq_expires_next.
cpu_base::softirq_expires_next is cached to make the check for raising the
soft interrupt fast. In the above case the soft interrupt won't be raised
until clock monotonic reaches the stale cpu_base::softirq_expires_next
value. That's incorrect, but what's worse it that if the softirq timer
becomes the first expiring timer of all clock bases after the hard expiry
timer has been handled the reprogramming of the clockevent from
hrtimer_interrupt() will result in an interrupt storm. That happens because
the reprogramming does not use cpu_base::softirq_expires_next, it uses
__hrtimer_get_next_event() which returns the actual expiry time. Once clock
MONOTONIC reaches cpu_base::softirq_expires_next the soft interrupt is
raised and the storm subsides.
Change the logic in hrtimer_force_reprogram() to evaluate the soft and hard
bases seperately, update softirq_expires_next and handle the case when a
soft expiring timer is the first of all bases by comparing the expiry times
and updating the required cpu base fields. Split this functionality into a
separate function to be able to use it in hrtimer_interrupt() as well
without copy paste.
Fixes: 5da70160462e ("hrtimer: Implement support for softirq based hrtimers")
Reported-by: Mikael Beckius <mikael.beckius@windriver.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mikael Beckius <mikael.beckius@windriver.com>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210223160240.27518-1-anna-maria@linutronix.de
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
A hrtimer can be released in its callback, but lockdep_hrtimer_exit()
dereferences the pointer after the callback returns, i.e. a potential use
after free.
Retrieve the context in which the hrtimer expires before the callback is
invoked and use it in lockdep_hrtimer_exit().
Fixes: 40db173965c0 ("lockdep: Add hrtimer context tracing bits")
Reported-by: syzbot+62c155c276e580cfb606@syzkaller.appspotmail.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200331201849.fkp2siy3vcdqvqlz@linutronix.de
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Set current->irq_config = 1 for hrtimers which are not marked to expire in
hard interrupt context during hrtimer_init(). These timers will expire in
softirq context on PREEMPT_RT.
Setting this allows lockdep to differentiate these timers. If a timer is
marked to expire in hard interrupt context then the timer callback is not
supposed to acquire a regular spinlock instead of a raw_spinlock in the
expiry callback.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.534508206@linutronix.de
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
do_div() does a 64-by-32 division at least on 32bit platforms, while the
divisor 'div' is explicitly casted to unsigned long, thus 64-bit on 64-bit
platforms.
The code already ensures that the divisor is less than 2^32. Hence the
proper cast type is u32.
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200130130851.29204-1-wenyang@linux.alibaba.com
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Followup to commit dd2261ed45aa ("hrtimer: Protect lockless access
to timer->base")
lock_hrtimer_base() fetches timer->base without lock exclusion.
Compiler is allowed to read timer->base twice (even if considered dumb)
which could end up trying to lock migration_base and return
&migration_base.
base = timer->base;
if (likely(base != &migration_base)) {
/* compiler reads timer->base again, and now (base == &migration_base)
raw_spin_lock_irqsave(&base->cpu_base->lock, *flags);
if (likely(base == timer->base))
return base; /* == &migration_base ! */
Similarly the write sides must use WRITE_ONCE() to avoid store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191008173204.180879-1-edumazet@google.com
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
The recent change to avoid taking the expiry lock when a timer is currently
migrated missed to add a bracket at the end of the if statement leading to
compile errors. Since that commit the variable `migration_base' is always
used but it is only available on SMP configuration thus leading to another
compile error. The changelog says "The timer base and base->cpu_base
cannot be NULL in the code path", so it is safe to limit this check to SMP
configurations only.
Add the missing bracket to the if statement and hide `migration_base'
behind CONFIG_SMP bars.
[ tglx: Mark the functions inline ... ]
Fixes: 68b2c8c1e4210 ("hrtimer: Don't take expiry_lock when timer is currently migrated")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190904145527.eah7z56ntwobqm6j@linutronix.de
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
migration_base is used as a placeholder when an hrtimer is migrated to a
different CPU. In the case that hrtimer_cancel_wait_running() hits a timer
which is currently migrated it would pointlessly acquire the expiry lock of
the migration base, which is even not initialized.
Surely it could be initialized, but there is absolutely no point in
acquiring this lock because the timer is guaranteed not to run it's
callback for which the caller waits to finish on that base. So it would
just do the inc/lock/dec/unlock dance for nothing.
As the base switch is short and non-preemptible, there is no issue when the
wait function returns immediately.
The timer base and base->cpu_base cannot be NULL in the code path which is
invoking that, so just replace those checks with a check whether base is
migration base.
[ tglx: Updated from RT patch. Massaged changelog. Added comment. ]
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190821092409.13225-4-julien.grall@arm.com
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
The update to timer->base is protected by the base->cpu_base->lock().
However, hrtimer_cancel_wait_running() does access it lockless. So the
compiler is allowed to refetch timer->base which can cause havoc when the
timer base is changed concurrently.
Use READ_ONCE() to prevent this.
[ tglx: Adapted from a RT patch ]
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190821092409.13225-2-julien.grall@arm.com
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Posix timer delete retry loops are affected by the same priority inversion
and live lock issues as the other timers.
Provide a RT specific synchronization function which keeps a reference to
the timer by holding rcu read lock to prevent the timer from being freed,
dropping the timer lock and invoking the timer specific wait function via a
new callback.
This does not yet cover posix CPU timers because they need more special
treatment on PREEMPT_RT.
[ This is folded into the original attempt which did not use a callback. ]
Originally-by: Anna-Maria Gleixenr <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20190819143801.656864506@linutronix.de
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
As a preparatory step for adding the PREEMPT RT specific synchronization
mechanism to wait for a running timer callback, rework the timer cancel
retry loops so they call a common function. This allows trivial
substitution in one place.
Originally-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190730223828.874901027@linutronix.de
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If
the soft interrupt thread is preempted in the middle of a timer callback,
then calling hrtimer_cancel() can lead to two issues:
- If the caller is on a remote CPU then it has to spin wait for the timer
handler to complete. This can result in unbound priority inversion.
- If the caller originates from the task which preempted the timer
handler on the same CPU, then spin waiting for the timer handler to
complete is never going to end.
To avoid these issues, add a new lock to the timer base which is held
around the execution of the timer callbacks. If hrtimer_cancel() detects
that the timer callback is currently running, it blocks on the expiry
lock. When the callback is finished, the expiry lock is dropped by the
softirq thread which wakes up the waiter and the system makes progress.
This addresses both the priority inversion and the life lock issues.
The same issue can happen in virtual machines when the vCPU which runs a
timer callback is scheduled out. If a second vCPU of the same guest calls
hrtimer_cancel() it will spin wait for the other vCPU to be scheduled back
in. The expiry lock mechanism would avoid that. It'd be trivial to enable
this when paravirt spinlocks are enabled in a guest, but it's not clear
whether this is an actual problem in the wild, so for now it's an RT only
mechanism.
[ tglx: Refactored it for mainline ]
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185753.737767218@linutronix.de
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
On PREEMPT_RT enabled kernels hrtimers which are not explicitely marked for
hard interrupt expiry mode are moved into soft interrupt context either for
latency reasons or because the hrtimer callback takes regular spinlocks or
invokes other functions which are not suitable for hard interrupt context
on PREEMPT_RT.
The hrtimer_sleeper callback is RT compatible in hard interrupt context,
but there is a latency concern: Untrusted userspace can spawn many threads
which arm timers for the same expiry time on the same CPU. On expiry that
causes a latency spike due to the wakeup of a gazillion threads.
OTOH, priviledged real-time user space applications rely on the low latency
of hard interrupt wakeups. These syscall related wakeups are all based on
hrtimer sleepers.
If the current task is in a real-time scheduling class, mark the mode for
hard interrupt expiry.
[ tglx: Split out of a larger combo patch. Added changelog ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185753.645792403@linutronix.de
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
On PREEMPT_RT not all hrtimers can be expired in hard interrupt context
even if that is perfectly fine on a PREEMPT_RT=n kernel, e.g. because they
take regular spinlocks. Also for latency reasons PREEMPT_RT tries to defer
most hrtimers' expiry into softirq context.
hrtimers marked with HRTIMER_MODE_HARD must be kept in hard interrupt
context expiry mode. Add the required logic.
No functional change for PREEMPT_RT=n kernels.
[ tglx: Split out of a larger combo patch. Added changelog ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185753.551967692@linutronix.de
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
hrtimer_start_range_ns() has a WARN_ONCE() which verifies that a timer
which is marker for softirq expiry is not queued in the hard interrupt base
and vice versa.
When PREEMPT_RT is enabled, timers which are not explicitely marked to
expire in hard interrupt context are deferrred to the soft interrupt. So
the regular check would trigger.
Change the check, so when PREEMPT_RT is enabled, it is verified that the
timers marked for hard interrupt expiry are not tried to be queued for soft
interrupt expiry or any of the unmarked and softirq marked is tried to be
expired in hard interrupt context.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
hrtimer_sleepers will gain a scheduling class dependent treatment on
PREEMPT_RT. Create a wrapper around hrtimer_start_expires() to make that
possible.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
hrtimer_init_sleeper() calls require prior initialisation of the hrtimer
object which is embedded into the hrtimer_sleeper.
Combine the initialization and spare a function call. Fixup all call sites.
This is also a preparatory change for PREEMPT_RT to do hrtimer sleeper
specific initializations of the embedded hrtimer without modifying any of
the call sites.
No functional change.
[ anna-maria: Minor cleanups ]
[ tglx: Adopted to the removal of the task argument of
hrtimer_init_sleeper() and trivial polishing.
Folded a fix from Stephen Rothwell for the vsoc code ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185752.887468908@linutronix.de
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
All callers hand in 'current' and that's the only task pointer which
actually makes sense. Remove the task argument and set current in the
function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185752.791885290@linutronix.de
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
That gets rid of this warning:
./kernel/time/hrtimer.c:1119: WARNING: Block quote ends without a blank line; unexpected unindent.
and displays nicely both at the source code and at the produced
documentation.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Doc Mailing List <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lkml.kernel.org/r/74ddad7dac331b4e5ce4a90e15c8a49e3a16d2ac.1561372382.git.mchehab+samsung@kernel.org
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Use timerqueue_iterate_next() to get to the next timer in
__hrtimer_next_event_base() without browsing the timerqueue
details diredctly.
No intentional changes in functionality.
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
The next set of changes will need to compute the time to the next
hrtimer event over all hrtimers except for the scheduler tick one.
To that end introduce a new helper function,
hrtimer_next_event_without(), for computing the time until the next
hrtimer event over all timers except for one and modify the underlying
code in __hrtimer_next_event_base() to prepare it for being called by
that new function.
No intentional changes in functionality.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
The hrtimer interrupt code contains a hang detection and mitigation
mechanism, which prevents that a long delayed hrtimer interrupt causes a
continous retriggering of interrupts which prevent the system from making
progress. If a hang is detected then the timer hardware is programmed with
a certain delay into the future and a flag is set in the hrtimer cpu base
which prevents newly enqueued timers from reprogramming the timer hardware
prior to the chosen delay. The subsequent hrtimer interrupt after the delay
clears the flag and resumes normal operation.
If such a hang happens in the last hrtimer interrupt before a CPU is
unplugged then the hang_detected flag is set and stays that way when the
CPU is plugged in again. At that point the timer hardware is not armed and
it cannot be armed because the hang_detected flag is still active, so
nothing clears that flag. As a consequence the CPU does not receive hrtimer
interrupts and no timers expire on that CPU which results in RCU stalls and
other malfunctions.
Clear the flag along with some other less critical members of the hrtimer
cpu base to ensure starting from a clean state when a CPU is plugged in.
Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
root cause of that hard to reproduce heisenbug. Once understood it's
trivial and certainly justifies a brown paperbag.
Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
All prerequisites to handle hrtimers for expiry in either hard or soft
interrupt context are in place.
Add the missing bit in hrtimer_init() which associates the timer to the
hard or the softirq clock base.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-30-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
hrtimer callbacks are always invoked in hard interrupt context. Several
users in tree require soft interrupt context for their callbacks and
achieve this by combining a hrtimer with a tasklet. The hrtimer schedules
the tasklet in hard interrupt context and the tasklet callback gets invoked
in softirq context later.
That's suboptimal and aside of that the real-time patch moves most of the
hrtimers into softirq context. So adding native support for hrtimers
expiring in softirq context is a valuable extension for both mainline and
the RT patch set.
Each valid hrtimer clock id has two associated hrtimer clock bases: one for
timers expiring in hardirq context and one for timers expiring in softirq
context.
Implement the functionality to associate a hrtimer with the hard or softirq
related clock bases and update the relevant functions to take them into
account when the next expiry time needs to be evaluated.
Add a check into the hard interrupt context handler functions to check
whether the first expiring softirq based timer has expired. If it's expired
the softirq is raised and the accounting of softirq based timers to
evaluate the next expiry time for programming the timer hardware is skipped
until the softirq processing has finished. At the end of the softirq
processing the regular processing is resumed.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-29-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
The softirq based hrtimer can utilize most of the existing hrtimers
functions, but need to operate on a different data set.
Add an 'active_mask' parameter to various functions so the hard and soft bases
can be selected. Fixup the existing callers and hand in the ACTIVE_HARD
mask.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-28-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
__run_hrtimer() is called with the hrtimer_cpu_base.lock held and
interrupts disabled. Before invoking the timer callback the base lock is
dropped, but interrupts stay disabled.
The upcoming support for softirq based hrtimers requires that interrupts
are enabled before the timer callback is invoked.
To avoid code duplication, take hrtimer_cpu_base.lock with
raw_spin_lock_irqsave(flags) at the call site and hand in the flags as
a parameter. So raw_spin_unlock_irqrestore() before the callback invocation
will either keep interrupts disabled in interrupt context or restore to
interrupt enabled state when called from softirq context.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-26-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Preparatory patch for softirq based hrtimers to avoid code duplication,
factor out the __hrtimer_start_range_ns() function from hrtimer_start_range_ns().
No functional change.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-24-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
hrtimer_reprogram() must have access to the hrtimer_clock_base of the new
first expiring timer to access hrtimer_clock_base.offset for adjusting the
expiry time to CLOCK_MONOTONIC. This is required to evaluate whether the
new left most timer in the hrtimer_clock_base is the first expiring timer
of all clock bases in a hrtimer_cpu_base.
The only user of hrtimer_reprogram() is hrtimer_start_range_ns(), which has
a pointer to hrtimer_clock_base() already and hands it in as a parameter. But
hrtimer_start_range_ns() will be split for the upcoming support for softirq
based hrtimers to avoid code duplication and will lose the direct access to
the clock base pointer.
Instead of handing in timer and timer->base as a parameter remove the base
parameter from hrtimer_reprogram() instead and retrieve the clock base internally.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-23-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
The current decision whether a timer can be queued on a remote CPU checks
for timer->expiry <= remote_cpu_base.expires_next.
This is too restrictive because a timer with the same expiry time as an
existing timer will be enqueued on right-hand size of the existing timer
inside the rbtree, i.e. behind the first expiring timer.
So its safe to allow enqueuing timers with the same expiry time as the
first expiring timer on a remote CPU base.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-22-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
hrtimer_reprogram() is conditionally invoked from hrtimer_start_range_ns()
when hrtimer_cpu_base.hres_active is true.
In the !hres_active case there is a special condition for the nohz_active
case:
If the newly enqueued timer expires before the first expiring timer on a
remote CPU then the remote CPU needs to be notified and woken up from a
NOHZ idle sleep to take the new first expiring timer into account.
Previous changes have already established the prerequisites to make the
remote enqueue behaviour the same whether high resolution mode is active or
not:
If the to be enqueued timer expires before the first expiring timer on a
remote CPU, then it cannot be enqueued there.
This was done for the high resolution mode because there is no way to
access the remote CPU timer hardware. The same is true for NOHZ, but was
handled differently by unconditionally enqueuing the timer and waking up
the remote CPU so it can reprogram its timer. Again there is no compelling
reason for this difference.
hrtimer_check_target(), which makes the 'can remote enqueue' decision is
already unconditional, but not yet functional because nothing updates
hrtimer_cpu_base.expires_next in the !hres_active case.
To unify this the following changes are required:
1) Make the store of the new first expiry time unconditonal in
hrtimer_reprogram() and check __hrtimer_hres_active() before proceeding
to the actual hardware access. This check also lets the compiler
eliminate the rest of the function in case of CONFIG_HIGH_RES_TIMERS=n.
2) Invoke hrtimer_reprogram() unconditionally from
hrtimer_start_range_ns()
3) Remove the remote wakeup special case for the !high_res && nohz_active
case.
Confine the timers_nohz_active static key to timer.c which is the only user
now.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-21-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
When the first hrtimer on the current CPU is removed,
hrtimer_force_reprogram() is invoked but only when
CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active is set.
hrtimer_force_reprogram() updates hrtimer_cpu_base.expires_next and
reprograms the clock event device. When CONFIG_HIGH_RES_TIMERS=y and
hrtimer_cpu_base.hres_active is set, a pointless hrtimer interrupt can be
prevented.
hrtimer_check_target() makes the 'can remote enqueue' decision. As soon as
hrtimer_check_target() is unconditionally available and
hrtimer_cpu_base.expires_next is updated by hrtimer_reprogram(),
hrtimer_force_reprogram() needs to be available unconditionally as well to
prevent the following scenario with CONFIG_HIGH_RES_TIMERS=n:
- the first hrtimer on this CPU is removed and hrtimer_force_reprogram() is
not executed
- CPU goes idle (next timer is calculated and hrtimers are taken into
account)
- a hrtimer is enqueued remote on the idle CPU: hrtimer_check_target()
compares expiry value and hrtimer_cpu_base.expires_next. The expiry value
is after expires_next, so the hrtimer is enqueued. This timer will fire
late, if it expires before the effective first hrtimer on this CPU and
the comparison was with an outdated expires_next value.
To prevent this scenario, make hrtimer_force_reprogram() unconditional
except the effective reprogramming part, which gets eliminated by the
compiler in the CONFIG_HIGH_RES_TIMERS=n case.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-20-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>