geo->keylen cannot be larger than 4. So we might as well make
fixed-size allocations.
Given the one remaining user, geo->keylen cannot even be larger than 1.
Logfs used to have 64bit and 128bit keys, tcm_qla2xxx only has 32bit
keys. But let's not break the code if we don't have to.
Change-Id: I124d7095003cd140c17b18c2038f67de9ffc9328
Signed-off-by: Joern Engel <joern@purestorage.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 39c46e136d13e4382d3aea16ac77e12654a799e7.
Change-Id: If43f0f89953b5ac9992d227b7663fdafbac6082c
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 21f963a0969bfd0b4015a8a4b34abe704a351d00.
Change-Id: Ia24ae592faba8d21a344cfd0d14f76d8716d36e3
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 341180065f131963128c33fb75bd74d0ee551e5e.
Change-Id: I7ec85a9d90d689dcf82d080c2f6e5fdd805d1477
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Scheduler raises a SCHED_SOFTIRQ to trigger a load balancing event on
from the IPI handler on the idle CPU. If the SMP function is invoked
from an idle CPU via flush_smp_call_function_queue() then the HARD-IRQ
flag is not set and raise_softirq_irqoff() needlessly wakes ksoftirqd
because soft interrupts are handled before ksoftirqd get on the CPU.
Adding a trace_printk() in nohz_csd_func() at the spot of raising
SCHED_SOFTIRQ and enabling trace events for sched_switch, sched_wakeup,
and softirq_entry (for SCHED_SOFTIRQ vector alone) helps observing the
current behavior:
<idle>-0 [000] dN.1.: nohz_csd_func: Raising SCHED_SOFTIRQ from nohz_csd_func
<idle>-0 [000] dN.4.: sched_wakeup: comm=ksoftirqd/0 pid=16 prio=120 target_cpu=000
<idle>-0 [000] .Ns1.: softirq_entry: vec=7 [action=SCHED]
<idle>-0 [000] .Ns1.: softirq_exit: vec=7 [action=SCHED]
<idle>-0 [000] d..2.: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=16 next_prio=120
ksoftirqd/0-16 [000] d..2.: sched_switch: prev_comm=ksoftirqd/0 prev_pid=16 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
...
Use __raise_softirq_irqoff() to raise the softirq. The SMP function call
is always invoked on the requested CPU in an interrupt handler. It is
guaranteed that soft interrupts are handled at the end.
Following are the observations with the changes when enabling the same
set of events:
<idle>-0 [000] dN.1.: nohz_csd_func: Raising SCHED_SOFTIRQ for nohz_idle_balance
<idle>-0 [000] dN.1.: softirq_raise: vec=7 [action=SCHED]
<idle>-0 [000] .Ns1.: softirq_entry: vec=7 [action=SCHED]
No unnecessary ksoftirqd wakeups are seen from idle task's context to
service the softirq.
Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
Closes: https://lore.kernel.org/lkml/fcf823f-195e-6c9a-eac3-25f870cb35ac@inria.fr/ [1]
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20241119054432.6405-5-kprateek.nayak@amd.com
Change-Id: I52f3ccc2cca851e52f557f4c41a15e3b289d45e9
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
optimizes IPIs to idle CPUs in TIF_POLLING_NRFLAG mode by setting the
TIF_NEED_RESCHED flag in idle task's thread info and relying on
flush_smp_call_function_queue() in idle exit path to run the
call-function. A softirq raised by the call-function is handled shortly
after in do_softirq_post_smp_call_flush() but the TIF_NEED_RESCHED flag
remains set and is only cleared later when schedule_idle() calls
__schedule().
need_resched() check in _nohz_idle_balance() exists to bail out of load
balancing if another task has woken up on the CPU currently in-charge of
idle load balancing which is being processed in SCHED_SOFTIRQ context.
Since the optimization mentioned above overloads the interpretation of
TIF_NEED_RESCHED, check for idle_cpu() before going with the existing
need_resched() check which can catch a genuine task wakeup on an idle
CPU processing SCHED_SOFTIRQ from do_softirq_post_smp_call_flush(), as
well as the case where ksoftirqd needs to be preempted as a result of
new task wakeup or slice expiry.
In case of PREEMPT_RT or threadirqs, although the idle load balancing
may be inhibited in some cases on the ilb CPU, the fact that ksoftirqd
is the only fair task going back to sleep will trigger a newidle balance
on the CPU which will alleviate some imbalance if it exists if idle
balance fails to do so.
Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241119054432.6405-4-kprateek.nayak@amd.com
Change-Id: I32090ea646a2fe7df8b74bb8aead3ca94dc05467
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The memory barrier rmb() in generic idle loop do_idle() function is not
needed, it doesn't order any load instruction, just remove it as needless
rmb() can cause performance impact.
The rmb() was introduced by the tglx/history.git commit f2f1b44c75c4
("[PATCH] Remove RCU abuse in cpu_idle()") to order the loads between
cpu_idle_map and pm_idle. It pairs with wmb() in function cpu_idle_wait().
And then with the removal of cpu_idle_state in function cpu_idle() and
wmb() in function cpu_idle_wait() in commit 783e391b7b5b ("x86: Simplify
cpu_idle_wait"), rmb() no longer has a reason to exist.
After that, commit d16699123434 ("idle: Implement generic idle function")
implemented a generic idle function cpu_idle_loop() which resembles the
functionality found in arch/. And it retained the rmb() in generic idle
loop in file kernel/cpu/idle.c.
And at last, commit cf37b6b48428 ("sched/idle: Move cpu/idle.c to
sched/idle.c") moved cpu/idle.c to sched/idle.c. And commit c1de45ca831a
("sched/idle: Add support for tasks that inject idle") renamed function
cpu_idle_loop() to do_idle().
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20241009093745.9504-1-quic_zhonhan@quicinc.com
Change-Id: I7a57f4796f2ab451b14290de2f7e6255823a928d
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
In extreme test scenarios:
the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime,
utime = 18446744073709518790 ns, rtime = 135989749728000 ns
In cputime_adjust() process, stime is greater than rtime due to
mul_u64_u64_div_u64() precision problem.
before call mul_u64_u64_div_u64(),
stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
after call mul_u64_u64_div_u64(),
stime = 135989949653530
unsigned reversion occurs because rtime is less than stime.
utime = rtime - stime = 135989749728000 - 135989949653530
= -199925530
= (u64)18446744073709518790
Trigger condition:
1). User task run in kernel mode most of time
2). ARM64 architecture
3). TICK_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime
Fixes: 3dc167ba5729 ("sched/cputime: Improve cputime_adjust()")
Signed-off-by: Zheng Zucheng <zhengzucheng@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20240726023235.217771-1-zhengzucheng@huawei.com
Change-Id: Ic55a6ec98e36583f170f5ac660113b02ae607069
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The timerslack_ns setting is used to specify how much the hardware
timers should be delayed, to potentially dispatch multiple timers in a
single interrupt. This is a performance optimization. Timers of
realtime tasks (having a realtime scheduling policy) should not be
delayed.
This logic was inconsitently applied to the hrtimers, leading to delays
of realtime tasks which used timed waits for events (e.g. condition
variables). Due to the downstream override of the slack for rt tasks,
the procfs reported incorrect (non-zero) timerslack_ns values.
This is changed by setting the timer_slack_ns task attribute to 0 for
all tasks with a rt policy. By that, downstream users do not need to
specially handle rt tasks (w.r.t. the slack), and the procfs entry
shows the correct value of "0". Setting non-zero slack values (either
via procfs or PR_SET_TIMERSLACK) on tasks with a rt policy is ignored,
as stated in "man 2 PR_SET_TIMERSLACK":
Timer slack is not applied to threads that are scheduled under a
real-time scheduling policy (see sched_setscheduler(2)).
The special handling of timerslack on rt tasks in downstream users
is removed as well.
Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240814121032.368444-2-felix.moessbauer@siemens.com
[Sultan Alsawaf: backport to 6.1]
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Change-Id: I8b2c81ffdeea181ab729935de71d1b0131c16ffc
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit b0defa7ae03ecf91b8bfd10ede430cff12fcbd06.
b0defa7ae03ec changed the load balancing logic to ignore env.max_loop if
all tasks examined to that point were pinned. The goal of the patch was
to make it more likely to be able to detach a task buried in a long list
of pinned tasks. However, this has the unfortunate side effect of
creating an O(n) iteration in detach_tasks(), as we now must fully
iterate every task on a cpu if all or most are pinned. Since this load
balance code is done with rq lock held, and often in softirq context, it
is very easy to trigger hard lockups. We observed such hard lockups with
a user who affined O(10k) threads to a single cpu.
When I discussed this with Vincent he initially suggested that we keep
the limit on the number of tasks to detach, but increase the number of
tasks we can search. However, after some back and forth on the mailing
list, he recommended we instead revert the original patch, as it seems
likely no one was actually getting hit by the original issue.
Fixes: b0defa7ae03e ("sched/fair: Make sure to try to detach at least one movable task")
Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20240620214450.316280-1-joshdon@google.com
Change-Id: I71ef744d417501639bcef230d958870de5081ea8
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change se->load.weight to se_weight(se) in the calculation for the
initial util_avg to avoid unnecessarily inflating the util_avg by 1024
times.
The reason is that se->load.weight has the unit/scale as the scaled-up
load, while cfs_rg->avg.load_avg has the unit/scale as the true task
weight (as mapped directly from the task's nice/priority value). With
CONFIG_32BIT, the scaled-up load is equal to the true task weight. With
CONFIG_64BIT, the scaled-up load is 1024 times the true task weight.
Thus, the current code may inflate the util_avg by 1024 times. The
follow-up capping will not allow the util_avg value to go wild. But the
calculation should have the correct logic.
Signed-off-by: Dawei Li <daweilics@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Link: https://lore.kernel.org/r/20240315015916.21545-1-daweilics@gmail.com
Change-Id: I77a0b4d7d7fb810720addde31baba417ef38af0e
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
I have a RT task X at a high priority and cyclictest on each CPU with
lower priority than X's. If X is active and each CPU wakes their own
cylictest thread then it ends in a longer rto_push storm.
A random CPU determines via balance_rt() that the CPU on which X is
running needs to push tasks. X has the highest priority, cyclictest is
next in line so there is nothing that can be done since the task with
the higher priority is not touched.
tell_cpu_to_push() increments rto_loop_next and schedules
rto_push_irq_work_func() on X's CPU. The other CPUs also increment the
loop counter and do the same. Once rto_push_irq_work_func() is active it
does nothing because it has _no_ pushable tasks on its runqueue. Then
checks rto_next_cpu() and decides to queue irq_work on the local CPU
because another CPU requested a push by incrementing the counter.
I have traces where ~30 CPUs request this ~3 times each before it
finally ends. This greatly increases X's runtime while X isn't making
much progress.
Teach rto_next_cpu() to only return CPUs which also have tasks on their
runqueue which can be pushed away. This does not reduce the
tell_cpu_to_push() invocations (rto_loop_next counter increments) but
reduces the amount of issued rto_push_irq_work_func() if nothing can be
done. As the result the overloaded CPU is blocked less often.
There are still cases where the "same job" is repeated several times
(for instance the current CPU needs to resched but didn't yet because
the irq-work is repeated a few times and so the old task remains on the
CPU) but the majority of request end in tell_cpu_to_push() before an IPI
is issued.
Reviewed-by: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230801152648._y603AS_@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Change-Id: I51731f3bee90080170e45a548282cbd0a3ec2e85
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The pm_qos idle wake-up mechanism currently wakes up *all* idle CPUs when
there's a pm_qos request change, instead of just the CPUs which are
affected by the change. This is horribly suboptimal and increases power
consumption by needlessly waking idled CPUs.
Additionally, pm_qos may kick CPUs which aren't even idle, since
wake_up_all_idle_cpus() only checks if a CPU is running the idle task,
which says nothing about whether or not the CPU is really in an idle state.
Optimize the pm_qos wake-ups by only sending IPIs to CPUs that are idle,
and by using arch_send_wakeup_ipi_mask() instead of wake_up_if_idle()
which is used under the hood in wake_up_all_idle_cpus(). Using IPI_WAKEUP
instead of IPI_RESCHEDULE, which is what wake_up_if_idle() uses behind the
scenes, has the benefit of doing zero work upon receipt of the IPI;
IPI_WAKEUP is designed purely for sending an IPI without a payload.
Determining which CPUs are idle is done efficiently with an atomic bitmask
instead of using the wake_up_if_idle() API, which checks the CPU's runqueue
in an RCU read-side critical section and under a spin lock. Not very
efficient in comparison to a simple, atomic bitwise operation. A cpumask
isn't needed for this because NR_CPUS is guaranteed to fit within a word.
CPUs are marked as idle as soon as IRQs are disabled in the idle loop,
since any IPI sent after that point will cause the CPU's idle attempt to
immediately exit (like when executing the wfi instruction). CPUs are marked
as not-idle as soon as they wake up in order to avoid sending redundant
IPIs to CPUs that are already awake.
Change-Id: I04c9e2bd9317357e16d8184a104fe603d0d2dab2
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
lru_gen_add_mm() has been added within an IRQ-off region in the commit
mentioned below. The other invocations of lru_gen_add_mm() are not within
an IRQ-off region.
The invocation within IRQ-off region is problematic on PREEMPT_RT because
the function is using a spin_lock_t which must not be used within
IRQ-disabled regions.
The other invocations of lru_gen_add_mm() occur while
task_struct::alloc_lock is acquired. Move lru_gen_add_mm() after
interrupts are enabled and before task_unlock().
Link: https://lkml.kernel.org/r/20221026134830.711887-1-bigeasy@linutronix.de
Fixes: bd74fdaea1460 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Change-Id: I63ef837e43e727fd9223ad0e30170465b826a4bb
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
wakeup_flusher_threads() was added under the assumption that if a system
runs out of clean cold pages, it might want to write back dirty pages more
aggressively so that they can become clean and be dropped.
However, doing so can breach the rate limit a system wants to impose on
writeback, resulting in early SSD wearout.
Link: https://lkml.kernel.org/r/YzSiWq9UEER5LKup@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Reported-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Change-Id: Ib4def4286264de926b11ec5247185edc3a780619
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
To save energy, CASS may prefer non-idle CPUs for uclamp-boosted tasks in
order to pack them onto a single performance domain rather than spreading
them across multiple performance domains. This way, it is more likely for
only one performance domain to be boosted a higher P-state when there is
more than one uclamp-boosted task running.
However, when a task has a uclamp boost value that is below a CPU's minimum
capacity, it is nearly the same thing as not having a uclamp boost at all.
In spite of that, CASS may still prefer non-idle CPUs for tasks with bogus
uclamp boost values. This is not only worse for latency, but also energy
efficiency since the load on the CPU is spread less evenly as a result.
Therefore, don't pack tasks with uclamp boosts below a CPU's minimum
configured capacity, since such tasks do not force the CPU to run at a
higher P-state.
Change-Id: Ide8f62162723dc0c509fa5cccf92b8124f20f4aa
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The scheduler is unaware of the applied min_freq limit to a CPU, which is
useful information when predicting the frequency a CPU will run at for
energy efficiency purposes.
Export this information via arch_scale_min_freq_capacity() and wire it up
for arm64.
Change-Id: Icdff7628c095185280e95dd965d497e6f740c871
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 0a0414f8a65655f816f5e04cb997ef8738106c05.
Change-Id: Ia14ba25e11bec4927c60502ee8c4dad40b71cf24
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit d6e561f94c2a1c83186116d4e35b8300a41d6a22.
Change-Id: I712d1a2c14b45ab522a815c5decd60b4389633e0
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
After turning the location services off (to actually turn off or on the
Wi-Fi device), users started reporting a 30 seconds delay when their
PCIe Wi-Fi device is brought back up.
This is due to Qualcomm's improper conditional umac_stop_complete_cb()
implementation that simply neglects `qdf_event_set(stop_evt)` when
QDF_ENABLE_TRACING is not defined.
As our local modification disables QDF_ENABLE_TRACING, this
implementation bug triggers a 30 seconds delayed reset.
All functions/APIs used within umac_stop_complete_cb() are available
without QDF_ENABLE_TRACING defined, so remove the conditional
implementation.
Fixes: bcd3d019d8e1 ("qcacld-3.0: Execute sme_stop and mac_stop in mc thread context")
Change-Id: I6055404c5df4e0232ea344e1c2669871e61cb9a7
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 09427173765ecb63836d49e4608ee2b65eb947df.
Change-Id: Iab647d5a3fc56a6e84eaded70252a0d736a5cd88
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit d0661f464d00db0cce80068cb1ea3a3d462b2bf9.
Change-Id: I8a1c946ea23485d0a1aac6755a741d24f2c03ca6
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
SBalance manages IRQ affinity automatically. However, if SBalance is
not enabled, we restore HIF_IRQ_AFFINITY to maintain optimized
IRQ distribution.
Change-Id: I0f99803959fc7fe080184ea4dcea7d16ab70997a
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
If SBalance is enabled, IRQ affinity should be managed automatically.
Prevent userspace from modifying it in this case, but allow changes
when SBalance is disabled.
Change-Id: Ibf37bf258a2358ad8b982704e8f035bd9739866b
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 602aa3bba862bb7ff51bdf2c9303db4b057f5353.
Change-Id: I4517bdb857e7e1ab02749596dedcaa8220dc040a
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 1b396d869a6da9fa864d4de8235f2d0afc7164c1.
Change-Id: I13b4629e9aefcd23da2e58ef534c1057f81059cd
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: Ifa5949aa44c5f6ceb8001c3105f1b3cf92fbefd5
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 4043298f9af526f1703b88c3dfbeae7a16e88425.
Change-Id: Idd5ef50ccc82197606c2a1851072d9056308fb19
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 06287f3167d25f3328e90ba5bf86de28761c0c1b.
Change-Id: I0fe1113f5369c4c9bfe93418b8c68529baa51b04
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 318d4145ba4f21fe23bd96b998c0a170ab5a26b6.
Change-Id: I14279c6c33a7da292376204c6824b8178b74fd6a
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Our msg priorities became an rbtree as of d6629859b36d ("ipc/mqueue:
improve performance of send/recv"). However, consuming a msg in
msg_get() remains logarithmic (still being better than the case before
of course). By applying well known techniques to cache pointers we can
have the node with the highest priority in O(1), which is specially nice
for the rt cases. Furthermore, some callers can call msg_get() in a
loop.
A new msg_tree_erase() helper is also added to encapsulate the tree
removal and node_cache game. Passes ltp mq testcases.
Link: http://lkml.kernel.org/r/20190321190216.1719-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Change-Id: I234983728fbc30aba482a6b58b2a70b1c38f3145
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit 31643d84b8c3d9c846aa0e20bc033e46c68c7e7d upstream.
With the introduction of binder_available_for_proc_work_ilocked() in
commit 1b77e9dcc3da ("ANDROID: binder: remove proc waitqueue") a binder
thread can only "wait_for_proc_work" after its thread->looper has been
marked as BINDER_LOOPER_STATE_{ENTERED|REGISTERED}.
This means an unregistered reader risks waiting indefinitely for work
since it never gets added to the proc->waiting_threads. If there are no
further references to its waitqueue either the task will hang. The same
applies to readers using the (e)poll interface.
I couldn't find the rationale behind this restriction. So this patch
restores the previous behavior of allowing unregistered threads to
"wait_for_proc_work". Note that an error message for this scenario,
which had previously become unreachable, is now re-enabled.
Fixes: 1b77e9dcc3da ("ANDROID: binder: remove proc waitqueue")
Cc: stable@vger.kernel.org
Cc: Martijn Coenen <maco@google.com>
Cc: Arve Hjønnevåg <arve@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20240711201452.2017543-1-cmllamas@google.com
Change-Id: I72954fb5fa749c7e0694fd036ed6862cff38cdb8
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Fixes gcc '-Wunused-but-set-variable' warning:
net/sched/sch_fq_codel.c: In function fq_codel_dequeue:
net/sched/sch_fq_codel.c:288:23: warning: variable prev_ecn_mark set but not used [-Wunused-but-set-variable]
net/sched/sch_fq_codel.c:288:6: warning: variable prev_drop_count set but not used [-Wunused-but-set-variable]
They are not used since commit 77ddaff218fc ("fq_codel: Kill
useless per-flow dropped statistic")
Change-Id: I09426b4d4b41b9302e534e41fdcab109ef55c571
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
It is almost impossible to get anything other than a 0 out of
flow->dropped statistic with a tc class dump, as it resets to 0
on every round.
It also conflates ecn marks with drops.
It would have been useful had it kept a cumulative drop count, but
it doesn't. This patch doesn't change the API, it just stops
tracking a stat and state that is impossible to measure and nobody
uses.
Change-Id: Ibac1a0fd6825aa5bf862ec7cf20227de7a939ec9
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
In the field fq_codel is often used with a smaller memory or
packet limit than the default, and when the bulk dropper is hit,
the drop pattern bifircates into one that more slowly increases
the codel drop rate and hits the bulk dropper more than it should.
The scan through the 1024 queues happens more often than it needs to.
This patch increases the codel count in the bulk dropper, but
does not change the drop rate there, relying on the next codel round
to deliver the next packet at the original drop rate
(after that burst of loss), then escalate to a higher signaling rate.
Change-Id: I47562b843bb86abed0b502cea62368a1195eeb0f
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The new added "reciprocal_value_adv" implements the advanced version of the
algorithm described in Figure 4.2 of the paper except when
"divisor > (1U << 31)" whose ceil(log2(d)) result will be 32 which then
requires u128 divide on host. The exception case could be easily handled
before calling "reciprocal_value_adv".
The advanced version requires more complex calculation to get the
reciprocal multiplier and other control variables, but then could reduce
the required emulation operations.
It makes no sense to use this advanced version for host divide emulation,
those extra complexities for calculating multiplier etc could completely
waive our saving on emulation operations.
However, it makes sense to use it for JIT divide code generation (for
example eBPF JIT backends) for which we are willing to trade performance of
JITed code with that of host. As shown by the following pseudo code, the
required emulation operations could go down from 6 (the basic version) to 3
or 4.
To use the result of "reciprocal_value_adv", suppose we want to calculate
n/d, the C-style pseudo code will be the following, it could be easily
changed to real code generation for other JIT targets.
struct reciprocal_value_adv rvalue;
u8 pre_shift, exp;
// handle exception case.
if (d >= (1U << 31)) {
result = n >= d;
return;
}
rvalue = reciprocal_value_adv(d, 32)
exp = rvalue.exp;
if (rvalue.is_wide_m && !(d & 1)) {
// floor(log2(d & (2^32 -d)))
pre_shift = fls(d & -d) - 1;
rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
} else {
pre_shift = 0;
}
// code generation starts.
if (imm == 1U << exp) {
result = n >> exp;
} else if (rvalue.is_wide_m) {
// pre_shift must be zero when reached here.
t = (n * rvalue.m) >> 32;
result = n - t;
result >>= 1;
result += t;
result >>= rvalue.sh - 1;
} else {
if (pre_shift)
result = n >> pre_shift;
result = ((u64)result * rvalue.m) >> 32;
result >>= rvalue.sh;
}
Change-Id: I54385f0df42aa43355d940d20d6818d2fb3197d9
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
If an input number x for int_sqrt64() has the highest bit set, then
fls64(x) is 64. (1UL << 64) is an overflow and breaks the algorithm.
Subtracting 1 is a better guess for the initial value of m anyway and
that's what also done in int_sqrt() implicitly [*].
[*] Note how int_sqrt() uses __fls() with two underscores, which already
returns the proper raw bit number.
In contrast, int_sqrt64() used fls64(), and that returns bit numbers
illogically starting at 1, because of error handling for the "no
bits set" case. Will points out that he bug probably is due to a
copy-and-paste error from the regular int_sqrt() case.
Change-Id: I5be5be3e03ddbe68cc8025a64698bbb49c57c3a5
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Florian La Roche <Florian.LaRoche@googlemail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>