To save energy, CASS may prefer non-idle CPUs for uclamp-boosted tasks in
order to pack them onto a single performance domain rather than spreading
them across multiple performance domains. This way, it is more likely for
only one performance domain to be boosted a higher P-state when there is
more than one uclamp-boosted task running.
However, when a task has a uclamp boost value that is below a CPU's minimum
capacity, it is nearly the same thing as not having a uclamp boost at all.
In spite of that, CASS may still prefer non-idle CPUs for tasks with bogus
uclamp boost values. This is not only worse for latency, but also energy
efficiency since the load on the CPU is spread less evenly as a result.
Therefore, don't pack tasks with uclamp boosts below a CPU's minimum
configured capacity, since such tasks do not force the CPU to run at a
higher P-state.
Change-Id: Ide8f62162723dc0c509fa5cccf92b8124f20f4aa
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The scheduler is unaware of the applied min_freq limit to a CPU, which is
useful information when predicting the frequency a CPU will run at for
energy efficiency purposes.
Export this information via arch_scale_min_freq_capacity() and wire it up
for arm64.
Change-Id: Icdff7628c095185280e95dd965d497e6f740c871
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 0a0414f8a65655f816f5e04cb997ef8738106c05.
Change-Id: Ia14ba25e11bec4927c60502ee8c4dad40b71cf24
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit d6e561f94c2a1c83186116d4e35b8300a41d6a22.
Change-Id: I712d1a2c14b45ab522a815c5decd60b4389633e0
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
After turning the location services off (to actually turn off or on the
Wi-Fi device), users started reporting a 30 seconds delay when their
PCIe Wi-Fi device is brought back up.
This is due to Qualcomm's improper conditional umac_stop_complete_cb()
implementation that simply neglects `qdf_event_set(stop_evt)` when
QDF_ENABLE_TRACING is not defined.
As our local modification disables QDF_ENABLE_TRACING, this
implementation bug triggers a 30 seconds delayed reset.
All functions/APIs used within umac_stop_complete_cb() are available
without QDF_ENABLE_TRACING defined, so remove the conditional
implementation.
Fixes: bcd3d019d8e1 ("qcacld-3.0: Execute sme_stop and mac_stop in mc thread context")
Change-Id: I6055404c5df4e0232ea344e1c2669871e61cb9a7
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 09427173765ecb63836d49e4608ee2b65eb947df.
Change-Id: Iab647d5a3fc56a6e84eaded70252a0d736a5cd88
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit d0661f464d00db0cce80068cb1ea3a3d462b2bf9.
Change-Id: I8a1c946ea23485d0a1aac6755a741d24f2c03ca6
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
SBalance manages IRQ affinity automatically. However, if SBalance is
not enabled, we restore HIF_IRQ_AFFINITY to maintain optimized
IRQ distribution.
Change-Id: I0f99803959fc7fe080184ea4dcea7d16ab70997a
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
If SBalance is enabled, IRQ affinity should be managed automatically.
Prevent userspace from modifying it in this case, but allow changes
when SBalance is disabled.
Change-Id: Ibf37bf258a2358ad8b982704e8f035bd9739866b
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 602aa3bba862bb7ff51bdf2c9303db4b057f5353.
Change-Id: I4517bdb857e7e1ab02749596dedcaa8220dc040a
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 1b396d869a6da9fa864d4de8235f2d0afc7164c1.
Change-Id: I13b4629e9aefcd23da2e58ef534c1057f81059cd
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: Ifa5949aa44c5f6ceb8001c3105f1b3cf92fbefd5
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 4043298f9af526f1703b88c3dfbeae7a16e88425.
Change-Id: Idd5ef50ccc82197606c2a1851072d9056308fb19
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 06287f3167d25f3328e90ba5bf86de28761c0c1b.
Change-Id: I0fe1113f5369c4c9bfe93418b8c68529baa51b04
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 318d4145ba4f21fe23bd96b998c0a170ab5a26b6.
Change-Id: I14279c6c33a7da292376204c6824b8178b74fd6a
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Our msg priorities became an rbtree as of d6629859b36d ("ipc/mqueue:
improve performance of send/recv"). However, consuming a msg in
msg_get() remains logarithmic (still being better than the case before
of course). By applying well known techniques to cache pointers we can
have the node with the highest priority in O(1), which is specially nice
for the rt cases. Furthermore, some callers can call msg_get() in a
loop.
A new msg_tree_erase() helper is also added to encapsulate the tree
removal and node_cache game. Passes ltp mq testcases.
Link: http://lkml.kernel.org/r/20190321190216.1719-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Change-Id: I234983728fbc30aba482a6b58b2a70b1c38f3145
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit 31643d84b8c3d9c846aa0e20bc033e46c68c7e7d upstream.
With the introduction of binder_available_for_proc_work_ilocked() in
commit 1b77e9dcc3da ("ANDROID: binder: remove proc waitqueue") a binder
thread can only "wait_for_proc_work" after its thread->looper has been
marked as BINDER_LOOPER_STATE_{ENTERED|REGISTERED}.
This means an unregistered reader risks waiting indefinitely for work
since it never gets added to the proc->waiting_threads. If there are no
further references to its waitqueue either the task will hang. The same
applies to readers using the (e)poll interface.
I couldn't find the rationale behind this restriction. So this patch
restores the previous behavior of allowing unregistered threads to
"wait_for_proc_work". Note that an error message for this scenario,
which had previously become unreachable, is now re-enabled.
Fixes: 1b77e9dcc3da ("ANDROID: binder: remove proc waitqueue")
Cc: stable@vger.kernel.org
Cc: Martijn Coenen <maco@google.com>
Cc: Arve Hjønnevåg <arve@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20240711201452.2017543-1-cmllamas@google.com
Change-Id: I72954fb5fa749c7e0694fd036ed6862cff38cdb8
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Fixes gcc '-Wunused-but-set-variable' warning:
net/sched/sch_fq_codel.c: In function fq_codel_dequeue:
net/sched/sch_fq_codel.c:288:23: warning: variable prev_ecn_mark set but not used [-Wunused-but-set-variable]
net/sched/sch_fq_codel.c:288:6: warning: variable prev_drop_count set but not used [-Wunused-but-set-variable]
They are not used since commit 77ddaff218fc ("fq_codel: Kill
useless per-flow dropped statistic")
Change-Id: I09426b4d4b41b9302e534e41fdcab109ef55c571
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
It is almost impossible to get anything other than a 0 out of
flow->dropped statistic with a tc class dump, as it resets to 0
on every round.
It also conflates ecn marks with drops.
It would have been useful had it kept a cumulative drop count, but
it doesn't. This patch doesn't change the API, it just stops
tracking a stat and state that is impossible to measure and nobody
uses.
Change-Id: Ibac1a0fd6825aa5bf862ec7cf20227de7a939ec9
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
In the field fq_codel is often used with a smaller memory or
packet limit than the default, and when the bulk dropper is hit,
the drop pattern bifircates into one that more slowly increases
the codel drop rate and hits the bulk dropper more than it should.
The scan through the 1024 queues happens more often than it needs to.
This patch increases the codel count in the bulk dropper, but
does not change the drop rate there, relying on the next codel round
to deliver the next packet at the original drop rate
(after that burst of loss), then escalate to a higher signaling rate.
Change-Id: I47562b843bb86abed0b502cea62368a1195eeb0f
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The new added "reciprocal_value_adv" implements the advanced version of the
algorithm described in Figure 4.2 of the paper except when
"divisor > (1U << 31)" whose ceil(log2(d)) result will be 32 which then
requires u128 divide on host. The exception case could be easily handled
before calling "reciprocal_value_adv".
The advanced version requires more complex calculation to get the
reciprocal multiplier and other control variables, but then could reduce
the required emulation operations.
It makes no sense to use this advanced version for host divide emulation,
those extra complexities for calculating multiplier etc could completely
waive our saving on emulation operations.
However, it makes sense to use it for JIT divide code generation (for
example eBPF JIT backends) for which we are willing to trade performance of
JITed code with that of host. As shown by the following pseudo code, the
required emulation operations could go down from 6 (the basic version) to 3
or 4.
To use the result of "reciprocal_value_adv", suppose we want to calculate
n/d, the C-style pseudo code will be the following, it could be easily
changed to real code generation for other JIT targets.
struct reciprocal_value_adv rvalue;
u8 pre_shift, exp;
// handle exception case.
if (d >= (1U << 31)) {
result = n >= d;
return;
}
rvalue = reciprocal_value_adv(d, 32)
exp = rvalue.exp;
if (rvalue.is_wide_m && !(d & 1)) {
// floor(log2(d & (2^32 -d)))
pre_shift = fls(d & -d) - 1;
rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
} else {
pre_shift = 0;
}
// code generation starts.
if (imm == 1U << exp) {
result = n >> exp;
} else if (rvalue.is_wide_m) {
// pre_shift must be zero when reached here.
t = (n * rvalue.m) >> 32;
result = n - t;
result >>= 1;
result += t;
result >>= rvalue.sh - 1;
} else {
if (pre_shift)
result = n >> pre_shift;
result = ((u64)result * rvalue.m) >> 32;
result >>= rvalue.sh;
}
Change-Id: I54385f0df42aa43355d940d20d6818d2fb3197d9
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
If an input number x for int_sqrt64() has the highest bit set, then
fls64(x) is 64. (1UL << 64) is an overflow and breaks the algorithm.
Subtracting 1 is a better guess for the initial value of m anyway and
that's what also done in int_sqrt() implicitly [*].
[*] Note how int_sqrt() uses __fls() with two underscores, which already
returns the proper raw bit number.
In contrast, int_sqrt64() used fls64(), and that returns bit numbers
illogically starting at 1, because of error handling for the "no
bits set" case. Will points out that he bug probably is due to a
copy-and-paste error from the regular int_sqrt() case.
Change-Id: I5be5be3e03ddbe68cc8025a64698bbb49c57c3a5
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Florian La Roche <Florian.LaRoche@googlemail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
There is no option to perform 64bit integer sqrt on 32bit platform.
Added stronger typed int_sqrt64 enables the 64bit calculations to
be performed on 32bit platforms. Using same algorithm as int_sqrt()
with strong typing provides enough precision also on 32bit platforms,
but it sacrifices some performance. In case values are smaller than
ULONG_MAX the standard int_sqrt is used for calculation to maximize the
performance due to more native calculations.
Change-Id: I8b22ef3fc9e63ea74fb1df14115fc374170549c3
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Crt Mori <cmo@melexis.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Our current int_sqrt() is not rough nor any approximation; it calculates
the exact value of: floor(sqrt()). Document this.
Link: http://lkml.kernel.org/r/20171020164645.001652117@infradead.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Anshul Garg <aksgarg1989@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Michael Davidson <md@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Change-Id: Iea660f36312f879010d16028bc21b6bb50905078
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The function types for swap, cmp and cmp_r functions are already
being in use by modules.
Move them to types.h that everybody in kernel will be able to use
generic types instead of custom ones.
This adds more sense to the comment in bsearch() later on.
Link: http://lkml.kernel.org/r/20191007135656.37734-1-andriy.shevchenko@linux.intel.com
Change-Id: I4848ccb09bac73774e2b0071eb767d596e4f6f90
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Our list_sort() utility has always supported a context argument that
is passed through to the comparison routine. Now there's a use case
for the similar thing for sort().
This implements sort_r by simply extending the existing sort function
in the obvious way. To avoid code duplication, we want to implement
sort() in terms of sort_r(). The naive way to do that is
static int cmp_wrapper(const void *a, const void *b, const void *ctx)
{
int (*real_cmp)(const void*, const void*) = ctx;
return real_cmp(a, b);
}
sort(..., cmp) { sort_r(..., cmp_wrapper, cmp) }
but this would do two indirect calls for each comparison. Instead, do
as is done for the default swap functions - that only adds a cost of a
single easily predicted branch to each comparison call.
Aside from introducing support for the context argument, this also
serves as preparation for patches that will eliminate the indirect
comparison calls in common cases.
Requested-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Change-Id: I3ad240253956f6ec3f41833fc9ddefa5749fbc58
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Fix kernel-doc notation in lib/sort.c by using correct function parameter
names.
lib/sort.c:59: warning: Excess function parameter 'size' description in 'swap_words_32'
lib/sort.c:83: warning: Excess function parameter 'size' description in 'swap_words_64'
lib/sort.c:110: warning: Excess function parameter 'size' description in 'swap_bytes'
Link: http://lkml.kernel.org/r/60e25d3d-68d1-bde2-3b39-e4baa0b14907@infradead.org
Fixes: 37d0ec34d111a ("lib/sort: make swap functions more generic")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: George Spelvin <lkml@sdf.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Change-Id: I40d3917918ee9a73ac983ecaf4d62abcd924a45f
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Similar to what's being done in the net code, this takes advantage of
the fact that most invocations use only a few common swap functions, and
replaces indirect calls to them with (highly predictable) conditional
branches. (The downside, of course, is that if you *do* use a custom
swap function, there are a few extra predicted branches on the code
path.)
This actually *shrinks* the x86-64 code, because it inlines the various
swap functions inside do_swap, eliding function prologues & epilogues.
x86-64 code size 767 -> 703 bytes (-64)
Link: http://lkml.kernel.org/r/d10c5d4b393a1847f32f5b26f4bbaa2857140e1e.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin <lkml@sdf.org>
Acked-by: Andrey Abramov <st5pub@yandex.ru>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Daniel Wagner <daniel.wagner@siemens.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Don Mullis <don.mullis@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Change-Id: I4f4850f79f2a1596ec4d19780f329cd073c4f11c
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This uses fewer comparisons than the previous code (approaching half as
many for large random inputs), but produces identical results; it
actually performs the exact same series of swap operations.
Specifically, it reduces the average number of compares from
2*n*log2(n) - 3*n + o(n)
to
n*log2(n) + 0.37*n + o(n).
This is still 1.63*n worse than glibc qsort() which manages n*log2(n) -
1.26*n, but at least the leading coefficient is correct.
Standard heapsort, when sifting down, performs two comparisons per
level: one to find the greater child, and a second to see if the current
node should be exchanged with that child.
Bottom-up heapsort observes that it's better to postpone the second
comparison and search for the leaf where -infinity would be sent to,
then search back *up* for the current node's destination.
Since sifting down usually proceeds to the leaf level (that's where half
the nodes are), this does O(1) second comparisons rather than log2(n).
That saves a lot of (expensive since Spectre) indirect function calls.
The one time it's worse than the previous code is if there are large
numbers of duplicate keys, when the top-down algorithm is O(n) and
bottom-up is O(n log n). For distinct keys, it's provably always
better, doing 1.5*n*log2(n) + O(n) in the worst case.
(The code is not significantly more complex. This patch also merges the
heap-building and -extracting sift-down loops, resulting in a net code
size savings.)
x86-64 code size 885 -> 767 bytes (-118)
(I see the checkpatch complaint about "else if (n -= size)". The
alternative is significantly uglier.)
Link: http://lkml.kernel.org/r/2de8348635a1a421a72620677898c7fd5bd4b19d.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin <lkml@sdf.org>
Acked-by: Andrey Abramov <st5pub@yandex.ru>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Daniel Wagner <daniel.wagner@siemens.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Don Mullis <don.mullis@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Change-Id: I370b088649c56ae9a0d8040c30ed5e13b847cc7c
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Patch series "lib/sort & lib/list_sort: faster and smaller", v2.
Because CONFIG_RETPOLINE has made indirect calls much more expensive, I
thought I'd try to reduce the number made by the library sort functions.
The first three patches apply to lib/sort.c.
Patch #1 is a simple optimization. The built-in swap has special cases
for aligned 4- and 8-byte objects. But those are almost never used;
most calls to sort() work on larger structures, which fall back to the
byte-at-a-time loop. This generalizes them to aligned *multiples* of 4
and 8 bytes. (If nothing else, it saves an awful lot of energy by not
thrashing the store buffers as much.)
Patch #2 grabs a juicy piece of low-hanging fruit. I agree that nice
simple solid heapsort is preferable to more complex algorithms (sorry,
Andrey), but it's possible to implement heapsort with far fewer
comparisons (50% asymptotically, 25-40% reduction for realistic sizes)
than the way it's been done up to now. And with some care, the code
ends up smaller, as well. This is the "big win" patch.
Patch #3 adds the same sort of indirect call bypass that has been added
to the net code of late. The great majority of the callers use the
builtin swap functions, so replace the indirect call to sort_func with a
(highly preditable) series of if() statements. Rather surprisingly,
this decreased code size, as the swap functions were inlined and their
prologue & epilogue code eliminated.
lib/list_sort.c is a bit trickier, as merge sort is already close to
optimal, and we don't want to introduce triumphs of theory over
practicality like the Ford-Johnson merge-insertion sort.
Patch #4, without changing the algorithm, chops 32% off the code size
and removes the part[MAX_LIST_LENGTH+1] pointer array (and the
corresponding upper limit on efficiently sortable input size).
Patch #5 improves the algorithm. The previous code is already optimal
for power-of-two (or slightly smaller) size inputs, but when the input
size is just over a power of 2, there's a very unbalanced final merge.
There are, in the literature, several algorithms which solve this, but
they all depend on the "breadth-first" merge order which was replaced by
commit 835cc0c8477f with a more cache-friendly "depth-first" order.
Some hard thinking came up with a depth-first algorithm which defers
merges as little as possible while avoiding bad merges. This saves
0.2*n compares, averaged over all sizes.
The code size increase is minimal (64 bytes on x86-64, reducing the net
savings to 26%), but the comments expanded significantly to document the
clever algorithm.
TESTING NOTES: I have some ugly user-space benchmarking code which I
used for testing before moving this code into the kernel. Shout if you
want a copy.
I'm running this code right now, with CONFIG_TEST_SORT and
CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since the last
round of minor edits to quell checkpatch. I figure there will be at
least one round of comments and final testing.
This patch (of 5):
Rather than having special-case swap functions for 4- and 8-byte
objects, special-case aligned multiples of 4 or 8 bytes. This speeds up
most users of sort() by avoiding fallback to the byte copy loop.
Despite what ca96ab859ab4 ("lib/sort: Add 64 bit swap function") claims,
very few users of sort() sort pointers (or pointer-sized objects); most
sort structures containing at least two words. (E.g.
drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte struct
acpi_fan_fps.)
The functions also got renamed to reflect the fact that they support
multiple words. In the great tradition of bikeshedding, the names were
by far the most contentious issue during review of this patch series.
x86-64 code size 872 -> 886 bytes (+14)
With feedback from Andy Shevchenko, Rasmus Villemoes and Geert
Uytterhoeven.
Link: http://lkml.kernel.org/r/f24f932df3a7fa1973c1084154f1cea596bcf341.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin <lkml@sdf.org>
Acked-by: Andrey Abramov <st5pub@yandex.ru>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Daniel Wagner <daniel.wagner@siemens.com>
Cc: Don Mullis <don.mullis@gmail.com>
Cc: Dave Chinner <dchinner@redhat.com>
Change-Id: I9f21e6eb4bcacf83d40cef3637a492b19db501fd
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Freezing during no interactivity may benefit power
Change-Id: I685806f9d39da1f3523ba70590797de926840e18
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
In order to complete a commit, we must first wait for the previous
commit to finish up. This is done by sleeping, during which time the CPU
can enter a deep idle state and take a while to finish processing the
commit after the wait is over. We can alleviate this by optimistically
assuming that the kthread this commit worker is running on won't migrate
to another CPU partway through. We only do this for the non-blocking
case where the commit completion is done in an asynchronous worker
because the generic DRM code already does this for atomic ioctls.
Change-Id: I55b822211b91f4a31c2bc6e65d7b31989e56aa7d
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Allowing the pm_qos notifier callbacks to execute without holding
pm_qos_lock can cause the callbacks to misbehave, e.g. the cpuidle
callback could erroneously send more IPIs than necessary.
Fix this by executing the pm_qos callbacks while pm_qos_lock is held.
Change-Id: I0f5b0de2b022997a8f7d88755d7b60070b9a091d
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Although data dependencies and one-way, semi-permeable barriers provided by
spin locks satisfy most ordering needs here, it is still possible for some
I/O writes to be reordered with respect to one another in a dangerous way.
One such example is that the interrupt status bit could be cleared *after*
the interrupt is unmasked when enabling the IRQ, potentially leading to a
spurious interrupt if there's an interrupt pending from when the IRQ was
disabled.
To prevent dangerous I/O write reordering, restore the minimum amount of
barriers needed to ensure writes are ordered as intended.
Change-Id: I4c44eaa93f39591d5c963dba2b9dcaf33831bdbe
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
For the vast majority of mmio operations in this driver, explicit memory
barriers aren't needed either because a data dependency between a read
and write already exists, or because of the presence of the spin locks
which execute a full memory barrier.
Removing all the unneeded explicit barriers considerably reduces
overhead for pinctrl operations, which in turn benefits things like i2c.
Change-Id: I566e9189fa0d392f242395b43aefbb9f324c1900
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Do not solely rely on compiler optimizations to get the workaround
of having macros do nothing using an empty do-while loop. It's
inefficient.
Use ((void)0) to which the standard assert macro expands when NDEBUG
is defined.
No functional change intended.
[mcdofrenchfreis]:
Implement this patch to tree using the command:
git grep -l "do {} while (0)" | xargs sed -i "s/do {} while (0)/((void)0)/g"
Change-Id: I9615c62c46670e31ed8d0d89d195144541baa3e6
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: mcdofrenchfreis <xyzevan@androidist.net>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
* This is required for U QPR2
Change-Id: I0321c64f77fccf74ff2472c3abd29e8b6b4be1ce
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
* Google is attempting to kill 4.14 in 0156d6e2ba
Change-Id: Ic87a66753a7acc89b0fe5b19158eea4c58ba980f
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Thanks to kdrag0n @ GitHub for his original commit using vmalloc instead
of kmalloc (preventing a panic).
Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
Change-Id: I336835a0bf9abbbbad0b9a0d299b5c22eaf15abb
Signed-off-by: DennySPb <dennyspb@gmail.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>