With the use of the barrier implied by barrier_data(), there is no need
for memzero_explicit() to be extern. Making it inline saves the overhead
of a function call, and allows the code to be reused in arch/*/purgatory
without having to duplicate the implementation.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephan Mueller <smueller@chronox.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-crypto@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Fixes: 906a4bb97f5d ("crypto: sha256 - Use get/put_unaligned_be32 to get input, memzero_explicit")
Link: https://lkml.kernel.org/r/20191007220000.GA408752@rani.riverdale.lan
Change-Id: Ic2098c6a670c16c60444e107d35ab49a0bc27a90
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
As was already noted in rbtree.h, the logic to cache rb_first (or
rb_last) can easily be implemented externally to the core rbtree api.
Change the implementation to do just that. Previously the update of
rb_leftmost was wired deeper into the implmentation, but there were some
disadvantages to that - mostly, lib/rbtree.c had separate instantiations
for rb_insert_color() vs rb_insert_color_cached(), as well as rb_erase()
vs rb_erase_cached(), which were doing exactly the same thing save for
the rb_leftmost update at the start of either function.
text data bss dec hex filename
5405 120 0 5525 1595 lib/rbtree.o-vanilla
3827 96 0 3923 f53 lib/rbtree.o-patch
[dave@stgolabs.net: changelog addition]
Link: http://lkml.kernel.org/r/20190628171416.by5gdizl3rcxk5h5@linux-r8p5
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20190628045008.39926-1-walken@google.com
Change-Id: Ifc2014f666553eb54dce2a69ec576c04928f779e
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
geo->keylen cannot be larger than 4. So we might as well make
fixed-size allocations.
Given the one remaining user, geo->keylen cannot even be larger than 1.
Logfs used to have 64bit and 128bit keys, tcm_qla2xxx only has 32bit
keys. But let's not break the code if we don't have to.
Change-Id: I124d7095003cd140c17b18c2038f67de9ffc9328
Signed-off-by: Joern Engel <joern@purestorage.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The new added "reciprocal_value_adv" implements the advanced version of the
algorithm described in Figure 4.2 of the paper except when
"divisor > (1U << 31)" whose ceil(log2(d)) result will be 32 which then
requires u128 divide on host. The exception case could be easily handled
before calling "reciprocal_value_adv".
The advanced version requires more complex calculation to get the
reciprocal multiplier and other control variables, but then could reduce
the required emulation operations.
It makes no sense to use this advanced version for host divide emulation,
those extra complexities for calculating multiplier etc could completely
waive our saving on emulation operations.
However, it makes sense to use it for JIT divide code generation (for
example eBPF JIT backends) for which we are willing to trade performance of
JITed code with that of host. As shown by the following pseudo code, the
required emulation operations could go down from 6 (the basic version) to 3
or 4.
To use the result of "reciprocal_value_adv", suppose we want to calculate
n/d, the C-style pseudo code will be the following, it could be easily
changed to real code generation for other JIT targets.
struct reciprocal_value_adv rvalue;
u8 pre_shift, exp;
// handle exception case.
if (d >= (1U << 31)) {
result = n >= d;
return;
}
rvalue = reciprocal_value_adv(d, 32)
exp = rvalue.exp;
if (rvalue.is_wide_m && !(d & 1)) {
// floor(log2(d & (2^32 -d)))
pre_shift = fls(d & -d) - 1;
rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
} else {
pre_shift = 0;
}
// code generation starts.
if (imm == 1U << exp) {
result = n >> exp;
} else if (rvalue.is_wide_m) {
// pre_shift must be zero when reached here.
t = (n * rvalue.m) >> 32;
result = n - t;
result >>= 1;
result += t;
result >>= rvalue.sh - 1;
} else {
if (pre_shift)
result = n >> pre_shift;
result = ((u64)result * rvalue.m) >> 32;
result >>= rvalue.sh;
}
Change-Id: I54385f0df42aa43355d940d20d6818d2fb3197d9
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
If an input number x for int_sqrt64() has the highest bit set, then
fls64(x) is 64. (1UL << 64) is an overflow and breaks the algorithm.
Subtracting 1 is a better guess for the initial value of m anyway and
that's what also done in int_sqrt() implicitly [*].
[*] Note how int_sqrt() uses __fls() with two underscores, which already
returns the proper raw bit number.
In contrast, int_sqrt64() used fls64(), and that returns bit numbers
illogically starting at 1, because of error handling for the "no
bits set" case. Will points out that he bug probably is due to a
copy-and-paste error from the regular int_sqrt() case.
Change-Id: I5be5be3e03ddbe68cc8025a64698bbb49c57c3a5
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Florian La Roche <Florian.LaRoche@googlemail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
There is no option to perform 64bit integer sqrt on 32bit platform.
Added stronger typed int_sqrt64 enables the 64bit calculations to
be performed on 32bit platforms. Using same algorithm as int_sqrt()
with strong typing provides enough precision also on 32bit platforms,
but it sacrifices some performance. In case values are smaller than
ULONG_MAX the standard int_sqrt is used for calculation to maximize the
performance due to more native calculations.
Change-Id: I8b22ef3fc9e63ea74fb1df14115fc374170549c3
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Crt Mori <cmo@melexis.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Our current int_sqrt() is not rough nor any approximation; it calculates
the exact value of: floor(sqrt()). Document this.
Link: http://lkml.kernel.org/r/20171020164645.001652117@infradead.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Anshul Garg <aksgarg1989@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Michael Davidson <md@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Change-Id: Iea660f36312f879010d16028bc21b6bb50905078
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The function types for swap, cmp and cmp_r functions are already
being in use by modules.
Move them to types.h that everybody in kernel will be able to use
generic types instead of custom ones.
This adds more sense to the comment in bsearch() later on.
Link: http://lkml.kernel.org/r/20191007135656.37734-1-andriy.shevchenko@linux.intel.com
Change-Id: I4848ccb09bac73774e2b0071eb767d596e4f6f90
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Our list_sort() utility has always supported a context argument that
is passed through to the comparison routine. Now there's a use case
for the similar thing for sort().
This implements sort_r by simply extending the existing sort function
in the obvious way. To avoid code duplication, we want to implement
sort() in terms of sort_r(). The naive way to do that is
static int cmp_wrapper(const void *a, const void *b, const void *ctx)
{
int (*real_cmp)(const void*, const void*) = ctx;
return real_cmp(a, b);
}
sort(..., cmp) { sort_r(..., cmp_wrapper, cmp) }
but this would do two indirect calls for each comparison. Instead, do
as is done for the default swap functions - that only adds a cost of a
single easily predicted branch to each comparison call.
Aside from introducing support for the context argument, this also
serves as preparation for patches that will eliminate the indirect
comparison calls in common cases.
Requested-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Change-Id: I3ad240253956f6ec3f41833fc9ddefa5749fbc58
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Fix kernel-doc notation in lib/sort.c by using correct function parameter
names.
lib/sort.c:59: warning: Excess function parameter 'size' description in 'swap_words_32'
lib/sort.c:83: warning: Excess function parameter 'size' description in 'swap_words_64'
lib/sort.c:110: warning: Excess function parameter 'size' description in 'swap_bytes'
Link: http://lkml.kernel.org/r/60e25d3d-68d1-bde2-3b39-e4baa0b14907@infradead.org
Fixes: 37d0ec34d111a ("lib/sort: make swap functions more generic")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: George Spelvin <lkml@sdf.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Change-Id: I40d3917918ee9a73ac983ecaf4d62abcd924a45f
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Similar to what's being done in the net code, this takes advantage of
the fact that most invocations use only a few common swap functions, and
replaces indirect calls to them with (highly predictable) conditional
branches. (The downside, of course, is that if you *do* use a custom
swap function, there are a few extra predicted branches on the code
path.)
This actually *shrinks* the x86-64 code, because it inlines the various
swap functions inside do_swap, eliding function prologues & epilogues.
x86-64 code size 767 -> 703 bytes (-64)
Link: http://lkml.kernel.org/r/d10c5d4b393a1847f32f5b26f4bbaa2857140e1e.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin <lkml@sdf.org>
Acked-by: Andrey Abramov <st5pub@yandex.ru>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Daniel Wagner <daniel.wagner@siemens.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Don Mullis <don.mullis@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Change-Id: I4f4850f79f2a1596ec4d19780f329cd073c4f11c
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This uses fewer comparisons than the previous code (approaching half as
many for large random inputs), but produces identical results; it
actually performs the exact same series of swap operations.
Specifically, it reduces the average number of compares from
2*n*log2(n) - 3*n + o(n)
to
n*log2(n) + 0.37*n + o(n).
This is still 1.63*n worse than glibc qsort() which manages n*log2(n) -
1.26*n, but at least the leading coefficient is correct.
Standard heapsort, when sifting down, performs two comparisons per
level: one to find the greater child, and a second to see if the current
node should be exchanged with that child.
Bottom-up heapsort observes that it's better to postpone the second
comparison and search for the leaf where -infinity would be sent to,
then search back *up* for the current node's destination.
Since sifting down usually proceeds to the leaf level (that's where half
the nodes are), this does O(1) second comparisons rather than log2(n).
That saves a lot of (expensive since Spectre) indirect function calls.
The one time it's worse than the previous code is if there are large
numbers of duplicate keys, when the top-down algorithm is O(n) and
bottom-up is O(n log n). For distinct keys, it's provably always
better, doing 1.5*n*log2(n) + O(n) in the worst case.
(The code is not significantly more complex. This patch also merges the
heap-building and -extracting sift-down loops, resulting in a net code
size savings.)
x86-64 code size 885 -> 767 bytes (-118)
(I see the checkpatch complaint about "else if (n -= size)". The
alternative is significantly uglier.)
Link: http://lkml.kernel.org/r/2de8348635a1a421a72620677898c7fd5bd4b19d.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin <lkml@sdf.org>
Acked-by: Andrey Abramov <st5pub@yandex.ru>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Daniel Wagner <daniel.wagner@siemens.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Don Mullis <don.mullis@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Change-Id: I370b088649c56ae9a0d8040c30ed5e13b847cc7c
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Patch series "lib/sort & lib/list_sort: faster and smaller", v2.
Because CONFIG_RETPOLINE has made indirect calls much more expensive, I
thought I'd try to reduce the number made by the library sort functions.
The first three patches apply to lib/sort.c.
Patch #1 is a simple optimization. The built-in swap has special cases
for aligned 4- and 8-byte objects. But those are almost never used;
most calls to sort() work on larger structures, which fall back to the
byte-at-a-time loop. This generalizes them to aligned *multiples* of 4
and 8 bytes. (If nothing else, it saves an awful lot of energy by not
thrashing the store buffers as much.)
Patch #2 grabs a juicy piece of low-hanging fruit. I agree that nice
simple solid heapsort is preferable to more complex algorithms (sorry,
Andrey), but it's possible to implement heapsort with far fewer
comparisons (50% asymptotically, 25-40% reduction for realistic sizes)
than the way it's been done up to now. And with some care, the code
ends up smaller, as well. This is the "big win" patch.
Patch #3 adds the same sort of indirect call bypass that has been added
to the net code of late. The great majority of the callers use the
builtin swap functions, so replace the indirect call to sort_func with a
(highly preditable) series of if() statements. Rather surprisingly,
this decreased code size, as the swap functions were inlined and their
prologue & epilogue code eliminated.
lib/list_sort.c is a bit trickier, as merge sort is already close to
optimal, and we don't want to introduce triumphs of theory over
practicality like the Ford-Johnson merge-insertion sort.
Patch #4, without changing the algorithm, chops 32% off the code size
and removes the part[MAX_LIST_LENGTH+1] pointer array (and the
corresponding upper limit on efficiently sortable input size).
Patch #5 improves the algorithm. The previous code is already optimal
for power-of-two (or slightly smaller) size inputs, but when the input
size is just over a power of 2, there's a very unbalanced final merge.
There are, in the literature, several algorithms which solve this, but
they all depend on the "breadth-first" merge order which was replaced by
commit 835cc0c8477f with a more cache-friendly "depth-first" order.
Some hard thinking came up with a depth-first algorithm which defers
merges as little as possible while avoiding bad merges. This saves
0.2*n compares, averaged over all sizes.
The code size increase is minimal (64 bytes on x86-64, reducing the net
savings to 26%), but the comments expanded significantly to document the
clever algorithm.
TESTING NOTES: I have some ugly user-space benchmarking code which I
used for testing before moving this code into the kernel. Shout if you
want a copy.
I'm running this code right now, with CONFIG_TEST_SORT and
CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since the last
round of minor edits to quell checkpatch. I figure there will be at
least one round of comments and final testing.
This patch (of 5):
Rather than having special-case swap functions for 4- and 8-byte
objects, special-case aligned multiples of 4 or 8 bytes. This speeds up
most users of sort() by avoiding fallback to the byte copy loop.
Despite what ca96ab859ab4 ("lib/sort: Add 64 bit swap function") claims,
very few users of sort() sort pointers (or pointer-sized objects); most
sort structures containing at least two words. (E.g.
drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte struct
acpi_fan_fps.)
The functions also got renamed to reflect the fact that they support
multiple words. In the great tradition of bikeshedding, the names were
by far the most contentious issue during review of this patch series.
x86-64 code size 872 -> 886 bytes (+14)
With feedback from Andy Shevchenko, Rasmus Villemoes and Geert
Uytterhoeven.
Link: http://lkml.kernel.org/r/f24f932df3a7fa1973c1084154f1cea596bcf341.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin <lkml@sdf.org>
Acked-by: Andrey Abramov <st5pub@yandex.ru>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Daniel Wagner <daniel.wagner@siemens.com>
Cc: Don Mullis <don.mullis@gmail.com>
Cc: Dave Chinner <dchinner@redhat.com>
Change-Id: I9f21e6eb4bcacf83d40cef3637a492b19db501fd
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Do not solely rely on compiler optimizations to get the workaround
of having macros do nothing using an empty do-while loop. It's
inefficient.
Use ((void)0) to which the standard assert macro expands when NDEBUG
is defined.
No functional change intended.
[mcdofrenchfreis]:
Implement this patch to tree using the command:
git grep -l "do {} while (0)" | xargs sed -i "s/do {} while (0)/((void)0)/g"
Change-Id: I9615c62c46670e31ed8d0d89d195144541baa3e6
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: mcdofrenchfreis <xyzevan@androidist.net>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
crc32_le and __crc32c_le can be overridden - extend this to crc32_be.
Change-Id: Ia51b0f97201903ba27bd22a345b781af13ef4a53
Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Casts were added in commit 8f243af42ade ("sections: fix const sections
for crc32 table") to cope with the tables not being const. They are no
longer required since commit f5e38b9284e1 ("lib: crc32: constify crc32
lookup table").
Change-Id: If601e66d4f0a753fba9c72698ba83c645409b634
Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Header was defining CRCPOLY_LE/BE and CRC32C_POLY_LE but in fact all of
them are CRC-32 polynomials so use consistent naming.
Change-Id: I21da6af43ebc69dcae24f2b97652728f739a2876
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Allow other drivers and parts of kernel to use the same define for
CRC32 polynomial, instead of duplicating it in many places. This code
does not bring any functional changes, except moving existing code.
Change-Id: Ibe919da197be32e1298d7b64c1602810bd3c0bb3
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This caused so much performance regressions in hackbench.
Change-Id: Ib72d4f4aca54ee00799809d4eb2fcb6cdb1f4971
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
xxhash's performance depends heavily on compiler optimizations including
inlines. Follow upstream's behavior and inline those helper functions.
Change-Id: I1bc08b7ef6a491817b9ed5e8daab0f1993081f71
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
These simply wraps memcpy().
Replace it with macros so that it is naturally inlined.
Change-Id: I32df8e35dd99611ab0cbd472146b0ef3ecb847d3
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit af73483f4e8b6f5c68c9aa63257bdd929a9c194a upstream.
The IDA usually detects double-frees, but that detection failed to
consider the case when there are no nearby IDs allocated and so we have a
NULL bitmap rather than simply having a clear bit. Add some tests to the
test-suite to be sure we don't inadvertently reintroduce this problem.
Unfortunately they're quite noisy so include a message to disregard
the warnings.
Reported-by: Zhenghan Wang <wzhmmmmm@gmail.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hugo SIMELIERE <hsimeliere.opensource@witekio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 89db5346acb5a15e670c4fb3b8f3c30fa30ebc15)
[Vegard: remove changes to lib/test_ida.c which does not exist in 4.14.]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEERFwmR4yFob14UDOYC8702P6YulgFAmcHrG8ZHHZlZ2FyZC5u
b3NzdW1Ab3JhY2xlLmNvbQAKCRALzvTY/pi6WIQuEACCYf9xCGBALlKFb0pXX3eF
oiRkceNyy5NWSndD7t9p/3d2g4YrVptGxtTZN12IltfG4wfCQ+qC/0g2Mu4ho0Yp
2ExKVaIli1t2csIjXCUUyjh3jU0JOkDwJap9n5QemACsX8zrDfKVwdlj9hw+e7vi
fBWwdfl1duK5cfVbbyvL74It4WeMnjuAYrBnMTxhYBTq56xFLrbBILl8BLxAV5NN
5wGoNCeUtj8LxUrL2qs5QoT3Bf7uoDlLnu1Ly7jDMMX34/oNh5huOjZdDFbQYxS3
DsEe6ljOYOyB/awdUhScERfxVPimumN3nHWnRJbsQhX36uXT6U7HNJah4zauchRk
UlKUSfG3YyOqKIwFH+8oGmkuCm6wZbVjVsNNkYhT804BCCHrasJ1SHXsSB9R0MpU
x3IQOoiuc33bUYrSqWAO7utvt+PwG++3GHz0XQwPfZn4DHY18/e+VNsGtQTPqzRG
tsywZVTN0DC0nO7L772nkQDb7z2mhmJGgN8q3FPbMTfp/I1phIh9C17pckfpHKAl
ippTmTMaIYDU3Rlc1g/cu363GOaXWRN4t03VSEu/BLV0IElRktUnmuBU3B/rMb+F
ItaBmhnZGXHUrulMTxDtzItrYMwx00USw6IrG3iYjob0MhhxhLVxEh0vKc7Te2w5
2FZEjj2BxinK66mJgAolZw==
=BQd/
-----END PGP SIGNATURE-----
Merge tag 'v4.14.353-openela' of https://github.com/openela/kernel-lts
This is the 4.14.353 OpenELA-Extended LTS stable release
* tag 'v4.14.353-openela' of https://github.com/openela/kernel-lts: (173 commits)
LTS: Update to 4.14.353
net: fix __dst_negative_advice() race
selftests: make order checking verbose in msg_zerocopy selftest
selftests: fix OOM in msg_zerocopy selftest
Revert "selftests/net: reap zerocopy completions passed up as ancillary data."
Revert "selftests: fix OOM in msg_zerocopy selftest"
Revert "selftests: make order checking verbose in msg_zerocopy selftest"
nvme/pci: Add APST quirk for Lenovo N60z laptop
exec: Fix ToCToU between perm check and set-uid/gid usage
drm/i915/gem: Fix Virtual Memory mapping boundaries calculation
drm/i915: Try GGTT mmapping whole object as partial
netfilter: nf_tables: set element extended ACK reporting support
kbuild: Fix '-S -c' in x86 stack protector scripts
drm/mgag200: Set DDC timeout in milliseconds
drm/bridge: analogix_dp: properly handle zero sized AUX transactions
drm/bridge: analogix_dp: Properly log AUX CH errors
drm/bridge: analogix_dp: Reset aux channel if an error occurred
drm/bridge: analogix_dp: Check AUX_EN status when doing AUX transfer
x86/mtrr: Check if fixed MTRRs exist before saving them
tracing: Fix overflow in get_free_elt()
...
Change-Id: I0e92a979e31d4fa6c526c6b70a1b61711d9747bb
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit bf6acd5d16057d7accbbb1bf7dc6d8c56eeb4ecc upstream.
The decompression code parses a huffman tree and counts the number of
symbols for a given bit length. In rare cases, there may be >= 256
symbols with a given bit length, causing the unsigned char to overflow.
This causes a decompression failure later when the code tries and fails to
find the bit length for a given symbol.
Since the maximum number of symbols is 258, use unsigned short instead.
Link: https://lkml.kernel.org/r/20240717162016.1514077-1-ross.lagerwall@citrix.com
Fixes: bc22c17e12c1 ("bzip2/lzma: library support for gzip, bzip2 and lzma decompression")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 16b92b031b4da174342bd909130731c55f20c7ea)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
The environment buffer isn't very big; when it's allocated on the stack,
kobject_uevent_env's stack frame size increases to just over 2 KiB,
which is safe considering that we have a 16 KiB stack.
Allocate the environment buffer on the stack instead of using the slab
allocator in order to improve performance.
Change-Id: I175eda1be09007436f77dd6f185931e6a1d9facb
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Extracted from Huawei's kernel source drop, this patch adds ARM64 optimizations for LZ4 decompression:
- Adds ARM64 acceleration support.
- Introduces new ARM64-specific files and updates the Makefile.
- Enhances __LZ4_decompress_generic for partial decompression.
- Adds bounds checks for safer decompression.
Originally intended as optimizations for Huawei's EROFS driver.
Change-Id: I438b79e43bbacd08a047d037a1eeac8ff89b4aff
Signed-off-by: Dark-Matter7232 <me@const.eu.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This fixes the following warnings by sparse:
../lib/lz4/lz4_compress.c:838:5: warning: symbol 'LZ4_compress_fast_extState' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:141:8: warning: symbol 'read_long_length_no_check' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:904:5: warning: symbol 'LZ4_decompress_fast' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:1052:5: warning: symbol 'LZ4_decompress_fast_continue' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:1099:5: warning: symbol 'LZ4_decompress_safe_usingDict' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:1118:5: warning: symbol 'LZ4_decompress_fast_usingDict' was not declared. Should it be static?
Since some of the functions have been marked as static now, there is no
need to export them. Remove the redundant export symbols as well.
Change-Id: Idf1e7a83ceba9f582e22df51ebe687585334abec
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
gcc can transform the loop in a naive implementation of memset/memcpy
etc into a call to the function itself. This optimization is enabled by
-ftree-loop-distribute-patterns.
This has been the case for a while, but gcc-10.x enables this option at
-O2 rather than -O3 as in previous versions.
Add -ffreestanding, which implicitly disables this optimization with
gcc. It is unclear whether clang performs such optimizations, but
hopefully it will also not do so in a freestanding environment.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
Change-Id: I9dc26de8617d7417de2a60c40fcc05687abb3f68
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target.
In particular, it triggers here because crc32_le_base/__crc32c_le_base
aren't __pure while their target crc32_le/__crc32c_le are.
These aliases are used by architectures as a fallback in accelerated
versions of CRC32. See commit 9784d82db3eb ("lib/crc32: make core crc32()
routines weak so they can be overridden").
Therefore, being fallbacks, it is likely that even if the aliases
were called from C, there wouldn't be any optimizations possible.
Currently, the only user is arm64, which calls this from asm.
Still, marking the aliases as __pure makes sense and is a good idea
for documentation purposes and possible future optimizations,
which also silences the warning.
Change-Id: I587626c4a9440aa11d45ab18fd025fd2e18e16fd
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Allow architectures to drop in accelerated CRC32 routines by making
the crc32_le/__crc32c_le entry points weak, and exposing non-weak
aliases for them that may be used by the accelerated versions as
fallbacks in case the instructions they rely upon are not available.
Change-Id: Ifd5e08ad5f0ebf4792986045d429704223c227c1
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I383e5f39ef6deccfe6f4126b00d29da3e2bdac3d
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Update lz4 module using official repository from revision [1].
Keep in mind lz4hc wasn't updated thus it is not used. It may not
compile anymore.
[1]: 4ebe313e00
Change-Id: Ica61c484bcc9a91f65752bd62f2bf3bb90ed1499
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Drop backported single patches in favor of full update to latest v1.9.4
Change-Id: Ib11b79efac81aede4645bbb317dc25b7878db5d6
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
* 'linux-4.14.y' of https://github.com/openela/kernel-lts: (278 commits)
LTS: Update to 4.14.348
docs: kernel_include.py: Cope with docutils 0.21
serial: kgdboc: Fix NMI-safety problems from keyboard reset code
btrfs: add missing mutex_unlock in btrfs_relocate_sys_chunks()
dm: limit the number of targets and parameter size area
Revert "selftests: mm: fix map_hugetlb failure on 64K page size systems"
LTS: Update to 4.14.347
rds: Fix build regression.
RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats
af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
net: fix out-of-bounds access in ops_init
drm/vmwgfx: Fix invalid reads in fence signaled events
dyndbg: fix old BUG_ON in >control parser
tipc: fix UAF in error path
usb: gadget: f_fs: Fix a race condition when processing setup packets.
usb: gadget: composite: fix OS descriptors w_value logic
firewire: nosy: ensure user_length is taken into account when fetching packet contents
af_unix: Fix garbage collector racing against connect()
af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
ipv6: fib6_rules: avoid possible NULL dereference in fib6_rule_action()
...
Change-Id: If329d39dd4e95e14045bb7c58494c197d1352d60
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit 00e7d3bea2ce7dac7bee1cf501fb071fd0ea8f6c upstream.
Fix a BUG_ON from 2009. Even if it looks "unreachable" (I didn't
really look), lets make sure by removing it, doing pr_err and return
-EINVAL instead.
Cc: stable <stable@kernel.org>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20240429193145.66543-2-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3c718bddddca9cbef177ac475b94c5c91147fb38)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
This just does the "if the architecture does efficient unaligned
handling, start the memcmp using 'unsigned long' accesses", since
Nikolay Borisov found a load that cares.
This is basically the minimal patch, and limited to architectures that
are known to not have slow unaligned handling. We've had the stupid
byte-at-a-time version forever, and nobody has ever even noticed before,
so let's keep the fix minimal.
A potential further improvement would be to align one of the sources in
order to at least minimize unaligned cases, but the only real case of
bigger memcmp() users seems to be the FIDEDUPERANGE ioctl(). As David
Sterba says, the dedupe ioctl is typically called on ranges spanning
many pages so the common case will all be page-aligned anyway.
All the relevant architectures select HAVE_EFFICIENT_UNALIGNED_ACCESS,
so I'm not going to worry about the combination of a very rare use-case
and a rare architecture until somebody actually hits it. Particularly
since Nikolay also tested the more complex patch with extra alignment
handling code, and it only added overhead.
Link: https://lore.kernel.org/lkml/20210721135926.602840-1-nborisov@suse.com/
Change-Id: I31524a62feb09d31028e870cac46598d0fbfb9cc
Reported-by: Nikolay Borisov <nborisov@suse.com>
Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
When NR_CPUS fits in a long, it's possible to use compiler built-ins to
produce much faster code when operating on cpumasks compared to just using
the generic bitops APIs.
Therefore, add optimized helpers using compiler built-ins when NR_CPUS fits
in a long. This also turns nr_cpu_ids into a compile-time constant for
further optimization potential.
Note that compared to the upstream cpumask rewrite with this feature, these
optimized helpers perfectly preserve the semantics of the helpers they
replace. And this change is much smaller than the upstream version.
Change-Id: I1ac6058a19bd3b22a491176eef9d661cca78e521
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: Ib8c142f4aa1dbe169d36b0e826f5da66bc334a47
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This accomodates the common drivers between the ZSTD Compressor and the ZSTD Decompressor.
Change-Id: I2c498cbab6bae106923138750ca695a663b9e1c5
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit eafc0a02391b7b36617b36c97c4b5d6832cf5e24 upstream.
When partialDecoding, it is EOF if we've either filled the output buffer
or can't proceed with reading an offset for following match.
In some extreme corner cases when compressed data is suitably corrupted,
UAF will occur. As reported by KASAN [1], LZ4_decompress_safe_partial
may lead to read out of bound problem during decoding. lz4 upstream has
fixed it [2] and this issue has been disscussed here [3] before.
current decompression routine was ported from lz4 v1.8.3, bumping
lib/lz4 to v1.9.+ is certainly a huge work to be done later, so, we'd
better fix it first.
[1] https://lore.kernel.org/all/000000000000830d1205cf7f0477@google.com/
[2] c5d6f8a8be#
[3] https://lore.kernel.org/all/CC666AE8-4CA4-4951-B6FB-A2EFDE3AC03B@fb.com/
Link: https://lkml.kernel.org/r/20211111105048.2006070-1-guoxuenan@huawei.com
Change-Id: Idaa5c404d9084f64a4be50ef3d4615b962dcd3b4
Reported-by: syzbot+63d688f1d899c588fb71@syzkaller.appspotmail.com
Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Yann Collet <cyan@fb.com>
Cc: Chengyang Fan <cy.fan@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
* 'deprecated/android-4.14-stable' of https://android.googlesource.com/kernel/common: (101 commits)
Linux 4.14.336
mmc: core: Cancel delayed work before releasing host
mmc: rpmb: fixes pause retune on all RPMB partitions.
firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
i40e: fix use-after-free in i40e_aqc_add_filters()
net: bcmgenet: Fix FCS generation for fragmented skbuffs
net: sched: em_text: fix possible memory leak in em_text_destroy()
nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local
UPSTREAM: drm: Fix doc warning in drm_connector_attach_edid_property()
BACKPORT: lib/vsprintf: Hash legacy clock addresses
UPSTREAM: xfrm: fix gro_cells leak when remove virtual xfrm interfaces
UPSTREAM: xfrm: Make function xfrmi_get_link_net() static
UPSTREAM: cpuidle: menu: Retain tick when shallow state is selected
UPSTREAM: bpf: fix rcu annotations in compute_effective_progs()
UPSTREAM: bpf: bpf_prog_array_alloc() should return a generic non-rcu pointer
UPSTREAM: sched/util_est: Fix util_est_dequeue() for throttled cfs_rq
UPSTREAM: softirq: Reorder trace_softirqs_on to prevent lockdep splat
UPSTREAM: l2tp: fix refcount leakage on PPPoL2TP sockets
UPSTREAM: HID: steam: select CONFIG_POWER_SUPPLY
BACKPORT: mac80211_hwsim: fix a possible memory leak in hwsim_new_radio_nl()
...
Change-Id: I1c98fbb0918986a06bee16b0c11fe8bee003fd3f
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
On platforms using the Common Clock Framework, "%pC" prints the clock's
name. On legacy platforms, it prints the unhashed clock's address,
potentially leaking sensitive information regarding the kernel layout in
memory.
Avoid this leak by printing the hashed address instead. To distinguish
between clocks, a 32-bit unique identifier is as good as an actual
pointer value.
Bug: 254441685
Fixes: ad67b74d2469d9b8 ("printk: hash addresses printed with %p")
Link: http://lkml.kernel.org/r/20181011084249.4520-3-geert+renesas@glider.be
To: "Tobin C . Harding" <me@tobin.cc>
To: Andrew Morton <akpm@linux-foundation.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
(cherry picked from commit ec12bc2909f9759747ab5ad3709472353c43a750)
[Lee: Fixed a trivial conflict pertaining to original diff]
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I8286a3b34ebd66ddef861dd969c68283ef569cf5
The BUG and stack protector reports were still using a raw %p. This
changes it to %pB for more meaningful output.
Bug: 254441685
Link: http://lkml.kernel.org/r/20180301225704.GA34198@beast
Fixes: ad67b74d2469 ("printk: hash addresses printed with %p")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Richard Weinberger <richard.weinberger@gmail.com>,
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 0862ca422b79cb5aa70823ee0f07f6b468f86070)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I5ea62cb4f1e800fe0c71694a3af4f7f55f11de55