5063 Commits

Author SHA1 Message Date
Arvind Sankar
67b2692daa lib/string: Make memzero_explicit() inline instead of external
With the use of the barrier implied by barrier_data(), there is no need
for memzero_explicit() to be extern. Making it inline saves the overhead
of a function call, and allows the code to be reused in arch/*/purgatory
without having to duplicate the implementation.

Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephan Mueller <smueller@chronox.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-crypto@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Fixes: 906a4bb97f5d ("crypto: sha256 - Use get/put_unaligned_be32 to get input, memzero_explicit")
Link: https://lkml.kernel.org/r/20191007220000.GA408752@rani.riverdale.lan
Change-Id: Ic2098c6a670c16c60444e107d35ab49a0bc27a90
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-02-09 23:14:46 -03:00
Michel Lespinasse
248be874fe lib/rbtree: Avoid generating code twice for the cached versions
As was already noted in rbtree.h, the logic to cache rb_first (or
rb_last) can easily be implemented externally to the core rbtree api.

Change the implementation to do just that.  Previously the update of
rb_leftmost was wired deeper into the implmentation, but there were some
disadvantages to that - mostly, lib/rbtree.c had separate instantiations
for rb_insert_color() vs rb_insert_color_cached(), as well as rb_erase()
vs rb_erase_cached(), which were doing exactly the same thing save for
the rb_leftmost update at the start of either function.

   text	   data	    bss	    dec	    hex	filename
   5405	    120	      0	   5525	   1595	lib/rbtree.o-vanilla
   3827	     96	      0	   3923	    f53	lib/rbtree.o-patch

[dave@stgolabs.net: changelog addition]
  Link: http://lkml.kernel.org/r/20190628171416.by5gdizl3rcxk5h5@linux-r8p5
[akpm@linux-foundation.org: coding-style fixes]
  Link: http://lkml.kernel.org/r/20190628045008.39926-1-walken@google.com

Change-Id: Ifc2014f666553eb54dce2a69ec576c04928f779e
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-02-09 23:14:37 -03:00
Joern Engel
6325cc3d94 lib/btree: Avoid variable-length allocations
geo->keylen cannot be larger than 4. So we might as well make
fixed-size allocations.

Given the one remaining user, geo->keylen cannot even be larger than 1.
Logfs used to have 64bit and 128bit keys, tcm_qla2xxx only has 32bit
keys. But let's not break the code if we don't have to.

Change-Id: I124d7095003cd140c17b18c2038f67de9ffc9328
Signed-off-by: Joern Engel <joern@purestorage.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-02-09 23:14:25 -03:00
Jiong Wang
09c7663480 lib/reciprocal_div: Implement the improved algorithm
The new added "reciprocal_value_adv" implements the advanced version of the
algorithm described in Figure 4.2 of the paper except when
"divisor > (1U << 31)" whose ceil(log2(d)) result will be 32 which then
requires u128 divide on host. The exception case could be easily handled
before calling "reciprocal_value_adv".

The advanced version requires more complex calculation to get the
reciprocal multiplier and other control variables, but then could reduce
the required emulation operations.

It makes no sense to use this advanced version for host divide emulation,
those extra complexities for calculating multiplier etc could completely
waive our saving on emulation operations.

However, it makes sense to use it for JIT divide code generation (for
example eBPF JIT backends) for which we are willing to trade performance of
JITed code with that of host. As shown by the following pseudo code, the
required emulation operations could go down from 6 (the basic version) to 3
or 4.

To use the result of "reciprocal_value_adv", suppose we want to calculate
n/d, the C-style pseudo code will be the following, it could be easily
changed to real code generation for other JIT targets.

  struct reciprocal_value_adv rvalue;
  u8 pre_shift, exp;

  // handle exception case.
  if (d >= (1U << 31)) {
    result = n >= d;
    return;
  }
  rvalue = reciprocal_value_adv(d, 32)
  exp = rvalue.exp;
  if (rvalue.is_wide_m && !(d & 1)) {
    // floor(log2(d & (2^32 -d)))
    pre_shift = fls(d & -d) - 1;
    rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
  } else {
    pre_shift = 0;
  }

  // code generation starts.
  if (imm == 1U << exp) {
    result = n >> exp;
  } else if (rvalue.is_wide_m) {
    // pre_shift must be zero when reached here.
    t = (n * rvalue.m) >> 32;
    result = n - t;
    result >>= 1;
    result += t;
    result >>= rvalue.sh - 1;
  } else {
    if (pre_shift)
      result = n >> pre_shift;
    result = ((u64)result * rvalue.m) >> 32;
    result >>= rvalue.sh;
  }

Change-Id: I54385f0df42aa43355d940d20d6818d2fb3197d9
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-17 01:13:18 -03:00
Florian La Roche
06381819ff lib/int_sqrt: Fix int_sqrt64() for very large numbers
If an input number x for int_sqrt64() has the highest bit set, then
fls64(x) is 64.  (1UL << 64) is an overflow and breaks the algorithm.

Subtracting 1 is a better guess for the initial value of m anyway and
that's what also done in int_sqrt() implicitly [*].

[*] Note how int_sqrt() uses __fls() with two underscores, which already
    returns the proper raw bit number.

    In contrast, int_sqrt64() used fls64(), and that returns bit numbers
    illogically starting at 1, because of error handling for the "no
    bits set" case. Will points out that he bug probably is due to a
    copy-and-paste error from the regular int_sqrt() case.

Change-Id: I5be5be3e03ddbe68cc8025a64698bbb49c57c3a5
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Florian La Roche <Florian.LaRoche@googlemail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-17 01:13:17 -03:00
Crt Mori
f111e963e2 lib/int_sqrt: Add strongly typed 64bit int_sqrt
There is no option to perform 64bit integer sqrt on 32bit platform.
Added stronger typed int_sqrt64 enables the 64bit calculations to
be performed on 32bit platforms. Using same algorithm as int_sqrt()
with strong typing provides enough precision also on 32bit platforms,
but it sacrifices some performance. In case values are smaller than
ULONG_MAX the standard int_sqrt is used for calculation to maximize the
performance due to more native calculations.

Change-Id: I8b22ef3fc9e63ea74fb1df14115fc374170549c3
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Crt Mori <cmo@melexis.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-17 01:13:17 -03:00
Peter Zijlstra
83c2c2b980 lib/int_sqrt: Adjust comments
Our current int_sqrt() is not rough nor any approximation; it calculates
the exact value of: floor(sqrt()).  Document this.

Link: http://lkml.kernel.org/r/20171020164645.001652117@infradead.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Anshul Garg <aksgarg1989@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Michael Davidson <md@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Change-Id: Iea660f36312f879010d16028bc21b6bb50905078
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-17 01:13:17 -03:00
Andy Shevchenko
4d011c9c8c lib/sort: Move swap, cmp and cmp_r function types for wider use
The function types for swap, cmp and cmp_r functions are already
being in use by modules.

Move them to types.h that everybody in kernel will be able to use
generic types instead of custom ones.

This adds more sense to the comment in bsearch() later on.

Link: http://lkml.kernel.org/r/20191007135656.37734-1-andriy.shevchenko@linux.intel.com

Change-Id: I4848ccb09bac73774e2b0071eb767d596e4f6f90
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-17 01:13:17 -03:00
Rasmus Villemoes
829cc32160 lib/sort: Implement sort() variant taking context argument
Our list_sort() utility has always supported a context argument that
is passed through to the comparison routine. Now there's a use case
for the similar thing for sort().

This implements sort_r by simply extending the existing sort function
in the obvious way. To avoid code duplication, we want to implement
sort() in terms of sort_r(). The naive way to do that is

static int cmp_wrapper(const void *a, const void *b, const void *ctx)
{
  int (*real_cmp)(const void*, const void*) = ctx;
  return real_cmp(a, b);
}

sort(..., cmp) { sort_r(..., cmp_wrapper, cmp) }

but this would do two indirect calls for each comparison. Instead, do
as is done for the default swap functions - that only adds a cost of a
single easily predicted branch to each comparison call.

Aside from introducing support for the context argument, this also
serves as preparation for patches that will eliminate the indirect
comparison calls in common cases.

Requested-by: Boris Brezillon <boris.brezillon@collabora.com>

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Change-Id: I3ad240253956f6ec3f41833fc9ddefa5749fbc58
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-17 01:13:17 -03:00
Randy Dunlap
ea7577a782 lib/sort: Fix kernel-doc notation warnings
Fix kernel-doc notation in lib/sort.c by using correct function parameter
names.

  lib/sort.c:59: warning: Excess function parameter 'size' description in 'swap_words_32'
  lib/sort.c:83: warning: Excess function parameter 'size' description in 'swap_words_64'
  lib/sort.c:110: warning: Excess function parameter 'size' description in 'swap_bytes'

Link: http://lkml.kernel.org/r/60e25d3d-68d1-bde2-3b39-e4baa0b14907@infradead.org
Fixes: 37d0ec34d111a ("lib/sort: make swap functions more generic")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: George Spelvin <lkml@sdf.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Change-Id: I40d3917918ee9a73ac983ecaf4d62abcd924a45f
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-17 01:13:17 -03:00
George Spelvin
26e9d7f193 lib/sort: Avoid indirect calls to built-in swap
Similar to what's being done in the net code, this takes advantage of
the fact that most invocations use only a few common swap functions, and
replaces indirect calls to them with (highly predictable) conditional
branches.  (The downside, of course, is that if you *do* use a custom
swap function, there are a few extra predicted branches on the code
path.)

This actually *shrinks* the x86-64 code, because it inlines the various
swap functions inside do_swap, eliding function prologues & epilogues.

x86-64 code size 767 -> 703 bytes (-64)

Link: http://lkml.kernel.org/r/d10c5d4b393a1847f32f5b26f4bbaa2857140e1e.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin <lkml@sdf.org>
Acked-by: Andrey Abramov <st5pub@yandex.ru>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Daniel Wagner <daniel.wagner@siemens.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Don Mullis <don.mullis@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Change-Id: I4f4850f79f2a1596ec4d19780f329cd073c4f11c
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-17 01:13:17 -03:00
George Spelvin
c93eb1b0d2 lib/sort: Use more efficient bottom-up heapsort variant
This uses fewer comparisons than the previous code (approaching half as
many for large random inputs), but produces identical results; it
actually performs the exact same series of swap operations.

Specifically, it reduces the average number of compares from
  2*n*log2(n) - 3*n + o(n)
to
    n*log2(n) + 0.37*n + o(n).

This is still 1.63*n worse than glibc qsort() which manages n*log2(n) -
1.26*n, but at least the leading coefficient is correct.

Standard heapsort, when sifting down, performs two comparisons per
level: one to find the greater child, and a second to see if the current
node should be exchanged with that child.

Bottom-up heapsort observes that it's better to postpone the second
comparison and search for the leaf where -infinity would be sent to,
then search back *up* for the current node's destination.

Since sifting down usually proceeds to the leaf level (that's where half
the nodes are), this does O(1) second comparisons rather than log2(n).
That saves a lot of (expensive since Spectre) indirect function calls.

The one time it's worse than the previous code is if there are large
numbers of duplicate keys, when the top-down algorithm is O(n) and
bottom-up is O(n log n).  For distinct keys, it's provably always
better, doing 1.5*n*log2(n) + O(n) in the worst case.

(The code is not significantly more complex.  This patch also merges the
heap-building and -extracting sift-down loops, resulting in a net code
size savings.)

x86-64 code size 885 -> 767 bytes (-118)

(I see the checkpatch complaint about "else if (n -= size)".  The
alternative is significantly uglier.)

Link: http://lkml.kernel.org/r/2de8348635a1a421a72620677898c7fd5bd4b19d.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin <lkml@sdf.org>
Acked-by: Andrey Abramov <st5pub@yandex.ru>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Daniel Wagner <daniel.wagner@siemens.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Don Mullis <don.mullis@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Change-Id: I370b088649c56ae9a0d8040c30ed5e13b847cc7c
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-17 01:13:17 -03:00
George Spelvin
f00f930f82 lib/sort: Make swap functions more generic
Patch series "lib/sort & lib/list_sort: faster and smaller", v2.

Because CONFIG_RETPOLINE has made indirect calls much more expensive, I
thought I'd try to reduce the number made by the library sort functions.

The first three patches apply to lib/sort.c.

Patch #1 is a simple optimization.  The built-in swap has special cases
for aligned 4- and 8-byte objects.  But those are almost never used;
most calls to sort() work on larger structures, which fall back to the
byte-at-a-time loop.  This generalizes them to aligned *multiples* of 4
and 8 bytes.  (If nothing else, it saves an awful lot of energy by not
thrashing the store buffers as much.)

Patch #2 grabs a juicy piece of low-hanging fruit.  I agree that nice
simple solid heapsort is preferable to more complex algorithms (sorry,
Andrey), but it's possible to implement heapsort with far fewer
comparisons (50% asymptotically, 25-40% reduction for realistic sizes)
than the way it's been done up to now.  And with some care, the code
ends up smaller, as well.  This is the "big win" patch.

Patch #3 adds the same sort of indirect call bypass that has been added
to the net code of late.  The great majority of the callers use the
builtin swap functions, so replace the indirect call to sort_func with a
(highly preditable) series of if() statements.  Rather surprisingly,
this decreased code size, as the swap functions were inlined and their
prologue & epilogue code eliminated.

lib/list_sort.c is a bit trickier, as merge sort is already close to
optimal, and we don't want to introduce triumphs of theory over
practicality like the Ford-Johnson merge-insertion sort.

Patch #4, without changing the algorithm, chops 32% off the code size
and removes the part[MAX_LIST_LENGTH+1] pointer array (and the
corresponding upper limit on efficiently sortable input size).

Patch #5 improves the algorithm.  The previous code is already optimal
for power-of-two (or slightly smaller) size inputs, but when the input
size is just over a power of 2, there's a very unbalanced final merge.

There are, in the literature, several algorithms which solve this, but
they all depend on the "breadth-first" merge order which was replaced by
commit 835cc0c8477f with a more cache-friendly "depth-first" order.
Some hard thinking came up with a depth-first algorithm which defers
merges as little as possible while avoiding bad merges.  This saves
0.2*n compares, averaged over all sizes.

The code size increase is minimal (64 bytes on x86-64, reducing the net
savings to 26%), but the comments expanded significantly to document the
clever algorithm.

TESTING NOTES: I have some ugly user-space benchmarking code which I
used for testing before moving this code into the kernel.  Shout if you
want a copy.

I'm running this code right now, with CONFIG_TEST_SORT and
CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since the last
round of minor edits to quell checkpatch.  I figure there will be at
least one round of comments and final testing.

This patch (of 5):

Rather than having special-case swap functions for 4- and 8-byte
objects, special-case aligned multiples of 4 or 8 bytes.  This speeds up
most users of sort() by avoiding fallback to the byte copy loop.

Despite what ca96ab859ab4 ("lib/sort: Add 64 bit swap function") claims,
very few users of sort() sort pointers (or pointer-sized objects); most
sort structures containing at least two words.  (E.g.
drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte struct
acpi_fan_fps.)

The functions also got renamed to reflect the fact that they support
multiple words.  In the great tradition of bikeshedding, the names were
by far the most contentious issue during review of this patch series.

x86-64 code size 872 -> 886 bytes (+14)

With feedback from Andy Shevchenko, Rasmus Villemoes and Geert
Uytterhoeven.

Link: http://lkml.kernel.org/r/f24f932df3a7fa1973c1084154f1cea596bcf341.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin <lkml@sdf.org>
Acked-by: Andrey Abramov <st5pub@yandex.ru>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Daniel Wagner <daniel.wagner@siemens.com>
Cc: Don Mullis <don.mullis@gmail.com>
Cc: Dave Chinner <dchinner@redhat.com>
Change-Id: I9f21e6eb4bcacf83d40cef3637a492b19db501fd
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-17 01:13:17 -03:00
Tashfin Shakeer Rhythm
a9566ccc56 msm-4.14: Make macros no-op using ((void)0)
Do not solely rely on compiler optimizations to get the workaround
of having macros do nothing using an empty do-while loop. It's
inefficient.

Use ((void)0) to which the standard assert macro expands when NDEBUG
is defined.

No functional change intended.

[mcdofrenchfreis]:
Implement this patch to tree using the command:
git grep -l "do {} while (0)" | xargs sed -i "s/do {} while (0)/((void)0)/g"

Change-Id: I9615c62c46670e31ed8d0d89d195144541baa3e6
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: mcdofrenchfreis <xyzevan@androidist.net>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-17 00:51:10 -03:00
Kevin Bracey
61832f5ca4 lib/crc32: Make crc32_be weak for arch override
crc32_le and __crc32c_le can be overridden - extend this to crc32_be.

Change-Id: Ia51b0f97201903ba27bd22a345b781af13ef4a53
Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-11-26 01:06:22 -03:00
Kevin Bracey
c31cfade77 lib/crc32: Remove unneeded casts
Casts were added in commit 8f243af42ade ("sections: fix const sections
for crc32 table") to cope with the tables not being const. They are no
longer required since commit f5e38b9284e1 ("lib: crc32: constify crc32
lookup table").

Change-Id: If601e66d4f0a753fba9c72698ba83c645409b634
Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-11-26 01:05:56 -03:00
Krzysztof Kozlowski
a986e0812e lib/crc32: Use consistent naming for CRC-32 polynomials
Header was defining CRCPOLY_LE/BE and CRC32C_POLY_LE but in fact all of
them are CRC-32 polynomials so use consistent naming.

Change-Id: I21da6af43ebc69dcae24f2b97652728f739a2876
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-11-26 01:05:48 -03:00
Krzysztof Kozlowski
41588c2407 lib/crc32: Move polynomial definition to separate header
Allow other drivers and parts of kernel to use the same define for
CRC32 polynomial, instead of duplicating it in many places.  This code
does not bring any functional changes, except moving existing code.

Change-Id: Ibe919da197be32e1298d7b64c1602810bd3c0bb3
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-11-26 01:05:39 -03:00
darkhz
129663f2b0 msm-4.14: Disable some force-enabled debug options
This caused so much performance regressions in hackbench.

Change-Id: Ib72d4f4aca54ee00799809d4eb2fcb6cdb1f4971
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-11-26 00:37:39 -03:00
Juhyung Park
a74d78d29f xxhash: Inline round() functions
xxhash's performance depends heavily on compiler optimizations including
inlines. Follow upstream's behavior and inline those helper functions.

Change-Id: I1bc08b7ef6a491817b9ed5e8daab0f1993081f71
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-11-20 02:27:39 -03:00
Juhyung Park
8a0a4124ea xxhash: Replace copy_state() wrappers with macros
These simply wraps memcpy().
Replace it with macros so that it is naturally inlined.

Change-Id: I32df8e35dd99611ab0cbd472146b0ef3ecb847d3
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-11-20 02:27:36 -03:00
Richard Raya
2cd059fb56 This is the 4.14.354 OpenELA-Extended LTS stable release
-----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEERFwmR4yFob14UDOYC8702P6YulgFAmcgko0ZHHZlZ2FyZC5u
 b3NzdW1Ab3JhY2xlLmNvbQAKCRALzvTY/pi6WL/GD/0em+uP/O8QiPYqeGrEECpW
 bgRsBiN3XnyEsghAjplWX12G/zjxA0PY0u2zh9K9sdPw60n8nVZ1OxvPHINwuSC9
 kE9N60SCpJ88ju9OtU+4xz/nxtEmlel8fWy5elagB5wqbWbvsjT52ceZXqSxqhy7
 pQdIDHSiUUwx9JL6vDuJSL+Z/Y216qvBETZLnDSo90raFp/MDa5JmQsh81lLeUt8
 wGKwC/Olnbd21QTStNK34aQGyX5b+3YeACFVPud66Zs9airz9EE6Yq78gwL29L2k
 4jxzihXxSkkfa66eR63ap53+/mEqOZX72m2qEMVOvAcAwU0XsNDTdkXN7z8YQ5T3
 E1rJwr4Ox0hmM+hHBA20w9xRDXZoZmdrcjsU1aNKuK2zTJ0h9DBIvMM2XY5n5sWK
 I4F8E15KyKmu4nXBETreXZixqVLZMgjNFncRLf8XBIL1kxXm65LYCHypp3AgdVgo
 Ccdq5PbC6LAyNPrIOaftIaS9VlU15cqcalu7A+gSoWq55LGWAa3G9vX0ZtYQB9QX
 0R18fbzyjqG6Wa5J5KRDJ+HyS4IvdnEWS8hMR3jfosjMNgJhfDlDeev8NARBiDpX
 d26xogNA7xOOvtdpuwEbnxD5kR0zUdnC73pC4wxdMptYSK6ULKNPmTkA0dKE9qvl
 TDgw4DML8vXQqJ4P+w3Njw==
 =gX2R
 -----END PGP SIGNATURE-----

Merge tag 'v4.14.354-openela' of https://github.com/openela/kernel-lts

This is the 4.14.354 OpenELA-Extended LTS stable release

* tag 'v4.14.354-openela' of https://github.com/openela/kernel-lts: (90 commits)
  LTS: Update to 4.14.354
  drm/fb-helper: set x/yres_virtual in drm_fb_helper_check_var
  ipc: remove memcg accounting for sops objects in do_semtimedop()
  scsi: aacraid: Fix double-free on probe failure
  usb: core: sysfs: Unmerge @usb3_hardware_lpm_attr_group in remove_power_attributes()
  usb: dwc3: st: fix probed platform device ref count on probe error path
  usb: dwc3: core: Prevent USB core invalid event buffer address access
  usb: dwc3: omap: add missing depopulate in probe error path
  USB: serial: option: add MeiG Smart SRM825L
  cdc-acm: Add DISABLE_ECHO quirk for GE HealthCare UI Controller
  net: busy-poll: use ktime_get_ns() instead of local_clock()
  gtp: fix a potential NULL pointer dereference
  net: prevent mss overflow in skb_segment()
  ida: Fix crash in ida_free when the bitmap is empty
  net:rds: Fix possible deadlock in rds_message_put
  fbmem: Check virtual screen sizes in fb_set_var()
  fbcon: Prevent that screen size is smaller than font size
  printk: Export is_console_locked
  memcg: enable accounting of ipc resources
  cgroup/cpuset: Prevent UAF in proc_cpuset_show()
  ...

Change-Id: I7da4d8d188dec9d2833216e5d6580dbd72b99240
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-10-29 20:17:04 -03:00
Matthew Wilcox (Oracle)
2cd2e32fc4 ida: Fix crash in ida_free when the bitmap is empty
commit af73483f4e8b6f5c68c9aa63257bdd929a9c194a upstream.

The IDA usually detects double-frees, but that detection failed to
consider the case when there are no nearby IDs allocated and so we have a
NULL bitmap rather than simply having a clear bit.  Add some tests to the
test-suite to be sure we don't inadvertently reintroduce this problem.
Unfortunately they're quite noisy so include a message to disregard
the warnings.

Reported-by: Zhenghan Wang <wzhmmmmm@gmail.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hugo SIMELIERE <hsimeliere.opensource@witekio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 89db5346acb5a15e670c4fb3b8f3c30fa30ebc15)
[Vegard: remove changes to lib/test_ida.c which does not exist in 4.14.]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-24 10:07:41 +00:00
Richard Raya
1b9f971175 This is the 4.14.353 OpenELA-Extended LTS stable release
-----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEERFwmR4yFob14UDOYC8702P6YulgFAmcHrG8ZHHZlZ2FyZC5u
 b3NzdW1Ab3JhY2xlLmNvbQAKCRALzvTY/pi6WIQuEACCYf9xCGBALlKFb0pXX3eF
 oiRkceNyy5NWSndD7t9p/3d2g4YrVptGxtTZN12IltfG4wfCQ+qC/0g2Mu4ho0Yp
 2ExKVaIli1t2csIjXCUUyjh3jU0JOkDwJap9n5QemACsX8zrDfKVwdlj9hw+e7vi
 fBWwdfl1duK5cfVbbyvL74It4WeMnjuAYrBnMTxhYBTq56xFLrbBILl8BLxAV5NN
 5wGoNCeUtj8LxUrL2qs5QoT3Bf7uoDlLnu1Ly7jDMMX34/oNh5huOjZdDFbQYxS3
 DsEe6ljOYOyB/awdUhScERfxVPimumN3nHWnRJbsQhX36uXT6U7HNJah4zauchRk
 UlKUSfG3YyOqKIwFH+8oGmkuCm6wZbVjVsNNkYhT804BCCHrasJ1SHXsSB9R0MpU
 x3IQOoiuc33bUYrSqWAO7utvt+PwG++3GHz0XQwPfZn4DHY18/e+VNsGtQTPqzRG
 tsywZVTN0DC0nO7L772nkQDb7z2mhmJGgN8q3FPbMTfp/I1phIh9C17pckfpHKAl
 ippTmTMaIYDU3Rlc1g/cu363GOaXWRN4t03VSEu/BLV0IElRktUnmuBU3B/rMb+F
 ItaBmhnZGXHUrulMTxDtzItrYMwx00USw6IrG3iYjob0MhhxhLVxEh0vKc7Te2w5
 2FZEjj2BxinK66mJgAolZw==
 =BQd/
 -----END PGP SIGNATURE-----

Merge tag 'v4.14.353-openela' of https://github.com/openela/kernel-lts

This is the 4.14.353 OpenELA-Extended LTS stable release

* tag 'v4.14.353-openela' of https://github.com/openela/kernel-lts: (173 commits)
  LTS: Update to 4.14.353
  net: fix __dst_negative_advice() race
  selftests: make order checking verbose in msg_zerocopy selftest
  selftests: fix OOM in msg_zerocopy selftest
  Revert "selftests/net: reap zerocopy completions passed up as ancillary data."
  Revert "selftests: fix OOM in msg_zerocopy selftest"
  Revert "selftests: make order checking verbose in msg_zerocopy selftest"
  nvme/pci: Add APST quirk for Lenovo N60z laptop
  exec: Fix ToCToU between perm check and set-uid/gid usage
  drm/i915/gem: Fix Virtual Memory mapping boundaries calculation
  drm/i915: Try GGTT mmapping whole object as partial
  netfilter: nf_tables: set element extended ACK reporting support
  kbuild: Fix '-S -c' in x86 stack protector scripts
  drm/mgag200: Set DDC timeout in milliseconds
  drm/bridge: analogix_dp: properly handle zero sized AUX transactions
  drm/bridge: analogix_dp: Properly log AUX CH errors
  drm/bridge: analogix_dp: Reset aux channel if an error occurred
  drm/bridge: analogix_dp: Check AUX_EN status when doing AUX transfer
  x86/mtrr: Check if fixed MTRRs exist before saving them
  tracing: Fix overflow in get_free_elt()
  ...

Change-Id: I0e92a979e31d4fa6c526c6b70a1b61711d9747bb
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-10-11 00:50:20 -03:00
Ross Lagerwall
a01900bb7d decompress_bunzip2: fix rare decompression failure
commit bf6acd5d16057d7accbbb1bf7dc6d8c56eeb4ecc upstream.

The decompression code parses a huffman tree and counts the number of
symbols for a given bit length.  In rare cases, there may be >= 256
symbols with a given bit length, causing the unsigned char to overflow.
This causes a decompression failure later when the code tries and fails to
find the bit length for a given symbol.

Since the maximum number of symbols is 258, use unsigned short instead.

Link: https://lkml.kernel.org/r/20240717162016.1514077-1-ross.lagerwall@citrix.com
Fixes: bc22c17e12c1 ("bzip2/lzma: library support for gzip, bzip2 and lzma decompression")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 16b92b031b4da174342bd909130731c55f20c7ea)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-10 10:27:24 +00:00
Richard Raya
be1ff8e638 msm-4.14: Revert some unsafe optimizations
Change-Id: I2c268f87ab8d9154758384c7a7639046c3784eb8
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-09-29 17:32:34 -03:00
Sultan Alsawaf
15c8a438c6 kobject_uevent: Allocate environment buffer on the stack
The environment buffer isn't very big; when it's allocated on the stack,
kobject_uevent_env's stack frame size increases to just over 2 KiB,
which is safe considering that we have a 16 KiB stack.

Allocate the environment buffer on the stack instead of using the slab
allocator in order to improve performance.

Change-Id: I175eda1be09007436f77dd6f185931e6a1d9facb
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-08-27 11:40:01 -03:00
Tashfin Shakeer Rhythm
7bd5b033d5 lz4: Use ARM64 v8 ASM to accelerate decompression
Change-Id: I2c14ffc6ee9dcc0f2170f9f90169a08505dc43ed
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-08-18 03:56:56 -03:00
Dark-Matter7232
539a2175c0 lz4: Add ARM64 acceleration and enhance decompression
Extracted from Huawei's kernel source drop, this patch adds ARM64 optimizations for LZ4 decompression:

- Adds ARM64 acceleration support.
- Introduces new ARM64-specific files and updates the Makefile.
- Enhances __LZ4_decompress_generic for partial decompression.
- Adds bounds checks for safer decompression.

Originally intended as optimizations for Huawei's EROFS driver.

Change-Id: I438b79e43bbacd08a047d037a1eeac8ff89b4aff
Signed-off-by: Dark-Matter7232 <me@const.eu.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-08-18 03:56:56 -03:00
Tashfin Shakeer Rhythm
42a7777993 lz4: Eliminate unused functions
This fixes the following warnings by Clang:

../lib/lz4/lz4_decompress.c:904:12: warning: unused function 'LZ4_decompress_fast' [-Wunused-function]
static int LZ4_decompress_fast(const char *source, char *dest, int originalSize)
           ^
../lib/lz4/lz4_decompress.c:941:12: warning: unused function 'LZ4_decompress_fast_extDict' [-Wunused-function]
static int LZ4_decompress_fast_extDict(const char *source, char *dest,
           ^
../lib/lz4/lz4_decompress.c:1052:12: warning: unused function 'LZ4_decompress_fast_continue' [-Wunused-function]
static int LZ4_decompress_fast_continue(LZ4_streamDecode_t *LZ4_streamDecode,
           ^
../lib/lz4/lz4_decompress.c:1099:12: warning: unused function 'LZ4_decompress_safe_usingDict' [-Wunused-function]
static int LZ4_decompress_safe_usingDict(const char *source, char *dest,
           ^
../lib/lz4/lz4_decompress.c:1118:12: warning: unused function 'LZ4_decompress_fast_usingDict' [-Wunused-function]
static int LZ4_decompress_fast_usingDict(const char *source, char *dest,
           ^

Change-Id: Ieea187c98a10ee55da02f67615f25b6cd0ed13ba
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-08-18 03:55:22 -03:00
Tashfin Shakeer Rhythm
cf980ad5d1 lz4: Staticize some functions
This fixes the following warnings by sparse:

../lib/lz4/lz4_compress.c:838:5: warning: symbol 'LZ4_compress_fast_extState' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:141:8: warning: symbol 'read_long_length_no_check' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:904:5: warning: symbol 'LZ4_decompress_fast' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:1052:5: warning: symbol 'LZ4_decompress_fast_continue' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:1099:5: warning: symbol 'LZ4_decompress_safe_usingDict' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:1118:5: warning: symbol 'LZ4_decompress_fast_usingDict' was not declared. Should it be static?

Since some of the functions have been marked as static now, there is no
need to export them. Remove the redundant export symbols as well.

Change-Id: Idf1e7a83ceba9f582e22df51ebe687585334abec
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-08-18 03:55:22 -03:00
Arvind Sankar
7afd08666e lib/string: Use freestanding environment
gcc can transform the loop in a naive implementation of memset/memcpy
etc into a call to the function itself.  This optimization is enabled by
-ftree-loop-distribute-patterns.

This has been the case for a while, but gcc-10.x enables this option at
-O2 rather than -O3 as in previous versions.

Add -ffreestanding, which implicitly disables this optimization with
gcc.  It is unclear whether clang performs such optimizations, but
hopefully it will also not do so in a freestanding environment.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
Change-Id: I9dc26de8617d7417de2a60c40fcc05687abb3f68
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-06-04 21:03:24 -03:00
Miguel Ojeda
cc3eba949c lib/crc32: Mark crc32_le_base/__crc32c_le_base aliases as __pure
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target.

In particular, it triggers here because crc32_le_base/__crc32c_le_base
aren't __pure while their target crc32_le/__crc32c_le are.

These aliases are used by architectures as a fallback in accelerated
versions of CRC32. See commit 9784d82db3eb ("lib/crc32: make core crc32()
routines weak so they can be overridden").

Therefore, being fallbacks, it is likely that even if the aliases
were called from C, there wouldn't be any optimizations possible.
Currently, the only user is arm64, which calls this from asm.

Still, marking the aliases as __pure makes sense and is a good idea
for documentation purposes and possible future optimizations,
which also silences the warning.

Change-Id: I587626c4a9440aa11d45ab18fd025fd2e18e16fd
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-06-04 21:03:24 -03:00
Ard Biesheuvel
8bdf1f09a3 lib/crc32: Make core crc32() routines weak so they can be overridden
Allow architectures to drop in accelerated CRC32 routines by making
the crc32_le/__crc32c_le entry points weak, and exposing non-weak
aliases for them that may be used by the accelerated versions as
fallbacks in case the instructions they rely upon are not available.

Change-Id: Ifd5e08ad5f0ebf4792986045d429704223c227c1
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-06-04 21:03:24 -03:00
Andrzej Perczak
bb7d7f8c81 lz4: Update to version 1.9.4
Change-Id: I383e5f39ef6deccfe6f4126b00d29da3e2bdac3d
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-06-04 21:03:24 -03:00
Andrzej Perczak
4dcb0202e2 lz4: Update LZ4 module to v1.9.3+
Update lz4 module using official repository from revision [1].

Keep in mind lz4hc wasn't updated thus it is not used. It may not
compile anymore.

[1]: 4ebe313e00

Change-Id: Ica61c484bcc9a91f65752bd62f2bf3bb90ed1499
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-06-04 21:03:24 -03:00
John Galt
923182605a lz4: Slight reset
Drop backported single patches in favor of full update to latest v1.9.4

Change-Id: Ib11b79efac81aede4645bbb317dc25b7878db5d6
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-06-04 21:03:24 -03:00
Richard Raya
3143685e95 Merge branch 'linux-4.14.y' of https://github.com/openela/kernel-lts
* 'linux-4.14.y' of https://github.com/openela/kernel-lts: (278 commits)
  LTS: Update to 4.14.348
  docs: kernel_include.py: Cope with docutils 0.21
  serial: kgdboc: Fix NMI-safety problems from keyboard reset code
  btrfs: add missing mutex_unlock in btrfs_relocate_sys_chunks()
  dm: limit the number of targets and parameter size area
  Revert "selftests: mm: fix map_hugetlb failure on 64K page size systems"
  LTS: Update to 4.14.347
  rds: Fix build regression.
  RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats
  af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
  net: fix out-of-bounds access in ops_init
  drm/vmwgfx: Fix invalid reads in fence signaled events
  dyndbg: fix old BUG_ON in >control parser
  tipc: fix UAF in error path
  usb: gadget: f_fs: Fix a race condition when processing setup packets.
  usb: gadget: composite: fix OS descriptors w_value logic
  firewire: nosy: ensure user_length is taken into account when fetching packet contents
  af_unix: Fix garbage collector racing against connect()
  af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
  ipv6: fib6_rules: avoid possible NULL dereference in fib6_rule_action()
  ...

Change-Id: If329d39dd4e95e14045bb7c58494c197d1352d60
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-06-04 16:33:29 -03:00
Jim Cromie
8aa195d14b dyndbg: fix old BUG_ON in >control parser
commit 00e7d3bea2ce7dac7bee1cf501fb071fd0ea8f6c upstream.

Fix a BUG_ON from 2009.  Even if it looks "unreachable" (I didn't
really look), lets make sure by removing it, doing pr_err and return
-EINVAL instead.

Cc: stable <stable@kernel.org>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20240429193145.66543-2-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3c718bddddca9cbef177ac475b94c5c91147fb38)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-06-03 12:47:30 +00:00
Andrey Ryabinin
9485c0dda5 stackdepot: respect __GFP_NOLOCKDEP allocation flag
commit 6fe60465e1d53ea321ee909be26d97529e8f746c upstream.

If stack_depot_save_flags() allocates memory it always drops
__GFP_NOLOCKDEP flag.  So when KASAN tries to track __GFP_NOLOCKDEP
allocation we may end up with lockdep splat like bellow:

======================================================
 WARNING: possible circular locking dependency detected
 6.9.0-rc3+ #49 Not tainted
 ------------------------------------------------------
 kswapd0/149 is trying to acquire lock:
 ffff88811346a920
(&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590
[xfs]

 but task is already holding lock:
 ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0x5d9/0xad0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:
 -> #1 (fs_reclaim){+.+.}-{0:0}:
        __lock_acquire+0x7da/0x1030
        lock_acquire+0x15d/0x400
        fs_reclaim_acquire+0xb5/0x100
 prepare_alloc_pages.constprop.0+0xc5/0x230
        __alloc_pages+0x12a/0x3f0
        alloc_pages_mpol+0x175/0x340
        stack_depot_save_flags+0x4c5/0x510
        kasan_save_stack+0x30/0x40
        kasan_save_track+0x10/0x30
        __kasan_slab_alloc+0x83/0x90
        kmem_cache_alloc+0x15e/0x4a0
        __alloc_object+0x35/0x370
        __create_object+0x22/0x90
 __kmalloc_node_track_caller+0x477/0x5b0
        krealloc+0x5f/0x110
        xfs_iext_insert_raw+0x4b2/0x6e0 [xfs]
        xfs_iext_insert+0x2e/0x130 [xfs]
        xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs]
        xfs_btree_visit_block+0xfb/0x290 [xfs]
        xfs_btree_visit_blocks+0x215/0x2c0 [xfs]
        xfs_iread_extents+0x1a2/0x2e0 [xfs]
 xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs]
        iomap_iter+0x1d1/0x2d0
 iomap_file_buffered_write+0x120/0x1a0
        xfs_file_buffered_write+0x128/0x4b0 [xfs]
        vfs_write+0x675/0x890
        ksys_write+0xc3/0x160
        do_syscall_64+0x94/0x170
 entry_SYSCALL_64_after_hwframe+0x71/0x79

Always preserve __GFP_NOLOCKDEP to fix this.

Link: https://lkml.kernel.org/r/20240418141133.22950-1-ryabinin.a.a@gmail.com
Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reported-by: Xiubo Li <xiubli@redhat.com>
Closes: https://lore.kernel.org/all/a0caa289-ca02-48eb-9bf2-d86fd47b71f4@redhat.com/
Reported-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Closes: https://lore.kernel.org/all/f9ff999a-e170-b66b-7caf-293f2b147ac2@opensource.wdc.com/
Suggested-by: Dave Chinner <david@fromorbit.com>
Tested-by: Xiubo Li <xiubli@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 79b25b1a58d0a6b53dfd685bca8a1984c86710dd)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-05-31 12:57:28 +00:00
Linus Torvalds
73cd6c76f5 string: Improve default out-of-line memcmp() implementation
This just does the "if the architecture does efficient unaligned
handling, start the memcmp using 'unsigned long' accesses", since
Nikolay Borisov found a load that cares.

This is basically the minimal patch, and limited to architectures that
are known to not have slow unaligned handling.  We've had the stupid
byte-at-a-time version forever, and nobody has ever even noticed before,
so let's keep the fix minimal.

A potential further improvement would be to align one of the sources in
order to at least minimize unaligned cases, but the only real case of
bigger memcmp() users seems to be the FIDEDUPERANGE ioctl().  As David
Sterba says, the dedupe ioctl is typically called on ranges spanning
many pages so the common case will all be page-aligned anyway.

All the relevant architectures select HAVE_EFFICIENT_UNALIGNED_ACCESS,
so I'm not going to worry about the combination of a very rare use-case
and a rare architecture until somebody actually hits it.  Particularly
since Nikolay also tested the more complex patch with extra alignment
handling code, and it only added overhead.

Link: https://lore.kernel.org/lkml/20210721135926.602840-1-nborisov@suse.com/
Change-Id: I31524a62feb09d31028e870cac46598d0fbfb9cc
Reported-by: Nikolay Borisov <nborisov@suse.com>
Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-22 20:43:53 -03:00
Sultan Alsawaf
0fa3521bf8 cpumask: Add optimized helpers when NR_CPUS fits in a long
When NR_CPUS fits in a long, it's possible to use compiler built-ins to
produce much faster code when operating on cpumasks compared to just using
the generic bitops APIs.

Therefore, add optimized helpers using compiler built-ins when NR_CPUS fits
in a long. This also turns nr_cpu_ids into a compile-time constant for
further optimization potential.

Note that compared to the upstream cpumask rewrite with this feature, these
optimized helpers perfectly preserve the semantics of the helpers they
replace. And this change is much smaller than the upstream version.

Change-Id: I1ac6058a19bd3b22a491176eef9d661cca78e521
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-21 20:44:45 -03:00
John Galt
c5b6927a6d lib: zstd: Update to v1.5.5
Change-Id: Ib8c142f4aa1dbe169d36b0e826f5da66bc334a47
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Cyber Knight
8a2ba0d644 lib: zstd: Update to v1.5.4
Syncs latest upstream ZSTD from [1].

This update retains the following commits:
edc41e9a5d {"lib: zstd: Fix attribute declaration"}
31ef7d2d75 {"lib: zstd: include a missing header"}
4927d31bfc {"lib: zstd: define UINTPTR_MAX"}

[1]: https://github.com/facebook/zstd/commits/v1.5.4

Change-Id: I03bf09ce96b0398043cfec5506cb036e13906d58
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Cyber Knight
56e1cb9ea2 lib: zstd: Introduce CONFIG_ZSTD_COMMON
This accomodates the common drivers between the ZSTD Compressor and the ZSTD Decompressor.

Change-Id: I2c498cbab6bae106923138750ca695a663b9e1c5
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
John Galt
e9c33dabd0 msm-4.14: Selectively extend over inline optimization
Change-Id: I770a6de39a25b71cb9609343b93e8b26cf056017
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Guo Xuenan
23e0cffb68 lz4: Fix LZ4_decompress_safe_partial read out of bound
commit eafc0a02391b7b36617b36c97c4b5d6832cf5e24 upstream.

When partialDecoding, it is EOF if we've either filled the output buffer
or can't proceed with reading an offset for following match.

In some extreme corner cases when compressed data is suitably corrupted,
UAF will occur.  As reported by KASAN [1], LZ4_decompress_safe_partial
may lead to read out of bound problem during decoding.  lz4 upstream has
fixed it [2] and this issue has been disscussed here [3] before.

current decompression routine was ported from lz4 v1.8.3, bumping
lib/lz4 to v1.9.+ is certainly a huge work to be done later, so, we'd
better fix it first.

[1] https://lore.kernel.org/all/000000000000830d1205cf7f0477@google.com/
[2] c5d6f8a8be#
[3] https://lore.kernel.org/all/CC666AE8-4CA4-4951-B6FB-A2EFDE3AC03B@fb.com/

Link: https://lkml.kernel.org/r/20211111105048.2006070-1-guoxuenan@huawei.com
Change-Id: Idaa5c404d9084f64a4be50ef3d4615b962dcd3b4
Reported-by: syzbot+63d688f1d899c588fb71@syzkaller.appspotmail.com
Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Yann Collet <cyan@fb.com>
Cc: Chengyang Fan <cy.fan@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:32 -03:00
Richard Raya
669eb74484 Merge branch 'deprecated/android-4.14-stable' of https://android.googlesource.com/kernel/common into HEAD
* 'deprecated/android-4.14-stable' of https://android.googlesource.com/kernel/common: (101 commits)
  Linux 4.14.336
  mmc: core: Cancel delayed work before releasing host
  mmc: rpmb: fixes pause retune on all RPMB partitions.
  firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
  i40e: fix use-after-free in i40e_aqc_add_filters()
  net: bcmgenet: Fix FCS generation for fragmented skbuffs
  net: sched: em_text: fix possible memory leak in em_text_destroy()
  nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local
  UPSTREAM: drm: Fix doc warning in drm_connector_attach_edid_property()
  BACKPORT: lib/vsprintf: Hash legacy clock addresses
  UPSTREAM: xfrm: fix gro_cells leak when remove virtual xfrm interfaces
  UPSTREAM: xfrm: Make function xfrmi_get_link_net() static
  UPSTREAM: cpuidle: menu: Retain tick when shallow state is selected
  UPSTREAM: bpf: fix rcu annotations in compute_effective_progs()
  UPSTREAM: bpf: bpf_prog_array_alloc() should return a generic non-rcu pointer
  UPSTREAM: sched/util_est: Fix util_est_dequeue() for throttled cfs_rq
  UPSTREAM: softirq: Reorder trace_softirqs_on to prevent lockdep splat
  UPSTREAM: l2tp: fix refcount leakage on PPPoL2TP sockets
  UPSTREAM: HID: steam: select CONFIG_POWER_SUPPLY
  BACKPORT: mac80211_hwsim: fix a possible memory leak in hwsim_new_radio_nl()
  ...

Change-Id: I1c98fbb0918986a06bee16b0c11fe8bee003fd3f
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-03-25 23:54:05 -03:00
Geert Uytterhoeven
b88d825136 BACKPORT: lib/vsprintf: Hash legacy clock addresses
On platforms using the Common Clock Framework, "%pC" prints the clock's
name. On legacy platforms, it prints the unhashed clock's address,
potentially leaking sensitive information regarding the kernel layout in
memory.

Avoid this leak by printing the hashed address instead.  To distinguish
between clocks, a 32-bit unique identifier is as good as an actual
pointer value.

Bug: 254441685
Fixes: ad67b74d2469d9b8 ("printk: hash addresses printed with %p")
Link: http://lkml.kernel.org/r/20181011084249.4520-3-geert+renesas@glider.be
To: "Tobin C . Harding" <me@tobin.cc>
To: Andrew Morton <akpm@linux-foundation.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
(cherry picked from commit ec12bc2909f9759747ab5ad3709472353c43a750)
[Lee: Fixed a trivial conflict pertaining to original diff]
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I8286a3b34ebd66ddef861dd969c68283ef569cf5
2024-01-10 13:03:02 +00:00
Kees Cook
a71b40b5d7 UPSTREAM: bug: use %pB in BUG and stack protector failure
The BUG and stack protector reports were still using a raw %p.  This
changes it to %pB for more meaningful output.

Bug: 254441685
Link: http://lkml.kernel.org/r/20180301225704.GA34198@beast
Fixes: ad67b74d2469 ("printk: hash addresses printed with %p")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Richard Weinberger <richard.weinberger@gmail.com>,
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 0862ca422b79cb5aa70823ee0f07f6b468f86070)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I5ea62cb4f1e800fe0c71694a3af4f7f55f11de55
2024-01-10 12:49:04 +00:00