Do not solely rely on compiler optimizations to get the workaround
of having macros do nothing using an empty do-while loop. It's
inefficient.
Use ((void)0) to which the standard assert macro expands when NDEBUG
is defined.
No functional change intended.
[mcdofrenchfreis]:
Implement this patch to tree using the command:
git grep -l "do {} while (0)" | xargs sed -i "s/do {} while (0)/((void)0)/g"
Change-Id: I9615c62c46670e31ed8d0d89d195144541baa3e6
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: mcdofrenchfreis <xyzevan@androidist.net>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit ffedeeb780dc554eff3d3b16e6a462a26a41d7ec upstream.
Introduce new C macros for annotations of functions and data in
assembly. There is a long-standing mess in macros like ENTRY, END,
ENDPROC and similar. They are used in different manners and sometimes
incorrectly.
So introduce macros with clear use to annotate assembly as follows:
a) Support macros for the ones below
SYM_T_FUNC -- type used by assembler to mark functions
SYM_T_OBJECT -- type used by assembler to mark data
SYM_T_NONE -- type used by assembler to mark entries of unknown type
They are defined as STT_FUNC, STT_OBJECT, and STT_NOTYPE
respectively. According to the gas manual, this is the most portable
way. I am not sure about other assemblers, so this can be switched
back to %function and %object if this turns into a problem.
Architectures can also override them by something like ", @function"
if they need.
SYM_A_ALIGN, SYM_A_NONE -- align the symbol?
SYM_L_GLOBAL, SYM_L_WEAK, SYM_L_LOCAL -- linkage of symbols
b) Mostly internal annotations, used by the ones below
SYM_ENTRY -- use only if you have to (for non-paired symbols)
SYM_START -- use only if you have to (for paired symbols)
SYM_END -- use only if you have to (for paired symbols)
c) Annotations for code
SYM_INNER_LABEL_ALIGN -- only for labels in the middle of code
SYM_INNER_LABEL -- only for labels in the middle of code
SYM_FUNC_START_LOCAL_ALIAS -- use where there are two local names for
one function
SYM_FUNC_START_ALIAS -- use where there are two global names for one
function
SYM_FUNC_END_ALIAS -- the end of LOCAL_ALIASed or ALIASed function
SYM_FUNC_START -- use for global functions
SYM_FUNC_START_NOALIGN -- use for global functions, w/o alignment
SYM_FUNC_START_LOCAL -- use for local functions
SYM_FUNC_START_LOCAL_NOALIGN -- use for local functions, w/o
alignment
SYM_FUNC_START_WEAK -- use for weak functions
SYM_FUNC_START_WEAK_NOALIGN -- use for weak functions, w/o alignment
SYM_FUNC_END -- the end of SYM_FUNC_START_LOCAL, SYM_FUNC_START,
SYM_FUNC_START_WEAK, ...
For functions with special (non-C) calling conventions:
SYM_CODE_START -- use for non-C (special) functions
SYM_CODE_START_NOALIGN -- use for non-C (special) functions, w/o
alignment
SYM_CODE_START_LOCAL -- use for local non-C (special) functions
SYM_CODE_START_LOCAL_NOALIGN -- use for local non-C (special)
functions, w/o alignment
SYM_CODE_END -- the end of SYM_CODE_START_LOCAL or SYM_CODE_START
d) For data
SYM_DATA_START -- global data symbol
SYM_DATA_START_LOCAL -- local data symbol
SYM_DATA_END -- the end of the SYM_DATA_START symbol
SYM_DATA_END_LABEL -- the labeled end of SYM_DATA_START symbol
SYM_DATA -- start+end wrapper around simple global data
SYM_DATA_LOCAL -- start+end wrapper around simple local data
==========
The macros allow to pair starts and ends of functions and mark functions
correctly in the output ELF objects.
All users of the old macros in x86 are converted to use these in further
patches.
Link: https://lkml.kernel.org/r/20191011115108.12392-2-jslaby@suse.cz
Change-Id: Ia6a0259a8da7c67cb10eff32093e736fda07e1db
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Cc: Jian Cai <jiancai@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Modern compilers are perfectly capable of extracting parallelism from
the XOR routines, provided that the prototypes reflect the nature of the
input accurately, in particular, the fact that the input vectors are
expected not to overlap. This is not documented explicitly, but is
implied by the interchangeability of the various C routines, some of
which use temporary variables while others don't: this means that these
routines only behave identically for non-overlapping inputs.
So let's decorate these input vectors with the __restrict modifier,
which informs the compiler that there is no overlap. While at it, make
the input-only vectors pointer-to-const as well.
Link: https://github.com/ClangBuiltLinux/linux/issues/563
Change-Id: I5bf93880b158aa01f2b5155e7a9f6cd7b9088fc6
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEERFwmR4yFob14UDOYC8702P6YulgFAmcHrG8ZHHZlZ2FyZC5u
b3NzdW1Ab3JhY2xlLmNvbQAKCRALzvTY/pi6WIQuEACCYf9xCGBALlKFb0pXX3eF
oiRkceNyy5NWSndD7t9p/3d2g4YrVptGxtTZN12IltfG4wfCQ+qC/0g2Mu4ho0Yp
2ExKVaIli1t2csIjXCUUyjh3jU0JOkDwJap9n5QemACsX8zrDfKVwdlj9hw+e7vi
fBWwdfl1duK5cfVbbyvL74It4WeMnjuAYrBnMTxhYBTq56xFLrbBILl8BLxAV5NN
5wGoNCeUtj8LxUrL2qs5QoT3Bf7uoDlLnu1Ly7jDMMX34/oNh5huOjZdDFbQYxS3
DsEe6ljOYOyB/awdUhScERfxVPimumN3nHWnRJbsQhX36uXT6U7HNJah4zauchRk
UlKUSfG3YyOqKIwFH+8oGmkuCm6wZbVjVsNNkYhT804BCCHrasJ1SHXsSB9R0MpU
x3IQOoiuc33bUYrSqWAO7utvt+PwG++3GHz0XQwPfZn4DHY18/e+VNsGtQTPqzRG
tsywZVTN0DC0nO7L772nkQDb7z2mhmJGgN8q3FPbMTfp/I1phIh9C17pckfpHKAl
ippTmTMaIYDU3Rlc1g/cu363GOaXWRN4t03VSEu/BLV0IElRktUnmuBU3B/rMb+F
ItaBmhnZGXHUrulMTxDtzItrYMwx00USw6IrG3iYjob0MhhxhLVxEh0vKc7Te2w5
2FZEjj2BxinK66mJgAolZw==
=BQd/
-----END PGP SIGNATURE-----
Merge tag 'v4.14.353-openela' of https://github.com/openela/kernel-lts
This is the 4.14.353 OpenELA-Extended LTS stable release
* tag 'v4.14.353-openela' of https://github.com/openela/kernel-lts: (173 commits)
LTS: Update to 4.14.353
net: fix __dst_negative_advice() race
selftests: make order checking verbose in msg_zerocopy selftest
selftests: fix OOM in msg_zerocopy selftest
Revert "selftests/net: reap zerocopy completions passed up as ancillary data."
Revert "selftests: fix OOM in msg_zerocopy selftest"
Revert "selftests: make order checking verbose in msg_zerocopy selftest"
nvme/pci: Add APST quirk for Lenovo N60z laptop
exec: Fix ToCToU between perm check and set-uid/gid usage
drm/i915/gem: Fix Virtual Memory mapping boundaries calculation
drm/i915: Try GGTT mmapping whole object as partial
netfilter: nf_tables: set element extended ACK reporting support
kbuild: Fix '-S -c' in x86 stack protector scripts
drm/mgag200: Set DDC timeout in milliseconds
drm/bridge: analogix_dp: properly handle zero sized AUX transactions
drm/bridge: analogix_dp: Properly log AUX CH errors
drm/bridge: analogix_dp: Reset aux channel if an error occurred
drm/bridge: analogix_dp: Check AUX_EN status when doing AUX transfer
x86/mtrr: Check if fixed MTRRs exist before saving them
tracing: Fix overflow in get_free_elt()
...
Change-Id: I0e92a979e31d4fa6c526c6b70a1b61711d9747bb
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit 919f18f961c03d6694aa726c514184f2311a4614 upstream.
MTRRs have an obsolete fixed variant for fine grained caching control
of the 640K-1MB region that uses separate MSRs. This fixed variant has
a separate capability bit in the MTRR capability MSR.
So far all x86 CPUs which support MTRR have this separate bit set, so it
went unnoticed that mtrr_save_state() does not check the capability bit
before accessing the fixed MTRR MSRs.
Though on a CPU that does not support the fixed MTRR capability this
results in a #GP. The #GP itself is harmless because the RDMSR fault is
handled gracefully, but results in a WARN_ON().
Add the missing capability check to prevent this.
Fixes: 2b1f6278d77c ("[PATCH] x86: Save the MTRRs of the BSP before booting an AP")
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240808000244.946864-1-ak@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 34f36e6ee5bd7eff8b2adcd9fcaef369f752d82e)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit ad97196379d0b8cb24ef3d5006978a6554e6467f ]
topa_entry->base is a bit-field. Bit-fields are not promoted to a 64-bit
type, even if the underlying type is 64-bit, and so, if necessary, must
be cast to a larger type when calculations are done.
Fix a topa_entry->base address calculation by adding a cast.
Without the cast, the address was limited to 36-bits i.e. 64GiB.
The address calculation is used on systems that do not support Multiple
Entry ToPA (only Broadwell), and affects physical addresses on or above
64GiB. Instead of writing to the correct address, the address comprising
the first 36 bits would be written to.
Intel PT snapshot and sampling modes are not affected.
Fixes: 52ca9ced3f70 ("perf/x86/intel/pt: Add Intel PT PMU driver")
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240624201101.60186-3-adrian.hunter@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 418f7db13405953c2d9223275d365d9828169076)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 38bb8d77d0b932a0773b5de2ef42479409314f96 ]
PT uses page sized ToPA tables, where the ToPA table resides at the bottom
and its driver-specific metadata taking up a few words at the top of the
page. The split is currently calculated manually and needs to be redone
every time a field is added to or removed from the metadata structure.
Also, the 32-bit version can be made smaller.
By splitting the table and metadata into separate structures, we are making
the compiler figure out the division of the page.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20190821124727.73310-5-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Stable-dep-of: ad97196379d0 ("perf/x86/intel/pt: Fix a topa_entry base address calculation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e9d9ec1019a90aafdb54765a3b46f36f402b481a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 539f7c26b41d4ed7d88dd9756de3966ae7ca07b4 ]
Currently, pt_buffer_reset_offsets() calculates the current ToPA entry by
casting pointers to addresses and performing ungainly subtractions and
divisions instead of a simpler pointer arithmetic, which would be perfectly
applicable in that case. Fix that.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20190821124727.73310-4-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Stable-dep-of: ad97196379d0 ("perf/x86/intel/pt: Fix a topa_entry base address calculation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 67968b8c7603007751f140f3f9f8aa8e64fc26b2)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit fffec50f541ace292383c0cbe9a2a97d16d201c6 ]
There are a few places in the PT driver that need to obtain the size of
a ToPA entry, some of them for the current ToPA entry in the buffer.
Use helpers for those, to make the lines shorter and more readable.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20190821124727.73310-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Stable-dep-of: ad97196379d0 ("perf/x86/intel/pt: Fix a topa_entry base address calculation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e3fb71f7ecbf87228148c3287eac965927ef49be)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit f6d079ce867d679e4dffef5b3112c7634215fd88 ]
pt_cap_get() is required by the upcoming PT support in KVM guests.
Export it and move the capabilites enum to a global header.
As a global functions, "pt_*" is already used for ptrace and
other things, so it makes sense to use "intel_pt_*" as a prefix.
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Stable-dep-of: ad97196379d0 ("perf/x86/intel/pt: Fix a topa_entry base address calculation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit bea2d4588e90f56da62b0dd9099484a42498b08a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit 5638bd722a44bbe97c1a7b3fae5b9efddb3e70ff upstream.
topa_entry->base needs to store a pfn. It obviously needs to be
large enough to store the largest possible x86 pfn which is
MAXPHYADDR-PAGE_SIZE (52-12). So it is 4 bits too small.
Increase the size of topa_entry->base from 36 bits to 40 bits.
Note, systems where physical addresses can be 256TiB or more are affected.
[ Adrian: Amend commit message as suggested by Dave Hansen ]
Fixes: 52ca9ced3f70 ("perf/x86/intel/pt: Add Intel PT PMU driver")
Signed-off-by: Marco Cavenati <cavenati.marco@gmail.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240624201101.60186-2-adrian.hunter@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b4030b619066aa1c20e075ce9382f103e0168145)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 7821fa101eab529521aa4b724bf708149d70820c ]
iosf_mbi_pci_{read,write}_mdr() use pci_{read,write}_config_dword()
that return PCIBIOS_* codes but functions also return -ENODEV which are
not compatible error codes. As neither of the functions are related to
PCI read/write functions, they should return normal errnos.
Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning it.
Fixes: 46184415368a ("arch: x86: New MailBox support driver for Intel SOC's")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240527125538.13620-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 3f4f08e59ddf359da5bc4226ba865a59177a3a50)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit e9d7b435dfaec58432f4106aaa632bf39f52ce9f ]
xen_pcifront_enable_irq() uses pci_read_config_byte() that returns
PCIBIOS_* codes. The error handling, however, assumes the codes are
normal errnos because it checks for < 0.
xen_pcifront_enable_irq() also returns the PCIBIOS_* code back to the
caller but the function is used as the (*pcibios_enable_irq) function
which should return normal errnos.
Convert the error check to plain non-zero check which works for
PCIBIOS_* return codes and convert the PCIBIOS_* return code using
pcibios_err_to_errno() into normal errno before returning it.
Fixes: 3f2a230caf21 ("xen: handled remapped IRQs when enabling a pcifront PCI device.")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20240527125538.13620-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 5294b91618250c7719e4c85096cafe8f76a1bc20)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 724852059e97c48557151b3aa4af424614819752 ]
intel_mid_pci_irq_enable() uses pci_read_config_byte() that returns
PCIBIOS_* codes. The error handling, however, assumes the codes are
normal errnos because it checks for < 0.
intel_mid_pci_irq_enable() also returns the PCIBIOS_* code back to the
caller but the function is used as the (*pcibios_enable_irq) function
which should return normal errnos.
Convert the error check to plain non-zero check which works for
PCIBIOS_* return codes and convert the PCIBIOS_* return code using
pcibios_err_to_errno() into normal errno before returning it.
Fixes: 5b395e2be6c4 ("x86/platform/intel-mid: Make IRQ allocation a bit more flexible")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240527125538.13620-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 600a520cc4e661aa712415e4a733924e9d22777d)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit ec0b4c4d45cf7cf9a6c9626a494a89cb1ae7c645 ]
x86_of_pci_irq_enable() returns PCIBIOS_* code received from
pci_read_config_byte() directly and also -EINVAL which are not
compatible error types. x86_of_pci_irq_enable() is used as
(*pcibios_enable_irq) function which should not return PCIBIOS_* codes.
Convert the PCIBIOS_* return code from pci_read_config_byte() into
normal errno using pcibios_err_to_errno().
Fixes: 96e0a0797eba ("x86: dtb: Add support for PCI devices backed by dtb nodes")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240527125538.13620-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 56d64c36b2aac95c9c24e303fb746591ecfa096a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.
Currently we simply back off and mark an oom victim with blockable mmu
notifiers as done after a short sleep. That can result in selecting a new
oom victim prematurely because the previous one still hasn't torn its
memory down yet.
We can do much better though. Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held. Moreover
majority of notifiers only care about a portion of the address space and
there is absolutely zero reason to fail when we are unmapping an unrelated
range. Many notifiers do really block and wait for HW which is harder to
handle and we have to bail out though.
This patch handles the low hanging fruit.
__mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
are not allowed to sleep if the flag is set to false. This is achieved by
using trylock instead of the sleepable lock for most callbacks and
continue as long as we do not block down the call chain.
I think we can improve that even further because there is a common pattern
to do a range lookup first and then do something about that. The first
part can be done without a sleeping lock in most cases AFAICS.
The oom_reaper end then simply retries if there is at least one notifier
which couldn't make any progress in !blockable mode. A retry loop is
already implemented to wait for the mmap_sem and this is basically the
same thing.
The simplest way for driver developers to test this code path is to wrap
userspace code which uses these notifiers into a memcg and set the hard
limit to hit the oom. This can be done e.g. after the test faults in all
the mmu notifier managed memory and set the hard limit to something really
small. Then we are looking for a proper process tear down.
[akpm@linux-foundation.org: coding style fixes]
[akpm@linux-foundation.org: minor code simplification]
Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
Reported-by: David Rientjes <rientjes@google.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Change-Id: Ibf089b0ebbbfa7182eeca314b757caf456969758
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
* 'linux-4.14.y' of https://github.com/openela/kernel-lts: (133 commits)
LTS: Update to 4.14.350
SUNRPC: Fix RPC client cleaned up the freed pipefs dentries
arm64: dts: rockchip: Add sound-dai-cells for RK3368
tcp: Fix data races around icsk->icsk_af_ops.
ipv6: Fix data races around sk->sk_prot.
ipv6: annotate some data-races around sk->sk_prot
pwm: stm32: Refuse too small period requests
ftruncate: pass a signed offset
batman-adv: Don't accept TT entries for out-of-spec VIDs
batman-adv: include gfp.h for GFP_* defines
drm/nouveau/dispnv04: fix null pointer dereference in nv17_tv_get_hd_modes
drm/nouveau/dispnv04: fix null pointer dereference in nv17_tv_get_ld_modes
hexagon: fix fadvise64_64 calling conventions
tty: mcf: MCF54418 has 10 UARTS
usb: atm: cxacru: fix endpoint checking in cxacru_bind()
usb: musb: da8xx: fix a resource leak in probe()
usb: gadget: printer: SS+ support
net: usb: ax88179_178a: improve link status logs
iio: adc: ad7266: Fix variable checking bug
mmc: sdhci-pci: Convert PCIBIOS_* return codes to errnos
...
Change-Id: I0ab862e0932a48f13c64657e5456f3034b766445
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
[ Upstream commit 093d9603b60093a9aaae942db56107f6432a5dca ]
The 'profile_pc()' function is used for timer-based profiling, which
isn't really all that relevant any more to begin with, but it also ends
up making assumptions based on the stack layout that aren't necessarily
valid.
Basically, the code tries to account the time spent in spinlocks to the
caller rather than the spinlock, and while I support that as a concept,
it's not worth the code complexity or the KASAN warnings when no serious
profiling is done using timers anyway these days.
And the code really does depend on stack layout that is only true in the
simplest of cases. We've lost the comment at some point (I think when
the 32-bit and 64-bit code was unified), but it used to say:
Assume the lock function has either no stack frame or a copy
of eflags from PUSHF.
which explains why it just blindly loads a word or two straight off the
stack pointer and then takes a minimal look at the values to just check
if they might be eflags or the return pc:
Eflags always has bits 22 and up cleared unlike kernel addresses
but that basic stack layout assumption assumes that there isn't any lock
debugging etc going on that would complicate the code and cause a stack
frame.
It causes KASAN unhappiness reported for years by syzkaller [1] and
others [2].
With no real practical reason for this any more, just remove the code.
Just for historical interest, here's some background commits relating to
this code from 2006:
0cb91a229364 ("i386: Account spinlocks to the caller during profiling for !FP kernels")
31679f38d886 ("Simplify profile_pc on x86-64")
and a code unification from 2009:
ef4512882dbe ("x86: time_32/64.c unify profile_pc")
but the basics of this thing actually goes back to before the git tree.
Link: https://syzkaller.appspot.com/bug?extid=84fe685c02cd112a2ac3 [1]
Link: https://lore.kernel.org/all/CAK55_s7Xyq=nh97=K=G1sxueOFrJDAvPOJAL4TPTCAYvmxO9_A@mail.gmail.com/ [2]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 65ebdde16e7f5da99dbf8a548fb635837d78384e)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit c625dabbf1c4a8e77e4734014f2fde7aa9071a1f ]
AMD Zen-based systems use a System Management Network (SMN) that
provides access to implementation-specific registers.
SMN accesses are done indirectly through an index/data pair in PCI
config space. The PCI config access may fail and return an error code.
This would prevent the "read" value from being updated.
However, the PCI config access may succeed, but the return value may be
invalid. This is in similar fashion to PCI bad reads, i.e. return all
bits set.
Most systems will return 0 for SMN addresses that are not accessible.
This is in line with AMD convention that unavailable registers are
Read-as-Zero/Writes-Ignored.
However, some systems will return a "PCI Error Response" instead. This
value, along with an error code of 0 from the PCI config access, will
confuse callers of the amd_smn_read() function.
Check for this condition, clear the return value, and set a proper error
code.
Fixes: ddfe43cdc0da ("x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230403164244.471141-1-yazen.ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d4e52b36c73f44d2b5f41d0cd3f57b3d2efbf180)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEERFwmR4yFob14UDOYC8702P6YulgFAmaYMNYZHHZlZ2FyZC5u
b3NzdW1Ab3JhY2xlLmNvbQAKCRALzvTY/pi6WI+2EACbJP/GYZL4iZezt3yp9J6y
ObeobshL3ODENH9J4Rpjo7EJNdRbiJmqK07C6g3gxfEBqYhMDxYCBbhwTTvvHmu7
ezr1rmQmUlyzf2qW905a+rTawUrKztZpvZ0ycRXgfQHjX8w64salq/G5X9kJ1CZQ
0TYwhDXXYRc1yuhJkVH0+ZUP+FvSBYXY42QZQ8tRzviBKgHUqyQ2JiLN7yGXStSp
PEOCeXuEsQxkzbFU1rG7J9KXfUYndih+fiGSvuUUZF6WTHNobfkh+nrGzsdadtUp
UW9nEdHjjEhTpTr125uOGc3H2Y1rWVPrcZ9kvJBhzf4WKNBFu2v7Bc5i2/Yz/jKU
5cz7bjqpSnFOAmNe1f+pOO2oIsBk/xhAbMrPHS1eTJfUJmVL21HgDS3nXfV3yYcR
0cHH10HGf7DEx2PRh3DM53XzaiumOXY3e/eFt+syYFWtsPY0XKHjsfwLeoujCVgh
Sb6yiV1HTNg2hkGck+CQKTvHKZhSs1uE+vGSHiSTpryrsXYCTRJySSXEdiU0QpeL
c9xzRE0PrUaUKNucdimGr6EqvXL11M1I59Z3ygk8vyLGI13vSmkRZ9Sl7m0tbirA
0K1Ws2PkwuYQEOut8Esp6DJ2n38Uz3j0lnb2lreC0KbfXMvPWQfP81M1Lc+Pkpn6
Zgbbs68F6jYs0KV/iRty2A==
=RvUO
-----END PGP SIGNATURE-----
Merge tag 'v4.14.349-openela' of https://github.com/openela/kernel-lts
This is the 4.14.349 OpenELA-Extended LTS stable release
* tag 'v4.14.349-openela' of https://github.com/openela/kernel-lts: (160 commits)
LTS: Update to 4.14.349
x86/kvm: Disable all PV features on crash
x86/kvm: Disable kvmclock on all CPUs on shutdown
x86/kvm: Teardown PV features on boot CPU as well
crypto: algif_aead - fix uninitialized ctx->init
nfs: fix undefined behavior in nfs_block_bits()
ext4: fix mb_cache_entry's e_refcnt leak in ext4_xattr_block_cache_find()
sparc: move struct termio to asm/termios.h
kdb: Use format-specifiers rather than memset() for padding in kdb_read()
kdb: Merge identical case statements in kdb_read()
kdb: Fix console handling when editing and tab-completing commands
kdb: Use format-strings rather than '\0' injection in kdb_read()
kdb: Fix buffer overflow during tab-complete
sparc64: Fix number of online CPUs
intel_th: pci: Add Meteor Lake-S CPU support
net/9p: fix uninit-value in p9_client_rpc()
crypto: qat - Fix ADF_DEV_RESET_SYNC memory leak
KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode
netfilter: nft_dynset: relax superfluous check on set updates
netfilter: nft_dynset: report EOPNOTSUPP on missing set feature
...
Change-Id: Idb0053e6b2186ef17f31e15fdb601ae451c81283
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit 3d6b84132d2a57b5a74100f6923a8feb679ac2ce upstream.
Crash shutdown handler only disables kvmclock and steal time, other PV
features remain active so we risk corrupting memory or getting some
side-effects in kdump kernel. Move crash handler to kvm.c and unify
with CPU offline.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mahmoud Adam <mngyadam@amazon.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit c02027b5742b5aa804ef08a4a9db433295533046 upstream.
Currenly, we disable kvmclock from machine_shutdown() hook and this
only happens for boot CPU. We need to disable it for all CPUs to
guard against memory corruption e.g. on restore from hibernate.
Note, writing '0' to kvmclock MSR doesn't clear memory location, it
just prevents hypervisor from updating the location so for the short
while after write and while CPU is still alive, the clock remains usable
and correct so we don't need to switch to some other clocksource.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mahmoud Adam <mngyadam@amazon.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit 8b79feffeca28c5459458fe78676b081e87c93a4 upstream.
Various PV features (Async PF, PV EOI, steal time) work through memory
shared with hypervisor and when we restore from hibernation we must
properly teardown all these features to make sure hypervisor doesn't
write to stale locations after we jump to the previously hibernated kernel
(which can try to place anything there). For secondary CPUs the job is
already done by kvm_cpu_down_prepare(), register syscore ops to do
the same for boot CPU.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mahmoud Adam <mngyadam@amazon.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 66ee3636eddcc82ab82b539d08b85fb5ac1dff9b ]
It took me some time to understand the purpose of the tricky code at
the end of arch/x86/Kconfig.debug.
Without it, the following would be shown:
WARNING: unmet direct dependencies detected for FRAME_POINTER
because
81d387190039 ("x86/kconfig: Consolidate unwinders into multiple choice selection")
removed 'select ARCH_WANT_FRAME_POINTERS'.
The correct and more straightforward approach should have been to move
it where 'select FRAME_POINTER' is located.
Several architectures properly handle the conditional selection of
ARCH_WANT_FRAME_POINTERS. For example, 'config UNWINDER_FRAME_POINTER'
in arch/arm/Kconfig.debug.
Fixes: 81d387190039 ("x86/kconfig: Consolidate unwinders into multiple choice selection")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20240204122003.53795-1-masahiroy@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 28a7a1f9571068bb2ddc8a11f0afe5dfa9863462)
[Vegard: fix trivial conflict in context due to missing commit
06ec64b84c357693e9a5540de8eedfc775dbae12 ("Kconfig: consolidate the
"Kernel hacking" menu").]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 59162e0c11d7257cde15f907d19fefe26da66692 ]
The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.
Opcode 0x68 PUSH instruction is currently defined as 64-bit operand size
only i.e. (d64). That was based on Intel SDM Opcode Map. However that is
contradicted by the Instruction Set Reference section for PUSH in the
same manual.
Remove 64-bit operand size only annotation from opcode 0x68 PUSH
instruction.
Example:
$ cat pushw.s
.global _start
.text
_start:
pushw $0x1234
mov $0x1,%eax # system call number (sys_exit)
int $0x80
$ as -o pushw.o pushw.s
$ ld -s -o pushw pushw.o
$ objdump -d pushw | tail -4
0000000000401000 <.text>:
401000: 66 68 34 12 pushw $0x1234
401004: b8 01 00 00 00 mov $0x1,%eax
401009: cd 80 int $0x80
$ perf record -e intel_pt//u ./pushw
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data ]
Before:
$ perf script --insn-trace=disasm
Warning:
1 instruction trace errors
pushw 10349 [000] 10586.869237014: 401000 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) pushw $0x1234
pushw 10349 [000] 10586.869237014: 401006 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %al, (%rax)
pushw 10349 [000] 10586.869237014: 401008 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %cl, %ch
pushw 10349 [000] 10586.869237014: 40100a [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb $0x2e, (%rax)
instruction trace error type 1 time 10586.869237224 cpu 0 pid 10349 tid 10349 ip 0x40100d code 6: Trace doesn't match instruction
After:
$ perf script --insn-trace=disasm
pushw 10349 [000] 10586.869237014: 401000 [unknown] (./pushw) pushw $0x1234
pushw 10349 [000] 10586.869237014: 401004 [unknown] (./pushw) movl $1, %eax
Fixes: eb13296cfaf6 ("x86: Instruction decoder API")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-3-adrian.hunter@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ef10bbdf4d59a98cf57ddf943756f14ef3cdbccd)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 76e9762d66373354b45c33b60e9a53ef2a3c5ff2 ]
Commit:
aaa8736370db ("x86, relocs: Ignore relocations in .notes section")
... only started ignoring the .notes sections in print_absolute_relocs(),
but the same logic should also by applied in walk_relocs() to avoid
such relocations.
[ mingo: Fixed various typos in the changelog, removed extra curly braces from the code. ]
Fixes: aaa8736370db ("x86, relocs: Ignore relocations in .notes section")
Fixes: 5ead97c84fa7 ("xen: Core Xen implementation")
Fixes: da1a679cde9b ("Add /sys/kernel/notes")
Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240317150547.24910-1-weiguixiong@bytedance.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 2487db16d4b9faead07b7825d33294e9e783791d)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit eefb8c124fd969e9a174ff2bedff86aa305a7438 upstream.
Introduce a new READELF variable to top-level Makefile, so the name of
readelf binary can be specified.
Before this change the name of the binary was hardcoded to
"$(CROSS_COMPILE)readelf" which might not be present for every
toolchain.
This allows to build with LLVM Object Reader by using make parameter
READELF=llvm-readelf.
Link: https://github.com/ClangBuiltLinux/linux/issues/771
Change-Id: Idaa4259f5926233a0e5f09e8d79db1736e91b159
Signed-off-by: Dmitry Golovin <dima@golovin.in>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
* 'linux-4.14.y' of https://github.com/openela/kernel-lts: (278 commits)
LTS: Update to 4.14.348
docs: kernel_include.py: Cope with docutils 0.21
serial: kgdboc: Fix NMI-safety problems from keyboard reset code
btrfs: add missing mutex_unlock in btrfs_relocate_sys_chunks()
dm: limit the number of targets and parameter size area
Revert "selftests: mm: fix map_hugetlb failure on 64K page size systems"
LTS: Update to 4.14.347
rds: Fix build regression.
RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats
af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
net: fix out-of-bounds access in ops_init
drm/vmwgfx: Fix invalid reads in fence signaled events
dyndbg: fix old BUG_ON in >control parser
tipc: fix UAF in error path
usb: gadget: f_fs: Fix a race condition when processing setup packets.
usb: gadget: composite: fix OS descriptors w_value logic
firewire: nosy: ensure user_length is taken into account when fetching packet contents
af_unix: Fix garbage collector racing against connect()
af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
ipv6: fib6_rules: avoid possible NULL dereference in fib6_rule_action()
...
Change-Id: If329d39dd4e95e14045bb7c58494c197d1352d60
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit 5ce344beaca688f4cdea07045e0b8f03dc537e74 upstream.
When done from a virtual machine, instructions that touch APIC memory
must be emulated. By convention, MMIO accesses are typically performed
via io.h helpers such as readl() or writeq() to simplify instruction
emulation/decoding (ex: in KVM hosts and SEV guests) [0].
Currently, native_apic_mem_read() does not follow this convention,
allowing the compiler to emit instructions other than the MOV
instruction generated by readl(). In particular, when the kernel is
compiled with clang and run as a SEV-ES or SEV-SNP guest, the compiler
would emit a TESTL instruction which is not supported by the SEV-ES
emulator, causing a boot failure in that environment. It is likely the
same problem would happen in a TDX guest as that uses the same
instruction emulator as SEV-ES.
To make sure all emulators can emulate APIC memory reads via MOV, use
the readl() function in native_apic_mem_read(). It is expected that any
emulator would support MOV in any addressing mode as it is the most
generic and is what is usually emitted currently.
The TESTL instruction is emitted when native_apic_mem_read() is inlined
into apic_mem_wait_icr_idle(). The emulator comes from
insn_decode_mmio() in arch/x86/lib/insn-eval.c. It's not worth it to
extend insn_decode_mmio() to support more instructions since, in theory,
the compiler could choose to output nearly any instruction for such
reads which would bloat the emulator beyond reason.
[0] https://lore.kernel.org/all/20220405232939.73860-12-kirill.shutemov@linux.intel.com/
[ bp: Massage commit message, fix typos. ]
Signed-off-by: Adam Dunlap <acdunlap@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Kevin Loughlin <kevinloughlin@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240318230927.2191933-1-acdunlap@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 38ecf8d8a293c9677a4659ede4810ecacb06dcda)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 upstream.
PAT handling won't do the right thing in COW mappings: the first PTE (or,
in fact, all PTEs) can be replaced during write faults to point at anon
folios. Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.
Using follow_phys(), we might just get the address+protection of the anon
folio (which is very wrong), or fail on swap/nonswap entries, failing
follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
track_pfn_copy(), not properly calling free_pfn_range().
In free_pfn_range(), we either wouldn't call memtype_free() or would call
it with the wrong range, possibly leaking memory.
To fix that, let's update follow_phys() to refuse returning anon folios,
and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
if we run into that.
We will now properly handle untrack_pfn() with COW mappings, where we
don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if
the first page was replaced by an anon folio, though: we'd have to store
the cachemode in the VMA to make this work, likely growing the VMA size.
For now, lets keep it simple and let track_pfn_copy() just fail in that
case: it would have failed in the past with swap/nonswap entries already,
and it would have done the wrong thing with anon folios.
Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
<--- C reproducer --->
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <liburing.h>
int main(void)
{
struct io_uring_params p = {};
int ring_fd;
size_t size;
char *map;
ring_fd = io_uring_setup(1, &p);
if (ring_fd < 0) {
perror("io_uring_setup");
return 1;
}
size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
/* Map the submission queue ring MAP_PRIVATE */
map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
ring_fd, IORING_OFF_SQ_RING);
if (map == MAP_FAILED) {
perror("mmap");
return 1;
}
/* We have at least one page. Let's COW it. */
*map = 0;
pause();
return 0;
}
<--- C reproducer --->
On a system with 16 GiB RAM and swap configured:
# ./iouring &
# memhog 16G
# killall iouring
[ 301.552930] ------------[ cut here ]------------
[ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
[ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
[ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
[ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
[ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
[ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
[ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
[ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
[ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
[ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
[ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
[ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
[ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
[ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
[ 301.565725] PKRU: 55555554
[ 301.565944] Call Trace:
[ 301.566148] <TASK>
[ 301.566325] ? untrack_pfn+0xf4/0x100
[ 301.566618] ? __warn+0x81/0x130
[ 301.566876] ? untrack_pfn+0xf4/0x100
[ 301.567163] ? report_bug+0x171/0x1a0
[ 301.567466] ? handle_bug+0x3c/0x80
[ 301.567743] ? exc_invalid_op+0x17/0x70
[ 301.568038] ? asm_exc_invalid_op+0x1a/0x20
[ 301.568363] ? untrack_pfn+0xf4/0x100
[ 301.568660] ? untrack_pfn+0x65/0x100
[ 301.568947] unmap_single_vma+0xa6/0xe0
[ 301.569247] unmap_vmas+0xb5/0x190
[ 301.569532] exit_mmap+0xec/0x340
[ 301.569801] __mmput+0x3e/0x130
[ 301.570051] do_exit+0x305/0xaf0
...
Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Wupeng Ma <mawupeng1@huawei.com>
Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f18681daaec9665a15c5e7e0f591aad5d0ac622b)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit c567f2948f57bdc03ed03403ae0234085f376b7d upstream.
This reverts commit d794734c9bbfe22f86686dc2909c25f5ffe1a572.
While the original change tries to fix a bug, it also unintentionally broke
existing systems, see the regressions reported at:
https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/
Since d794734c9bbf was also marked for -stable, let's back it out before
causing more damage.
Note that due to another upstream change the revert was not 100% automatic:
0a845e0f6348 mm/treewide: replace pud_large() with pud_leaf()
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Russ Anderson <rja@hpe.com>
Cc: Steve Wahl <steve.wahl@hpe.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/
Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b66762945d3289d472cedfca81dd98f9d8efe3b7)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit fd470a8beed88440b160d690344fbae05a0b9b1b upstream.
Unlike Intel's Enhanced IBRS feature, AMD's Automatic IBRS does not
provide protection to processes running at CPL3/user mode, see section
"Extended Feature Enable Register (EFER)" in the APM v2 at
https://bugzilla.kernel.org/attachment.cgi?id=304652
Explicitly enable STIBP to protect against cross-thread CPL3
branch target injections on systems with Automatic IBRS enabled.
Also update the relevant documentation.
Fixes: e7862eda309e ("x86/cpu: Support AMD Automatic IBRS")
Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230720194727.67022-1-kim.phillips@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bb8cc9c34361714dd232700b3d5f1373055de610)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit 1d30800c0c0ae1d086ffad2bdf0ba4403370f132 upstream.
Those mitigations are very talkative; use the printing helper which pays
attention to the buffer size.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220809153419.10182-1-bp@alien8.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 205bf06008b8ea128ae8f643c21fb32fe4165416)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit 6cb2b08ff92460290979de4be91363e5d1b6cec1 upstream.
Xen PV domain kernel is not by design affected by meltdown as it's
enforcing split CR3 itself. Let's not report such systems as "Vulnerable"
in sysfs (we're also already forcing PTI to off in X86_HYPER_XEN_PV cases);
the security of the system ultimately depends on presence of mitigation in
the Hypervisor, which can't be easily detected from DomU; let's report
that.
Reported-and-tested-by: Mike Latimer <mlatimer@suse.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1806180959080.6203@cbobk.fhfr.pm
[ Merge the user-visible string into a single line. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 6cb2b08ff92460290979de4be91363e5d1b6cec1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit e7862eda309ecfccc36bb5558d937ed3ace07f3f upstream.
The AMD Zen4 core supports a new feature called Automatic IBRS.
It is a "set-and-forget" feature that means that, like Intel's Enhanced IBRS,
h/w manages its IBRS mitigation resources automatically across CPL transitions.
The feature is advertised by CPUID_Fn80000021_EAX bit 8 and is enabled by
setting MSR C000_0080 (EFER) bit 21.
Enable Automatic IBRS by default if the CPU feature is present. It typically
provides greater performance over the incumbent generic retpolines mitigation.
Reuse the SPECTRE_V2_EIBRS spectre_v2_mitigation enum. AMD Automatic IBRS and
Intel Enhanced IBRS have similar enablement. Add NO_EIBRS_PBRSB to
cpu_vuln_whitelist, since AMD Automatic IBRS isn't affected by PBRSB-eIBRS.
The kernel command line option spectre_v2=eibrs is used to select AMD Automatic
IBRS, if available.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20230124163319.2277355-8-kim.phillips@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a7268b3424863578814d99a1dd603f6bb5b9f977)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
* 'linux-4.14.y' of https://github.com/openela/kernel-lts: (186 commits)
LTS: Update to 4.14.344
binder: signal epoll threads of self-work
ANDROID: binder: Add thread->process_todo flag.
scsi: bnx2fc: Fix skb double free in bnx2fc_rcv()
scsi: bnx2fc: Remove set but not used variable 'oxid'
net: check dev->gso_max_size in gso_features_check()
driver: staging: count ashmem_range into SLAB_RECLAIMBLE
net: warn if gso_type isn't set for a GSO SKB
staging: android: ashmem: Remove use of unlikely()
ALSA: hda/realtek: Enable headset on Lenovo M90 Gen5
ALSA: hda/realtek: Enable headset onLenovo M70/M90
ALSA: hda/realtek: Add quirk for Lenovo TianYi510Pro-14IOB
ALSA: hda/realtek - ALC897 headset MIC no sound
ALSA: hda/realtek - Add headset Mic support for Lenovo ALC897 platform
ALSA: hda/realtek: Fix the mic type detection issue for ASUS G551JW
ALSA: hda/realtek - The front Mic on a HP machine doesn't work
ALSA: hda/realtek - Enable the headset of Acer N50-600 with ALC662
ALSA: hda/realtek - Enable headset mic of Acer X2660G with ALC662
ALSA: hda/realtek - Add Headset Mic supported for HP cPC
ALSA: hda/realtek - More constifications
...
Change-Id: I3d093c0e457ab7e7e7b98b46eb44e82b6f4636f9
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
[ Upstream commit edc0a2b5957652f4685ef3516f519f84807087db ]
Traditionally, all CPUs in a system have identical numbers of SMT
siblings. That changes with hybrid processors where some logical CPUs
have a sibling and others have none.
Today, the CPU boot code sets the global variable smp_num_siblings when
every CPU thread is brought up. The last thread to boot will overwrite
it with the number of siblings of *that* thread. That last thread to
boot will "win". If the thread is a Pcore, smp_num_siblings == 2. If it
is an Ecore, smp_num_siblings == 1.
smp_num_siblings describes if the *system* supports SMT. It should
specify the maximum number of SMT threads among all cores.
Ensure that smp_num_siblings represents the system-wide maximum number
of siblings by always increasing its value. Never allow it to decrease.
On MeteorLake-P platform, this fixes a problem that the Ecore CPUs are
not updated in any cpu sibling map because the system is treated as an
UP system when probing Ecore CPUs.
Below shows part of the CPU topology information before and after the
fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore).
...
-/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff
-/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11
+/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21
...
-/sys/devices/system/cpu/cpu12/topology/package_cpus:001000
-/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12
+/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21
Notice that the "before" 'package_cpus_list' has only one CPU. This
means that userspace tools like lscpu will see a little laptop like
an 11-socket system:
-Core(s) per socket: 1
-Socket(s): 11
+Core(s) per socket: 16
+Socket(s): 1
This is also expected to make the scheduler do rather wonky things
too.
[ dhansen: remove CPUID detail from changelog, add end user effects ]
CC: stable@kernel.org
Fixes: bbb65d2d365e ("x86: use cpuid vector 0xb when available for detecting cpu topology")
Fixes: 95f3d39ccf7a ("x86/cpu/topology: Provide detect_extended_topology_early()")
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230323015640.27906-1-rui.zhang%40intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
* 'linux-4.14.y' of https://github.com/openela/kernel-lts: (176 commits)
LTS: Update to 4.14.343
crypto: af_alg - Work around empty control messages without MSG_MORE
crypto: af_alg - Fix regression on empty requests
spi: spi-mt65xx: Fix NULL pointer access in interrupt handler
net/bnx2x: Prevent access to a freed page in page_pool
hsr: Handle failures in module init
rds: introduce acquire/release ordering in acquire/release_in_xmit()
hsr: Fix uninit-value access in hsr_get_node()
net: hsr: fix placement of logical operator in a multi-line statement
usb: gadget: net2272: Use irqflags in the call to net2272_probe_fin
staging: greybus: fix get_channel_from_mode() failure path
serial: 8250_exar: Don't remove GPIO device on suspend
rtc: mt6397: select IRQ_DOMAIN instead of depending on it
rtc: mediatek: enhance the description for MediaTek PMIC based RTC
tty: serial: samsung: fix tx_empty() to return TIOCSER_TEMT
serial: max310x: fix syntax error in IRQ error message
clk: qcom: gdsc: Add support to update GDSC transition delay
NFS: Fix an off by one in root_nfs_cat()
net: sunrpc: Fix an off by one in rpc_sockaddr2uaddr()
scsi: bfa: Fix function pointer type mismatch for hcb_qe->cbfn
...
Change-Id: Ib9b7d4f4fbb66b54b4fc2d35e945418da4c02331
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
[ Upstream commit aaa8736370db1a78f0e8434344a484f9fd20be3b ]
When building with CONFIG_XEN_PV=y, .text symbols are emitted into
the .notes section so that Xen can find the "startup_xen" entry point.
This information is used prior to booting the kernel, so relocations
are not useful. In fact, performing relocations against the .notes
section means that the KASLR base is exposed since /sys/kernel/notes
is world-readable.
To avoid leaking the KASLR base without breaking unprivileged tools that
are expecting to read /sys/kernel/notes, skip performing relocations in
the .notes section. The values readable in .notes are then identical to
those found in System.map.
Reported-by: Guixiong Wei <guixiongwei@gmail.com>
Closes: https://lore.kernel.org/all/20240218073501.54555-1-guixiongwei@gmail.com/
Fixes: 5ead97c84fa7 ("xen: Core Xen implementation")
Fixes: da1a679cde9b ("Add /sys/kernel/notes")
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 13edb509abc91c72152a11baaf0e7c060a312e03)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
[ Upstream commit 69143f60868b3939ddc89289b29db593b647295e ]
These local variables @{resched|pmu|callfunc...}_name saves the new
string allocated by kasprintf(), and when bind_{v}ipi_to_irqhandler()
fails, it goes to the @fail tag, and calls xen_smp_intr_free{_pv}() to
free resource, however the new string is not saved, which cause a memory
leak issue. fix it.
Fixes: 9702785a747a ("i386: move xen")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20221123155858.11382-2-xiujianfeng@huawei.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9209d2cbf5845b8bf8e722015d86f65a9d2bc76a)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
[ Upstream commit d04b1ae5a9b0c868dda8b4b34175ef08f3cb9e93 ]
xen_debug_interrupt() is specific to 2-level event handling. So don't
register it with fifo event handling being active.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20201022094907.28560-4-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Stable-dep-of: 69143f60868b ("x86/xen: Fix memory leak in xen_smp_intr_init{_pv}()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 475a13f3d41e422370420882ccf9dbb3f960d33c)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
[ Upstream commit 386093c68ba3e8bcfe7f46deba901e0e80713c29 ]
There doesn't seem to be any reason for the rpath being set in
the binaries, at on systems that I tested on. On the other hand,
setting rpath is actually harming binaries in some cases, e.g.
if using nix-based compilation environments where /lib & /lib64
are not part of the actual environment.
Add a new Kconfig option (under EXPERT, for less user confusion)
that allows disabling the rpath additions.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Stable-dep-of: 846cfbeed09b ("um: Fix adding '-no-pie' for clang")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 90091bdf5df0195de0d2d8e3e4d43aaaee122d34)
[vegard: fix trivial conflict due to missing commit
1572497cb0e6d2016078bc9d5a95786bb878389f ("kconfig: include common
Kconfig files from top-level Kconfig")]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
* 'linux-4.14.y' of https://github.com/openela/kernel-lts: (350 commits)
LTS: Update to 4.14.340
fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio
KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table()
PCI/MSI: Prevent MSI hardware interrupt number truncation
s390: use the correct count for __iowrite64_copy()
packet: move from strlcpy with unused retval to strscpy
ipv6: sr: fix possible use-after-free and null-ptr-deref
nouveau: fix function cast warnings
scsi: jazz_esp: Only build if SCSI core is builtin
RDMA/srpt: fix function pointer cast warnings
RDMA/srpt: Support specifying the srpt_service_guid parameter
IB/hfi1: Fix a memleak in init_credit_return
usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
l2tp: pass correct message length to ip6_append_data
gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp()
dm-crypt: don't modify the data when using authenticated encryption
mm: memcontrol: switch to rcu protection in drain_all_stock()
s390/qeth: Fix potential loss of L3-IP@ in case of network issues
virtio-blk: Ensure no requests in virtqueues before deleting vqs.
firewire: core: send bus reset promptly on gap count error
...
Change-Id: Ieafdd459ee41343bf15ed781b3e45adc2be29cc1
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit d794734c9bbfe22f86686dc2909c25f5ffe1a572 upstream.
When ident_pud_init() uses only gbpages to create identity maps, large
ranges of addresses not actually requested can be included in the
resulting table; a 4K request will map a full GB. On UV systems, this
ends up including regions that will cause hardware to halt the system
if accessed (these are marked "reserved" by BIOS). Even processor
speculation into these regions is enough to trigger the system halt.
Only use gbpages when map creation requests include the full GB page
of space. Fall back to using smaller 2M pages when only portions of a
GB page are included in the request.
No attempt is made to coalesce mapping requests. If a request requires
a map entry at the 2M (pmd) level, subsequent mapping requests within
the same 1G region will also be at the pmd level, even if adjacent or
overlapping such requests could have been combined to map a full
gbpage. Existing usage starts with larger regions and then adds
smaller regions, so this should not have any great consequence.
[ dhansen: fix up comment formatting, simplifty changelog ]
Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240126164841.170866-1-steve.wahl%40hpe.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9149fef02dc1c54d2b4b9a555e11e7482f6ab583)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
The stable kernel version backport of the patch disabling XSAVES on AMD
Zen family 0x17 applied this change to the wrong function (init_amd_k6()),
one which isn't called for Zen CPUs.
Move the erratum to the init_amd_zn() function instead.
Add an explicit family 0x17 check to the erratum so nothing will break if
someone naively makes this kernel version call init_amd_zn() also for
family 0x19 in the future (as the current upstream code does).
Fixes: f028a7db9824 ("x86/CPU/AMD: Disable XSAVES on AMD family 0x17")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 771df0145297a1a9f1e7f799da43f8b0f8850e7c)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit a24d61c609813963aacc9f6ec8343f4fcaac7243 ]
tl;dr: The num_digits() function has a theoretical overflow issue.
But it doesn't affect any actual in-tree users. Fix it by using
a larger type for one of the local variables.
Long version:
There is an overflow in variable m in function num_digits when val
is >= 1410065408 which leads to the digit calculation loop to
iterate more times than required. This results in either more
digits being counted or in some cases (for example where val is
1932683193) the value of m eventually overflows to zero and the
while loop spins forever).
Currently the function num_digits is currently only being used for
small values of val in the SMP boot stage for digit counting on the
number of cpus and NUMA nodes, so the overflow is never encountered.
However it is useful to fix the overflow issue in case the function
is used for other purposes in the future. (The issue was discovered
while investigating the digit counting performance in various
kernel helper functions rather than any real-world use-case).
The simplest fix is to make m a long long, the overhead in
multiplication speed for a long long is very minor for small values
of val less than 10000 on modern processors. The alternative
fix is to replace the multiplication with a constant division
by 10 loop (this compiles down to an multiplication and shift)
without needing to make m a long long, but this is slightly slower
than the fix in this commit when measured on a range of x86
processors).
[ dhansen: subject and changelog tweaks ]
Fixes: 646e29a1789a ("x86: Improve the printout of the SMP bootup CPU table")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20231102174901.2590325-1-colin.i.king%40gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit cd6382e261952a7c2f1b8326bb9c11b074168d6c)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
* 'android-4.14-stable' of https://android.googlesource.com/kernel/common: (2966 commits)
Linux 4.14.331
net: sched: fix race condition in qdisc_graft()
scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
ext4: remove gdb backup copy for meta bg in setup_new_flex_group_blocks
ext4: correct return value of ext4_convert_meta_bg
ext4: correct offset of gdb backup in non meta_bg group to update_backups
ext4: apply umask if ACL support is disabled
media: venus: hfi: fix the check to handle session buffer requirement
media: sharp: fix sharp encoding
i2c: i801: fix potential race in i801_block_transaction_byte_by_byte
net: dsa: lan9303: consequently nested-lock physical MDIO
ALSA: info: Fix potential deadlock at disconnection
parisc/pgtable: Do not drop upper 5 address bits of physical address
parisc: Prevent booting 64-bit kernels on PA1.x machines
mcb: fix error handling for different scenarios when parsing
jbd2: fix potential data lost in recovering journal raced with synchronizing fs bdev
genirq/generic_chip: Make irq_remove_generic_chip() irqdomain aware
mmc: meson-gx: Remove setting of CMD_CFG_ERROR
PM: hibernate: Clean up sync_read handling in snapshot_write_next()
PM: hibernate: Use __get_safe_page() rather than touching the list
...
Change-Id: I755d2aa7c525ace28adc4aee433572b3110ea39b
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmVmGT0ACgkQONu9yGCS
aT5ERQ//Tx5hvAL4WlnyNLMshNB5Ep8cuB1JryM1pi5BbtQxToDFZv3aJKkqj2K3
CRFq1x5hO9dli5MK5RTaO4JwCSwOphBDEqswOrtIdI7nHHzkMGBF7UUwezc6M5TZ
7cjs3LFnsVJJITBUAM/f33HyYXUPiMw/TEcWcFnJLJgWQafpOQ4kRH5k5UOL8Kgm
LV+E9YhBikaRpPpsC6obxT7KnaSnOScdUjjD+DRBm+UNhx/F3HVSY2ZY/Mr1XTyJ
v0QhzMAgWdBVGja8+9qU2e8pPw36NcEli539iU4HfrmCUry4J0Mh+XFYbpzvhQLC
U72e0vIoievkxYM1krnI2+wIFh58qlFGwKEIYag+eg0DuJn4ttaTFG9+rkn2lcI9
+d6JqALAImPtd5ZdISj7mBI8mWoTl73Hl5RNnJQQBaBwdHZQc2IXXJQUSbfyDE8/
gor9eEls3E2FtucEtihbsCF/5M0IXs+tr4b67qo73HfS6lqGFGLAFQUlKvhPr0R/
baoEoIb6bsH9oTCLjNoH1vSRPM9VEj3+AFOzK4D3wlfEhDRYkNZDQ/MF3btv6HTp
ifLXerLLxSK56OOqn3yyGOmUhtpR+sPLBrjhrALrcWOjESH9i7zvmHRLCow9qbmx
bf6Qxz6L8/+JIkdDNCN/l7NuzNyCUj0U/ObR1WWXp/n8ZqUpGR0=
=rkdh
-----END PGP SIGNATURE-----
Merge 4.14.331 into android-4.14-stable
Changes in 4.14.331
locking/ww_mutex/test: Fix potential workqueue corruption
clocksource/drivers/timer-imx-gpt: Fix potential memory leak
clocksource/drivers/timer-atmel-tcb: Fix initialization on SAM9 hardware
x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size
wifi: mac80211: don't return unset power in ieee80211_get_tx_power()
wifi: ath9k: fix clang-specific fortify warnings
wifi: ath10k: fix clang-specific fortify warning
net: annotate data-races around sk->sk_dst_pending_confirm
drm/amd: Fix UBSAN array-index-out-of-bounds for SMU7
drm/amd: Fix UBSAN array-index-out-of-bounds for Polaris and Tonga
selftests/efivarfs: create-read: fix a resource leak
crypto: pcrypt - Fix hungtask for PADATA_RESET
RDMA/hfi1: Use FIELD_GET() to extract Link Width
fs/jfs: Add check for negative db_l2nbperpage
fs/jfs: Add validity check for db_maxag and db_agpref
jfs: fix array-index-out-of-bounds in dbFindLeaf
jfs: fix array-index-out-of-bounds in diAlloc
ALSA: hda: Fix possible null-ptr-deref when assigning a stream
atm: iphase: Do PCI error checks on own line
scsi: libfc: Fix potential NULL pointer dereference in fc_lport_ptp_setup()
tty: vcc: Add check for kstrdup() in vcc_probe()
i2c: sun6i-p2wi: Prevent potential division by zero
media: gspca: cpia1: shift-out-of-bounds in set_flicker
media: vivid: avoid integer overflow
gfs2: ignore negated quota changes
pwm: Fix double shift bug
media: venus: hfi: add checks to perform sanity on queue pointers
randstruct: Fix gcc-plugin performance mode to stay in group
KVM: x86: Ignore MSR_AMD64_TW_CFG access
audit: don't take task_lock() in audit_exe_compare() code path
audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare()
hvc/xen: fix error path in xen_hvc_init() to always register frontend driver
PCI/sysfs: Protect driver's D3cold preference from user space
mmc: vub300: fix an error code
PM: hibernate: Use __get_safe_page() rather than touching the list
PM: hibernate: Clean up sync_read handling in snapshot_write_next()
mmc: meson-gx: Remove setting of CMD_CFG_ERROR
genirq/generic_chip: Make irq_remove_generic_chip() irqdomain aware
jbd2: fix potential data lost in recovering journal raced with synchronizing fs bdev
mcb: fix error handling for different scenarios when parsing
parisc: Prevent booting 64-bit kernels on PA1.x machines
parisc/pgtable: Do not drop upper 5 address bits of physical address
ALSA: info: Fix potential deadlock at disconnection
net: dsa: lan9303: consequently nested-lock physical MDIO
i2c: i801: fix potential race in i801_block_transaction_byte_by_byte
media: sharp: fix sharp encoding
media: venus: hfi: fix the check to handle session buffer requirement
ext4: apply umask if ACL support is disabled
ext4: correct offset of gdb backup in non meta_bg group to update_backups
ext4: correct return value of ext4_convert_meta_bg
ext4: remove gdb backup copy for meta bg in setup_new_flex_group_blocks
scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
net: sched: fix race condition in qdisc_graft()
Linux 4.14.331
Change-Id: I1a1bce75363d3b2c731f3e947543c6506bed9817
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>