Writing to registers is frequent enough that there is a measurably
significant portion of CPU time spent on checking the debug mask for
whether to log. Remove the check and logging call altogether to
eliminate the overhead.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
Remote register I/O amounts to a measurably significant portion of CPU
time due to how frequently this function is used. Cache the value of
each register on-demand and use this value in future invocations to
mitigate the expensive I/O.
Co-authored-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Speed up command transfers by reducing unnecessary intermediate buffer
allocation. The buffer allocation is only needed if using FIFO command
transfer, but otherwise there's no need to allocate and memcpy into
intermediate buffers.
Bug: 136715342
Change-Id: Ie540c285655ec86deb046c187f1e27538fd17d1c
Signed-off-by: Adrian Salido <salidoa@google.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
Signed-off-by: alk3pInjection <webmaster@raspii.tech>
Signed-off-by: azrim <mirzaspc@gmail.com>
sdmmagpie's CPUs (semi-custom derivations of Cortex-A55 and Cortex-A76)
support ARMv8.1's efficient LSE atomic instructions as per
/proc/cpuinfo:
CPU feature detection messages in printk confirm the support:
Since our CPUs support it, enable use of LSE atomics to speed up atomic
operations since they are implemented in hardware instead of being
synthesized by a few instructions in software.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
CAF appears to have messed with some code in this kernel related to LSE
atomics and/or alternatives, causing the combined LSE + LL/SC
out-of-line calling code to be too big for its section when compiling
kernel/locking/spinlock.c. This causes gas to fail with a confusing error:
/tmp/spinlock-79343b.s: Assembler messages:
/tmp/spinlock-79343b.s:61: Error: attempt to move .org backwards
/tmp/spinlock-79343b.s:157: Error: attempt to move .org backwards
Clang's integrated assembler is more verbose and provides a more helpful
error that points to the alternatives code as being the culprit:
In file included from ../kernel/locking/spinlock.c:20:
In file included from ../include/linux/spinlock.h:88:
../arch/arm64/include/asm/spinlock.h:76:15: error: invalid .org offset '56' (at offset '60')
asm volatile(ARM64_LSE_ATOMIC_INSN(
^
../arch/arm64/include/asm/lse.h:36:2: note: expanded from macro 'ARM64_LSE_ATOMIC_INSN'
ALTERNATIVE(llsc, lse, ARM64_HAS_LSE_ATOMICS)
^
../arch/arm64/include/asm/alternative.h:281:2: note: expanded from macro 'ALTERNATIVE'
_ALTERNATIVE_CFG(oldinstr, newinstr, __VA_ARGS__, 1)
^
../arch/arm64/include/asm/alternative.h:83:2: note: expanded from macro '_ALTERNATIVE_CFG'
__ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg), 0)
^
../arch/arm64/include/asm/alternative.h:73:16: note: expanded from macro '__ALTERNATIVE_CFG'
".popsection\n\t" \
^
<inline asm>:35:7: note: instantiated into assembly here
.org . - (664b-663b) + (662b-661b)
^
Omitting the alternatives code indeed reduces the size enough to make
everything compile successfully. We don't need the patching anyway
because we will only enable CONFIG_ARM64_LSE_ATOMICS when the target CPU
is known to support LSE atomics with 100% certainty, so kill all the
dynamic out-of-line LL/SC patching code.
This change also has the side-effect of reducing the I-cache footprint
of these critical locking and atomic paths, which can reduce cache
thrashing and increase overall performance.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
On a Kryo 485 CPU (semi-custom Cortex-A76 derivative) in a Snapdragon
855 (SM8150) SoC, switching from traditional LL/SC atomics to LSE
causes LKDTM's ATOMIC_TIMING test to regress by 2x:
LL/SC ATOMIC_TIMING: 34.14s 34.08s
LSE ATOMIC_TIMING: 70.84s 71.06s
Prefetching the target operands fixes the regression and makes LSE
perform better than LSE as expected:
LSE+prfm ATOMIC_TIMING: 21.36s 21.21s
"dd if=/dev/zero of=/dev/null count=10000000" also runs faster:
LL/SC: 3.3 3.2 3.3 s
LSE: 3.1 3.2 3.2 s
LSE+p: 2.3 2.3 2.3 s
Commit 0ea366f5e1b6413a6095dce60ea49ae51e468b61 applied the same change
to LL/SC atomics, but it was never ported to LSE.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
This adds support to arm64 for fast refcount checking, as contributed
by Kees for x86 based on the implementation by grsecurity/PaX.
The general approach is identical: the existing atomic_t helpers are
cloned for refcount_t, with the arithmetic instruction modified to set
the PSTATE flags, and one or two branch instructions added that jump to
an out of line handler if overflow, decrement to zero or increment from
zero are detected.
One complication that we have to deal with on arm64 is the fact that
it has two atomics implementations: the original LL/SC implementation
using load/store exclusive loops, and the newer LSE one that does mostly
the same in a single instruction. So we need to clone some parts of
both for the refcount handlers, but we also need to deal with the way
LSE builds fall back to LL/SC at runtime if the hardware does not
support it.
As is the case with the x86 version, the performance gain is substantial
(ThunderX2 @ 2.2 GHz, using LSE), even though the arm64 implementation
incorporates an add-from-zero check as well:
perf stat -B -- echo ATOMIC_TIMING >/sys/kernel/debug/provoke-crash/DIRECT
116252672661 cycles # 2.207 GHz
52.689793525 seconds time elapsed
perf stat -B -- echo REFCOUNT_TIMING >/sys/kernel/debug/provoke-crash/DIRECT
127060259162 cycles # 2.207 GHz
57.243690077 seconds time elapsed
For comparison, the numbers below were captured using CONFIG_REFCOUNT_FULL,
which uses the validation routines implemented in C using cmpxchg():
perf stat -B -- echo REFCOUNT_TIMING >/sys/kernel/debug/provoke-crash/DIRECT
Performance counter stats for 'cat /dev/fd/63':
191057942484 cycles # 2.207 GHz
86.568269904 seconds time elapsed
As a bonus, this code has been found to perform significantly better on
systems with many CPUs, due to the fact that it no longer relies on the
load/compare-and-swap combo performed in a tight loop, which is what we
emit for cmpxchg() on arm64.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>,
Cc: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Cc: Jan Glauber <jglauber@cavium.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Cc: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[kdrag0n]
- Backported to k4.14 from:
https://www.spinics.net/lists/arm-kernel/msg735992.html
- Benchmarked on sm8150 using perf and LKDTM REFCOUNT_TIMING:
https://docs.google.com/spreadsheets/d/14CctCmWzQAGhOmpHrBJfXQy_HuNFTpEkMEYSUGKOZR8/edit
| Fast checking | Generic checking
---------+--------------------+-----------------------
Cycles | 79235532616 | 102554062037
| 79391767237 | 99625955749
Time | 32.99879212 sec | 42.5354029 sec
| 32.97133254 sec | 41.31902045 sec
Average:
Cycles | 79313649927 | 101090008893
Time | 33 sec | 42 sec
Conflicts:
arch/arm64/kernel/traps.c
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Volodymyr Zhdanov <wight554@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Mixing kernel and user debug hooks together is highly error-prone as it
relies on all of the hooks to figure out whether the exception came from
kernel or user, and then to act accordingly.
Make our debug hook code a little more robust by maintaining separate
hook lists for user and kernel, with separate registration functions
to force callers to be explicit about the exception levels that they
care about.
Conflicts:
arch/arm64/kernel/traps.c
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Volodymyr Zhdanov <wight554@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
The implementation of flush_icache_range() includes instruction sequences
which are themselves patched at runtime, so it is not safe to call from
the patching framework.
This patch reworks the alternatives cache-flushing code so that it rolls
its own internal D-cache maintenance using DC CIVAC before invalidating
the entire I-cache after all alternatives have been applied at boot.
Modules don't cause any issues, since flush_icache_range() is safe to
call by the time they are loaded.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Rohit Khanna <rokhanna@nvidia.com>
Cc: Alexander Van Brunt <avanbrunt@nvidia.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Patching kernel instructions at runtime requires other CPUs to undergo
a context synchronisation event via an explicit ISB or an IPI in order
to ensure that the new instructions are visible. This is required even
for "hotpatch" instructions such as NOP and BL, so avoid optimising in
this case and always go via stop_machine() when performing general
patching.
ftrace isn't quite as strict, so it can continue to call the nosync
code directly.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
When invalidating the instruction cache for a kernel mapping via
flush_icache_range(), it is also necessary to flush the pipeline for
other CPUs so that instructions fetched into the pipeline before the
I-cache invalidation are discarded. For example, if module 'foo' is
unloaded and then module 'bar' is loaded into the same area of memory,
a CPU could end up executing instructions from 'foo' when branching into
'bar' if these instructions were fetched into the pipeline before 'foo'
was unloaded.
Whilst this is highly unlikely to occur in practice, particularly as
any exception acts as a context-synchronizing operation, following the
letter of the architecture requires us to execute an ISB on each CPU
in order for the new instruction stream to be visible.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Commit 959bf2fd03b5 ("arm64: percpu: Rewrite per-cpu ops to allow use of
LSE atomics") introduced alternative code sequences for the arm64 percpu
atomics, so that the LSE instructions can be patched in at runtime if
they are supported by the CPU.
Unfortunately, when patching in the LSE sequence for a value-returning
pcpu atomic, the argument registers are the wrong way round. The
implementation of this_cpu_add_return() therefore ends up adding
uninitialised stack to the percpu variable and returning garbage.
As it turns out, there aren't very many users of the value-returning
percpu atomics in mainline and we only spotted this due to a failure in
the kprobes selftests. In this case, when attempting to single-step over
the out-of-line instruction slot, the debug monitors would not be
enabled because calling this_cpu_inc_return() on the kernel debug
monitor refcount would fail to detect the transition from 0. We would
consequently execute past the slot and take an undefined instruction
exception from the kernel, resulting in a BUG:
| kernel BUG at arch/arm64/kernel/traps.c:421!
| PREEMPT SMP
| pc : do_undefinstr+0x268/0x278
| lr : do_undefinstr+0x124/0x278
| Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
| Call trace:
| do_undefinstr+0x268/0x278
| el1_undef+0x10/0x78
| 0xffff00000803c004
| init_kprobes+0x150/0x180
| do_one_initcall+0x74/0x178
| kernel_init_freeable+0x188/0x224
| kernel_init+0x10/0x100
| ret_from_fork+0x10/0x1c
Fix the argument order to get the value-returning pcpu atomics working
correctly when implemented using the LSE instructions.
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
Our percpu code is a bit of an inconsistent mess:
* It rolls its own xchg(), but reuses cmpxchg_local()
* It uses various different flavours of preempt_{enable,disable}()
* It returns values even for the non-returning RmW operations
* It makes no use of LSE atomics outside of the cmpxchg() ops
* There are individual macros for different sizes of access, but these
are all funneled through a switch statement rather than dispatched
directly to the relevant case
This patch rewrites the per-cpu operations to address these shortcomings.
Whilst the new code is a lot cleaner, the big advantage is that we can
use the non-returning ST- atomic instructions when we have LSE.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
This reverts commit 99836317ce2f622e1e70d29770a048c12848c765.
Revert CAF's mutant pick in favor of a fresh backport from mainline.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
The CAS instructions implicitly access only the relevant bits of the "old"
argument, so there is no need for explicit masking via type-casting as
there is in the LL/SC implementation.
Move the casting into the LL/SC code and remove it altogether for the LSE
implementation.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
We need linux/compiler.h for unreachable(), so #include it here.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
We want to avoid pulling linux/preempt.h into cmpxchg.h, since that can
introduce a circular dependency on linux/bitops.h. linux/preempt.h is
only needed by the per-cpu cmpxchg implementation, which is better off
alongside the per-cpu xchg implementation in percpu.h, so move it there
and add the missing #include.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
Having asm/cmpxchg.h pull in linux/bug.h is problematic because this
ends up pulling in the atomic bitops which themselves may be built on
top of atomic.h and cmpxchg.h.
Instead, just include build_bug.h for the definition of BUILD_BUG.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
When the LL/SC atomics are moved out-of-line, they are annotated as
notrace and exported to modules. Ensure we pull in the relevant include
files so that these macros are defined when we need them.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
In cases where x30 is used as a temporary in the out-of-line ll/sc atomics
(e.g. atomic_fetch_add), the compiler tends to put out a full stackframe,
which included pointing the x29 at the new frame.
Since these things aren't traceable anyway, we can pass -fomit-frame-pointer
to reduce the work when spilling. Since this is incompatible with -pg, we
also remove that from the CFLAGS for this file.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
TODO: benchmark
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.
The idea is obviously arch-agnostic. This commit moves the config entry
from arch/x86/Kconfig.debug to lib/Kconfig.debug so that all
architectures can benefit from it.
This can make a huge difference in kernel image size especially when
CONFIG_OPTIMIZE_FOR_SIZE is enabled.
For example, I got 3.5% smaller arm64 kernel for v5.1-rc1.
dec file
18983424 arch/arm64/boot/Image.before
18321920 arch/arm64/boot/Image.after
This also slightly improves the "Kernel hacking" Kconfig menu as
e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen') suggested;
this config option would be a good fit in the "compiler option" menu.
Link: http://lkml.kernel.org/r/20190423034959.13525-12-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Brezillon <bbrezillon@kernel.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[kdrag0n: Backported to k4.14]
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
Such relocations are fine if the erratum 843419 fix is disabled.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
ThinLTO is actually quite fast on multi-threaded systems now and doesn't
increase build times by much on my Threadripper 3960X system.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
With LTO, modules with a large number of compilation units maybe end
up exceeding the for loop argument list in the shell. Reduce the
probability for this happening by including only the modules that have
exported symbols.
Bug: 150234396
Change-Id: I4a289aff47e1444aca28d1bd00b125628f39bcd5
Suggested-by: Hsiu-Chang Chen <hsiuchangchen@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
[panchajanya1999] : backported to 4.14-android
Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live>
Signed-off-by: azrim <mirzaspc@gmail.com>
Currently, there is a regression of 689 KiB in Image.gz's size when the
kernel is linked with LLD. This is reduced to 213 KiB when we use -pie
rather than -shared when invoking the linker.
Unfortunately, ld.bfd dislikes this change and regresses in size by 163
KiB with -pie as compared to using -shared. To address this problem, we
add checks so that -pie is used with LLD and -shared is used with
ld.bfd. That way, both linkers are able to perform their best.
List of Image.gz sizes:
ld.bfd -shared: 10,066,988 bytes
ld.bfd -pie: 10,230,316 bytes
LLD -shared: 10,796,872 bytes
LLD -pie: 10,280,168 bytes
Test: kernel compiles and boots with both ld.bfd and LLD
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
The vDSO library is obviously not self-contained, so it doesn't qualify
for -fwhole-program. Using -fwhole-program on the vDSO library breaks
it, so disable -fwhole-program to fix it.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Subsequent to c251c5f ("Makefile: Use O3 for Clang")
Ensure vDSO uses the same optimization level as the rest of the
kernel, which is O3, when compiling with Clang.
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Newer versions of clang only look for $(COMPAT_GCC_TOOLCHAIN_DIR)as [1],
rather than $(COMPAT_GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE_COMPAT)as,
resulting in the following build error:
$ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \
CROSS_COMPILE_COMPAT=arm-linux-gnueabi- LLVM=1 O=out/aarch64 distclean \
defconfig arch/arm64/kernel/vdso32/
...
/home/nathan/cbl/toolchains/llvm-binutils/bin/as: unrecognized option '-EL'
clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)
make[3]: *** [arch/arm64/kernel/vdso32/Makefile:181: arch/arm64/kernel/vdso32/note.o] Error 1
...
Adding the value of CROSS_COMPILE_COMPAT (adding notdir to account for a
full path for CROSS_COMPILE_COMPAT) fixes this issue, which matches the
solution done for the main Makefile [2].
[1]: 3452a0d8c1
[2]: https://lore.kernel.org/lkml/20200721173125.1273884-1-maskray@google.com/
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1099
Link: https://lore.kernel.org/r/20200723041509.400450-1-natechancellor@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
[dl: Backported to 4.14, depends on commit 38253a0ed057c49dde77588eef05fdcb4008ce0b
("vdso32: Invoke clang with correct path to GCC toolchain")
from the Pixel 4 kernel]
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: azrim <mirzaspc@gmail.com>
The vDSO needs to be build with x18 reserved in order to accommodate
userspace platform ABIs built on top of Linux that use the register
to carry inter-procedural state, as provided for by the AAPCS.
An example of such a platform ABI is the one that will be used by an
upcoming version of Android.
Although this change is currently a no-op due to the fact that the vDSO
is currently implemented in pure assembly on arm64, it is necessary
in order to prepare for another change [1] that will add C code to
the vDSO.
[1] https://patchwork.kernel.org/patch/10044501/
Change-Id: Icaac4b1c9127d81d754d3b8688274e9afc781760
Signed-off-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Mark Salyzyn <salyzyn@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Khusika Dhamar Gusti <khusikadhamar@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Deal with regression from 7b4edf240be4a86ede06af51caf056fb1e80682e
("clocksource: arch_timer: make virtual counter access configurable")
by selecting ARM_ARCH_TIMER_VCT_ACCESS if COMPAT_VDSO is selected.
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 72417836
Change-Id: Ie11498880941977a8014adb8b8a3b07a6ef82e27
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Clang needs to have access to a GCC toolchain which we advertise using
the command line option --gcc-toolchain=. Clang previously picked the
wrong toolchain which resulted in the following error message:
/..//bin/as: unrecognized option '-EL'
Bug: 123422077
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Change-Id: I3e339dd446b71e2c75eb9e2c186eba715b3771cd
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
Currently, in order to build the compat VDSO with Clang, this format
has to be used:
PATH=${BIN_FOLDER}:${PATH} make CC=clang
Prior to the addition of this file, this format would also be
acceptable:
make CC=${BIN_FOLDER}/clang
This is because the vdso32 Makefile uses cc-name instead of CC. After
this path, CC will still evaluate to clang for the first case as
expected but now the second case will use the specified Clang, rather
than the host's copy, which may not be compatible as shown below.
/usr/bin/as: unrecognized option '-mfloat-abi=soft'
clang-6.0: error: assembler command failed with exit code 1
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
(cherry picked from https://patchwork.kernel.org/patch/10419665)
Bug: 80184372
Change-Id: If90a5a4edbc2b5883b4c78161081ebeafbebdcde
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
clock_gettime(CLOCK_BOOTTIME,) slows down after significant
accumulation of suspend time creating a large offset between it and
CLOCK_MONOTONIC time. The __iter_div_u64_rem() is only for the usage
of adding a few second+nanosecond times and saving cycles on more
expensive remainder and division operations, but iterates one second
at a time which quickly goes out of scale in CLOCK_BOOTTIME's case
since it was specified as nanoseconds only.
The fix is to split off seconds from the boot time and cap the
nanoseconds so that __iter_div_u64_rem does not iterate.
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 72406285
Change-Id: Ia647ef1e76b7ba3b0c003028d4b3b955635adabb
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
(cherry pick from url https://patchwork.kernel.org/patch/10060447/)
Expose the new compat vDSO via the COMPAT_VDSO config option.
The option is not enabled in defconfig because we really need a 32-bit
compiler this time, and we rely on the user to provide it themselves
by setting CROSS_COMPILE_ARM32. Therefore enabling the option by
default would make little sense, since the user must explicitly set a
non-standard environment variable anyway.
CONFIG_COMPAT_VDSO is not directly used in the code, because we want
to ignore it (build as if it were not set) if the user didn't set
CROSS_COMPILE_ARM32. If the variable has been set to a valid prefix,
CONFIG_VDSO32 will be set; this is the option that the code and
Makefiles test.
For more flexibility, like CROSS_COMPILE, CROSS_COMPILE_ARM32 can also
be set via CONFIG_CROSS_COMPILE_ARM32 (the environment variable
overrides the config option, as expected).
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Bug: 63737556
Bug: 20045882
Change-Id: Ie8a7d6c2b5ba3edca591a9a953ce99ec792da882
Signed-off-by: azrim <mirzaspc@gmail.com>
(cherry pick from url https://patchwork.kernel.org/patch/10060459/)
If the compat vDSO is enabled, install it in compat processes. In this
case, the compat vDSO replaces the sigreturn page (it provides its own
sigreturn trampolines).
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Bug: 63737556
Bug: 20045882
Change-Id: Ia6acf4c3ffea636bc750ac00853ea762c182e5b5
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
(cherry pick from url https://patchwork.kernel.org/patch/10060445/)
Provide the files necessary for building a compat (AArch32) vDSO in
kernel/vdso32.
This is mostly an adaptation of the arm vDSO. The most significant
change in vgettimeofday.c is the use of the arm64 vdso_data struct,
allowing the vDSO data page to be shared between the 32 and 64-bit
vDSOs. Additionally, a different set of barrier macros is used (see
aarch32-barrier.h), as we want to support old 32-bit compilers that
may not support ARMv8 and its new barrier arguments (*ld).
In addition to the time functions, sigreturn trampolines are also
provided, aiming at replacing those in the sigreturn page as the
latter don't provide any unwinding information (and it's easier to
have just one "user code" page). arm-specific unwinding directives are
used, based on glibc's implementation. Symbol offsets are made
available to the kernel using the same method as the 64-bit vDSO.
There is unfortunately an important caveat: we cannot get away with
hand-coding 32-bit instructions like in kernel/kuser32.S, this time we
really need a 32-bit compiler. The compat vDSO Makefile relies on
CROSS_COMPILE_ARM32 to provide a 32-bit compiler, appropriate logic
will be added to the arm64 Makefile later on to ensure that an attempt
to build the compat vDSO is made only if this variable has been set
properly.
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Take an effort to recode the arm64 vdso code from assembler to C
previously submitted by Andrew Pinski <apinski@cavium.com>, rework
it for use in both arm and arm64, overlapping any optimizations
for each architecture.
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Bug: 63737556
Bug: 20045882
Change-Id: I3fb9d21b29bd9fec1408f2274d090e6def546b0d
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
(cherry pick from url https://patchwork.kernel.org/patch/10060439/)
Move the logic for setting up mappings and pages for the vDSO into
static functions. This makes the vDSO setup code more consistent with
the compat side and will allow to reuse it for the future compat vDSO.
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Bug: 63737556
Bug: 20045882
Change-Id: I13e84479591091669190360f2a7f4d04462e6344
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
(cherry pick from url https://patchwork.kernel.org/patch/10060431/)
If the compat vDSO is enabled, we need to set AT_SYSINFO_EHDR in the
auxiliary vector of compat processes to the address of the vDSO code
page, so that the dynamic linker can find it (just like the regular vDSO).
Note that we cast context.vdso to Elf64_Off, instead of elf_addr_t,
because elf_addr_t is Elf32_Off in compat_binfmt_elf.c, and casting
context.vdso to u32 would trigger a pointer narrowing warning.
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Bug: 63737556
Bug: 20045882
Change-Id: I5d0b191d3b2f4c0b2ec31fe9faef0246253635ce
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
(cherry pick from url https://patchwork.kernel.org/patch/10060449/)
If the compat vDSO is enabled, it replaces the sigreturn page.
Therefore, we use the sigreturn trampolines the vDSO provides instead.
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Bug: 63737556
Bug: 20045882
Change-Id: Ic0933741e321e1bf66409b7e190a776f12948024
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
(cherry pick from url https://patchwork.kernel.org/patch/10053549/)
Add time() vdso support to match up with existing support in the x86's
vdso. Currently benefitting arm and arm64 which uses the common
vgettimeofday.c implementation. On arm provides about a ~14 fold
improvement in speed over the straight syscall, and about a ~5 fold
improvement in speed over an alternate library implementation that
relies on the vdso call to gettimeofday to fulfill the request.
We can provide __vdso_time even if we can not provide a speed
enhanced __vdso_gettimeofday.
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Bug: 63737556
Bug: 20045882
Change-Id: I0bb3c6bafe57f9ed69350e2dd54edaae58316e8f
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
(cherry picked from url https://patchwork.kernel.org/patch/10006025/)
This will be needed to provide unwinding information in compat
sigreturn trampolines, part of the future compat vDSO. There is no
obvious header the compat_sig* struct's should be moved to, so let's
put them in signal32.h.
Also fix minor style issues reported by checkpatch.
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Gross <andy.gross@linaro.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Bug: 63737556
Bug: 20045882
Change-Id: I9c23dd6b56ca48c0953cbf78ccb7b49ded906052
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
(cherry pick from url https://patchwork.kernel.org/patch/10044539/)
Take an effort to recode the arm64 vdso code from assembler to C
previously submitted by Andrew Pinski <apinski@cavium.com>, rework
it for use in both arm and arm64, overlapping any optimizations
for each architecture. But instead of landing it in arm64, land the
result into lib/vdso and unify both implementations to simplify
future maintenance.
If ARCH_PROVIDES_TIMER is not defined, do not expose gettimeofday.
libc will default directly to syscall. Also ifdef clock_gettime
switch cases and stubs if not supported and other unused components.
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: James Morse <james.morse@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Gross <andy.gross@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Andrew Pinski <apinski@cavium.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Bug: 63737556
Bug: 20045882
Change-Id: I362a7114db0aac800e16eb90d14a8739e18f42e4
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>