msm-4.14

mirror of https://github.com/rd-stuffs/msm-4.14.git synced 2025-02-20 11:45:48 +08:00

Author	SHA1	Message	Date
Sultan Alsawaf	c8313af451	irqchip/gic-v3: Remove pr_devel message containing smp_processor_id() This call to smp_processor_id() forces gic_raise_softirq() to require being called while preemption is disabled, which isn't an actual requirement. When called without preemption disabled, smp_processor_id() is thus used incorrectly and generates a warning splat with the relevant kernel debug options enabled. Get rid of the useless pr_devel message outright to fix the incorrect smp_processor_id() usage. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:33 +07:00
Sultan Alsawaf	9d017661f5	pinctrl: msm: Remove explicit barriers from mmio ops where unneeded For the vast majority of mmio operations in this driver, explicit memory barriers aren't needed either because a data dependency between a read and write already exists, or because of the presence of the spin locks which execute a full memory barrier. Removing all the unneeded explicit barriers considerably reduces overhead for pinctrl operations, which in turn benefits things like i2c. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:33 +07:00
Sultan Alsawaf	4b03b283a3	locking/rwsem: Don't hog RCU read lock while optimistically spinning There's no reason to hold an RCU read lock the entire time while optimistically spinning for a rwsem. This can needlessly lengthen RCU grace periods and slow down synchronize_rcu() when it doesn't brute force the RCU grace period via rcupdate.rcu_expedited=1. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:33 +07:00
Sultan Alsawaf	2fab6f5aa1	locking/mutex: Don't hog RCU read lock while optimistically spinning There's no reason to hold an RCU read lock the entire time while optimistically spinning for a mutex lock. This can needlessly lengthen RCU grace periods and slow down synchronize_rcu() when it doesn't brute force the RCU grace period via rcupdate.rcu_expedited=1. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:33 +07:00
Sultan Alsawaf	e731832b11	cpuidle: Mark CPUs idle as late as possible to avoid unneeded IPIs It isn't guaranteed a CPU will idle upon calling lpm_cpuidle_enter(), since it could abort early at the need_resched() check. In this case, it's possible for an IPI to be sent to this "idle" CPU needlessly, thus wasting power. For the same reason, it's also wasteful to keep a CPU marked idle even after it's woken up. Reduce the window that CPUs are marked idle to as small as it can be in order to improve power consumption. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:32 +07:00
Sultan Alsawaf	1a33937108	cpuidle: Optimize pm_qos notifier callback and IPI semantics The pm_qos callback currently suffers from a number of pitfalls: it sends IPIs to CPUs that may not be idle, waits for those IPIs to finish propagating while preemption is disabled (resulting in a long busy wait for the pm_qos_update_target() caller), and needlessly calls a no-op function when the IPIs are processed. Optimize the pm_qos notifier by only sending IPIs to CPUs that are idle, and by using arch_send_wakeup_ipi_mask() instead of smp_call_function_many(). Using IPI_WAKEUP instead of IPI_CALL_FUNC, which is what smp_call_function_many() uses behind the scenes, has the benefit of doing zero work upon receipt of the IPI; IPI_WAKEUP is designed purely for sending an IPI without a payload, whereas IPI_CALL_FUNC does unwanted extra work just to run the empty smp_callback() function. Determining which CPUs are idle is done efficiently with an atomic bitmask instead of using the wake_up_if_idle() API, which checks the CPU's runqueue in an RCU read-side critical section and under a spin lock. Not very efficient in comparison to a simple, atomic bitwise operation. A cpumask isn't needed for this because NR_CPUS is guaranteed to fit within a word. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:32 +07:00
Sultan Alsawaf	a4daa912eb	arm64: Allow IPI_WAKEUP to be used outside of the ACPI parking protocol An empty IPI is useful for cpuidle to wake sleeping CPUs without causing them to do unnecessary work upon receipt of the IPI. IPI_WAKEUP fills this use-case nicely, so let it be used outside of the ACPI parking protocol. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:32 +07:00
Sultan Alsawaf	3142fce40a	qos: Don't disable interrupts while holding pm_qos_lock None of the pm_qos functions actually run in interrupt context; if some driver calls pm_qos_update_target in interrupt context then it's already broken. There's no need to disable interrupts while holding pm_qos_lock, so don't do it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:32 +07:00
Sultan Alsawaf	eb754bba42	Revert "mutex: Add a delay into the SPIN_ON_OWNER wait loop." This reverts commit 1e5a5b5e00e9706cd48e3c87de1607fcaa5214d2. This doesn't make sense for a few reasons. Firstly, upstream uses this mutex code and it works fine on all arches; why should arm be any different? Secondly, once the mutex owner starts to spin on `wait_lock`, preemption is disabled and the owner will be in an actively-running state. The optimistic mutex spinning occurs when the lock owner is actively running on a CPU, and while the optimistic spinning takes place, no attempt to acquire `wait_lock` is made by the new waiter. Therefore, it is guaranteed that new mutex waiters which optimistically spin will not contend the `wait_lock` spin lock that the owner needs to acquire in order to make forward progress. Another potential source of `wait_lock` contention can come from tasks that call mutex_trylock(), but this isn't actually problematic (and if it were, it would affect the MUTEX_SPIN_ON_OWNER=n use-case too). This won't introduce significant contention on `wait_lock` because the trylock code exits before attempting to lock `wait_lock`, specifically when the atomic mutex counter indicates that the mutex is already locked. So in reality, the amount of `wait_lock` contention that can come from mutex_trylock() amounts to only one task. And once it finishes, `wait_lock` will no longer be contended and the previous mutex owner can proceed with clean up. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:31 +07:00
Sultan Alsawaf	6d0b8ae4f4	Revert "usb: gadget: mtp: Increase RX transfer length to 1M" This reverts commit 0db49c2550a09458db188fb7312c66783c5af104. This results in kmalloc() abuse to find a large number of contiguous pages, which thrashes the page allocator and hurts overall performance. I couldn't reproduce the improved MTP throughput that this commit claimed either, so just revert it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:31 +07:00
Sultan Alsawaf	2aef0ad096	Revert "usb: gadget: f_mtp: Increase default TX buffer size" This reverts commit a9a60c58e0fa21c41ac284282949187b13bdd756. This results in kmalloc() abuse to find a large number of contiguous pages, which thrashes the page allocator and hurts overall performance. I couldn't reproduce the improved MTP throughput that this commit claimed either, so just revert it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:31 +07:00
Sultan Alsawaf	c6655a8449	msm: kgsl: Don't allocate memory dynamically for drawobj sync structs The memory allocated dynamically here is just used to store a single instance of a struct. Allocate both possible structs on the stack instead of allocating them dynamically to improve performance. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:31 +07:00
Sultan Alsawaf	c56590f19f	msm: kgsl: Wake GPU upon receiving an ioctl rather than upon touch input Waking the GPU upon touch wastes power when the screen is being touched in a way that does not induce animation or any actual need for GPU usage. Instead of preemptively waking the GPU on touch input, wake it up upon receiving a IOCTL_KGSL_GPU_COMMAND ioctl since it is a sign that the GPU will soon be needed. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:31 +07:00
Sultan Alsawaf	69dd40d271	msm: camera: Fix memory leak in cam_res_mgr_probe() Since we have multiple CCIs with qcom,cam-res-mgr defined, the global cam_res pointer gets overwritten each time a CCI probes, causing memory to be leaked. Since it appears that the single global cam_res pointer is intentional, let's just skip superfluous cam_res allocations to fix the leak. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:30 +07:00
edenhuang	157ad67d60	msm: camera: Unmap secure buffers in secure usecase Detach and unmap DMA buffers obtained previously from DMA attach and mappings respectively. Port from ./drivers/media/platform/msm/camera_v2/common/cam_smmu_api.c Bug: 168589064 Signed-off-by: edenhuang <edenhuang@google.com> Change-Id: Ib25ee5973b9f276ed99edb1805415c4c6a727249 Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:30 +07:00
Sultan Alsawaf	cc7a139419	clk: qcom: clk-cpu-osm: Set each CPU clock to its max when waking up The default frequency on Qualcomm CPUs is the lowest frequency supported by the CPU. This hurts latency when waking from suspend, as each CPU coming online runs at its lowest frequency until the governor can take over later. To speed up waking from suspend, hijack the CPUHP_AP_ONLINE hook and use it to set the highest available frequency on each CPU as they come online. This is done behind the governor's back but it's fine because the governor isn't running at this point in time for a CPU that's coming online. This speeds up waking from suspend significantly. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Change-Id: Ibb92aa78b858b00b6687340f2efe66f86b866514 Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:30 +07:00
Danny Lin	bf489f7790	ARM64/dts: sdmmagpie: Disable broken IRQ detection Our kernel only runs on known systems where broken IRQs would already have been discovered, so disable this to reduce overhead in the IRQ handling path. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:30 +07:00
Nanda Okitavera	a99fa0345d	block: zram: Use lz4 as default zram compression Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:15:37 +07:00
celtare21	6e2e801fde	block,cfq: Disable logging if trace is not enabled Signed-off-by: celtare21 <celtare21@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:15:37 +07:00
celtare21	03f9f56939	block,cfq: Set cfq_back_penalty to 1 Signed-off-by: celtare21 <celtare21@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:15:36 +07:00
celtare21	98d748d79e	block,cfq: Set cfq_quantum to 16 Signed-off-by: celtare21 <celtare21@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:15:36 +07:00
Tyler Nijmeh	ebc063413f	drivers: thermal: Don't qualify thermal polling as high priority Don't take priority over other workqueues. Signed-off-by: Tyler Nijmeh <tylernij@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-22 11:33:33 +00:00
Tyler Nijmeh	c8b5360d5d	drivers: char: mem: Reroute random fops to urandom Arter has done a similar commit, where the random fops routed the read hook to the urandom_read method. However, this leads to a warning about random_read being unused, as well as having the poll hook still linked to random_poll. This commit should solve both of those issues, from the roots. Signed-off-by: Tyler Nijmeh <tylernij@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-22 11:33:33 +00:00
ankusa	09b3fcf699	msm: kgsl: Parallelization of kgsl_3d_init kgsl_3d_init is taking a lot of time in execution. Created a kernel thread to save kernel boot time. Change-Id: I35e7a1525204b5be4301762aa0e41c9a159784d3 Signed-off-by: ankusa <ankusa@codeaurora.org> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-22 11:33:28 +00:00
Adithya R	1f3fd98715	Revert "msm: kgsl: Parallelization of kgsl_3d_init for AUTO" This reverts commit 9aa3c0b5c856af9396e729f61cba69f78dc3630c. Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-22 11:33:28 +00:00
Sultan Alsawaf	2dd3853feb	msm: kgsl: Don't try to wait for fences that have been signaled Trying to wait for fences that have already been signaled incurs a high setup cost, since dynamic memory allocation must be used. Avoiding this overhead when it isn't needed improves performance. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-22 11:33:27 +00:00
Yaroslav Furman	4afd54aefb	drivers: msm: Don't copy fence names by default Same concept as here: `fe23bc0887` Extended version that covers more cases. Signed-off-by: Yaroslav Furman <yaro330@gmail.com> [kdrag0n: Fixed compile error in Adreno driver when debugfs is enabled] Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-22 11:33:27 +00:00
Ken Huang	f570b453a6	drm/msm/sde: Init IRQ lists after allocated node In some use case, IRQ lists are added into user_event_list without init them. Immediately init IRQ lists after allocated node to avoid accessing null pointer. Bug: 129427630 Test: boot to Android without panel, and ADB command can work Change-Id: I39c3b50e7c11cd6b22b7dc5e9461288608694e26 Signed-off-by: Ken Huang <kenbshuang@google.com> (cherry picked from commit f7d5d71d72960c1fd637aecc5de36e413f37b92e) Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-22 11:33:27 +00:00
Danny Lin	8dc628f95c	drm/msm/sde: Remove register write debug logging Writing to registers is frequent enough that there is a measurably significant portion of CPU time spent on checking the debug mask for whether to log. Remove the check and logging call altogether to eliminate the overhead. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-22 11:33:26 +00:00
Danny Lin	f3745cf11f	drm/msm/sde: Cache register values when performing clock control Remote register I/O amounts to a measurably significant portion of CPU time due to how frequently this function is used. Cache the value of each register on-demand and use this value in future invocations to mitigate the expensive I/O. Co-authored-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: celtare21 <celtare21@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-22 11:33:26 +00:00
Adrian Salido	cbe8927024	drm/msm: dsi-ctrl: Remove extra buffer copy Speed up command transfers by reducing unnecessary intermediate buffer allocation. The buffer allocation is only needed if using FIFO command transfer, but otherwise there's no need to allocate and memcpy into intermediate buffers. Bug: 136715342 Change-Id: Ie540c285655ec86deb046c187f1e27538fd17d1c Signed-off-by: Adrian Salido <salidoa@google.com> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-22 11:33:25 +00:00
celtare21	f5a0e7bd38	ARM64/dts: sdmmagpie: Set silver cluster qos-cores for msm_fastrpc Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-22 11:33:25 +00:00
Danny Lin	493152d36e	ARM64/configs: surya: Enable ARMv8.1 LSE atomics sdmmagpie's CPUs (semi-custom derivations of Cortex-A55 and Cortex-A76) support ARMv8.1's efficient LSE atomic instructions as per /proc/cpuinfo: CPU feature detection messages in printk confirm the support: Since our CPUs support it, enable use of LSE atomics to speed up atomic operations since they are implemented in hardware instead of being synthesized by a few instructions in software. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:54 +00:00
Danny Lin	9fd2c88771	arm64: lse: Omit LL/SC alternatives patching CAF appears to have messed with some code in this kernel related to LSE atomics and/or alternatives, causing the combined LSE + LL/SC out-of-line calling code to be too big for its section when compiling kernel/locking/spinlock.c. This causes gas to fail with a confusing error: /tmp/spinlock-79343b.s: Assembler messages: /tmp/spinlock-79343b.s:61: Error: attempt to move .org backwards /tmp/spinlock-79343b.s:157: Error: attempt to move .org backwards Clang's integrated assembler is more verbose and provides a more helpful error that points to the alternatives code as being the culprit: In file included from ../kernel/locking/spinlock.c:20: In file included from ../include/linux/spinlock.h:88: ../arch/arm64/include/asm/spinlock.h:76:15: error: invalid .org offset '56' (at offset '60') asm volatile(ARM64_LSE_ATOMIC_INSN( ^ ../arch/arm64/include/asm/lse.h:36:2: note: expanded from macro 'ARM64_LSE_ATOMIC_INSN' ALTERNATIVE(llsc, lse, ARM64_HAS_LSE_ATOMICS) ^ ../arch/arm64/include/asm/alternative.h:281:2: note: expanded from macro 'ALTERNATIVE' _ALTERNATIVE_CFG(oldinstr, newinstr, __VA_ARGS__, 1) ^ ../arch/arm64/include/asm/alternative.h:83:2: note: expanded from macro '_ALTERNATIVE_CFG' __ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg), 0) ^ ../arch/arm64/include/asm/alternative.h:73:16: note: expanded from macro '__ALTERNATIVE_CFG' ".popsection\n\t" \ ^ <inline asm>:35:7: note: instantiated into assembly here .org . - (664b-663b) + (662b-661b) ^ Omitting the alternatives code indeed reduces the size enough to make everything compile successfully. We don't need the patching anyway because we will only enable CONFIG_ARM64_LSE_ATOMICS when the target CPU is known to support LSE atomics with 100% certainty, so kill all the dynamic out-of-line LL/SC patching code. This change also has the side-effect of reducing the I-cache footprint of these critical locking and atomic paths, which can reduce cache thrashing and increase overall performance. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:54 +00:00
Danny Lin	b4e1f81b12	arm64: lse: Prefetch operands to speed up atomic operations On a Kryo 485 CPU (semi-custom Cortex-A76 derivative) in a Snapdragon 855 (SM8150) SoC, switching from traditional LL/SC atomics to LSE causes LKDTM's ATOMIC_TIMING test to regress by 2x: LL/SC ATOMIC_TIMING: 34.14s 34.08s LSE ATOMIC_TIMING: 70.84s 71.06s Prefetching the target operands fixes the regression and makes LSE perform better than LSE as expected: LSE+prfm ATOMIC_TIMING: 21.36s 21.21s "dd if=/dev/zero of=/dev/null count=10000000" also runs faster: LL/SC: 3.3 3.2 3.3 s LSE: 3.1 3.2 3.2 s LSE+p: 2.3 2.3 2.3 s Commit 0ea366f5e1b6413a6095dce60ea49ae51e468b61 applied the same change to LL/SC atomics, but it was never ported to LSE. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:54 +00:00
Ard Biesheuvel	c35aad0595	FROMLIST: arm64: kernel: Implement fast refcount checking This adds support to arm64 for fast refcount checking, as contributed by Kees for x86 based on the implementation by grsecurity/PaX. The general approach is identical: the existing atomic_t helpers are cloned for refcount_t, with the arithmetic instruction modified to set the PSTATE flags, and one or two branch instructions added that jump to an out of line handler if overflow, decrement to zero or increment from zero are detected. One complication that we have to deal with on arm64 is the fact that it has two atomics implementations: the original LL/SC implementation using load/store exclusive loops, and the newer LSE one that does mostly the same in a single instruction. So we need to clone some parts of both for the refcount handlers, but we also need to deal with the way LSE builds fall back to LL/SC at runtime if the hardware does not support it. As is the case with the x86 version, the performance gain is substantial (ThunderX2 @ 2.2 GHz, using LSE), even though the arm64 implementation incorporates an add-from-zero check as well: perf stat -B -- echo ATOMIC_TIMING >/sys/kernel/debug/provoke-crash/DIRECT 116252672661 cycles # 2.207 GHz 52.689793525 seconds time elapsed perf stat -B -- echo REFCOUNT_TIMING >/sys/kernel/debug/provoke-crash/DIRECT 127060259162 cycles # 2.207 GHz 57.243690077 seconds time elapsed For comparison, the numbers below were captured using CONFIG_REFCOUNT_FULL, which uses the validation routines implemented in C using cmpxchg(): perf stat -B -- echo REFCOUNT_TIMING >/sys/kernel/debug/provoke-crash/DIRECT Performance counter stats for 'cat /dev/fd/63': 191057942484 cycles # 2.207 GHz 86.568269904 seconds time elapsed As a bonus, this code has been found to perform significantly better on systems with many CPUs, due to the fact that it no longer relies on the load/compare-and-swap combo performed in a tight loop, which is what we emit for cmpxchg() on arm64. Cc: Will Deacon <will.deacon@arm.com> Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>, Cc: Kees Cook <keescook@chromium.org> Cc: Catalin Marinas <catalin.marinas@arm.com>, Cc: Jan Glauber <jglauber@cavium.com>, Cc: Linus Torvalds <torvalds@linux-foundation.org>, Cc: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [kdrag0n] - Backported to k4.14 from: https://www.spinics.net/lists/arm-kernel/msg735992.html - Benchmarked on sm8150 using perf and LKDTM REFCOUNT_TIMING: https://docs.google.com/spreadsheets/d/14CctCmWzQAGhOmpHrBJfXQy_HuNFTpEkMEYSUGKOZR8/edit \| Fast checking \| Generic checking ---------+--------------------+----------------------- Cycles \| 79235532616 \| 102554062037 \| 79391767237 \| 99625955749 Time \| 32.99879212 sec \| 42.5354029 sec \| 32.97133254 sec \| 41.31902045 sec Average: Cycles \| 79313649927 \| 101090008893 Time \| 33 sec \| 42 sec Conflicts: arch/arm64/kernel/traps.c Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Volodymyr Zhdanov <wight554@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:54 +00:00
Will Deacon	6dbef73225	arm64: debug: Separate debug hooks based on target exception level Mixing kernel and user debug hooks together is highly error-prone as it relies on all of the hooks to figure out whether the exception came from kernel or user, and then to act accordingly. Make our debug hook code a little more robust by maintaining separate hook lists for user and kernel, with separate registration functions to force callers to be explicit about the exception levels that they care about. Conflicts: arch/arm64/kernel/traps.c Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Volodymyr Zhdanov <wight554@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:54 +00:00
Will Deacon	1933ba7335	arm64: Avoid flush_icache_range() in alternatives patching code The implementation of flush_icache_range() includes instruction sequences which are themselves patched at runtime, so it is not safe to call from the patching framework. This patch reworks the alternatives cache-flushing code so that it rolls its own internal D-cache maintenance using DC CIVAC before invalidating the entire I-cache after all alternatives have been applied at boot. Modules don't cause any issues, since flush_icache_range() is safe to call by the time they are loaded. Acked-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Rohit Khanna <rokhanna@nvidia.com> Cc: Alexander Van Brunt <avanbrunt@nvidia.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:54 +00:00
Will Deacon	f05f9c086a	arm64: insn: Don't fallback on nosync path for general insn patching Patching kernel instructions at runtime requires other CPUs to undergo a context synchronisation event via an explicit ISB or an IPI in order to ensure that the new instructions are visible. This is required even for "hotpatch" instructions such as NOP and BL, so avoid optimising in this case and always go via stop_machine() when performing general patching. ftrace isn't quite as strict, so it can continue to call the nosync code directly. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:54 +00:00
Will Deacon	4621b4a85b	arm64: IPI each CPU after invalidating the I-cache for kernel mappings When invalidating the instruction cache for a kernel mapping via flush_icache_range(), it is also necessary to flush the pipeline for other CPUs so that instructions fetched into the pipeline before the I-cache invalidation are discarded. For example, if module 'foo' is unloaded and then module 'bar' is loaded into the same area of memory, a CPU could end up executing instructions from 'foo' when branching into 'bar' if these instructions were fetched into the pipeline before 'foo' was unloaded. Whilst this is highly unlikely to occur in practice, particularly as any exception acts as a context-synchronizing operation, following the letter of the architecture requires us to execute an ISB on each CPU in order for the new instruction stream to be visible. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:54 +00:00
Will Deacon	c13613150c	arm64: percpu: Fix LSE implementation of value-returning pcpu atomics Commit 959bf2fd03b5 ("arm64: percpu: Rewrite per-cpu ops to allow use of LSE atomics") introduced alternative code sequences for the arm64 percpu atomics, so that the LSE instructions can be patched in at runtime if they are supported by the CPU. Unfortunately, when patching in the LSE sequence for a value-returning pcpu atomic, the argument registers are the wrong way round. The implementation of this_cpu_add_return() therefore ends up adding uninitialised stack to the percpu variable and returning garbage. As it turns out, there aren't very many users of the value-returning percpu atomics in mainline and we only spotted this due to a failure in the kprobes selftests. In this case, when attempting to single-step over the out-of-line instruction slot, the debug monitors would not be enabled because calling this_cpu_inc_return() on the kernel debug monitor refcount would fail to detect the transition from 0. We would consequently execute past the slot and take an undefined instruction exception from the kernel, resulting in a BUG: \| kernel BUG at arch/arm64/kernel/traps.c:421! \| PREEMPT SMP \| pc : do_undefinstr+0x268/0x278 \| lr : do_undefinstr+0x124/0x278 \| Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____)) \| Call trace: \| do_undefinstr+0x268/0x278 \| el1_undef+0x10/0x78 \| 0xffff00000803c004 \| init_kprobes+0x150/0x180 \| do_one_initcall+0x74/0x178 \| kernel_init_freeable+0x188/0x224 \| kernel_init+0x10/0x100 \| ret_from_fork+0x10/0x1c Fix the argument order to get the value-returning pcpu atomics working correctly when implemented using the LSE instructions. Reported-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:54 +00:00
Will Deacon	9dc172eec7	arm64: percpu: Rewrite per-cpu ops to allow use of LSE atomics Our percpu code is a bit of an inconsistent mess: * It rolls its own xchg(), but reuses cmpxchg_local() * It uses various different flavours of preempt_{enable,disable}() * It returns values even for the non-returning RmW operations * It makes no use of LSE atomics outside of the cmpxchg() ops * There are individual macros for different sizes of access, but these are all funneled through a switch statement rather than dispatched directly to the relevant case This patch rewrites the per-cpu operations to address these shortcomings. Whilst the new code is a lot cleaner, the big advantage is that we can use the non-returning ST- atomic instructions when we have LSE. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:53 +00:00
Danny Lin	99f3d060de	Revert "arm64: percpu: Initialize the ret variable for default case" This reverts commit 99836317ce2f622e1e70d29770a048c12848c765. Revert CAF's mutant pick in favor of a fresh backport from mainline. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:53 +00:00
Will Deacon	4f6dcbb9ee	arm64: Avoid masking "old" for LSE cmpxchg() implementation The CAS instructions implicitly access only the relevant bits of the "old" argument, so there is no need for explicit masking via type-casting as there is in the LL/SC implementation. Move the casting into the LL/SC code and remove it altogether for the LSE implementation. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:53 +00:00
Will Deacon	bedcdcbc4f	arm64: cmpxchg: Include linux/compiler.h in asm/cmpxchg.h We need linux/compiler.h for unreachable(), so #include it here. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:53 +00:00
Will Deacon	d0692e3b62	arm64: move percpu cmpxchg implementation from cmpxchg.h to percpu.h We want to avoid pulling linux/preempt.h into cmpxchg.h, since that can introduce a circular dependency on linux/bitops.h. linux/preempt.h is only needed by the per-cpu cmpxchg implementation, which is better off alongside the per-cpu xchg implementation in percpu.h, so move it there and add the missing #include. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:53 +00:00
Will Deacon	c0b96259eb	arm64: cmpxchg: Include build_bug.h instead of bug.h for BUILD_BUG Having asm/cmpxchg.h pull in linux/bug.h is problematic because this ends up pulling in the atomic bitops which themselves may be built on top of atomic.h and cmpxchg.h. Instead, just include build_bug.h for the definition of BUILD_BUG. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:53 +00:00
Will Deacon	e7f5cdd83f	arm64: lse: Include compiler_types.h and export.h for out-of-line LL/SC When the LL/SC atomics are moved out-of-line, they are annotated as notrace and exported to modules. Ensure we pull in the relevant include files so that these macros are defined when we need them. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:53 +00:00
Will Deacon	34cb0b8d88	arm64: lse: Pass -fomit-frame-pointer to out-of-line ll/sc atomics In cases where x30 is used as a temporary in the out-of-line ll/sc atomics (e.g. atomic_fetch_add), the compiler tends to put out a full stackframe, which included pointing the x29 at the new frame. Since these things aren't traceable anyway, we can pass -fomit-frame-pointer to reduce the work when spilling. Since this is incompatible with -pg, we also remove that from the CFLAGS for this file. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:53 +00:00
Danny Lin	aaffc31a93	configs: surya: Enable optimized inlining TODO: benchmark Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-03-19 07:30:53 +00:00

1 2 3 4 5 ...

799134 Commits