msm-4.14

mirror of https://github.com/rd-stuffs/msm-4.14.git synced 2025-02-20 11:45:48 +08:00

Author	SHA1	Message	Date
Yaroslav Furman	e7c78ba42c	[SQUASH] power: supply: Silence massive debug logspam power/supply: qcom: Silence charging drivers Signed-off-by: Yaroslav Furman <yaro330@gmail.com> power: supply: ti: silence charging spam Signed-off-by: Yaroslav Furman <yaro330@gmail.com> qpnp-qg: silence qg_get_prop_soc_decimal spam Signed-off-by: Yaroslav Furman <yaro330@gmail.com> qpnp-smb5: silence another annoying logger Signed-off-by: Yaroslav Furman <yaro330@gmail.com> qcom: smb5-lib: stfu Signed-off-by: Yaroslav Furman <yaro330@gmail.com> smb5-lib: silence annoying loggers Signed-off-by: Yaroslav Furman <yaro330@gmail.com> power: maxim: silence drivers Leave actual errors enabled. Signed-off-by: Yaroslav Furman <yaro330@gmail.com> power_supply_sysfs: silence 'failed to report 'flash_trigger' spam Signed-off-by: Yaroslav Furman <yaro330@gmail.com> qpnp-smb5: silence some kernel spam Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:48 +07:00
Adithya R	728e2fe7eb	aw8624_haptic: Silence all debug logging * this driver is spammy af and quite annoying so just shut the f*k up already pr_info -> pr_debug Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:48 +07:00
Forenche	19a63dcd5d	input/ts: nt36xxx: Disable debugging Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:48 +07:00
Yaroslav Furman	27812d229c	power/supply: cp_qc30: Silence logging Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:48 +07:00
Yaroslav Furman	47711fb724	techpack/audio: massive stfu Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:47 +07:00
Yaroslav Furman	4e8cb17df7	wl2866d: silence logspam Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:47 +07:00
Yaroslav Furman	fa5936590b	aw8624_haptic: silence and change the wq Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:47 +07:00
Sultan Alsawaf	cc9834cfcf	treewide: Suppress overly verbose log spam This tames quite a bit of the log spam and makes dmesg readable. Uses work from Danny Lin <danny@kdrag0n.dev>. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:47 +07:00
Danny Lin	c48ee556ad	Revert "ARM: dts: msm: Set rcu_expedited for sdm855" This reverts commit 911ed9aadf134f633ec8c933acf06754a328b250. This hurts battery and jitter without a noticeable real-world performance improvement. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Change-Id: I4174bfae9046aae85054ada7a5ec5b25b111e827 Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:47 +07:00
Kuba Wojciechowski	2d6bb1b472	ARM: dts: msm: disable kpti on sdmmagpie As per qualcomm - "SM8150/SM8250/SM8350/SM7250/SM7150/SM6150 - KPTI Not required". It can also help increase performance by a lot in some scenarios. Signed-off-by: Kuba Wojciechowski <nullbytepl@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:46 +07:00
Danny Lin	d58d2e8724	ARM64/dts: sdmmagpie: Remove unused 36 MiB memdump region This reserved memory dump region is intended to be used with the memory dump v2 driver, but we've disabled that and we don't need this memory dumping functionality. Remove the unused region and assiciated driver node to save 36 MiB of memory. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:46 +07:00
Mimi Wu	626cfeeeb6	scsi: ufs: disable clock scaling Disable clock scaling to avoid costly workqueue overheads. Power test results on Blueline: [without this change] Suspend: 9.75mA Idle: 238.26mA Camera Preview: 1309.99mA Partial Wake Lock: 13.67mA [with this change - disable clock scaling] Suspend: 9.73mA (-0.21%) Idle: 215.87mA (-9.4%) Camera Preview: 1181.71mA (-9.79%) Partial Wake Lock: 13.85mA (+1.32%) Bug: 78601190 Signed-off-by: Mimi Wu <mimiwu@google.com> Change-Id: I09f07619ab3e11b05149358c1d06b0d1039decf3 Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:46 +07:00
Arian	b7b95b766d	cpufreq: Ensure the minimal frequency is lower than the maximal frequency * Libperfmgr increases the minimal frequency to 9999999 in order to boost the cpu to the maximal frequency. This usally works because it also increases the max frequency to 9999999 at init. However if we decrease the maximal frequency afterwards, which mi_thermald does, setting the minimal frequency to 9999999 fails because it exceeds the maximal frequency. * We can allow setting a minimal frequency higher than the maximal frequency and setting a lower maximal frequency than the minimal frequency by adjusting the minimal frequency if it exceeds the maximal frequency. Change-Id: I25b7ccde714aac14c8fdb9910857c3bd38c0aa05 Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:46 +07:00
Adrian Salido	5989e9daa4	drm/msm: add idle state sysfs node Add a sysfs mechanism to track the idle state of display subsystem. This allows user space to poll on the idle state node to detect when display goes idle for longer than the time set. Bug: 142159002 Bug: 126304228 Change-Id: I21e3c7b0830a9695db9f65526c111ce5153d1764 Signed-off-by: Adrian Salido <salidoa@google.com> Signed-off-by: Robb Glasser <rglasser@google.com> (cherry picked from commit 11a2193b434cb3130743fbff89a161062883132e) Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:46 +07:00
Danny Lin	d5c02ad549	ARM64: configs: surya: Shorten PELT ramp/delay halflife to 16 ms Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:45 +07:00
Wei Wang	306bcaf596	ARM64: configs: surya: Disable CONFIG_MSM_PERFORMANCE The msm_performance module is only used by QCOM perfd, so remove it. Test: boot Bug: 157242328 Signed-off-by: Wei Wang <wvw@google.com> Change-Id: I981561829c0f26dfe21a907de16a5665c1085775 Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:45 +07:00
Steve Muckle	2e8086808f	ARM64: configs: surya: Turn off CONFIG_SCHED_CORE_CTL This functionality is unused on this platform. Disable it to prevent incurring unnecessary overhead. Change-Id: Ia52ab5fb9a7119ba4495879fa755c846fdde498e Signed-off-by: Steve Muckle <smuckle@google.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:45 +07:00
Wei Wang	709fa09f37	ARM64: configs: surya: Remove unused governors and CONFIG_CPU_BOOST Bug: 115684360 Bug: 113594604 Test: Build Change-Id: I9141b9bac316604730f0e277ca0212e86df3a90d Signed-off-by: Wei Wang <wvw@google.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:45 +07:00
Suren Baghdasaryan	aa2ad7d14d	ARM64: configs: surya: Remove FAIR_GROUP_SCHED This feature is undesirable and not required by Android. Bug: 153203661 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I8adeb2ab1cac3041c812bbab7907df6bac57ac6d Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:44 +07:00
Kyle Lin	78e4c86e5d	ARM64: configs: surya: Disable CONFIG_AUTOCGROUP As previous projects, disable sched autocgroup helps reduce jank in certain workloads Bug: 142549504 Test: build and boot to home Change-Id: I5781468a2b584df93b8ee34b1af49ba6a78f340c Signed-off-by: Kyle Lin <kylelin@google.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:44 +07:00
Andrzej Perczak	8ac1ee44be	configs: vayu: Enable MINIMAL_TRACING_FOR_IORAP Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com> Signed-off-by: alanndz <alanndz7@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:44 +07:00
Andrzej Perczak	2a09fecdeb	trace: Introduce minimal tracing for iorapd Android service iorapd uses mm tracing to check what files are being loaded by an app during launch. It then compiles the traces to perform madvise syscalls to speedup app launch. This, however, enforces tracepoints to be enabled for whole kernel which effects with much larger image size and some performance penalties. It turns out that tracing can be disabled by passing NOTRACE flag. To make use of this flag pass it globally and undef where needed to make tracing work, also, keep mm tracing in place for iorapd. Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com> Signed-off-by: alanndz <alanndz7@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:44 +07:00
Sultan Alsawaf	fd30ed4aae	sched: Add API to migrate the current process to a given cpumask There are some chunks of code in the kernel running in process context where it may be helpful to run the code on a specific set of CPUs, such as when reading some CPU-intensive procfs files. This is especially useful when the code in question must run within the context of the current process (so kthreads cannot be used). Add an API to make this possible, which consists of the following: sched_migrate_to_cpumask_start(): @old_mask: pointer to output the current task's old cpumask @dest: pointer to a cpumask the current task should be moved to sched_migrate_to_cpumask_end(): @old_mask: pointer to the old cpumask generated earlier @dest: pointer to the dest cpumask provided earlier Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:43 +07:00
UtsavBalar1231	0f1991bc8e	ARM64/dts: sdmmagpie-gpu: Use iommu_unmap_fast Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:43 +07:00
Yaroslav Furman	a16e32a89d	input/ts: nt36xxx: Switch to device initcall Device boots and dt2w switch works. Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:43 +07:00
Sultan Alsawaf	fc7fe1470b	spi-geni-qcom: Add a function to get IRQ of device's master Exporting the IRQ of a SPI device's master controller can help device drivers utilize the PM QoS API to force the SPI master IRQ to be serviced with low latency. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:43 +07:00
Sultan Alsawaf	db8c27e097	sched/core: Free dead mm structs asynchronously in finish_task_switch() Although mm structs are not often freed from finish_task_switch() during a context switch, they can still slow things down and waste CPU time on high priority CPUs when freed. Since unbound workqueues are now affined to the little CPU cluster, we can offload the mm struct frees away from the current CPU entirely if it's a high-performance CPU, and defer them onto a little CPU. This reduces the amount of time spent in context switches and reclaims CPU time from more-important CPUs. This is achieved without increasing the size of the mm struct by reusing the mmput async work, which is guaranteed to not be in use by the time mm_count reaches zero. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:42 +07:00
Sultan Alsawaf	38913c58c5	sched/core: Free dead tasks asynchronously in finish_task_switch() Task stacks are frequently freed from finish_task_switch() during a context switch, in addition to the occasional task struct itself. This not only slows down context switches, but also wastes CPU time on high priority CPUs. Since unbound workqueues are now affined to the little CPU cluster, we can offload the task frees away from the current CPU entirely if it's a high-performance CPU, and defer them onto a little CPU. This reduces the amount of time spent in context switches and reclaims CPU time from more-important CPUs. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:42 +07:00
Sultan Alsawaf	9ffd325945	ion: Mark workqueues freeing buffers asynchronously as CPU intensive When exiting the camera, there's a period of intense lag caused by all of the buffer-free workers consuming all CPUs at once for a few seconds. This isn't very good, and freeing the buffers isn't super time critical, so we can lower the burden of the workers by marking the per-heap workqueues as CPU intensive, which offloads the burden of balancing the workers onto the scheduler. Also, mark these workqueues with WQ_MEM_RECLAIM so forward progress is guaranteed via a rescuer thread, since these are used to free memory. The unnecessary WQ_UNBOUND_MAX_ACTIVE is removed as well, since it's only used for increasing the active worker count on large-CPU systems. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:42 +07:00
Sultan Alsawaf	47196f428b	ion: Rewrite to improve clarity and performance The ION driver suffers from massive code bloat caused by excessive debug features, as well as poor lock usage as a result of that. Multiple locks in ION exist to make the debug features thread-safe, which hurts ION's actual performance when doing its job. There are numerous code paths in ION that hold mutexes for no reason and hold them for longer than necessary. This results in not only unwanted lock contention, but also long delays when a mutex lock results in the calling thread getting preempted for a while. All lock usage in ION follows this pattern, which causes poor performance across the board. Furthermore, a big mutex lock is used mostly everywhere, which causes performance degradation due to unnecessary lock overhead. Instead of having a big mutex lock, multiple fine-grained locks are now used, improving performance. Additionally, dup_sg_table is called very frequently, and lies within the rendering path for the display. Speed it up by copying scatterlists in page-sized chunks rather than iterating one at a time. Note that sg_alloc_table zeroes out `table`, so there's no need to zero it out using the memory allocator. This also features a lock-less caching system for DMA attachments and their respective sg_table copies, reducing overhead significantly for code which frequently maps and unmaps DMA buffers and speeding up cache maintenance since iteration through the list of buffer attachments is now lock-free. This is safe since there is no interleaved DMA buffer attaching or accessing for a single ION buffer. Overall, just rewrite ION entirely to fix its deficiencies. This optimizes ION for excellent performance and discards its debug cruft. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Change-Id: I0a21435be1eb409cfe140eec8da507cc35f060dd Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:41 +07:00
Sultan Alsawaf	754104ecf9	iommu: msm: Rewrite to improve clarity and performance This scope of this driver's lock usage is extremely wide, leading to excessively long lock hold times. Additionally, there is lots of excessive linked-list traversal and unnecessary dynamic memory allocation in a critical path, causing poor performance across the board. Fix all of this by greatly reducing the scope of the locks used and by significantly reducing the amount of operations performed when msm_dma_map_sg_attrs() is called. The entire driver's code is overhauled for better cleanliness and performance. Note that ION must be modified to pass a known structure via the private dma_buf pointer, so that the IOMMU driver can prevent races when operating on the same buffer concurrently. This is the only way to eliminate said buffer races without hurting the IOMMU driver's performance. Some additional members are added to the device struct as well to make these various performance improvements possible. This also removes the manual cache maintenance since ION already handles it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:41 +07:00
Julien Thierry	3b50339f91	arm64: Use WFE for long delays The current delay implementation uses the yield instruction, which is a hint that it is beneficial to schedule another thread. As this is a hint, it may be implemented as a NOP, causing all delays to be busy loops. This is the case for many existing CPUs. Taking advantage of the generic timer sending periodic events to all cores, we can use WFE during delays to reduce power consumption. This is beneficial only for delays longer than the period of the timer event stream. If timer event stream is not enabled, delays will behave as yield/busy loops. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:40 +07:00
Julien Thierry	8012529c4a	arm_arch_timer: Expose event stream status The arch timer configuration for a CPU might get reset after suspending said CPU. In order to reliably use the event stream in the kernel (e.g. for delays), we keep track of the state where we can safely consider the event stream as properly configured. After writing to cntkctl, we issue an ISB to ensure that subsequent delay loops can rely on the event stream being enabled. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:40 +07:00
Danny Lin	42273bce73	cpuidle: lpm-levels: Remove debug event logging A measurably significant amount of CPU time is spent on logging events for debugging purposes in lpm_cpuidle_enter. Kill the useless logging to reduce overhead. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:39 +07:00
Kyle Lin	001f5b500c	cpufreq: stats: replace the global lock with atomic We want to reduce the lock contention so replace the global lock with atomic. bug: 127722781 Change-Id: I08ed3d55bf6bf17f31f4017c82c998fb513bad3e Signed-off-by: Kyle Lin <kylelin@google.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:39 +07:00
Arjan van de Ven	08abdcb276	ipv4/tcp: allow the memory tuning for tcp to go a little bigger than default Signed-off-by: Diab Neiroukh <lazerl0rd@thezest.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:38 +07:00
Arjan van de Ven	23a4d4ad7d	fs: ext4: fsync: optimize double-fsync() a bunch There are cases where EXT4 is a bit too conservative sending barriers down to the disk; there are cases where the transaction in progress is not the one that sent the barrier (in other words: the fsync is for a file for which the IO happened more time ago and all data was already sent to the disk). For that case, a more performing tradeoff can be made on SSD devices (which have the ability to flush their dram caches in a hurry on a power fail event) where the barrier gets sent to the disk, but we don't need to wait for the barrier to complete. Any consecutive IO will block on the barrier correctly. Signed-off-by: Diab Neiroukh <lazerl0rd@thezest.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:38 +07:00
Arjan van de Ven	7e04c8929c	kernel: do accept() in LIFO order for cache efficiency Signed-off-by: Diab Neiroukh <lazerl0rd@thezest.dev> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:37 +07:00
Abel Wu	dc2cf4a06b	mm/slub.c: branch optimization in free slowpath The two conditions are mutually exclusive and gcc compiler will optimise this into if-else-like pattern. Given that the majority of free_slowpath is free_frozen, let's provide some hint to the compilers. Tests (perf bench sched messaging -g 20 -l 400000, executed 10x after reboot) are done and the summarized result: un-patched patched max. 192.316 189.851 min. 187.267 186.252 avg. 189.154 188.086 stdev. 1.37 0.99 Signed-off-by: Abel Wu <wuyun.wu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Hewenliang <hewenliang4@huawei.com> Cc: Hu Shiyuan <hushiyuan@huawei.com> Link: http://lkml.kernel.org/r/20200813101812.1617-1-wuyun.wu@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:37 +07:00
Ritesh Harjani	62cda3812f	BACKPORT: ext4: optimize file overwrites In case if the file already has underlying blocks/extents allocated then we don't need to start a journal txn and can directly return the underlying mapping. Currently ext4_iomap_begin() is used by both DAX & DIO path. We can check if the write request is an overwrite & then directly return the mapping information. This could give a significant perf boost for multi-threaded writes specially random overwrites. On PPC64 VM with simulated pmem(DAX) device, ~10x perf improvement could be seen in random writes (overwrite). Also bcoz this optimizes away the spinlock contention during jbd2 slab cache allocation (jbd2_journal_handle). On x86 VM, ~2x perf improvement was observed. Reported-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/88e795d8a4d5cd22165c7ebe857ba91d68d8813e.1600401668.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:37 +07:00
Shachar Raindel	d870b8e400	f2fs: Fix deadlock between f2fs_quota_sync and block_operation This deadlock is hitting Android users (Pixel 3/3a/4) with Magisk, due to frequent umount/mount operations that trigger quota_sync, hitting the race. See https://github.com/topjohnwu/Magisk/issues/3171 for additional impact discussion. In commit db6ec53b7e03, we added a semaphore to protect quota flags. As part of this commit, we changed f2fs_quota_sync to call f2fs_lock_op, in an attempt to prevent an AB/BA type deadlock with quota_sem locking in block_operation. However, rwsem in Linux is not recursive. Therefore, the following deadlock can occur: f2fs_quota_sync down_read(cp_rwsem) // f2fs_lock_op filemap_fdatawrite f2fs_write_data_pages ... block_opertaion down_write(cp_rwsem) - marks rwsem as "writer pending" down_read_trylock(cp_rwsem) - fails as there is a writer pending. Code keeps on trying, live-locking the filesystem. We solve this by creating a new rwsem, used specifically to synchronize this case, instead of attempting to reuse an existing lock. Signed-off-by: Shachar Raindel <shacharr@gmail.com> Fixes: db6ec53b7e03 f2fs: add a rw_sem to cover quota flag changes Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:37 +07:00
Jesse Chan	f76a504091	f2fs: Enlarge min_fsync_blocks to 20 In OPPO's kernel: enlarge min_fsync_blocks to optimize performance - yanwu@TECH.Storage.FS.oF2FS, 2019/08/12 Huawei is also doing this in their production kernel. If this optimization is good for them and shipped with their devices, it should be good for us. Signed-off-by: Jesse Chan <jc@linux.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:36 +07:00
Park Ju Hyung	f3e434ec21	f2fs: Reduce timeout for uncongestion On high fs utilization, congestion is hit quite frequently and waiting for a whooping 20ms is too expensive, especially on critical paths. Reduce it to an amount that is unlikely to affect UI rendering paths. The new times are as follows: 100 Hz => 1 jiffy (effective: 10 ms) 250 Hz => 2 jiffies (effective: 8 ms) 300 Hz => 2 jiffies (effective: 6 ms) 1000 Hz => 6 jiffies (effective: 6 ms) Co-authored-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Change-Id: I2978c7de07e6fa8d8261b532d5bc1325006433f9 Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:36 +07:00
Danny Lin	8ab83a114f	f2fs: Demote GC thread to idle scheduler class We don't want the background GC work causing UI jitter should it ever collide with periods of user activity. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:36 +07:00
Park Ju Hyung	9923ec2d27	f2fs: Set ioprio of GC kthread to idle GC should run conservatively as possible to reduce latency spikes to the user. Setting ioprio to idle class will allow the kernel to schedule GC thread's I/O to not affect any other processes' I/O requests. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:36 +07:00
Jerin Jacob	ed190c65a5	arm64: bpf: Optimize modulo operation Optimize modulo operation instruction generation by using single MSUB instruction vs MUL followed by SUB instruction scheme. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:36 +07:00
Waiman Long	d8f8e9d93b	locking/osq: Use optimized spinning loop for arm64 Arm64 has a more optimized spinning loop (atomic_cond_read_acquire) using wfe for spinlock that can boost performance of sibling threads by putting the current cpu to a wait state that is broken only when the monitored variable changes or an external event happens. OSQ has a more complicated spinning loop. Besides the lock value, it also checks for need_resched() and vcpu_is_preempted(). The check for need_resched() is not a problem as it is only set by the tick interrupt handler. That will be detected by the spinning cpu right after iret. The vcpu_is_preempted() check, however, is a problem as changes to the preempt state of of previous node will not affect the wait state. For ARM64, vcpu_is_preempted is not currently defined and so is a no-op. Will has indicated that he is planning to para-virtualize wfe instead of defining vcpu_is_preempted for PV support. So just add a comment in arch/arm64/include/asm/spinlock.h to indicate that vcpu_is_preempted() should not be defined as suggested. On a 2-socket 56-core 224-thread ARM64 system, a kernel mutex locking microbenchmark was run for 10s with and without the patch. The performance numbers before patch were: Running locktest with mutex [runtime = 10s, load = 1] Threads = 224, Min/Mean/Max = 316/123,143/2,121,269 Threads = 224, Total Rate = 2,757 kop/s; Percpu Rate = 12 kop/s After patch, the numbers were: Running locktest with mutex [runtime = 10s, load = 1] Threads = 224, Min/Mean/Max = 334/147,836/1,304,787 Threads = 224, Total Rate = 3,311 kop/s; Percpu Rate = 15 kop/s So there was about 20% performance improvement. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/20200113150735.21956-1-longman@redhat.com Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:35 +07:00
Robin Murphy	0fe7b18bef	arm64: Implement optimised checksum routine Apparently there exist certain workloads which rely heavily on software checksumming, for which the generic do_csum() implementation becomes a significant bottleneck. Therefore let's give arm64 its own optimised version - for ease of maintenance this foregoes assembly or intrisics, and is thus not actually arm64-specific, but does rely heavily on C idioms that translate well to the A64 ISA and the typical load/store capabilities of most ARMv8 CPU cores. The resulting increase in checksum throughput scales nicely with buffer size, tending towards 4x for a small in-order core (Cortex-A53), and up to 6x or more for an aggressive big core (Ampere eMAG). Reported-by: Lingyan Huang <huanglingyan2@huawei.com> Tested-by: Lingyan Huang <huanglingyan2@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:35 +07:00
Will Deacon	822ccebefc	locking/barriers: Introduce smp_cond_load_relaxed() and atomic_cond_read_relaxed() Whilst we currently provide smp_cond_load_acquire() and atomic_cond_read_acquire(), there are cases where the ACQUIRE semantics are not required because of a subsequent fence or release operation once the conditional loop has exited. This patch adds relaxed versions of the conditional spinning primitives to avoid unnecessary barrier overhead on architectures such as arm64. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: linux-arm-kernel@lists.infradead.org Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1524738868-31318-2-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: kdrag0n <dragon@khronodragon.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:35 +07:00
Robin Murphy	fe31af8684	arm64: Select ARCH_HAS_FAST_MULTIPLIER It is probably safe to assume that all Armv8-A implementations have a multiplier whose efficiency is comparable or better than a sequence of three or so register-dependent arithmetic instructions. Select ARCH_HAS_FAST_MULTIPLIER to get ever-so-slightly nicer codegen in the few dusty old corners which care. In a contrived benchmark calling hweight64() in a loop, this does indeed turn out to be a small win overall, with no measurable impact on Cortex-A57 but about 5% performance improvement on Cortex-A53. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: azrim <mirzaspc@gmail.com>	2022-04-06 13:17:35 +07:00

1 2 3 4 5 ...

799189 Commits