810004 Commits

Author SHA1 Message Date
John Galt
76b1331352 devfreq: Drop WQ_UNBOUND
Due to asym arm64 latency regression on WQ_UNBOUND.

Change-Id: I2917305abaa017247950e0f5b3a73b1d166c3463
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Amir Vajid
8537663786 memlat: Simplify core-dev table parsing logic
In register_common(), there's unnecessary code used to find
which device tree node has the core-dev property. Refactor
this to reduce complexity.

Change-Id: Ib3475272b25e898ad23f9f5a4412d90cd889a356
Signed-off-by: Amir Vajid <avajid@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Sultan Alsawaf
2d26472b39 memlat: Read perf counters in parallel and reduce system jitter
Sending synchronous IPIs to other CPUs involves spinning with preemption
disabled in order to wait for each IPI to finish. Keeping preemption off
for long periods of time like this is bad for system jitter, not to mention
the perf event IPIs are sent and flushed one at a time for each event for
each CPU rather than all at once for all the CPUs.

Since the way perf events are currently read is quite naive, rewrite it to
make it exploit parallelism and go much faster. IPIs for reading each perf
event are now sent to all CPUs asynchronously so that each CPU can work on
reading the events in parallel, and the dispatching CPU now sleeps rather
than spins when waiting for the IPIs to finish. Before the dispatching CPU
starts waiting though, it works on reading events for itself and then
reading events which can be read from any CPU in order to derive further
parallelism, and then waits for the IPIs to finish afterwards if they
haven't already.

Furthermore, there's now only one IPI sent to read all of a CPU's events
rather than an IPI sent for reading each event, which significantly speeds
up the event reads and reduces the number of IPIs sent.

This also checks for active SCM calls on a per-CPU basis rather than a
global basis so that unrelated CPUs don't get their counter reads skipped
and so that some CPUs can still receive fresh counter readings.

Overall, this makes the memlat driver much faster and more efficient, and
eliminates significant system jitter previously caused by IPI abuse.

Change-Id: I238c4e57f672a0337e2377c8fd38d0f6a1dbc2d0
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Sultan Alsawaf
aa6fa8433d soc: qcom: scm: Fix scm_call_count when used with LSE atomics
LSE atomic increments and decrements clobber the x0 and x1 registers,
and since these registers are used in volatile inline assembly for SCM
calls, GCC does not preserve their values across the atomic_inc() and
atomic_dec() calls. This results in x0 and x1 containing garbage values
before and after the SCM call, breaking it entirely.

Wrapping the atomic_inc() and atomic_dec() outside the SCM call
functions fixes the issue.

Change-Id: Icae5d4cf18118bd1a39b1270211c663063b96e35
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Sultan Alsawaf
393c47521a drm/msm: Eliminate unnecessary snprintf() usage from hot paths
There's no reason to constantly use snprintf() to generate pretty debug
strings from hot paths. We don't need them, so remove them.

Change-Id: I523c45bf3e382cc926364634ce0362dad014ce94
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Sultan Alsawaf
316fc7da75 sched/core: Use SCHED_RR in place of SCHED_FIFO for all users
Although SCHED_FIFO is a real-time scheduling policy, it can have bad
results on system latency, since each SCHED_FIFO task will run to
completion before yielding to another task. This can result in visible
micro-stalls when a SCHED_FIFO task hogs the CPU for too long. On a
system where latency is favored over throughput, using SCHED_RR is a
better choice than SCHED_FIFO.

Change-Id: I11ef6efd89a73a4a090ed5d45e7b9d74c91f2f98
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Adithya R
9a96accad1 sched/fair: Make stune boost > 0 tasks stay in big cluster
We use top-app stune boost of 1 now

Change-Id: I36589cbca8799e61e24d29efc22de4abfa26dfd4
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Zachariah Kennedy
d5afaaf78f sched/fair: Don't allow boosted tasks to be migrated to small cores
We want boosted tasks to run on big cores. But CAF's load balancer
changes do not account for SchedTune boosting, so this allows for
boosted tasks to be migrated to a suboptimal core. Let's mitigate
this by setting the LBF_IGNORE_STUNE_BOOSTED_TASKS for tasks
migrating from a larger capacity core to a min capacity one and
that have a schedtune boost > 10. If both are true, do
not migrate this task. If the next time the load balancer runs,
the same task is selected, we clear the
LBF_IGNORE_STUNE_BOOSTED_TASKS flag.

Change-Id: Ibd9f6616b482d446d5acce2a93418bfda4c35ffb
Signed-off-by: Zachariah Kennedy <zkennedy87@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Richard Raya
2ef5980a4d sched/tune: Restrict stune boost to top-app
Change-Id: Ide30ce5cc10f441569f0b98de618e9e046c7db17
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Richard Raya
bf2fdd35c1 sched/tune: Restrict prefer_idle to top-app
Change-Id: Ia456ed5047f1a894dfcdbec2422af287e8d91639
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Adithya
8b2b5fedb9 sched/tune: Increase the cgroup limit to 8
Android 12 creates audio-app group

Change-Id: I3403d1f8b4e6cc904d1cb00b4d995807d6436348
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Wei Wang
345d419aaf sched/tune: Increase the cgroup limit to 7
We have 6 groups that is: background/ camera-daemon/ foreground/ rt/
top-app and root group. Adding another one for testing.

Bug: 144809570
Test: Build
Change-Id: I2d749a7bde4ad4c7c05f7218c9a5f39f8533acae
Signed-off-by: Wei Wang <wvw@google.com>
Signed-off-by: Kyle Lin <kylelin@google.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Abhijeet Dharmapurikar
b8529d0dda sched/tune: Fix improper accounting of tasks
cgroup_migrate_execute() calls can_attach() and css_set_move_task()
separately without holding rq->lock.

The schedtune implementation breaks here, since can_attach() accounts
for the task move way before the group move is committed. If the task
sleeps right after can_attach(), the sleep is accounted towards the
previous group. This ends up in disparity of counts between group.

Consider this race:

TaskA is moved from root_grp to topapp_grp, root_grp's tasks = 1 and
topapp tasks =0 right before the move and TaskB is moving it.

On cpu X
TaskA runs
* cgroup_migrate_execute()
  schedtune_can_attach()
   root_grp.tasks--; topapp_grp.tasks++;
   (root_grp.tasks =  0 and topapp_grp.tasks = 1)

*right at this moment context is switched and TaskA runs.

*TaskA sleeps
 dequeue_task()
   schedtune_dequeue_task()
    schedtune_task_update
     root_grp.tasks--; //TaskA has not really "switched" group, so it
     decrements from the root_grp, however can_attach() has accounted
     the task move and this leaves us with
     root_grp.tasks = 0 (it is -ve value protected)
     topapp.grp.tasks = 1

Now even if cpuX is idle (TaskA is long gone sleeping), its
topapp_grp.tasks continues to stay +ve and it is subject to topapp's
boost unnecessarily.

An easy way to fix this is to move the group change accounting in
attach() callback which gets called _after_ css_set_move_task(). Also
maintain the task's current idx in struct task_struct as it moves
between groups. The task's enqueue/dequeue is accounted towards the
cached idx value. In an event when the task dequeues just
before group changes, it gets subtracted from the old group, which is
correct because the task would have bumped up the old group's count. If
the task changes group while its running, the attach() callback has to
decrement from the old group and increment from the new group so that
the next dequeue will subtract from the new group. IOW the attach()
callback has to account only for running task but has to update the
cached index for both running and sleeping task.

The current uses task->on_rq != 0 check to determine whether a task
is queued on the runqueue or not. This is an incorrect check. Because
task->on_rq is set to TASK_ON_RQ_MIGRATING (value = 2) during
migration. Fix this by using task_on_rq_queued() to check if a task
is queued or not.

Change-Id: If412da5a239c18d9122cfad2be59b355c14c068f
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Co-developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Richard Raya
9e9c3f6cfd defconfig: Regenerate full defconfig
Change-Id: I7253529202a3751da97c886e6cf807ad61a76bea
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
John Galt
c5b6927a6d lib: zstd: Update to v1.5.5
Change-Id: Ib8c142f4aa1dbe169d36b0e826f5da66bc334a47
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Cyber Knight
8a2ba0d644 lib: zstd: Update to v1.5.4
Syncs latest upstream ZSTD from [1].

This update retains the following commits:
edc41e9a5d {"lib: zstd: Fix attribute declaration"}
31ef7d2d75 {"lib: zstd: include a missing header"}
4927d31bfc {"lib: zstd: define UINTPTR_MAX"}

[1]: https://github.com/facebook/zstd/commits/v1.5.4

Change-Id: I03bf09ce96b0398043cfec5506cb036e13906d58
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:35 -03:00
Cyber Knight
56e1cb9ea2 lib: zstd: Introduce CONFIG_ZSTD_COMMON
This accomodates the common drivers between the ZSTD Compressor and the ZSTD Decompressor.

Change-Id: I2c498cbab6bae106923138750ca695a663b9e1c5
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Richard Raya
8bca0da49a defconfig: Regenerate full defconfig
Change-Id: I0262bd1b0273127918f0392048715ac42a4ba9ae
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
John Galt
aac1c25e24 Makefile: Enable MLGO and HCS optimizations
Change-Id: Iaadc6381960a95c11f6e875d60d99959fd6dc86a
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Tashfin Shakeer Rhythm
af973d9452 Makefile: Add -polly-postopts=1 cmdline option
The -polly-postopts cmdline option applies post-rescheduling optimizations such as tiling.

Reference: ced20c6672

Change-Id: Icff506a133f8f063fa7b3370930043d252808b4e
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Tashfin Shakeer Rhythm
a7d4166f3a Makefile: Add -polly-reschedule=1 cmdline option
The -polly-reschedule cmdline option optimizes SCoPs using ISL.

Reference: ced20c6672

Change-Id: I4284a596eb04ea29e70345078fca7fdd43921341
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Tashfin Shakeer Rhythm
6cd0cc3dab Makefile: Add -polly-loopfusion-greedy=1 cmdline option
The -polly-loopfusion-greedy cmdline option aggressively tries to fuse any loop regardless of profitability.

Reference: 64489255be

Change-Id: I15d7df63475f315fb2c42799bb5c7448999536d5
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Tashfin Shakeer Rhythm
b7ad330af9 Makefile: Drop -polly-detect-keep-going cmdline option
A recent clang 16 update has introduced segfault when compiling kernel
using the `-polly-invariant-load-hoisting` flag and breaks compilation
when kernel is compiled with full LTO.
This is due to the flag being ignorant about the errors during SCoP
verification and does not take the errors into account as per a recent
issue opened at [LLVM], causing polly to do segfault and the compiler
to print the following backtrace:

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: clang -Wp,-MD,drivers/md/.md.o.d -nostdinc -isystem /tmp/cirrus-ci-build/toolchains/clang/lib/clang/16.0.0/include -I../arch/arm64/include -I./arch/arm64/include/generated -I../include -I./include -I../arch/arm64/include/uapi -I./arch/arm64/include/generated/uapi -I../include/uapi -I./include/generated/uapi -include ../include/linux/kconfig.h -include ../include/linux/compiler_types.h -I../drivers/md -Idrivers/md -D__KERNEL__ -mlittle-endian -DKASAN_SHADOW_SCALE_SHIFT=3 -Qunused-arguments -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Werror=return-type -Wno-format-security -std=gnu89 --target=aarch64-linux-gnu --prefix=/tmp/cirrus-ci-build/Kernel/../toolchains/clang/bin/aarch64-linux-gnu- --gcc-toolchain=/tmp/cirrus-ci-build/toolchains/clang -Wno-misleading-indentation -Wno-bool-operation -Werror=unknown-warning-option -Wno-unsequenced -opaque-pointers -fno-PIE -mgeneral-regs-only -DCONFIG_AS_LSE=1 -fno-asynchronous-unwind-tables -Wno-psabi -DKASAN_SHADOW_SCALE_SHIFT=3 -fno-delete-null-pointer-checks -Wno-frame-address -Wno-int-in-bool-context -Wno-address-of-packed-member -O3 -march=armv8.1-a+crypto+fp16+rcpc -mtune=cortex-a53 -mllvm -polly -mllvm -polly-ast-use-context -mllvm -polly-detect-keep-going -mllvm -polly-invariant-load-hoisting -mllvm -polly-run-inliner -mllvm -polly-vectorizer=stripmine -mllvm -polly-loopfusion-greedy=1 -mllvm -polly-reschedule=1 -mllvm -polly-postopts=1 -fstack-protector-strong --target=aarch64-linux-gnu --gcc-toolchain=/tmp/cirrus-ci-build/toolchains/clang -meabi gnu -Wno-format-invalid-specifier -Wno-gnu -Wno-duplicate-decl-specifier -Wno-asm-operand-widths -Wno-initializer-overrides -Wno-tautological-constant-out-of-range-compare -Wno-tautological-compare -mno-global-merge -Wno-void-ptr-dereference -Wno-unused-but-set-variable -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -Wvla -flto -fwhole-program-vtables -fvisibility=hidden -Wdeclaration-after-statement -Wno-pointer-sign -Wno-array-bounds -fno-strict-overflow -fno-stack-check -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -fmacro-prefix-map=../= -Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-uninitialized -Wno-pointer-to-enum-cast -Wno-unaligned-access -DKBUILD_BASENAME=\"md\" -DKBUILD_MODNAME=\"md_mod\" -D__KBUILD_MODNAME=kmod_md_mod -c -o drivers/md/md.o ../drivers/md/md.c
1.	<eof> parser at end of file
2.	Optimizer
  CC      drivers/media/platform/msm/camera_v2/camera/camera.o
  AR      drivers/media/pci/intel/ipu3/built-in.a
  CC      drivers/md/dm-linear.o
 #0 0x0000559d3527073f (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x3a7073f)
 #1 0x0000559d352705bf (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x3a705bf)
 #2 0x0000559d3523b198 (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x3a3b198)
 #3 0x0000559d3523b33e (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x3a3b33e)
 #4 0x00007f339dc3ea00 (/usr/lib/libc.so.6+0x38a00)
 #5 0x0000559d35affccf (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x42ffccf)
 #6 0x0000559d35b01710 (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x4301710)
 #7 0x0000559d35b01a12 (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x4301a12)
 #8 0x0000559d35b09a9e (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x4309a9e)
 #9 0x0000559d35b14707 (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x4314707)
clang-16: error: clang frontend command failed with exit code 139 (use -v to see invocation)
Neutron clang version 16.0.0 (https://github.com/llvm/llvm-project.git 598f5275c16049b1e1b5bc934cbde447a82d485e)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /tmp/cirrus-ci-build/Kernel/../toolchains/clang/bin

From the very nature of `-polly-detect-keep-going`, it can be concluded
that this is a potentially unsafe flag and instead of using it conditionally
for CLANG < 16, just remove it all together for all. This should allow polly
to run SCoP verifications as intended.

Issue [LLVM]: https://github.com/llvm/llvm-project/issues/58484#issuecomment-1284887374

Change-Id: Icf6b6c62f4a1df9319c3e16962673c590615b79b
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Cyber Knight
3ce5ae2329 Makefile: Drop -polly-opt-fusion=max cmdline option
This polly flag has been deprecated since Clang 14.

Change-Id: I5942eea7f7443c98e5186540376a59eaeaadbfd7
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Diab Neiroukh
5f6f33d8cb Makefile: Add support for Clang's polyhedral loop optimizer
Polly is able to optimize various loops throughout the kernel for cache
locality. A mathematical representation of the program, based on
polyhedra, is analysed to find opportunistic optimisations in memory
access patterns which then leads to loop transformations.

Polly is not built with LLVM by default, and requires LLVM to be compiled
with the Polly "project". This can be done by adding Polly to
-DLLVM_ENABLE_PROJECTS, for example:

-DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi;polly"

Preliminary benchmarking seems to show an improvement of around two
percent across perf benchmarks:

Benchmark                         | Control    | Polly
--------------------------------------------------------
bonnie++ -x 2 -s 4096 -r 0        | 12.610s    | 12.547s
perf bench futex requeue          | 33.553s    | 33.094s
perf bench futex wake             |  1.032s    |  1.021s
perf bench futex wake-parallel    |  1.049s    |  1.025s
perf bench futex requeue          |  1.037s    |  1.020s

Furthermore, Polly does not produce a much larger image size netting it
to be a "free" optimisation. A comparison of a bzImage for a kernel with
and without Polly is shown below:

bzImage        | stat --printf="%s\n"
-------------------------------------
Control        | 9333728
Polly          | 9345792

Compile times were one percent different at best, which is well within
the range of noise. Therefore, I can say with certainty that Polly has
a minimal effect on compile times, if none.

[Tashar02]:
1. Rework on the flag passing format.
2. Pass Polly Flags to linker as well.
3. Add `-polly-detect-keep-going` cmdline option.

Change-Id: I588b3f0fedc10221383c9030c33f42d789b30fb9
Suggested-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Diab Neiroukh <lazerl0rd@thezest.dev>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Cyber Knight
de943f0747 Makefile: Add more optimization flags
- Create OPT_FLAGS which contains -O3 and CPU-specific optimizations.
- Pass OPT_FLAGS to compiler and assembler.
- Pass LTO-specific plugin-opt optimization to linker flags when LTO is enabled.
- Drop extraneous -O3.
- Optimize for armv8.2-a+dotprod.

Change-Id: I0b2f4217baec828ab393039419a723f092864287
Co-authored-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Diab Neiroukh
adcec81be1 Makefile: Add a config to optimize inlining
Increased inlining can provide an improvement in performance, but also
increases the size of the final kernel image. This config peovides a bit
more control on how much is inlined to further optimise the kernel for
certain workloads.

Change-Id: I276c10ae6722032b3d40831f4242e27cbf830c9a
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
John Galt
e9c33dabd0 msm-4.14: Selectively extend over inline optimization
Change-Id: I770a6de39a25b71cb9609343b93e8b26cf056017
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Richard Raya
87265ff842 Revert "kbuild: Add support for LLVM's Polly optimizer"
This reverts commit fc33cc16b14c9a54cce17c6f9f96d8b3d542f7e3.

Change-Id: Ieff0a3f219e76351a355de03138320ccd4d7c360
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
kdrag0n
7b4e8ef7a3 ARM64/dts: qcom: Reduce early wakeups before vsync events
The default jitter is 2%, but modern panels should not have such high
jitter. Use a value of 0.8% instead to disable early wakeup for each
vsync and thus reduce power consumption.

Extracted from Razer Phone 2 aura kernel sources.
arch/arm64/boot/dts/fih/RC2_common/dsi-panel-nt36830-wqhd-dualmipi-extclk-cmd.dtsi

Change-Id: I55c540e4dae9229d84095c827be2bbd763757049
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Richard Raya
74d06a0759 ARM64/dts: sdmmagpie: Drop some frequencies
Change-Id: Ie41eedf9dcbadead6936455140a24009f2c79fd5
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
kdrag0n
240d13545c ARM64/dts: sdmmagpie: Suppress verbose output during boot
This should make the kernel initialization faster as it suppresses any
potential serial console output.

Change-Id: I3a1e7daba4a1202d09b23cf8b60afce694514b54
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Panchajanya1999
1ef4a2c265 ARM64/dts: sdmmagpie: Implement RHEL's Low Latency Kernel cmdline
nosoftlockup - disables logging of backtraces when a process executes on a CPU for
longer than the softlockup threshold (default 120 seconds). Typical low-latency
programming and tuning techniques might involve spinning on a core or modifying
scheduler priorities/policies, which can lead to a task reaching this threshold. If a task
has not relinquished the CPU for 120 seconds, the kernel prints a backtrace for
diagnostic purposes. Adding nosoftlockup to the cmdline disables the printing of this
backtrace (the printk itself can cause latency spikes), and does not in itself reduce
latency. Tasks that spin on a CPU must occasionally yield (especially if they are
SCHED_FIFO priority), or important per-cpu kernel threads may never execute,
potentially leading to unexpected behavior such as very large latency spikes or
interruptions in network traffic.

mce=ignore_ce - ignores corrected errors and associated scans that can cause
periodic latency spikes.

Documentation: https://access.redhat.com/sites/default/files/attachments/201501-perf-brief-low-latency-tuning-rhel7-v1.1.pdf

Change-Id: Ieaedb58d018ddb85ec46e8fe69729596993aabf1
Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Panchajanya1999
ff49d01dca ARM64/dts: sdmmagpie: Avoid run-time IPIs from expedited grace periods
Real-time systems desiring fast boot but wishing to avoid run-time
IPIs from expedited grace periods would therefore set both
rcupdate.rcu_expedited=1 and rcupdate.rcu_normal_after_boot=1 .

Lookup: https://lwn.net/Articles/777214/

Change-Id: I0b1f41657963e2581e119a599cd9869f2f7d2c0e
Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Aarqw12
bcb85c577f ARM64/dts: sdmmagpie-gpu: Enable GPU DDR 2.1GHz
Change-Id: Ia236d8ac626093c50db91cd8ed99ec72f6b7e372
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Forenche
690be66258 ARM64/dts: sdmmagpie-gpu: Update bus frequencies for high refresh rate
Change-Id: I1daa29b49780dea97dd16f02822d5fe897e5ceb2
Signed-off-by: Forenche <prahul2003@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:34 -03:00
Forenche
3c4fc24f48 nt36xxx: Bump SPI bus frequency to 15MHz
Change-Id: Ie1b7504176f4d971d5bbafb8d5942245ec5afc2a
Signed-off-by: Forenche <prahul2003@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Veera Sundaram Sankaran
56d8abf539 drm/msm/sde: Move some frame_events from crtc commit to event thread
Move frame data stats collection/notification during frame-done and
retire fence sysfs notification to event thread. This will free up
some interrupt time.

Change-Id: I2648ac4287ce8712e9a059edd408a59753aa6d32
Signed-off-by: Veera Sundaram Sankaran <quic_veeras@quicinc.com>
Signed-off-by: V S Ganga VaraPrasad (VARA) Adabala <quic_vadabala@quicinc.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Kalyan Thota
8e987df038 drm/msm: Reset thread priority work on every new run
Reinit thread priority work before queueing on multiple display
threads as the work stores the former worker thread. Also
flush work such the next init is serialized.

Change-Id: I51409d4d12d100be0cb30238f812a56ec064a339
Signed-off-by: Kalyan Thota <quic_kalyant@quicinc.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Dhaval Patel
0272b633c9 drm/msm: Move thread priority call from component bind
Move thread priority call to kernel worker thread because
component bind API may run from vendor_modeprobe process
context when all drivers probe succeed. Thread priority
update is not allowed from vendor_modeprobe process
context.

Change-Id: Iafac97ce02942d6a2134495232f3c395ba4a362f
Signed-off-by: Dhaval Patel <pdhaval@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Raghavendra Ambadas
fbbc610e4d drm/msm: Avoid dynamic mode switch during first commit
Dynamic mode switch (DMS) is not supported for video mode panels before cont-splash handoff handled for first frame. so avoid dynamic mode-switch during cont-splash handoff for any DRM mode change.

WA is given by QC for GSI issue, as of observation there are no
side effects

QC SR:05515278
QC CR:NA
QC Change ID:Icd5881af99afb3e398d3bba3746b7a35bcda4491

Change-Id: I97f5712ce5bb8448f1c600ccf306d0dac7fa6eae
Signed-off-by: maheshmk <maheshmk@motorola.com>
Reviewed-on: https://gerrit.mot.com/2113033
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Ashwin Kumar Pathmudi <jfxr63@motorola.com>
Reviewed-by: Shuo Yan <shuoyan@motorola.com>
Reviewed-by: Guobin Zhang <zhanggb@motorola.com>
Submit-Approved: Jira Key
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Sultanxda
fae4a4fed8 workqueue: Schedule workers on CPU0 or 0-5 by default
For regular bound workers that don't request to be queued onto a specific
CPU, just use CPU0 to save power. Additionally, adjust the CPU affinity of
unbound workqueues to force their workers onto the power cluster (CPUs 0-5)
to further improve power consumption.

Change-Id: Ib3aede9947c4a2c2673adc5f5b7c4e0c2c4520bf
Signed-off-by: Sultanxda <sultanxda@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Valentin Schneider
a771b8ad1b sched/fair: Introduce a CPU capacity comparison helper
During load-balance, groups classified as group_misfit_task are filtered
out if they do not pass

  group_smaller_max_cpu_capacity(<candidate group>, <local group>);

which itself employs fits_capacity() to compare the sgc->max_capacity of
both groups.

Due to the underlying margin, fits_capacity(X, 1024) will return false for
any X > 819. Tough luck, the capacity_orig's on e.g. the Pixel 4 are
{261, 871, 1024}. If a CPU-bound task ends up on one of those "medium"
CPUs, misfit migration will never intentionally upmigrate it to a CPU of
higher capacity due to the aforementioned margin.

One may argue the 20% margin of fits_capacity() is excessive in the advent
of counter-enhanced load tracking (APERF/MPERF, AMUs), but one point here
is that fits_capacity() is meant to compare a utilization value to a
capacity value, whereas here it is being used to compare two capacity
values. As CPU capacity and task utilization have different dynamics, a
sensible approach here would be to add a new helper dedicated to comparing
CPU capacities.

Also note that comparing capacity extrema of local and source sched_group's
doesn't make much sense when at the day of the day the imbalance will be
pulled by a known env->dst_cpu, whose capacity can be anywhere within the
local group's capacity extrema.

While at it, replace group_smaller_{min, max}_cpu_capacity() with
comparisons of the source group's min/max capacity and the destination
CPU's capacity.

Link: https://lkml.kernel.org/r/20210407220628.3798191-4-valentin.schneider@arm.com
Change-Id: Ia448773d82f5b2c36e94d608a15fb076d5af598f
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Lingutla Chandrasekhar
d074e18d7d sched/fair: Fix excessive packing on the max capacity CPU
When boost is enabled or mid capacity cluster has single cpu, then
we traverse both mid and max capacity CPUs. When all of these CPUs
are busy, the task should be placed on the CPU which has highest
spare capacity. However the current code completely ignores
the mid capacity CPU when it encounters the max capacity CPU,
though the former has more spare capacity.

Fix this issue by enforcing the spare capacity check across
different capacity CPUs.

Change-Id: I726d47f985f9e59d2bb1c6cf2b743796b57e3051
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Sultan Alsawaf
49caeb1ddc sched/fair: Always update CPU capacity when load balancing
Limiting CPU capacity updates, which are quite cheap, results in worse
balancing decisions during opportunistic balancing (e.g., SD_BALANCE_WAKE).
This causes opportunistic placement decisions to be skewed using stale CPU
capacity data, and when a CPU isn't idling much, its capacity suffers from
even more staleness since the only exception to the 100 ms capacity update
ratelimit is a CPU exiting idle.

Since the capacity updates are cheap, always do it when load balancing in
order to improve opportunistic task placement decisions.

Change-Id: I3727e5dcc00ebdbe57b967b51cd8df7ac26d61af
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Sultan Alsawaf
fd9b3b40ac sched/fair: Set asym priority equally for all CPUs in a performance domain
All CPUs in a performance domain share the same capacity, and therefore
aren't different from one another when distinguishing between which one is
better for asymmetric packing.

Instead of unfairly prioritizing lower-numbered CPUs within the same
performance domain, treat all CPUs in a performance domain equally for
asymmetric packing.

Change-Id: Ibe0c10034d237894d505c5022c73b2671a632004
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Alexander Winkowski
77224feb50 Revert "sched: fair: Add strict skip buddy support"
This reverts commit 6f58caae21910a0700592a9acf12d7f6dda2e7bc.

It's not present in newer CAF kernels and Google removed it on their
4.14 devices as well.

Change-Id: I3675cbfe4a37ae9ed31bf3659a545965a0d59c6f
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Dark-Matter7232
d8423889bd Revert "ANDROID: sched: fair: balance for single core cluster"
SD732G does not have a single core cluster.

This reverts commit 5d72d20d1438b7c5e74c88250976b5d9570d7ed1.

Change-Id: Idedb27ca6e260552b351011e8970bce9758f48b6
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Dark-Matter7232
3226034c5c Revert "ANDROID: update_group_capacity for single cpu in cluster"
This reverts commit ce00e4cfc5da07c93c25d8653e754ad2a6c9eab1.

Change-Id: Ie9635a191f1718cdd0c9a4a490d1aab5d1594eac
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Richard Raya
239aecb5d1 build.sh: Use depth for cloning AnyKernel3
Change-Id: I3ecaa0c0810fcece5c3df0af2ca3195d913dcca8
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00