Due to asym arm64 latency regression on WQ_UNBOUND.
Change-Id: I2917305abaa017247950e0f5b3a73b1d166c3463
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
In register_common(), there's unnecessary code used to find
which device tree node has the core-dev property. Refactor
this to reduce complexity.
Change-Id: Ib3475272b25e898ad23f9f5a4412d90cd889a356
Signed-off-by: Amir Vajid <avajid@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Sending synchronous IPIs to other CPUs involves spinning with preemption
disabled in order to wait for each IPI to finish. Keeping preemption off
for long periods of time like this is bad for system jitter, not to mention
the perf event IPIs are sent and flushed one at a time for each event for
each CPU rather than all at once for all the CPUs.
Since the way perf events are currently read is quite naive, rewrite it to
make it exploit parallelism and go much faster. IPIs for reading each perf
event are now sent to all CPUs asynchronously so that each CPU can work on
reading the events in parallel, and the dispatching CPU now sleeps rather
than spins when waiting for the IPIs to finish. Before the dispatching CPU
starts waiting though, it works on reading events for itself and then
reading events which can be read from any CPU in order to derive further
parallelism, and then waits for the IPIs to finish afterwards if they
haven't already.
Furthermore, there's now only one IPI sent to read all of a CPU's events
rather than an IPI sent for reading each event, which significantly speeds
up the event reads and reduces the number of IPIs sent.
This also checks for active SCM calls on a per-CPU basis rather than a
global basis so that unrelated CPUs don't get their counter reads skipped
and so that some CPUs can still receive fresh counter readings.
Overall, this makes the memlat driver much faster and more efficient, and
eliminates significant system jitter previously caused by IPI abuse.
Change-Id: I238c4e57f672a0337e2377c8fd38d0f6a1dbc2d0
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
LSE atomic increments and decrements clobber the x0 and x1 registers,
and since these registers are used in volatile inline assembly for SCM
calls, GCC does not preserve their values across the atomic_inc() and
atomic_dec() calls. This results in x0 and x1 containing garbage values
before and after the SCM call, breaking it entirely.
Wrapping the atomic_inc() and atomic_dec() outside the SCM call
functions fixes the issue.
Change-Id: Icae5d4cf18118bd1a39b1270211c663063b96e35
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
There's no reason to constantly use snprintf() to generate pretty debug
strings from hot paths. We don't need them, so remove them.
Change-Id: I523c45bf3e382cc926364634ce0362dad014ce94
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Although SCHED_FIFO is a real-time scheduling policy, it can have bad
results on system latency, since each SCHED_FIFO task will run to
completion before yielding to another task. This can result in visible
micro-stalls when a SCHED_FIFO task hogs the CPU for too long. On a
system where latency is favored over throughput, using SCHED_RR is a
better choice than SCHED_FIFO.
Change-Id: I11ef6efd89a73a4a090ed5d45e7b9d74c91f2f98
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
We want boosted tasks to run on big cores. But CAF's load balancer
changes do not account for SchedTune boosting, so this allows for
boosted tasks to be migrated to a suboptimal core. Let's mitigate
this by setting the LBF_IGNORE_STUNE_BOOSTED_TASKS for tasks
migrating from a larger capacity core to a min capacity one and
that have a schedtune boost > 10. If both are true, do
not migrate this task. If the next time the load balancer runs,
the same task is selected, we clear the
LBF_IGNORE_STUNE_BOOSTED_TASKS flag.
Change-Id: Ibd9f6616b482d446d5acce2a93418bfda4c35ffb
Signed-off-by: Zachariah Kennedy <zkennedy87@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
We have 6 groups that is: background/ camera-daemon/ foreground/ rt/
top-app and root group. Adding another one for testing.
Bug: 144809570
Test: Build
Change-Id: I2d749a7bde4ad4c7c05f7218c9a5f39f8533acae
Signed-off-by: Wei Wang <wvw@google.com>
Signed-off-by: Kyle Lin <kylelin@google.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
cgroup_migrate_execute() calls can_attach() and css_set_move_task()
separately without holding rq->lock.
The schedtune implementation breaks here, since can_attach() accounts
for the task move way before the group move is committed. If the task
sleeps right after can_attach(), the sleep is accounted towards the
previous group. This ends up in disparity of counts between group.
Consider this race:
TaskA is moved from root_grp to topapp_grp, root_grp's tasks = 1 and
topapp tasks =0 right before the move and TaskB is moving it.
On cpu X
TaskA runs
* cgroup_migrate_execute()
schedtune_can_attach()
root_grp.tasks--; topapp_grp.tasks++;
(root_grp.tasks = 0 and topapp_grp.tasks = 1)
*right at this moment context is switched and TaskA runs.
*TaskA sleeps
dequeue_task()
schedtune_dequeue_task()
schedtune_task_update
root_grp.tasks--; //TaskA has not really "switched" group, so it
decrements from the root_grp, however can_attach() has accounted
the task move and this leaves us with
root_grp.tasks = 0 (it is -ve value protected)
topapp.grp.tasks = 1
Now even if cpuX is idle (TaskA is long gone sleeping), its
topapp_grp.tasks continues to stay +ve and it is subject to topapp's
boost unnecessarily.
An easy way to fix this is to move the group change accounting in
attach() callback which gets called _after_ css_set_move_task(). Also
maintain the task's current idx in struct task_struct as it moves
between groups. The task's enqueue/dequeue is accounted towards the
cached idx value. In an event when the task dequeues just
before group changes, it gets subtracted from the old group, which is
correct because the task would have bumped up the old group's count. If
the task changes group while its running, the attach() callback has to
decrement from the old group and increment from the new group so that
the next dequeue will subtract from the new group. IOW the attach()
callback has to account only for running task but has to update the
cached index for both running and sleeping task.
The current uses task->on_rq != 0 check to determine whether a task
is queued on the runqueue or not. This is an incorrect check. Because
task->on_rq is set to TASK_ON_RQ_MIGRATING (value = 2) during
migration. Fix this by using task_on_rq_queued() to check if a task
is queued or not.
Change-Id: If412da5a239c18d9122cfad2be59b355c14c068f
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Co-developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: Ib8c142f4aa1dbe169d36b0e826f5da66bc334a47
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This accomodates the common drivers between the ZSTD Compressor and the ZSTD Decompressor.
Change-Id: I2c498cbab6bae106923138750ca695a663b9e1c5
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The -polly-postopts cmdline option applies post-rescheduling optimizations such as tiling.
Reference: ced20c6672
Change-Id: Icff506a133f8f063fa7b3370930043d252808b4e
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The -polly-loopfusion-greedy cmdline option aggressively tries to fuse any loop regardless of profitability.
Reference: 64489255be
Change-Id: I15d7df63475f315fb2c42799bb5c7448999536d5
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
A recent clang 16 update has introduced segfault when compiling kernel
using the `-polly-invariant-load-hoisting` flag and breaks compilation
when kernel is compiled with full LTO.
This is due to the flag being ignorant about the errors during SCoP
verification and does not take the errors into account as per a recent
issue opened at [LLVM], causing polly to do segfault and the compiler
to print the following backtrace:
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: clang -Wp,-MD,drivers/md/.md.o.d -nostdinc -isystem /tmp/cirrus-ci-build/toolchains/clang/lib/clang/16.0.0/include -I../arch/arm64/include -I./arch/arm64/include/generated -I../include -I./include -I../arch/arm64/include/uapi -I./arch/arm64/include/generated/uapi -I../include/uapi -I./include/generated/uapi -include ../include/linux/kconfig.h -include ../include/linux/compiler_types.h -I../drivers/md -Idrivers/md -D__KERNEL__ -mlittle-endian -DKASAN_SHADOW_SCALE_SHIFT=3 -Qunused-arguments -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Werror=return-type -Wno-format-security -std=gnu89 --target=aarch64-linux-gnu --prefix=/tmp/cirrus-ci-build/Kernel/../toolchains/clang/bin/aarch64-linux-gnu- --gcc-toolchain=/tmp/cirrus-ci-build/toolchains/clang -Wno-misleading-indentation -Wno-bool-operation -Werror=unknown-warning-option -Wno-unsequenced -opaque-pointers -fno-PIE -mgeneral-regs-only -DCONFIG_AS_LSE=1 -fno-asynchronous-unwind-tables -Wno-psabi -DKASAN_SHADOW_SCALE_SHIFT=3 -fno-delete-null-pointer-checks -Wno-frame-address -Wno-int-in-bool-context -Wno-address-of-packed-member -O3 -march=armv8.1-a+crypto+fp16+rcpc -mtune=cortex-a53 -mllvm -polly -mllvm -polly-ast-use-context -mllvm -polly-detect-keep-going -mllvm -polly-invariant-load-hoisting -mllvm -polly-run-inliner -mllvm -polly-vectorizer=stripmine -mllvm -polly-loopfusion-greedy=1 -mllvm -polly-reschedule=1 -mllvm -polly-postopts=1 -fstack-protector-strong --target=aarch64-linux-gnu --gcc-toolchain=/tmp/cirrus-ci-build/toolchains/clang -meabi gnu -Wno-format-invalid-specifier -Wno-gnu -Wno-duplicate-decl-specifier -Wno-asm-operand-widths -Wno-initializer-overrides -Wno-tautological-constant-out-of-range-compare -Wno-tautological-compare -mno-global-merge -Wno-void-ptr-dereference -Wno-unused-but-set-variable -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -Wvla -flto -fwhole-program-vtables -fvisibility=hidden -Wdeclaration-after-statement -Wno-pointer-sign -Wno-array-bounds -fno-strict-overflow -fno-stack-check -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -fmacro-prefix-map=../= -Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-uninitialized -Wno-pointer-to-enum-cast -Wno-unaligned-access -DKBUILD_BASENAME=\"md\" -DKBUILD_MODNAME=\"md_mod\" -D__KBUILD_MODNAME=kmod_md_mod -c -o drivers/md/md.o ../drivers/md/md.c
1. <eof> parser at end of file
2. Optimizer
CC drivers/media/platform/msm/camera_v2/camera/camera.o
AR drivers/media/pci/intel/ipu3/built-in.a
CC drivers/md/dm-linear.o
#0 0x0000559d3527073f (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x3a7073f)
#1 0x0000559d352705bf (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x3a705bf)
#2 0x0000559d3523b198 (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x3a3b198)
#3 0x0000559d3523b33e (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x3a3b33e)
#4 0x00007f339dc3ea00 (/usr/lib/libc.so.6+0x38a00)
#5 0x0000559d35affccf (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x42ffccf)
#6 0x0000559d35b01710 (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x4301710)
#7 0x0000559d35b01a12 (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x4301a12)
#8 0x0000559d35b09a9e (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x4309a9e)
#9 0x0000559d35b14707 (/tmp/cirrus-ci-build/toolchains/clang/bin/clang-16+0x4314707)
clang-16: error: clang frontend command failed with exit code 139 (use -v to see invocation)
Neutron clang version 16.0.0 (https://github.com/llvm/llvm-project.git 598f5275c16049b1e1b5bc934cbde447a82d485e)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /tmp/cirrus-ci-build/Kernel/../toolchains/clang/bin
From the very nature of `-polly-detect-keep-going`, it can be concluded
that this is a potentially unsafe flag and instead of using it conditionally
for CLANG < 16, just remove it all together for all. This should allow polly
to run SCoP verifications as intended.
Issue [LLVM]: https://github.com/llvm/llvm-project/issues/58484#issuecomment-1284887374
Change-Id: Icf6b6c62f4a1df9319c3e16962673c590615b79b
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This polly flag has been deprecated since Clang 14.
Change-Id: I5942eea7f7443c98e5186540376a59eaeaadbfd7
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Polly is able to optimize various loops throughout the kernel for cache
locality. A mathematical representation of the program, based on
polyhedra, is analysed to find opportunistic optimisations in memory
access patterns which then leads to loop transformations.
Polly is not built with LLVM by default, and requires LLVM to be compiled
with the Polly "project". This can be done by adding Polly to
-DLLVM_ENABLE_PROJECTS, for example:
-DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi;polly"
Preliminary benchmarking seems to show an improvement of around two
percent across perf benchmarks:
Benchmark | Control | Polly
--------------------------------------------------------
bonnie++ -x 2 -s 4096 -r 0 | 12.610s | 12.547s
perf bench futex requeue | 33.553s | 33.094s
perf bench futex wake | 1.032s | 1.021s
perf bench futex wake-parallel | 1.049s | 1.025s
perf bench futex requeue | 1.037s | 1.020s
Furthermore, Polly does not produce a much larger image size netting it
to be a "free" optimisation. A comparison of a bzImage for a kernel with
and without Polly is shown below:
bzImage | stat --printf="%s\n"
-------------------------------------
Control | 9333728
Polly | 9345792
Compile times were one percent different at best, which is well within
the range of noise. Therefore, I can say with certainty that Polly has
a minimal effect on compile times, if none.
[Tashar02]:
1. Rework on the flag passing format.
2. Pass Polly Flags to linker as well.
3. Add `-polly-detect-keep-going` cmdline option.
Change-Id: I588b3f0fedc10221383c9030c33f42d789b30fb9
Suggested-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Diab Neiroukh <lazerl0rd@thezest.dev>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
- Create OPT_FLAGS which contains -O3 and CPU-specific optimizations.
- Pass OPT_FLAGS to compiler and assembler.
- Pass LTO-specific plugin-opt optimization to linker flags when LTO is enabled.
- Drop extraneous -O3.
- Optimize for armv8.2-a+dotprod.
Change-Id: I0b2f4217baec828ab393039419a723f092864287
Co-authored-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Increased inlining can provide an improvement in performance, but also
increases the size of the final kernel image. This config peovides a bit
more control on how much is inlined to further optimise the kernel for
certain workloads.
Change-Id: I276c10ae6722032b3d40831f4242e27cbf830c9a
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit fc33cc16b14c9a54cce17c6f9f96d8b3d542f7e3.
Change-Id: Ieff0a3f219e76351a355de03138320ccd4d7c360
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The default jitter is 2%, but modern panels should not have such high
jitter. Use a value of 0.8% instead to disable early wakeup for each
vsync and thus reduce power consumption.
Extracted from Razer Phone 2 aura kernel sources.
arch/arm64/boot/dts/fih/RC2_common/dsi-panel-nt36830-wqhd-dualmipi-extclk-cmd.dtsi
Change-Id: I55c540e4dae9229d84095c827be2bbd763757049
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This should make the kernel initialization faster as it suppresses any
potential serial console output.
Change-Id: I3a1e7daba4a1202d09b23cf8b60afce694514b54
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
nosoftlockup - disables logging of backtraces when a process executes on a CPU for
longer than the softlockup threshold (default 120 seconds). Typical low-latency
programming and tuning techniques might involve spinning on a core or modifying
scheduler priorities/policies, which can lead to a task reaching this threshold. If a task
has not relinquished the CPU for 120 seconds, the kernel prints a backtrace for
diagnostic purposes. Adding nosoftlockup to the cmdline disables the printing of this
backtrace (the printk itself can cause latency spikes), and does not in itself reduce
latency. Tasks that spin on a CPU must occasionally yield (especially if they are
SCHED_FIFO priority), or important per-cpu kernel threads may never execute,
potentially leading to unexpected behavior such as very large latency spikes or
interruptions in network traffic.
mce=ignore_ce - ignores corrected errors and associated scans that can cause
periodic latency spikes.
Documentation: https://access.redhat.com/sites/default/files/attachments/201501-perf-brief-low-latency-tuning-rhel7-v1.1.pdf
Change-Id: Ieaedb58d018ddb85ec46e8fe69729596993aabf1
Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Real-time systems desiring fast boot but wishing to avoid run-time
IPIs from expedited grace periods would therefore set both
rcupdate.rcu_expedited=1 and rcupdate.rcu_normal_after_boot=1 .
Lookup: https://lwn.net/Articles/777214/
Change-Id: I0b1f41657963e2581e119a599cd9869f2f7d2c0e
Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Move frame data stats collection/notification during frame-done and
retire fence sysfs notification to event thread. This will free up
some interrupt time.
Change-Id: I2648ac4287ce8712e9a059edd408a59753aa6d32
Signed-off-by: Veera Sundaram Sankaran <quic_veeras@quicinc.com>
Signed-off-by: V S Ganga VaraPrasad (VARA) Adabala <quic_vadabala@quicinc.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Reinit thread priority work before queueing on multiple display
threads as the work stores the former worker thread. Also
flush work such the next init is serialized.
Change-Id: I51409d4d12d100be0cb30238f812a56ec064a339
Signed-off-by: Kalyan Thota <quic_kalyant@quicinc.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Move thread priority call to kernel worker thread because
component bind API may run from vendor_modeprobe process
context when all drivers probe succeed. Thread priority
update is not allowed from vendor_modeprobe process
context.
Change-Id: Iafac97ce02942d6a2134495232f3c395ba4a362f
Signed-off-by: Dhaval Patel <pdhaval@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Dynamic mode switch (DMS) is not supported for video mode panels before cont-splash handoff handled for first frame. so avoid dynamic mode-switch during cont-splash handoff for any DRM mode change.
WA is given by QC for GSI issue, as of observation there are no
side effects
QC SR:05515278
QC CR:NA
QC Change ID:Icd5881af99afb3e398d3bba3746b7a35bcda4491
Change-Id: I97f5712ce5bb8448f1c600ccf306d0dac7fa6eae
Signed-off-by: maheshmk <maheshmk@motorola.com>
Reviewed-on: https://gerrit.mot.com/2113033
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Ashwin Kumar Pathmudi <jfxr63@motorola.com>
Reviewed-by: Shuo Yan <shuoyan@motorola.com>
Reviewed-by: Guobin Zhang <zhanggb@motorola.com>
Submit-Approved: Jira Key
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
For regular bound workers that don't request to be queued onto a specific
CPU, just use CPU0 to save power. Additionally, adjust the CPU affinity of
unbound workqueues to force their workers onto the power cluster (CPUs 0-5)
to further improve power consumption.
Change-Id: Ib3aede9947c4a2c2673adc5f5b7c4e0c2c4520bf
Signed-off-by: Sultanxda <sultanxda@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
During load-balance, groups classified as group_misfit_task are filtered
out if they do not pass
group_smaller_max_cpu_capacity(<candidate group>, <local group>);
which itself employs fits_capacity() to compare the sgc->max_capacity of
both groups.
Due to the underlying margin, fits_capacity(X, 1024) will return false for
any X > 819. Tough luck, the capacity_orig's on e.g. the Pixel 4 are
{261, 871, 1024}. If a CPU-bound task ends up on one of those "medium"
CPUs, misfit migration will never intentionally upmigrate it to a CPU of
higher capacity due to the aforementioned margin.
One may argue the 20% margin of fits_capacity() is excessive in the advent
of counter-enhanced load tracking (APERF/MPERF, AMUs), but one point here
is that fits_capacity() is meant to compare a utilization value to a
capacity value, whereas here it is being used to compare two capacity
values. As CPU capacity and task utilization have different dynamics, a
sensible approach here would be to add a new helper dedicated to comparing
CPU capacities.
Also note that comparing capacity extrema of local and source sched_group's
doesn't make much sense when at the day of the day the imbalance will be
pulled by a known env->dst_cpu, whose capacity can be anywhere within the
local group's capacity extrema.
While at it, replace group_smaller_{min, max}_cpu_capacity() with
comparisons of the source group's min/max capacity and the destination
CPU's capacity.
Link: https://lkml.kernel.org/r/20210407220628.3798191-4-valentin.schneider@arm.com
Change-Id: Ia448773d82f5b2c36e94d608a15fb076d5af598f
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
When boost is enabled or mid capacity cluster has single cpu, then
we traverse both mid and max capacity CPUs. When all of these CPUs
are busy, the task should be placed on the CPU which has highest
spare capacity. However the current code completely ignores
the mid capacity CPU when it encounters the max capacity CPU,
though the former has more spare capacity.
Fix this issue by enforcing the spare capacity check across
different capacity CPUs.
Change-Id: I726d47f985f9e59d2bb1c6cf2b743796b57e3051
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Limiting CPU capacity updates, which are quite cheap, results in worse
balancing decisions during opportunistic balancing (e.g., SD_BALANCE_WAKE).
This causes opportunistic placement decisions to be skewed using stale CPU
capacity data, and when a CPU isn't idling much, its capacity suffers from
even more staleness since the only exception to the 100 ms capacity update
ratelimit is a CPU exiting idle.
Since the capacity updates are cheap, always do it when load balancing in
order to improve opportunistic task placement decisions.
Change-Id: I3727e5dcc00ebdbe57b967b51cd8df7ac26d61af
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
All CPUs in a performance domain share the same capacity, and therefore
aren't different from one another when distinguishing between which one is
better for asymmetric packing.
Instead of unfairly prioritizing lower-numbered CPUs within the same
performance domain, treat all CPUs in a performance domain equally for
asymmetric packing.
Change-Id: Ibe0c10034d237894d505c5022c73b2671a632004
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 6f58caae21910a0700592a9acf12d7f6dda2e7bc.
It's not present in newer CAF kernels and Google removed it on their
4.14 devices as well.
Change-Id: I3675cbfe4a37ae9ed31bf3659a545965a0d59c6f
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
SD732G does not have a single core cluster.
This reverts commit 5d72d20d1438b7c5e74c88250976b5d9570d7ed1.
Change-Id: Idedb27ca6e260552b351011e8970bce9758f48b6
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit ce00e4cfc5da07c93c25d8653e754ad2a6c9eab1.
Change-Id: Ie9635a191f1718cdd0c9a4a490d1aab5d1594eac
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>