It may lead to pstore being lost which is not a desirable behavior.
Change-Id: I483db403c4abe9fc03119d3d31a049f9a6ec5754
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
ShadowCallStack protects against stack buffer overflows; there is little
point in keeping both SCS and Stack Protector enabled. Runtime testing
showed that SCS is faster than Stack Protector.
Change-Id: Ibea0ea09055f2d1bf3d65b8f34fdfce4a5ca470b
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Disabling CT NETLINK won't hurt anything
in Android. Reduces NETLINK wakeups a bit.
Change-Id: I16670140779a2fe4f13f1e7849f7a77053396047
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: Iaeeeb1491e8940bf870aac0e36a75591a286776d
Signed-off-by: Samuel Pascua <pascua.samuel.14@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
It does seem like functions that use those values suffer __A LOT__
after moving to 300 HZ, that causes terrible connectivity issues
on Mata and maybe other devices with bad antennas.
Change-Id: Ie2f4e4d3ce4beaedad733d58747cab592e2fb4e8
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
There is no need to "select BUG" when CONFIG_HARDENED_USERCOPY is enabled.
The kernel thread will always die, regardless of the CONFIG_BUG.
Change-Id: I1e297a1c1d5fdf8ad48bed4088537e13066565a7
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This caused so much performance regressions in hackbench.
Change-Id: Ib72d4f4aca54ee00799809d4eb2fcb6cdb1f4971
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I512afa97c7cc07a9200f0ba3265fc9b3fbca44cf
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Since this will be part of the scheduler's hotpath in some cases, use
unlikely() for few of the obvious conditionals.
Change-Id: I751f3189304326caeab7ccccc5df327f9be3c897
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
That's a LONG wait. Don't busy wait there to save power.
Change-Id: I14069b2aaf1872d276932f9904d2c1a20ee0845c
Signed-off-by: Kazuki Hashimoto <kazukih@tuta.io>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: Iee0c13a5f0773c31b8a896d650fe9b61ab50828e
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
For lower runtime overheads
Change-Id: Ic4e59db7be8f90d92660c3462c566b4568929655
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I9e78135ea3c5644f3328da7b7424ea80a15c2f85
Signed-off-by: Samuel Pascua <pascua.samuel.14@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Other stuff don't do anything or only there for debugging.
Change-Id: I2f2311dbafef0edcc89b9174605cb22b2169cf69
Signed-off-by: Kazuki Hashimoto <kazukih0205@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I27814ed92ac864c92393ee674058b2ab07708905
Signed-off-by: Kazuki Hashimoto <kazukih0205@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: Ib7a07ea499910cc32e41a05d65d2e2b6f9d15bbc
Signed-off-by: Samuel Pascua <pascua.samuel.14@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Page pool additions and removals are very hot during GPU workloads, so
they should be optimized accordingly. We can use a lock-less list for
storing the free pages in order to speed things up. The lock-less list
allows for one llist_del_first() user and unlimited llist_add() users to
run concurrently, so only a spin lock around the llist_del_first() is
needed; everything else is lock-free. The per-pool page count is now an
atomic to make it lock-free as well.
Change-Id: I5a1b6cef1eba2172728037ff5b38a2729c9e1d3e
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit ea9ce4d947b9e7177cb32046f497405947622030.
Change-Id: I37f4fda9b19ee4d102a451cd031ebfabadc90228
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 91be8a168642a6431da272c5400b21b297281d29.
Change-Id: I7b5f89e3c79a26b5614c2fe25b983e268993651f
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 032448d80056adeb00075e2883fc042fc334d1e8.
Change-Id: I142e5cf4220a4c093f6eb9537f9fcf948a5ad0e7
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 96a41f7e12eba124b77dee4919aa3f3c01f9b34d.
Change-Id: Ia3a354503f141325ac20fccadbc7d2be87d11dba
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit f110fe3f864f093698fc6237f4b5984d749ae432.
Change-Id: I178d04e2db3df94b11db852d399f6d0eeaaca0d4
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I77e5663fa00afba2211b52997e007a0f2e6364e2
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The idle load balancer (ILB) is kicked whenever a task is misfit, meaning
that the task doesn't fit on its CPU (i.e., fits_capacity() == false).
Since CASS makes no attempt to place tasks such that they'll fit on the CPU
they're placed upon, the ILB works harder to correct this and rebalances
misfit tasks onto a CPU with sufficient capacity.
By fighting the ILB like this, CASS degrades both energy efficiency and
performance.
Play nicely with the ILB by trying to place tasks onto CPUs that fit.
Change-Id: Id2e9919fbd506dbbcddbe04a0ddf6c02ecb58ac3
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The same formula to check utilization against capacity (after
considering capacity_margin) is already used at 5 different locations.
This patch creates a new macro, fits_capacity(), which can be used from
all these locations without exposing the details of it and hence
simplify code.
Link: https://lkml.kernel.org/r/b477ac75a2b163048bdaeb37f57b4c3f04f75a31.1559631700.git.viresh.kumar@linaro.org
Change-Id: Id857cefcc57b17e961e5f94bafe2b11c91b91354
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit f330ea9c94be15e326119436073f3c41ad206712.
Change-Id: I1bafe1424fb5b8454332938a15d7e402503b394f
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
MIUI-1428085
Change-Id: I7c910321b66c6877cbc5656b3b3e426557dc3314
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
MIUI-1428085
The discard thread can only process 8 requests at a time by default.
So fstrim need to handle the remaining discard requests while using
discard option.
Change-Id: I5eac38c34182607e8dceeb13273522b10ce02af8
Signed-off-by: liuchao12 <liuchao12@xiaomi.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Some architectures have implemented optimized copy_page for full page
copying, such as arm.
On my arm platform, use the copy_page helper for single page copying is
about 10 percent faster than memcpy.
Change-Id: Ie28de9ef5954d0c232b418f382471bc7c125563f
Signed-off-by: Dark-Matter7232 <me@const.eu.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Some architectures have implemented optimized copy_page for full page
copying, such as arm.
On my arm platform, use the copy_page helper for single page copying is
about 10 percent faster than memcpy.
Change-Id: I1d012a94f40f08a9cd83e28a9e7efea1ef1e2d70
Signed-off-by: Dark-Matter7232 <me@const.eu.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: If79fd167f5c6017a4d234145482df72781f1ae02
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The original dedup code does not handle collision from the observation
that it practically does not happen.
For additional peace of mind, use a bigger hash size for reducing the
possibility of collision even further.
Change-Id: I83e740c63373a06c4f1bdb630adf9c8a9d4f15d9
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I27b3a26ae96f95a728610b8a7e6b6f8f0d418d8b
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 3a3600895214a4dfe032ee825b07e20582411f90.
crc32(c) is for checking data corruption, not for comparing with other data
and minimizing collisions.
Change-Id: Ic079f225d64be6db3c547749fd8b1a03a79dfed9
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
xxhash's performance depends heavily on compiler optimizations including
inlines. Follow upstream's behavior and inline those helper functions.
Change-Id: I1bc08b7ef6a491817b9ed5e8daab0f1993081f71
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
These simply wraps memcpy().
Replace it with macros so that it is naturally inlined.
Change-Id: I32df8e35dd99611ab0cbd472146b0ef3ecb847d3
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Patch series "Currently used jhash are slow enough and replace it allow as
to make KSM", v8.
Apeed (in kernel):
ksm: crc32c hash() 12081 MB/s
ksm: xxh64 hash() 8770 MB/s
ksm: xxh32 hash() 4529 MB/s
ksm: jhash2 hash() 1569 MB/s
Sioh Lee's testing (copy from other mail):
Test platform: openstack cloud platform (NEWTON version)
Experiment node: openstack based cloud compute node (CPU: xeon E5-2620 v3, memory 64gb)
VM: (2 VCPU, RAM 4GB, DISK 20GB) * 4
Linux kernel: 4.14 (latest version)
KSM setup - sleep_millisecs: 200ms, pages_to_scan: 200
Experiment process:
Firstly, we turn off KSM and launch 4 VMs. Then we turn on the KSM and
measure the checksum computation time until full_scans become two.
The experimental results (the experimental value is the average of the measured values)
crc32c_intel: 1084.10ns
crc32c (no hardware acceleration): 7012.51ns
xxhash32: 2227.75ns
xxhash64: 1413.16ns
jhash2: 5128.30ns
In summary, the result shows that crc32c_intel has advantages over all of
the hash function used in the experiment. (decreased by 84.54% compared
to crc32c, 78.86% compared to jhash2, 51.33% xxhash32, 23.28% compared to
xxhash64) the results are similar to those of Timofey.
But, use only xxhash for now, because for using crc32c, cryptoapi must be
initialized first - that require some tricky solution to work good in all
situations.
So:
- First patch implement compile time pickup of fastest implementation of
xxhash for target platform.
- The second patch replaces jhash2 with xxhash
This patch (of 2):
xxh32() - fast on both 32/64-bit platforms
xxh64() - fast only on 64-bit platform
Create xxhash() which will pick up the fastest version at compile time.
Link: http://lkml.kernel.org/r/20181023182554.23464-2-nefelim4ag@gmail.com
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: leesioh <solee@os.korea.ac.kr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I70ea705120672baf63ccd01965480c528529b521
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: Iad838615d82eebd050c9a28b167f4bf3163ec0d2
Signed-off-by: Dark-Matter7232 <me@const.eu.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
It is too bad to do a tight loop every adding pkt. When the hotspot is turned on, I notice that the
htt_htc_misc_pkt_list_trim() function consumes at least 5% of CPU time. By caching the head of pkt
queue and freeing multiple pkts at once to reduce CPU consumption.
Change-Id: I0d4b9a266b8def08a85fb41805f9368dd49649eb
Signed-off-by: Julian Liu <wlootlxt123@gmail.com>
Signed-off-by: Alex Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
napi_disable() is subject to an hangup, when the threaded
mode is enabled and the napi is under heavy traffic.
If the relevant napi has been scheduled and the napi_disable()
kicks in before the next napi_threaded_wait() completes - so
that the latter quits due to the napi_disable_pending() condition,
the existing code leaves the NAPI_STATE_SCHED bit set and the
napi_disable() loop waiting for such bit will hang.
This patch addresses the issue by dropping the NAPI_STATE_DISABLE
bit test in napi_thread_wait(). The later napi_threaded_poll()
iteration will take care of clearing the NAPI_STATE_SCHED.
This also addresses a related problem reported by Jakub:
before this patch a napi_disable()/napi_enable() pair killed
the napi thread, effectively disabling the threaded mode.
On the patched kernel napi_disable() simply stops scheduling
the relevant thread.
v1 -> v2:
- let the main napi_thread_poll() loop clear the SCHED bit
Reported-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/883923fa22745a9589e8610962b7dc59df09fb1f.1617981844.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 27f0ad71699de41bae013c367b95a6b319cc46a9)
Change-Id: Ib586ca1f170c5321a37091c97d8ca710d8b21aad
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>