Change-Id: Iaeeeb1491e8940bf870aac0e36a75591a286776d
Signed-off-by: Samuel Pascua <pascua.samuel.14@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
It does seem like functions that use those values suffer __A LOT__
after moving to 300 HZ, that causes terrible connectivity issues
on Mata and maybe other devices with bad antennas.
Change-Id: Ie2f4e4d3ce4beaedad733d58747cab592e2fb4e8
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
There is no need to "select BUG" when CONFIG_HARDENED_USERCOPY is enabled.
The kernel thread will always die, regardless of the CONFIG_BUG.
Change-Id: I1e297a1c1d5fdf8ad48bed4088537e13066565a7
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This caused so much performance regressions in hackbench.
Change-Id: Ib72d4f4aca54ee00799809d4eb2fcb6cdb1f4971
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I512afa97c7cc07a9200f0ba3265fc9b3fbca44cf
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Since this will be part of the scheduler's hotpath in some cases, use
unlikely() for few of the obvious conditionals.
Change-Id: I751f3189304326caeab7ccccc5df327f9be3c897
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
That's a LONG wait. Don't busy wait there to save power.
Change-Id: I14069b2aaf1872d276932f9904d2c1a20ee0845c
Signed-off-by: Kazuki Hashimoto <kazukih@tuta.io>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: Iee0c13a5f0773c31b8a896d650fe9b61ab50828e
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
For lower runtime overheads
Change-Id: Ic4e59db7be8f90d92660c3462c566b4568929655
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I9e78135ea3c5644f3328da7b7424ea80a15c2f85
Signed-off-by: Samuel Pascua <pascua.samuel.14@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Other stuff don't do anything or only there for debugging.
Change-Id: I2f2311dbafef0edcc89b9174605cb22b2169cf69
Signed-off-by: Kazuki Hashimoto <kazukih0205@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I27814ed92ac864c92393ee674058b2ab07708905
Signed-off-by: Kazuki Hashimoto <kazukih0205@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: Ib7a07ea499910cc32e41a05d65d2e2b6f9d15bbc
Signed-off-by: Samuel Pascua <pascua.samuel.14@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Page pool additions and removals are very hot during GPU workloads, so
they should be optimized accordingly. We can use a lock-less list for
storing the free pages in order to speed things up. The lock-less list
allows for one llist_del_first() user and unlimited llist_add() users to
run concurrently, so only a spin lock around the llist_del_first() is
needed; everything else is lock-free. The per-pool page count is now an
atomic to make it lock-free as well.
Change-Id: I5a1b6cef1eba2172728037ff5b38a2729c9e1d3e
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit ea9ce4d947b9e7177cb32046f497405947622030.
Change-Id: I37f4fda9b19ee4d102a451cd031ebfabadc90228
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 91be8a168642a6431da272c5400b21b297281d29.
Change-Id: I7b5f89e3c79a26b5614c2fe25b983e268993651f
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 032448d80056adeb00075e2883fc042fc334d1e8.
Change-Id: I142e5cf4220a4c093f6eb9537f9fcf948a5ad0e7
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 96a41f7e12eba124b77dee4919aa3f3c01f9b34d.
Change-Id: Ia3a354503f141325ac20fccadbc7d2be87d11dba
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit f110fe3f864f093698fc6237f4b5984d749ae432.
Change-Id: I178d04e2db3df94b11db852d399f6d0eeaaca0d4
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I77e5663fa00afba2211b52997e007a0f2e6364e2
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The idle load balancer (ILB) is kicked whenever a task is misfit, meaning
that the task doesn't fit on its CPU (i.e., fits_capacity() == false).
Since CASS makes no attempt to place tasks such that they'll fit on the CPU
they're placed upon, the ILB works harder to correct this and rebalances
misfit tasks onto a CPU with sufficient capacity.
By fighting the ILB like this, CASS degrades both energy efficiency and
performance.
Play nicely with the ILB by trying to place tasks onto CPUs that fit.
Change-Id: Id2e9919fbd506dbbcddbe04a0ddf6c02ecb58ac3
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The same formula to check utilization against capacity (after
considering capacity_margin) is already used at 5 different locations.
This patch creates a new macro, fits_capacity(), which can be used from
all these locations without exposing the details of it and hence
simplify code.
Link: https://lkml.kernel.org/r/b477ac75a2b163048bdaeb37f57b4c3f04f75a31.1559631700.git.viresh.kumar@linaro.org
Change-Id: Id857cefcc57b17e961e5f94bafe2b11c91b91354
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit f330ea9c94be15e326119436073f3c41ad206712.
Change-Id: I1bafe1424fb5b8454332938a15d7e402503b394f
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
MIUI-1428085
Change-Id: I7c910321b66c6877cbc5656b3b3e426557dc3314
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
MIUI-1428085
The discard thread can only process 8 requests at a time by default.
So fstrim need to handle the remaining discard requests while using
discard option.
Change-Id: I5eac38c34182607e8dceeb13273522b10ce02af8
Signed-off-by: liuchao12 <liuchao12@xiaomi.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Some architectures have implemented optimized copy_page for full page
copying, such as arm.
On my arm platform, use the copy_page helper for single page copying is
about 10 percent faster than memcpy.
Change-Id: Ie28de9ef5954d0c232b418f382471bc7c125563f
Signed-off-by: Dark-Matter7232 <me@const.eu.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Some architectures have implemented optimized copy_page for full page
copying, such as arm.
On my arm platform, use the copy_page helper for single page copying is
about 10 percent faster than memcpy.
Change-Id: I1d012a94f40f08a9cd83e28a9e7efea1ef1e2d70
Signed-off-by: Dark-Matter7232 <me@const.eu.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: If79fd167f5c6017a4d234145482df72781f1ae02
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The original dedup code does not handle collision from the observation
that it practically does not happen.
For additional peace of mind, use a bigger hash size for reducing the
possibility of collision even further.
Change-Id: I83e740c63373a06c4f1bdb630adf9c8a9d4f15d9
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I27b3a26ae96f95a728610b8a7e6b6f8f0d418d8b
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 3a3600895214a4dfe032ee825b07e20582411f90.
crc32(c) is for checking data corruption, not for comparing with other data
and minimizing collisions.
Change-Id: Ic079f225d64be6db3c547749fd8b1a03a79dfed9
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
xxhash's performance depends heavily on compiler optimizations including
inlines. Follow upstream's behavior and inline those helper functions.
Change-Id: I1bc08b7ef6a491817b9ed5e8daab0f1993081f71
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
These simply wraps memcpy().
Replace it with macros so that it is naturally inlined.
Change-Id: I32df8e35dd99611ab0cbd472146b0ef3ecb847d3
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Patch series "Currently used jhash are slow enough and replace it allow as
to make KSM", v8.
Apeed (in kernel):
ksm: crc32c hash() 12081 MB/s
ksm: xxh64 hash() 8770 MB/s
ksm: xxh32 hash() 4529 MB/s
ksm: jhash2 hash() 1569 MB/s
Sioh Lee's testing (copy from other mail):
Test platform: openstack cloud platform (NEWTON version)
Experiment node: openstack based cloud compute node (CPU: xeon E5-2620 v3, memory 64gb)
VM: (2 VCPU, RAM 4GB, DISK 20GB) * 4
Linux kernel: 4.14 (latest version)
KSM setup - sleep_millisecs: 200ms, pages_to_scan: 200
Experiment process:
Firstly, we turn off KSM and launch 4 VMs. Then we turn on the KSM and
measure the checksum computation time until full_scans become two.
The experimental results (the experimental value is the average of the measured values)
crc32c_intel: 1084.10ns
crc32c (no hardware acceleration): 7012.51ns
xxhash32: 2227.75ns
xxhash64: 1413.16ns
jhash2: 5128.30ns
In summary, the result shows that crc32c_intel has advantages over all of
the hash function used in the experiment. (decreased by 84.54% compared
to crc32c, 78.86% compared to jhash2, 51.33% xxhash32, 23.28% compared to
xxhash64) the results are similar to those of Timofey.
But, use only xxhash for now, because for using crc32c, cryptoapi must be
initialized first - that require some tricky solution to work good in all
situations.
So:
- First patch implement compile time pickup of fastest implementation of
xxhash for target platform.
- The second patch replaces jhash2 with xxhash
This patch (of 2):
xxh32() - fast on both 32/64-bit platforms
xxh64() - fast only on 64-bit platform
Create xxhash() which will pick up the fastest version at compile time.
Link: http://lkml.kernel.org/r/20181023182554.23464-2-nefelim4ag@gmail.com
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: leesioh <solee@os.korea.ac.kr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I70ea705120672baf63ccd01965480c528529b521
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: Iad838615d82eebd050c9a28b167f4bf3163ec0d2
Signed-off-by: Dark-Matter7232 <me@const.eu.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
It is too bad to do a tight loop every adding pkt. When the hotspot is turned on, I notice that the
htt_htc_misc_pkt_list_trim() function consumes at least 5% of CPU time. By caching the head of pkt
queue and freeing multiple pkts at once to reduce CPU consumption.
Change-Id: I0d4b9a266b8def08a85fb41805f9368dd49649eb
Signed-off-by: Julian Liu <wlootlxt123@gmail.com>
Signed-off-by: Alex Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
napi_disable() is subject to an hangup, when the threaded
mode is enabled and the napi is under heavy traffic.
If the relevant napi has been scheduled and the napi_disable()
kicks in before the next napi_threaded_wait() completes - so
that the latter quits due to the napi_disable_pending() condition,
the existing code leaves the NAPI_STATE_SCHED bit set and the
napi_disable() loop waiting for such bit will hang.
This patch addresses the issue by dropping the NAPI_STATE_DISABLE
bit test in napi_thread_wait(). The later napi_threaded_poll()
iteration will take care of clearing the NAPI_STATE_SCHED.
This also addresses a related problem reported by Jakub:
before this patch a napi_disable()/napi_enable() pair killed
the napi thread, effectively disabling the threaded mode.
On the patched kernel napi_disable() simply stops scheduling
the relevant thread.
v1 -> v2:
- let the main napi_thread_poll() loop clear the SCHED bit
Reported-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/883923fa22745a9589e8610962b7dc59df09fb1f.1617981844.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 27f0ad71699de41bae013c367b95a6b319cc46a9)
Change-Id: Ib586ca1f170c5321a37091c97d8ca710d8b21aad
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
determine if the kthread owns this napi and could call napi->poll() on
it. However, if socket busy poll is enabled, it is possible that the
busy poll thread grabs this SCHED bit (after the previous napi->poll()
invokes napi_complete_done() and clears SCHED bit) and tries to poll
on the same napi. napi_disable() could grab the SCHED bit as well.
This patch tries to fix this race by adding a new bit
NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
____napi_schedule() if the threaded mode is enabled, and gets cleared
in napi_complete_done(), and we only poll the napi in kthread if this
bit is set. This helps distinguish the ownership of the napi between
kthread and other scenarios and fixes the race issue.
Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
Reported-by: Martin Zaharinov <micron10@gmail.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: Alexander Duyck <alexanderduyck@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cb038357937ee4f589aab2469ec3896dce90f317)
Change-Id: Idd1d67f6f4620dc332fa61918616dcf29137a44f
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
This patch adds a new sysfs attribute to the network device class.
Said attribute provides a per-device control to enable/disable the
threaded mode for all the napi instances of the given network device,
without the need for a device up/down.
User sets it to 1 or 0 to enable or disable threaded mode.
Note: when switching between threaded and the current softirq based mode
for a napi instance, it will not immediately take effect if the napi is
currently being polled. The mode switch will happen for the next time
napi_schedule() is called.
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Co-developed-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Wei Wang <weiwan@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5fdd2f0e5c64846bf3066689b73fc3b8dddd1c74)
Change-Id: I0a1616d1cc8a89ba9aa6500c1b7daa171c793632
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Dark-Matter7232 <me@const.eu.org>
This patch allows running each napi poll loop inside its own
kernel thread.
The kthread is created during netif_napi_add() if dev->threaded
is set. And threaded mode is enabled in napi_enable(). We will
provide a way to set dev->threaded and enable threaded mode
without a device up/down in the following patch.
Once that threaded mode is enabled and the kthread is
started, napi_schedule() will wake-up such thread instead
of scheduling the softirq.
The threaded poll loop behaves quite likely the net_rx_action,
but it does not have to manipulate local irqs and uses
an explicit scheduling point based on netdev_budget.
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Co-developed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Wei Wang <weiwan@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 29863d41bb6e1d969c62fdb15b0961806942960e)
Change-Id: Ifa5817efd5b1a999ae57e8c79accd8b390682e78
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Dark-Matter7232 <me@const.eu.org>