msm-4.14

mirror of https://github.com/rd-stuffs/msm-4.14.git synced 2025-02-20 11:45:48 +08:00

Author	SHA1	Message	Date
Waiman Long	dc4d0337e6	locking/rwsem: Remove arch specific rwsem files As the generic rwsem-xadd code is using the appropriate acquire and release versions of the atomic operations, the arch specific rwsem.h files will not be that much faster than the generic code as long as the atomic functions are properly implemented. So we can remove those arch specific rwsem.h and stop building asm/rwsem.h to reduce maintenance effort. Currently, only x86, alpha and ia64 have implemented architecture specific fast paths. I don't have access to alpha and ia64 systems for testing, but they are legacy systems that are not likely to be updated to the latest kernel anyway. By using a rwsem microbenchmark, the total locking rates on a 4-socket 56-core 112-thread x86-64 system before and after the patch were as follows (mixed means equal # of read and write locks): Before Patch After Patch # of Threads wlock rlock mixed wlock rlock mixed ------------ ----- ----- ----- ----- ----- ----- 1 29,201 30,143 29,458 28,615 30,172 29,201 2 6,807 13,299 1,171 7,725 15,025 1,804 4 6,504 12,755 1,520 7,127 14,286 1,345 8 6,762 13,412 764 6,826 13,652 726 16 6,693 15,408 662 6,599 15,938 626 32 6,145 15,286 496 5,549 15,487 511 64 5,812 15,495 60 5,858 15,572 60 There were some run-to-run variations for the multi-thread tests. For x86-64, using the generic C code fast path seems to be a little bit faster than the assembly version with low lock contention. Looking at the assembly version of the fast paths, there are assembly to/from C code wrappers that save and restore all the callee-clobbered registers (7 registers on x86-64). The assembly generated from the generic C code doesn't need to do that. That may explain the slight performance gain here. The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h with no code change as no other code other than those under kernel/locking needs to access the internal rwsem macros and functions. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-riscv@lists.infradead.org Cc: linux-um@lists.infradead.org Cc: linux-xtensa@linux-xtensa.org Cc: linuxppc-dev@lists.ozlabs.org Cc: nios2-dev@lists.rocketboards.org Cc: openrisc@lists.librecores.org Cc: uclinux-h8-devel@lists.sourceforge.jp Link: https://lkml.kernel.org/r/20190322143008.21313-2-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> [kdrag0n: Dropped conflicting architecture changes] Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:06 +05:30
Waiman Long	1edf200866	locking/rwsem: Make owner store task pointer of last owning reader Currently, when a reader acquires a lock, it only sets the RWSEM_READER_OWNED bit in the owner field. The other bits are simply not used. When debugging hanging cases involving rwsems and readers, the owner value does not provide much useful information at all. This patch modifies the current behavior to always store the task_struct pointer of the last rwsem-acquiring reader in a reader-owned rwsem. This may be useful in debugging rwsem hanging cases especially if only one reader is involved. However, the task in the owner field may not the real owner or one of the real owners at all when the owner value is examined, for example, in a crash dump. So it is just an additional hint about the past history. If CONFIG_DEBUG_RWSEMS=y is enabled, the owner field will be checked at unlock time too to make sure the task pointer value is valid. That does have a slight performance cost and so is only enabled as part of that debug option. From the performance point of view, it is expected that the changes shouldn't have any noticeable performance impact. A rwsem microbenchmark (with 48 worker threads and 1:1 reader/writer ratio) was ran on a 2-socket 24-core 48-thread Haswell system. The locking rates on a 4.19-rc1 based kernel were as follows: 1) Unpatched kernel: 543.3 kops/s 2) Patched kernel: 549.2 kops/s 3) Patched kernel (CONFIG_DEBUG_RWSEMS on): 546.6 kops/s There was actually a slight increase in performance (1.1%) in this particular case. Maybe it was caused by the elimination of a branch or just a testing noise. Turning on the CONFIG_DEBUG_RWSEMS option also had less than the expected impact on performance. The least significant 2 bits of the owner value are now used to designate the rwsem is readers owned and the owners are anonymous. Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1536265114-10842-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:06 +05:30
Waiman Long	5789523bb6	locking/rwsem: Fix up_read_non_owner() warning with DEBUG_RWSEMS It was found that the use of up_read_non_owner() in NFS was causing the following warning when DEBUG_RWSEMS was configured. DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0))) Looking into the rwsem.c file, it was discovered that the corresponding down_read_non_owner() function was not setting the owner field properly. This is fixed now, and the warning should be gone. Fixes: 5149cbac4235 ("locking/rwsem: Add DEBUG_RWSEMS to look for lock/unlock mismatches") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Gavin Schenk <g.schenk@eckelmann.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-nfs@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1527168398-4291-1-git-send-email-longman@redhat.com Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:06 +05:30
Waiman Long	9fc9683d14	locking/rwsem: Add DEBUG_RWSEMS to look for lock/unlock mismatches For a rwsem, locking can either be exclusive or shared. The corresponding exclusive or shared unlock must be used. Otherwise, the protected data structures may get corrupted or the lock may be in an inconsistent state. In order to detect such anomaly, a new configuration option DEBUG_RWSEMS is added which can be enabled to look for such mismatches and print warnings that that happens. Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1522445280-7767-2-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:06 +05:30
Kirill Tkhai	4083654f4f	locking/rwsem: Add down_read_killable() Similar to down_read() and down_write_killable(), add killable version of down_read(), based on __down_read_killable() function, added in previous patches. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: avagin@virtuozzo.com Cc: davem@davemloft.net Cc: fenghua.yu@intel.com Cc: gorcunov@virtuozzo.com Cc: heiko.carstens@de.ibm.com Cc: hpa@zytor.com Cc: ink@jurassic.park.msu.ru Cc: mattst88@gmail.com Cc: rientjes@google.com Cc: rth@twiddle.net Cc: schwidefsky@de.ibm.com Cc: tony.luck@intel.com Cc: viro@zeniv.linux.org.uk Link: http://lkml.kernel.org/r/150670119884.23930.2585570605960763239.stgit@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:06 +05:30
Waiman Long	0904ccaaa1	locking/qspinlock: Remove unnecessary BUG_ON() call With the > 4 nesting levels case handled by the commit: d682b596d993 ("locking/qspinlock: Handle > 4 slowpath nesting levels") the BUG_ON() call in encode_tail() will never actually be triggered. Remove it. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1551057253-3231-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:06 +05:30
Waiman Long	9a0758fda3	locking/qspinlock_stat: Track the no MCS node available case Track the number of slowpath locking operations that are being done without any MCS node available as well renaming lock_index[123] to make them more descriptive. Using these stat counters is one way to find out if a code path is being exercised. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: James Morse <james.morse@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SRINIVAS <srinivas.eeda@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Link: https://lkml.kernel.org/r/1548798828-16156-3-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:06 +05:30
Waiman Long	1079a7a0d6	locking/qspinlock: Handle > 4 slowpath nesting levels Four queue nodes per CPU are allocated to enable up to 4 nesting levels using the per-CPU nodes. Nested NMIs are possible in some architectures. Still it is very unlikely that we will ever hit more than 4 nested levels with contention in the slowpath. When that rare condition happens, however, it is likely that the system will hang or crash shortly after that. It is not good and we need to handle this exception case. This is done by spinning directly on the lock using repeated trylock. This alternative code path should only be used when there is nested NMIs. Assuming that the locks used by those NMI handlers will not be heavily contended, a simple TAS locking should work out. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: James Morse <james.morse@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: SRINIVAS <srinivas.eeda@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Link: https://lkml.kernel.org/r/1548798828-16156-2-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:06 +05:30
Waiman Long	3e70514901	locking/pvqspinlock: Extend node size when pvqspinlock is configured The qspinlock code supports up to 4 levels of slowpath nesting using four per-CPU mcs_spinlock structures. For 64-bit architectures, they fit nicely in one 64-byte cacheline. For para-virtualized (PV) qspinlocks it needs to store more information in the per-CPU node structure than there is space for. It uses a trick to use a second cacheline to hold the extra information that it needs. So PV qspinlock needs to access two extra cachelines for its information whereas the native qspinlock code only needs one extra cacheline. Freshly added counter profiling of the qspinlock code, however, revealed that it was very rare to use more than two levels of slowpath nesting. So it doesn't make sense to penalize PV qspinlock code in order to have four mcs_spinlock structures in the same cacheline to optimize for a case in the native qspinlock code that rarely happens. Extend the per-CPU node structure to have two more long words when PV qspinlock locks are configured to hold the extra data that it needs. As a result, the PV qspinlock code will enjoy the same benefit of using just one extra cacheline like the native counterpart, for most cases. [ mingo: Minor changelog edits. ] Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1539697507-28084-2-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:05 +05:30
Matthew Wilcox	5d2164461e	locking/spinlocks: Remove an instruction from spin and write locks Both spin locks and write locks currently do: f0 0f b1 17 lock cmpxchg %edx,(%rdi) 85 c0 test %eax,%eax 75 05 jne [slowpath] This 'test' insn is superfluous; the cmpxchg insn sets the Z flag appropriately. Peter pointed out that using atomic_try_cmpxchg_acquire() will let the compiler know this is true. Comparing before/after disassemblies show the only effect is to remove this insn. Take this opportunity to make the spin & write lock code resemble each other more closely and have similar likely() hints. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Link: http://lkml.kernel.org/r/20180820162639.GC25153@bombadil.infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:05 +05:30
Andrea Parri	43e1f71f36	locking/spinlocks: Clean up comment and #ifndef for {,queued_}spin_is_locked() Removes "#ifndef queued_spin_is_locked" from the generic code: this is unused and it's reasonable to conclude that it will continue to be unused. Also removes the comment about spin_is_locked() from mutex_is_locked(): the comment remains valid but not particularly useful. Suggested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akiyks@gmail.com Cc: boqun.feng@gmail.com Cc: dhowells@redhat.com Cc: j.alglave@ucl.ac.uk Cc: linux-arch@vger.kernel.org Cc: luc.maranget@inria.fr Cc: npiggin@gmail.com Cc: parri.andrea@gmail.com Cc: stern@rowland.harvard.edu Link: http://lkml.kernel.org/r/1526338889-7003-3-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:05 +05:30
Will Deacon	7c9f6bd5c9	locking/qspinlock: Use smp_store_release() in queued_spin_unlock() A qspinlock can be unlocked simply by writing zero to the locked byte. This can be implemented in the generic code, so do that and remove the arch-specific override for x86 in the !PV case. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: linux-arm-kernel@lists.infradead.org Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1524738868-31318-11-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> [kdrag0n: Stripped x86 part due to conflicts] Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:05 +05:30
Waiman Long	6e792005ec	locking/qspinlock_stat: Count instances of nested lock slowpaths Queued spinlock supports up to 4 levels of lock slowpath nesting - user context, soft IRQ, hard IRQ and NMI. However, we are not sure how often the nesting happens. So add 3 more per-CPU stat counters to track the number of instances where nesting index goes to 1, 2 and 3 respectively. On a dual-socket 64-core 128-thread Zen server, the following were the new stat counter values under different circumstances: State slowpath index1 index2 index3 ----- -------- ------ ------ ------- After bootup 1,012,150 82 0 0 After parallel build + perf-top 125,195,009 82 0 0 So the chance of having more than 2 levels of nesting is extremely low. [ mingo: Minor changelog edits. ] Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1539697507-28084-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:05 +05:30
Peter Zijlstra	762e647595	locking/qspinlock, x86: Provide liveness guarantee commit 7aa54be2976550f17c11a1c3e3630002dea39303 upstream. On x86 we cannot do fetch_or() with a single instruction and thus end up using a cmpxchg loop, this reduces determinism. Replace the fetch_or() with a composite operation: tas-pending + load. Using two instructions of course opens a window we previously did not have. Consider the scenario: CPU0 CPU1 CPU2 1) lock trylock -> (0,0,1) 2) lock trylock /* fail */ 3) unlock -> (0,0,0) 4) lock trylock -> (0,0,1) 5) tas-pending -> (0,1,1) load-val <- (0,1,0) from 3 6) clear-pending-set-locked -> (0,0,1) FAIL: _2_ owners where 5) is our new composite operation. When we consider each part of the qspinlock state as a separate variable (as we can when _Q_PENDING_BITS == 8) then the above is entirely possible, because tas-pending will only RmW the pending byte, so the later load is able to observe prior tail and lock state (but not earlier than its own trylock, which operates on the whole word, due to coherence). To avoid this we need 2 things: - the load must come after the tas-pending (obviously, otherwise it can trivially observe prior state). - the tas-pending must be a full word RmW instruction, it cannot be an XCHGB for example, such that we cannot observe other state prior to setting pending. On x86 we can realize this by using "LOCK BTS m32, r32" for tas-pending followed by a regular load. Note that observing later state is not a problem: - if we fail to observe a later unlock, we'll simply spin-wait for that store to become visible. - if we observe a later xchg_tail(), there is no difference from that xchg_tail() having taken place before the tas-pending. Suggested-by: Will Deacon <will.deacon@arm.com> Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: andrea.parri@amarulasolutions.com Cc: longman@redhat.com Fixes: 59fb586b4a07 ("locking/qspinlock: Remove unbounded cmpxchg() loop from locking slowpath") Link: https://lkml.kernel.org/r/20181003130957.183726335@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> [bigeasy: GEN_BINARY_RMWcc macro redo] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:05 +05:30
Peter Zijlstra	0f27e821bc	locking/qspinlock: Rework some comments While working my way through the code again; I felt the comments could use help. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrea.parri@amarulasolutions.com Cc: longman@redhat.com Link: https://lkml.kernel.org/r/20181003130257.156322446@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:05 +05:30
Peter Zijlstra	d2bb67459c	locking/qspinlock: Re-order code Flip the branch condition after atomic_fetch_or_acquire(_Q_PENDING_VAL) such that we loose the indent. This also result in a more natural code flow IMO. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrea.parri@amarulasolutions.com Cc: longman@redhat.com Link: https://lkml.kernel.org/r/20181003130257.156322446@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:05 +05:30
Waiman Long	e4145982d3	locking/qspinlock: Add stat tracking for pending vs. slowpath Currently, the qspinlock_stat code tracks only statistical counts in the PV qspinlock code. However, it may also be useful to track the number of locking operations done via the pending code vs. the MCS lock queue slowpath for the non-PV case. The qspinlock stat code is modified to do that. The stat counter pv_lock_slowpath is renamed to lock_slowpath so that it can be used by both the PV and non-PV cases. Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: linux-arm-kernel@lists.infradead.org Cc: paulmck@linux.vnet.ibm.com Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1524738868-31318-14-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:05 +05:30
Will Deacon	8d54fbfa5f	locking/qspinlock: Use try_cmpxchg() instead of cmpxchg() when locking When reaching the head of an uncontended queue on the qspinlock slow-path, using a try_cmpxchg() instead of a cmpxchg() operation to transition the lock work to _Q_LOCKED_VAL generates slightly better code for x86 and pretty much identical code for arm64. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: linux-arm-kernel@lists.infradead.org Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1524738868-31318-13-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:05 +05:30
Will Deacon	94f9902b78	locking/qspinlock: Elide back-to-back RELEASE operations with smp_wmb() The qspinlock slowpath must ensure that the MCS node is fully initialised before it can be reached by another other CPU. This is currently enforced by using a RELEASE operation when updating the tail and also when linking the node into the waitqueue, since the control dependency off xchg_tail is insufficient to enforce sufficient ordering, see: 95bcade33a8a ("locking/qspinlock: Ensure node is initialised before updating prev->next") Back-to-back RELEASE operations may be expensive on some architectures, particularly those that implement them using fences under the hood. We can replace the two RELEASE operations with a single smp_wmb() fence and use RELAXED operations for the subsequent publishing of the node. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: linux-arm-kernel@lists.infradead.org Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1524738868-31318-12-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:05 +05:30
Will Deacon	bbc29076d2	locking/qspinlock: Use smp_cond_load_relaxed() to wait for next node When a locker reaches the head of the queue and takes the lock, a concurrent locker may enqueue and force the lock holder to spin whilst its node->next field is initialised. Rather than open-code a READ_ONCE/cpu_relax() loop, this can be implemented using smp_cond_load_relaxed() instead. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: linux-arm-kernel@lists.infradead.org Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1524738868-31318-10-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:05 +05:30
Will Deacon	01d82ddeee	locking/qspinlock: Use atomic_cond_read_acquire() Rather than dig into the counter field of the atomic_t inside the qspinlock structure so that we can call smp_cond_load_acquire(), use atomic_cond_read_acquire() instead, which operates on the atomic_t directly. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: linux-arm-kernel@lists.infradead.org Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1524738868-31318-8-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:05 +05:30
Danny Lin	74edecadf6	Revert "locking/qspinlock: Re-order code" This reverts commit 49849a651b8456263391ae767be6f7d02af8ef26. Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:04 +05:30
Danny Lin	4ba2989ee6	Revert "locking/qspinlock, x86: Provide liveness guarantee" This reverts commit 5d01e063296a80e93abe93803602e5cf37309865. Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:04 +05:30
Will Deacon	b6d8a53666	BACKPORT: arm64: locking: Replace ticket lock implementation with qspinlock It's fair to say that our ticket lock has served us well over time, but it's time to bite the bullet and start using the generic qspinlock code so we can make use of explicit MCS queuing and potentially better PV performance in future. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:04 +05:30
Will Deacon	5f0386c971	arm64: barrier: Implement smp_cond_load_relaxed We can provide an implementation of smp_cond_load_relaxed using READ_ONCE and __cmpwait_relaxed. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: kdrag0n <dragon@khronodragon.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:04 +05:30
Andrea Parri	f1a4cb9827	locking/spinlocks/arm64: Remove smp_mb() from arch_spin_is_locked() The following commit: 38b850a73034f ("arm64: spinlock: order spin_{is_locked,unlock_wait} against local locks") ... added an smp_mb() to arch_spin_is_locked(), in order "to ensure that the lock value is always loaded after any other locks have been taken by the current CPU", and reported one example (the "insane case" in ipc/sem.c) relying on such guarantee. It is however understood that spin_is_locked() is not required to provide such an ordering guarantee (a guarantee that is currently not provided by all the implementations/archs), and that callers relying on such ordering should instead insert suitable memory barriers before acting on the result of spin_is_locked(). Following a recent auditing [1] of the callers of {,raw_}spin_is_locked(), revealing that none of them are relying on the ordering guarantee anymore, this commit removes the leading smp_mb() from the primitive thus reverting 38b850a73034f. [1] https://marc.info/?l=linux-kernel&m=151981440005264&w=2 https://marc.info/?l=linux-kernel&m=152042843808540&w=2 https://marc.info/?l=linux-kernel&m=152043346110262&w=2 Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akiyks@gmail.com Cc: boqun.feng@gmail.com Cc: dhowells@redhat.com Cc: j.alglave@ucl.ac.uk Cc: linux-arch@vger.kernel.org Cc: luc.maranget@inria.fr Cc: npiggin@gmail.com Cc: parri.andrea@gmail.com Cc: stern@rowland.harvard.edu Link: http://lkml.kernel.org/r/1526338889-7003-2-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: kdrag0n <dragon@khronodragon.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:04 +05:30
Will Deacon	0821289d2c	locking/arch: Remove dummy arch_{read,spin,write}_lock_flags() implementations The arch_{read,spin,write}_lock_flags() macros are simply mapped to the non-flags versions by the majority of architectures, so do this in core code and remove the dummy implementations. Also remove the implementation in spinlock_up.h, since all callers of do_raw_spin_lock_flags() call local_irq_save(flags) anyway. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1507055129-12300-4-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:04 +05:30
Will Deacon	fa482cfdeb	locking/arch: Remove dummy arch_{read,spin,write}_relax() implementations arch_{read,spin,write}_relax() are defined as cpu_relax() by the core code, so architectures that can't do better (i.e. most of them) don't need to bother with the dummy definitions. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1507055129-12300-3-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> [kdrag0n: dropped changes to arch/sparc to due to conflicts] Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-13 21:03:04 +05:30
Jebaitedneko	10a274c7f6	pinctrl-msm: Fix build with SHOW_RESUME_IRQ disabled Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>	2021-06-13 21:02:55 +05:30
Jebaitedneko	9ef1185597	irqchip: Don't select QCOM_SHOW_RESUME_IRQ for ARM_GIC_V3	2021-06-13 18:49:46 +05:30
lybxlpsv	54c758c1f9	drm: edid: Make drm_match_cea_mode always return 0 Removed due to around 0.5-1% in perf top while 120fps gaming. Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>	2021-06-13 18:47:41 +05:30
Sultan Alsawaf	52116a451f	irqchip: qcom: pdc: Don't overwrite IRQ domain name on init The IRQ subsystem allocates a domain name dynamically on its own. By overwriting it, there is not only a memory leak created, but also a kfree() on a non-allocated pointer when the IRQ is freed. We can remedy all of this by simply not overwriting the IRQ domain name. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>	2021-06-13 18:44:52 +05:30
atndko	a4524b1512	icnss: Fix suspend errors caused by wrong paired PM operation Becauase of using wrong pair of PM operation device was not resuming if suspend fails. Also device gives errors in dmesg like below: [ 2420.308491] dpm_run_callback(): icnss_pm_suspend_noirq+0x0/0xc0 returns -11 [ 2420.308510] PM: Device 18800000.qcom,icnss failed to suspend async: error -11 [ 2420.312069] PM: noirq suspend of devices failed [ 2423.384002] dpm_run_callback(): icnss_pm_suspend_noirq+0x0/0xc0 returns -11 [ 2423.384020] PM: Device 18800000.qcom,icnss failed to suspend async: error -11 [ 2423.317523] PM: noirq suspend of devices failed [ 2426.444164] dpm_run_callback(): icnss_pm_suspend_noirq+0x0/0xc0 returns -11 [ 2426.444181] PM: Device 18800000.qcom,icnss failed to suspend async: error -11 [ 2426.447813] PM: noirq suspend of devices failed [ 2428.915643] dpm_run_callback(): icnss_pm_suspend_noirq+0x0/0xc0 returns -11 [ 2428.915659] PM: Device 18800000.qcom,icnss failed to suspend async: error -11 [ 2428.919208] PM: noirq suspend of devices failed [ 2429.529067] dpm_run_callback(): icnss_pm_suspend_noirq+0x0/0xc0 returns -11 [ 2429.529086] PM: Device 18800000.qcom,icnss failed to suspend async: error -11 [ 2423.532786] PM: noirq suspend of devices failed Adding changes to use correct set of PM operations and fix log spam. Signed-off-by: atndko <z1281552865@gmail.com>	2021-06-13 18:43:49 +05:30
Panchajanya1999	2f017429b1	include: blkdev: Enable Write Back Caching Write back caching is a popular technique which copies data to higher level caches, backing store or memory. As per an article in Techopedia [1], enabling Write-Back Caching improves or reduces write operations to and from a cache and it's data source. [1] - https://www.techopedia.com/definition/1138/write-back-cache Change-Id: I0140736d3f921d5e0275f6154219009f84b6cb8b Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>	2021-06-13 18:41:50 +05:30
Yaroslav Furman	fe557d2e3b	[SQUASH] power: supply: Silence massive debug logspam power/supply: qcom: Silence charging drivers Signed-off-by: Yaroslav Furman <yaro330@gmail.com> power: supply: ti: silence charging spam Signed-off-by: Yaroslav Furman <yaro330@gmail.com> qpnp-qg: silence qg_get_prop_soc_decimal spam Signed-off-by: Yaroslav Furman <yaro330@gmail.com> qpnp-smb5: silence another annoying logger Signed-off-by: Yaroslav Furman <yaro330@gmail.com> qcom: smb5-lib: stfu Signed-off-by: Yaroslav Furman <yaro330@gmail.com> smb5-lib: silence annoying loggers Signed-off-by: Yaroslav Furman <yaro330@gmail.com> power: maxim: silence drivers Leave actual errors enabled. Signed-off-by: Yaroslav Furman <yaro330@gmail.com> power_supply_sysfs: silence 'failed to report 'flash_trigger' spam Signed-off-by: Yaroslav Furman <yaro330@gmail.com> qpnp-smb5: silence some kernel spam Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>	2021-06-11 23:33:48 +05:30
Adam W. Willis	c0d3ec9799	msm: gsi: Don't warn on client callback in polling mode Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com>	2021-06-11 23:30:16 +05:30
Li RongQing	375bd3e62f	net: sched: Do not emit messages while holding spinlock move messages emitting out of sch_tree_lock to avoid holding this lock too long. Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yaroslav Furman <yaro330@gmail.com>	2021-06-11 23:29:45 +05:30
Yaroslav Furman	4774e2b2fa	msm: vidc: Silence some debugfs spam Signed-off-by: Yaroslav Furman <yaro330@gmail.com>	2021-06-11 23:29:29 +05:30
Yaroslav Furman	cc224d6daa	cpufreq: schedutil: Remove useless tracing Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>	2021-06-11 23:28:40 +05:30
Sultan Alsawaf	54caffe3bd	treewide: Suppress overly verbose log spam This tames quite a bit of the log spam and makes dmesg readable. Uses work from Danny Lin <danny@kdrag0n.dev>. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2021-06-11 23:27:44 +05:30
Adithya R	80a6f938fc	ARM64/configs: surya: Enable BBR tcp congestion control * also disable useless tcpcongs and build westwood inline	2021-06-11 23:25:43 +05:30
Neal Cardwell	fdfe643a49	BACKPORT: FROMGIT: net-tcp_bbr: v2: shrink delivered_mstamp, first_tx_mstamp to u32 to free up 8 bytes Free up some space for tracking inflight and losses for each bw sample, in upcoming commits. These timestamps are in microseconds, and are now stored in 32 bits. So they can only hold time intervals up to roughly 2^12 = 4096 seconds. But Linux TCP RTT and RTO tracking has the same 32-bit microsecond implementation approach and resulting deployment limitations. So this is not introducing a new limit. And these should not be a limitation for the foreseeable future. Effort: net-tcp_bbr Origin-9xx-SHA1: 238a7e6b5d51625fef1ce7769826a7b21b02ae55 Change-Id: I3b779603797263b52a61ad57c565eb91fe42680c (cherry-picked from `6cabc5bdb5`) Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-11 23:23:28 +05:30
Yuchung Cheng	22ad80ebfe	BACKPORT: FROMGIT: net-tcp_rate: consolidate inflight tracking approaches in TCP In order to track CE marks per rate sample (one round trip), we'll need to snap the starting tcp delivered_ce acount in the packet meta header (tcp_skb_cb). But there's not enough space. Good news is that the "last_in_flight" in the header, used by NV congestion control, is almost equivalent as "delivered". In fact "delivered" is better by accounting out-of-order packets additionally. Therefore we can remove it to make room for the CE tracking. This would make delayed ACK detection slightly less accurate but the impact is negligible since it's not used for any critical control. Effort: net-tcp_rate Origin-9xx-SHA1: ddcd46ec85d5f1c4454258af0c54b3254c0d64a7 Change-Id: I1a184aad6d101c981ac7f2f275aa9417ff856910 (cherry-picked from `3de61ba5b8`) Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-11 23:23:28 +05:30
Neal Cardwell	ece09f3d68	FROMGIT: net-tcp_bbr: broaden app-limited rate sample detection This commit is a bug fix for the Linux TCP app-limited logic used for collecting rate (bandwidth) samples. Previously the app-limited logic only looked for "bubbles" of silence in between application writes, by checking at the start of each sendmsg. But "bubbles" of silence can also happen before retransmits: e.g. bubbles can happen between an application write and a retransmit, or between two retransmits. Retransmits are triggered by ACKs or timers. So this commit checks for bubbles of app-limited silence upon ACKs or timers. Why does this commit check for app-limited state at the start of ACKs and timer handling? Because at that point we know whether inflight was fully using the cwnd. During processing the ACK or timer event we often change the cwnd; after changing the cwnd we can't know whether inflight was fully using the old cwnd. Origin-9xx-SHA1: 3fe9b53291e018407780fb8c356adb5666722cbc Change-Id: I37221506f5166877c2b110753d39bb0757985e68 (cherry-picked from `80d039a0a0`) Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-11 23:23:28 +05:30
Priyaranjan Jha	7e31b98e1c	tcp_bbr: adapt cwnd based on ack aggregation estimation Aggregation effects are extremely common with wifi, cellular, and cable modem link technologies, ACK decimation in middleboxes, and LRO and GRO in receiving hosts. The aggregation can happen in either direction, data or ACKs, but in either case the aggregation effect is visible to the sender in the ACK stream. Previously BBR's sending was often limited by cwnd under severe ACK aggregation/decimation because BBR sized the cwnd at 2BDP. If packets were acked in bursts after long delays (e.g. one ACK acking 5BDP after 5RTT), BBR's sending was halted after sending 2BDP over 2RTT, leaving the bottleneck idle for potentially long periods. Note that loss-based congestion control does not have this issue because when facing aggregation it continues increasing cwnd after bursts of ACKs, growing cwnd until the buffer is full. To achieve good throughput in the presence of aggregation effects, this algorithm allows the BBR sender to put extra data in flight to keep the bottleneck utilized during silences in the ACK stream that it has evidence to suggest were caused by aggregation. A summary of the algorithm: when a burst of packets are acked by a stretched ACK or a burst of ACKs or both, BBR first estimates the expected amount of data that should have been acked, based on its estimated bandwidth. Then the surplus ("extra_acked") is recorded in a windowed-max filter to estimate the recent level of observed ACK aggregation. Then cwnd is increased by the ACK aggregation estimate. The larger cwnd avoids BBR being cwnd-limited in the face of ACK silences that recent history suggests were caused by aggregation. As a sanity check, the ACK aggregation degree is upper-bounded by the cwnd (at the time of measurement) and a global max of BW 100ms. The algorithm is further described by the following presentation: https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00 In our internal testing, we observed a significant increase in BBR throughput (measured using netperf), in a basic wifi setup. - Host1 (sender on ethernet) -> AP -> Host2 (receiver on wifi) - 2.4 GHz -> BBR before: ~73 Mbps; BBR after: ~102 Mbps; CUBIC: ~100 Mbps - 5.0 GHz -> BBR before: ~362 Mbps; BBR after: ~593 Mbps; CUBIC: ~601 Mbps Also, this code is running globally on YouTube TCP connections and produced significant bandwidth increases for YouTube traffic. This is based on Ian Swett's max_ack_height_ algorithm from the QUIC BBR implementation. Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-11 23:23:28 +05:30
Priyaranjan Jha	a14a3821ed	tcp_bbr: refactor bbr_target_cwnd() for general inflight provisioning Because bbr_target_cwnd() is really a general-purpose BBR helper for computing some volume of inflight data as a function of the estimated BDP, refactor it into following helper functions: - bbr_bdp() - bbr_quantization_budget() - bbr_inflight() Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-11 23:23:28 +05:30
Neal Cardwell	f1d4380a0e	tcp_bbr: centralize code to set gains Centralize the code that sets gains used for computing cwnd and pacing rate. This simplifies the code and makes it easier to change the state machine or (in the future) dynamically change the gain values and ensure that the correct gain values are always used. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-11 23:23:27 +05:30
Kevin Yang	3f1a3b04e0	tcp_bbr: apply PROBE_RTT cwnd cap even if acked==0 This commit fixes a corner case where TCP BBR would enter PROBE_RTT mode but not reduce its cwnd. If a TCP receiver ACKed less than one full segment, the number of delivered/acked packets was 0, so that bbr_set_cwnd() would short-circuit and exit early, without cutting cwnd to the value we want for PROBE_RTT. The fix is to instead make sure that even when 0 full packets are ACKed, we do apply all the appropriate caps, including the cap that applies in PROBE_RTT mode. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-11 23:23:27 +05:30
Kevin Yang	c71b4d1e16	tcp_bbr: in restart from idle, see if we should exit PROBE_RTT This patch fix the case where BBR does not exit PROBE_RTT mode when it restarts from idle. When BBR restarts from idle and if BBR is in PROBE_RTT mode, BBR should check if it's time to exit PROBE_RTT. If yes, then BBR should exit PROBE_RTT mode and restore the cwnd to its full value. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-11 23:23:27 +05:30
Kevin Yang	b04c00492e	tcp_bbr: add bbr_check_probe_rtt_done() helper This patch add a helper function bbr_check_probe_rtt_done() to 1. check the condition to see if bbr should exit probe_rtt mode; 2. process the logic of exiting probe_rtt mode. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2021-06-11 23:23:27 +05:30

1 2 3 4 5 ...

793614 Commits