[ Upstream commit f558120cd709682b739207b48cf7479fd9568431 ]
iucv_sever_path() is called from process context and from bh context.
iucv->path is used as indicator whether somebody else is taking care of
severing the path (or it is already removed / never existed).
This needs to be done with atomic compare and swap, otherwise there is a
small window where iucv_sock_close() will try to work with a path that has
already been severed and freed by iucv_callback_connrej() called by
iucv_tasklet_fn().
Example:
[452744.123844] Call Trace:
[452744.123845] ([<0000001e87f03880>] 0x1e87f03880)
[452744.123966] [<00000000d593001e>] iucv_path_sever+0x96/0x138
[452744.124330] [<000003ff801ddbca>] iucv_sever_path+0xc2/0xd0 [af_iucv]
[452744.124336] [<000003ff801e01b6>] iucv_sock_close+0xa6/0x310 [af_iucv]
[452744.124341] [<000003ff801e08cc>] iucv_sock_release+0x3c/0xd0 [af_iucv]
[452744.124345] [<00000000d574794e>] __sock_release+0x5e/0xe8
[452744.124815] [<00000000d5747a0c>] sock_close+0x34/0x48
[452744.124820] [<00000000d5421642>] __fput+0xba/0x268
[452744.124826] [<00000000d51b382c>] task_work_run+0xbc/0xf0
[452744.124832] [<00000000d5145710>] do_notify_resume+0x88/0x90
[452744.124841] [<00000000d5978096>] system_call+0xe2/0x2c8
[452744.125319] Last Breaking-Event-Address:
[452744.125321] [<00000000d5930018>] iucv_path_sever+0x90/0x138
[452744.125324]
[452744.125325] Kernel panic - not syncing: Fatal exception in interrupt
Note that bh_lock_sock() is not serializing the tasklet context against
process context, because the check for sock_owned_by_user() and
corresponding handling is missing.
Ideas for a future clean-up patch:
A) Correct usage of bh_lock_sock() in tasklet context, as described in
Link: https://lore.kernel.org/netdev/1280155406.2899.407.camel@edumazet-laptop/
Re-enqueue, if needed. This may require adding return values to the
tasklet functions and thus changes to all users of iucv.
B) Change iucv tasklet into worker and use only lock_sock() in af_iucv.
Fixes: 7d316b945352 ("af_iucv: remove IUCV-pathes completely")
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://patch.msgid.link/20240729122818.947756-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 84f40b46787ecb67c7ad08a5bb1376141fa10c01)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit fa96c6baef1b5385e2f0c0677b32b3839e716076 ]
tipc_udp_addr2str() should return non-zero value if the UDP media
address is invalid. Otherwise, a buffer overflow access can occur in
tipc_media_addr_printf(). Fix this by returning 1 on an invalid UDP
media address.
Fixes: d0f91938bede ("tipc: add ip/udp media type")
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@endava.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7ec3335dd89c8d169e9650e4bac64fde71fdf15b)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit cc73bbab4b1fb8a4f53a24645871dafa5f81266a ]
The Record Route IP option records the addresses of the routers that
routed the packet. In the case of forwarded packets, the kernel performs
a route lookup via fib_lookup() and fills in the preferred source
address of the matched route.
The lookup is performed with the DS field of the forwarded packet, but
using the RT_TOS() macro which only masks one of the two ECN bits. If
the packet is ECT(0) or CE, the matched route might be different than
the route via which the packet was forwarded as the input path masks
both of the ECN bits, resulting in the wrong address being filled in the
Record Route option.
Fix by masking both of the ECN bits.
Fixes: 8e36360ae876 ("ipv4: Remove route key identity dependencies in ip_rt_get_source().")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20240718123407.434778-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 5c65e55e41e1300c4ebf4dda22a704b2beed2423)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit abb9a68d2c64dd9b128ae1f2e635e4d805e7ce64 upstream.
When the source address is selected, the scope must be checked. For
example, if a loopback address is assigned to the vrf device, it must not
be chosen for packets sent outside.
CC: stable@vger.kernel.org
Fixes: afbac6010aec ("net: ipv6: Address selection needs to consider L3 domains")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240710081521.3809742-4-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b4f67f09287392e0a2f7422199a193e37f2737af)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit 79eecf631c14e7f4057186570ac20e2cfac3802e upstream.
The issue initially stems from libpcap. The ethertype will be overwritten
as the VLAN TPID if the network interface lacks hardware VLAN offloading.
In the outbound packet path, if hardware VLAN offloading is unavailable,
the VLAN tag is inserted into the payload but then cleared from the sk_buff
struct. Consequently, this can lead to a false negative when checking for
the presence of a VLAN tag, causing the packet sniffing outcome to lack
VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing
tool may be unable to parse packets as expected.
The TCI-TPID is missing because the prb_fill_vlan_info() function does not
modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the
payload and not in the sk_buff struct. The skb_vlan_tag_present() function
only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header
is stripped, preventing the packet capturing tool from determining the
correct TCI-TPID value. Additionally, the protocol in SLL is incorrect,
which means the packet capturing tool cannot parse the L3 header correctly.
Link: https://github.com/the-tcpdump-group/libpcap/issues/1105
Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u
Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Chengen Du <chengen.du@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240713114735.62360-1-chengen.du@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3dfd84aa72fa7329ed4a257c8f40e0c9aff4dc8f)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 782161895eb4ac45cf7cfa8db375bd4766cb8299 ]
Delete expectation path is missing a call to the nf_expect_get_id()
helper function to calculate the expectation ID, otherwise LSB of the
expectation object address is leaked to userspace.
Fixes: 3c79107631db ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id")
Reported-by: zdi-disclosures@trendmicro.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 66e7650dbbb8e236e781c670b167edc81e771450)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 53796b03295cf7ab1fc8600016fa6dfbf4a494a0 ]
In the context of the SCTP SNAT/DNAT handler, these calls can only
return true.
Fixes: e10d3ba4d434 ("ipvs: Fix checksumming on GSO of SCTP packets")
Signed-off-by: Ismael Luceno <iluceno@suse.de>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9340804ea465de0509a9afaeaaccf3fb74b14f9b)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
As we don't have to care about any other security mechanism than SELinux,
inline its struct sk_security_struct into struct sock.
This way, the kernel no longer has to allocate and de-allocate
struct sk_security_struct, which happens extremely frequently under Android.
Change-Id: Ic1e584a7d31bbb225cbd3079a5002542a7683818
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEERFwmR4yFob14UDOYC8702P6YulgFAmbMN2wZHHZlZ2FyZC5u
b3NzdW1Ab3JhY2xlLmNvbQAKCRALzvTY/pi6WI/OD/0df1vNkreJiEgYmrzoT5rG
FekNwYxJLpnN2cIk5qZv5SUSrZDfqPEMzhz4tuv0kVMv8gb+k8tXpBGr6rfZlYEw
UoDAZzAN0Ns0JwxtPoIIJrjfD1a0xq8teSGhbphUDLA/fH/M66pTPtJrDjia/99E
0Ymhc+8pv77UljmaxjEoQ8a0hXoe/TxoxwJpUdklwhcXZ+u4+FwJfxxHd4webnGs
XiC1rrq4fYOX5MaomrGhKQBFv/iQAM+U9VHn4m1yLqDXYXn3j7stOa8dPY5g6EJw
kzPsure193EXj9Tp5Y+sC+aExLGfaO7MB+hlXR0El/cHboTFJshafVuIhHp93zlI
4NVExnDe3Xmrb7U4hCJd2L2z3Vgq8jSH0R1FwZqfZruOo+B1zG9/SIiM96pD5YR1
ugk2P9XAXEpDUoMv4XPCNlpMetT7HBdUWIJCZQDCTl1axiGWnqs1MpGVHfavoTLz
CKuoluR/0vL5lVX7vrc1Pb6+7APaY0U9ehVDlMbumS2PmpkLsoaTbTLlJsnVBrmg
fkN4sjja0mszL4f5zzPCZ/9G8opf63+LLiheUNdnB4yWn7VoK4+kH3v3dW3e2gzJ
t5IYygJ8Ys8pCLAueuiRb3pce59HrEq8mFtoF+MXlLN4fVHFwKbTtJWiEvEl5x3E
Ks8j03Ywad/Zcl5GiwnnEw==
=AZG+
-----END PGP SIGNATURE-----
Merge tag 'v4.14.352-openela' of https://github.com/openela/kernel-lts
This is the 4.14.352 OpenELA-Extended LTS stable release
* tag 'v4.14.352-openela' of https://github.com/openela/kernel-lts: (32 commits)
LTS: Update to 4.14.352
filelock: Fix fcntl/close race recovery compat path
jfs: don't walk off the end of ealist
ocfs2: add bounds checking to ocfs2_check_dir_entry()
net: relax socket state check at accept time.
ACPI: processor_idle: Fix invalid comparison with insertion sort for latency
ARM: 9324/1: fix get_user() broken with veneer
filelock: Remove locks reliably when fcntl/close race is detected
hfsplus: fix uninit-value in copy_name
selftests/vDSO: fix clang build errors and warnings
spi: imx: Don't expect DMA for i.MX{25,35,50,51,53} cspi devices
fs: better handle deep ancestor chains in is_subdir()
Bluetooth: hci_core: cancel all works upon hci_unregister_dev()
net: mac802154: Fix racy device stats updates by DEV_STATS_INC() and DEV_STATS_ADD()
net: usb: qmi_wwan: add Telit FN912 compositions
ALSA: dmaengine_pcm: terminate dmaengine before synchronize
s390/sclp: Fix sclp_init() cleanup on failure
Input: elantech - fix touchpad state on resume for Lenovo N24
wifi: cfg80211: wext: add extra SIOCSIWSCAN data check
mei: demote client disconnect warning on suspend to debug
...
Change-Id: I4cbdfa0321bf83d62ac62f386eb77d21c5785dec
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit 92f1655aa2b2294d0b49925f3b875a634bd3b59e upstream.
__dst_negative_advice() does not enforce proper RCU rules when
sk->dst_cache must be cleared, leading to possible UAF.
RCU rules are that we must first clear sk->sk_dst_cache,
then call dst_release(old_dst).
Note that sk_dst_reset(sk) is implementing this protocol correctly,
while __dst_negative_advice() uses the wrong order.
Given that ip6_negative_advice() has special logic
against RTF_CACHE, this means each of the three ->negative_advice()
existing methods must perform the sk_dst_reset() themselves.
Note the check against NULL dst is centralized in
__dst_negative_advice(), there is no need to duplicate
it in various callbacks.
Many thanks to Clement Lecigne for tracking this issue.
This old bug became visible after the blamed commit, using UDP sockets.
Fixes: a87cb3e48ee8 ("net: Facility to report route quality of connected sockets")
Reported-by: Clement Lecigne <clecigne@google.com>
Diagnosed-by: Clement Lecigne <clecigne@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240528114353.1794151-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[mheyne: contextual conflict in ip6_negative_advice due to missing
commit c3c14da0288d ("net/ipv6: add rcu locking to ip6_negative_advice") and
commit 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")]
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
[ Upstream commit 0d151a103775dd9645c78c97f77d6e2a5298d913 ]
syzbot is reporting that calling hci_release_dev() from hci_error_reset()
due to hci_dev_put() from hci_error_reset() can cause deadlock at
destroy_workqueue(), for hci_error_reset() is called from
hdev->req_workqueue which destroy_workqueue() needs to flush.
We need to make sure that hdev->{rx_work,cmd_work,tx_work} which are
queued into hdev->workqueue and hdev->{power_on,error_reset} which are
queued into hdev->req_workqueue are no longer running by the moment
destroy_workqueue(hdev->workqueue);
destroy_workqueue(hdev->req_workqueue);
are called from hci_release_dev().
Call cancel_work_sync() on these work items from hci_unregister_dev()
as soon as hdev->list is removed from hci_dev_list.
Reported-by: syzbot <syzbot+da0a9c9721e36db712e8@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=da0a9c9721e36db712e8
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 48542881997e17b49dc16b93fe910e0cfcf7a9f9)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
[ Upstream commit cf28ff8e4c02e1ffa850755288ac954b6ff0db8c ]
As explained in commit 1378817486d6 ("tipc: block BH
before using dst_cache"), net/core/dst_cache.c
helpers need to be called with BH disabled.
ila_output() is called from lwtunnel_output()
possibly from process context, and under rcu_read_lock().
We might be interrupted by a softirq, re-enter ila_output()
and corrupt dst_cache data structures.
Fix the race by using local_bh_disable().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7435bd2f84a25aba607030237261b3795ba782da)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
[ Upstream commit 92ecbb3ac6f3fe8ae9edf3226c76aa17b6800699 ]
When testing the previous patch with CONFIG_UBSAN_BOUNDS, I've
noticed the following:
UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:372:4
index 0 is out of range for type 'struct ieee80211_channel *[]'
CPU: 0 PID: 1435 Comm: wpa_supplicant Not tainted 6.9.0+ #1
Hardware name: LENOVO 20UN005QRT/20UN005QRT <...BIOS details...>
Call Trace:
<TASK>
dump_stack_lvl+0x2d/0x90
__ubsan_handle_out_of_bounds+0xe7/0x140
? timerqueue_add+0x98/0xb0
ieee80211_prep_hw_scan+0x2db/0x480 [mac80211]
? __kmalloc+0xe1/0x470
__ieee80211_start_scan+0x541/0x760 [mac80211]
rdev_scan+0x1f/0xe0 [cfg80211]
nl80211_trigger_scan+0x9b6/0xae0 [cfg80211]
...<the rest is not too useful...>
Since '__ieee80211_start_scan()' leaves 'hw_scan_req->req.n_channels'
uninitialized, actual boundaries of 'hw_scan_req->req.channels' can't
be checked in 'ieee80211_prep_hw_scan()'. Although an initialization
of 'hw_scan_req->req.n_channels' introduces some confusion around
allocated vs. used VLA members, this shouldn't be a problem since
everything is correctly adjusted soon in 'ieee80211_prep_hw_scan()'.
Cleanup 'kmalloc()' math in '__ieee80211_start_scan()' by using the
convenient 'struct_size()' as well.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://msgid.link/20240517153332.18271-2-dmantipov@yandex.ru
[improve (imho) indentation a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit cd3212a9e0209dff7eda30f01ab8590f5e8d92fb)
[Harshit: add #include that was pulled in through some other unknown
header on 4.19.y]
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
[ Upstream commit 6f6291f09a322c1c1578badac8072d049363f4e6 ]
With a ath9k device I can see that:
iw phy phy0 interface add mesh0 type mp
ip link set mesh0 up
iw dev mesh0 scan
Will start a scan with the Power Management bit set in the Frame Control Field.
This is because we set this bit depending on the nonpeer_pm variable of the mesh
iface sdata and when there are no active links on the interface it remains to
NL80211_MESH_POWER_UNKNOWN.
As soon as links starts to be established, it wil switch to
NL80211_MESH_POWER_ACTIVE as it is the value set by befault on the per sta
nonpeer_pm field.
As we want no power save by default, (as expressed with the per sta ini values),
lets init it to the expected default value of NL80211_MESH_POWER_ACTIVE.
Also please note that we cannot change the default value from userspace prior to
establishing a link as using NL80211_CMD_SET_MESH_CONFIG will not work before
NL80211_CMD_JOIN_MESH has been issued. So too late for our initial scan.
Signed-off-by: Nicolas Escande <nico.escande@gmail.com>
Link: https://msgid.link/20240527141759.299411-1-nico.escande@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 46487275e810d1e7c99f36af9fdfae0909c4e200)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
commit 36534d3c54537bf098224a32dc31397793d4594d upstream.
Due to timer wheel implementation, a timer will usually fire
after its schedule.
For instance, for HZ=1000, a timeout between 512ms and 4s
has a granularity of 64ms.
For this range of values, the extra delay could be up to 63ms.
For TCP, this means that tp->rcv_tstamp may be after
inet_csk(sk)->icsk_timeout whenever the timer interrupt
finally triggers, if one packet came during the extra delay.
We need to make sure tcp_rtx_probe0_timed_out() handles this case.
Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Menglong Dong <imagedong@tencent.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240607125652.1472540-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0fe6516462392ffe355a45a1ada8d264a783430f)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit 69c7b2fe4c9cc1d3b1186d1c5606627ecf0de883 upstream.
The way the delayed work is handled in ceph_monc_stop() is prone to
races with mon_fault() and possibly also finish_hunting(). Both of
these can requeue the delayed work which wouldn't be canceled by any of
the following code in case that happens after cancel_delayed_work_sync()
runs -- __close_session() doesn't mess with the delayed work in order
to avoid interfering with the hunting interval logic. This part was
missed in commit b5d91704f53e ("libceph: behave in mon_fault() if
cur_mon < 0") and use-after-free can still ensue on monc and objects
that hang off of it, with monc->auth and monc->monmap being
particularly susceptible to quickly being reused.
To fix this:
- clear monc->cur_mon and monc->hunting as part of closing the session
in ceph_monc_stop()
- bail from delayed_work() if monc->cur_mon is cleared, similar to how
it's done in mon_fault() and finish_hunting() (based on monc->hunting)
- call cancel_delayed_work_sync() after the session is closed
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/66857
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1177afeca833174ba83504688eec898c6214f4bf)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 0ec986ed7bab6801faed1440e8839dcc710331ff ]
Loss recovery undo_retrans bookkeeping had a long-standing bug where a
DSACK from a spurious TLP retransmit packet could cause an erroneous
undo of a fast recovery or RTO recovery that repaired a single
really-lost packet (in a sequence range outside that of the TLP
retransmit). Basically, because the loss recovery state machine didn't
account for the fact that it sent a TLP retransmit, the DSACK for the
TLP retransmit could erroneously be implicitly be interpreted as
corresponding to the normal fast recovery or RTO recovery retransmit
that plugged a real hole, thus resulting in an improper undo.
For example, consider the following buggy scenario where there is a
real packet loss but the congestion control response is improperly
undone because of this bug:
+ send packets P1, P2, P3, P4
+ P1 is really lost
+ send TLP retransmit of P4
+ receive SACK for original P2, P3, P4
+ enter fast recovery, fast-retransmit P1, increment undo_retrans to 1
+ receive DSACK for TLP P4, decrement undo_retrans to 0, undo (bug!)
+ receive cumulative ACK for P1-P4 (fast retransmit plugged real hole)
The fix: when we initialize undo machinery in tcp_init_undo(), if
there is a TLP retransmit in flight, then increment tp->undo_retrans
so that we make sure that we receive a DSACK corresponding to the TLP
retransmit, as well as DSACKs for all later normal retransmits, before
triggering a loss recovery undo. Note that we also have to move the
line that clears tp->tlp_high_seq for RTO recovery, so that upon RTO
we remember the tp->tlp_high_seq value until tcp_init_undo() and clear
it only afterward.
Also note that the bug dates back to the original 2013 TLP
implementation, commit 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)").
However, this patch will only compile and work correctly with kernels
that have tp->tlp_retrans, which was added only in v5.8 in 2020 in
commit 76be93fc0702 ("tcp: allow at most one TLP probe per flight").
So we associate this fix with that later commit.
Fixes: 76be93fc0702 ("tcp: allow at most one TLP probe per flight")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Kevin Yang <yyd@google.com>
Link: https://patch.msgid.link/20240703171246.1739561-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 83f5eb01c4beb9741bc1600bcd8b6e94a1774abe)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit cd9151b618da4723877bd94eae952f2e50acbc0e upstream.
In ext_adv_report_event rssi comes before data (not after data as
in legacy adv_report_evt) so "+ 1" is not required in the ptr arithmatic
to point to next report.
Signed-off-by: Jaganath Kanakkassery <jaganath.kanakkassery@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b162f19e6603571061b19dbb604a9883f0fa4ecc)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 66be40e622e177316ae81717aa30057ba9e61dff ]
I don't see anything checking that TCP_METRICS_ATTR_SADDR_IPV4
is at least 4 bytes long, and the policy doesn't have an entry
for this attribute at all (neither does it for IPv6 but v6 is
manually validated).
Reviewed-by: Eric Dumazet <edumazet@google.com>
Fixes: 3e7013ddf55a ("tcp: metrics: Allow selective get/del of tcp-metrics based on src IP")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 19d997b59fa1fd7a02e770ee0881c0652b9c32c9)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit a6458ab7fd4f427d4f6f54380453ad255b7fde83 ]
In some production workloads we noticed that connections could
sometimes close extremely prematurely with ETIMEDOUT after
transmitting only 1 TLP and RTO retransmission (when we would normally
expect roughly tcp_retries2 = TCP_RETR2 = 15 RTOs before a connection
closes with ETIMEDOUT).
From tracing we determined that these workloads can suffer from a
scenario where in fast recovery, after some retransmits, a DSACK undo
can happen at a point where the scoreboard is totally clear (we have
retrans_out == sacked_out == lost_out == 0). In such cases, calling
tcp_try_keep_open() means that we do not execute any code path that
clears tp->retrans_stamp to 0. That means that tp->retrans_stamp can
remain erroneously set to the start time of the undone fast recovery,
even after the fast recovery is undone. If minutes or hours elapse,
and then a TLP/RTO/RTO sequence occurs, then the start_ts value in
retransmits_timed_out() (which is from tp->retrans_stamp) will be
erroneously ancient (left over from the fast recovery undone via
DSACKs). Thus this ancient tp->retrans_stamp value can cause the
connection to die very prematurely with ETIMEDOUT via
tcp_write_err().
The fix: we change DSACK undo in fast recovery (TCP_CA_Recovery) to
call tcp_try_to_open() instead of tcp_try_keep_open(). This ensures
that if no retransmits are in flight at the time of DSACK undo in fast
recovery then we properly zero retrans_stamp. Note that calling
tcp_try_to_open() is more consistent with other loss recovery
behavior, since normal fast recovery (CA_Recovery) and RTO recovery
(CA_Loss) both normally end when tp->snd_una meets or exceeds
tp->high_seq and then in tcp_fastretrans_alert() the "default" switch
case executes tcp_try_to_open(). Also note that by inspection this
change to call tcp_try_to_open() implies at least one other nice bug
fix, where now an ECE-marked DSACK that causes an undo will properly
invoke tcp_enter_cwr() rather than ignoring the ECE mark.
Fixes: c7d9d6a185a7 ("tcp: undo on DSACK during recovery")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8b5fd51b3040ce2596d22a72767c66d7435853b6)
[Harshit: Minor conflict resolution due to missing commit: 737ff314563c
("tcp: use sequence distance to detect reordering") and more commits]
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
[ Upstream commit e5c5f3596de224422561d48eba6ece5210d967b3 ]
This is an effort to get rid of all multiplications from allocation
functions in order to prevent integer overflows [1][2].
As the "ids" variable is a pointer to "struct sctp_assoc_ids" and this
structure ends in a flexible array:
struct sctp_assoc_ids {
[...]
sctp_assoc_t gaids_assoc_id[];
};
the preferred way in the kernel is to use the struct_size() helper to
do the arithmetic instead of the calculation "size + size * count" in
the kmalloc() function.
Also, refactor the code adding the "ids_size" variable to avoid sizing
twice.
This way, the code is more readable and safer.
This code was detected with the help of Coccinelle, and audited and
modified manually.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1]
Link: https://github.com/KSPP/linux/issues/160 [2]
Signed-off-by: Erick Archer <erick.archer@outlook.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/PAXPR02MB724871DB78375AB06B5171C88B152@PAXPR02MB7248.eurprd02.prod.outlook.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 03f37e56305156bd25c5c237d1cc7f5c75495ef2)
[Vegard: add #include that was pulled in through some other unknown
header on 4.19.y]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
* 'linux-4.14.y' of https://github.com/openela/kernel-lts: (133 commits)
LTS: Update to 4.14.350
SUNRPC: Fix RPC client cleaned up the freed pipefs dentries
arm64: dts: rockchip: Add sound-dai-cells for RK3368
tcp: Fix data races around icsk->icsk_af_ops.
ipv6: Fix data races around sk->sk_prot.
ipv6: annotate some data-races around sk->sk_prot
pwm: stm32: Refuse too small period requests
ftruncate: pass a signed offset
batman-adv: Don't accept TT entries for out-of-spec VIDs
batman-adv: include gfp.h for GFP_* defines
drm/nouveau/dispnv04: fix null pointer dereference in nv17_tv_get_hd_modes
drm/nouveau/dispnv04: fix null pointer dereference in nv17_tv_get_ld_modes
hexagon: fix fadvise64_64 calling conventions
tty: mcf: MCF54418 has 10 UARTS
usb: atm: cxacru: fix endpoint checking in cxacru_bind()
usb: musb: da8xx: fix a resource leak in probe()
usb: gadget: printer: SS+ support
net: usb: ax88179_178a: improve link status logs
iio: adc: ad7266: Fix variable checking bug
mmc: sdhci-pci: Convert PCIBIOS_* return codes to errnos
...
Change-Id: I0ab862e0932a48f13c64657e5456f3034b766445
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit f49cd2f4d6170d27a2c61f1fecb03d8a70c91f57 upstream.
setsockopt(IPV6_ADDRFORM) and tcp_v6_connect() change icsk->icsk_af_ops
under lock_sock(), but tcp_(get|set)sockopt() read it locklessly. To
avoid load/store tearing, we need to add READ_ONCE() and WRITE_ONCE()
for the reads and writes.
Thanks to Eric Dumazet for providing the syzbot report:
BUG: KCSAN: data-race in tcp_setsockopt / tcp_v6_connect
write to 0xffff88813c624518 of 8 bytes by task 23936 on cpu 0:
tcp_v6_connect+0x5b3/0xce0 net/ipv6/tcp_ipv6.c:240
__inet_stream_connect+0x159/0x6d0 net/ipv4/af_inet.c:660
inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:724
__sys_connect_file net/socket.c:1976 [inline]
__sys_connect+0x197/0x1b0 net/socket.c:1993
__do_sys_connect net/socket.c:2003 [inline]
__se_sys_connect net/socket.c:2000 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:2000
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88813c624518 of 8 bytes by task 23937 on cpu 1:
tcp_setsockopt+0x147/0x1c80 net/ipv4/tcp.c:3789
sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3585
__sys_setsockopt+0x212/0x2b0 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0xffffffff8539af68 -> 0xffffffff8539aff8
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23937 Comm: syz-executor.5 Not tainted
6.0.0-rc4-syzkaller-00331-g4ed9c1e971b1-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 08/26/2022
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5bb642cc3355ffd3c8bca0a8bd8e6e65bcc2091c)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit 364f997b5cfe1db0d63a390fe7c801fa2b3115f6 upstream.
Commit 086d49058cd8 ("ipv6: annotate some data-races around sk->sk_prot")
fixed some data-races around sk->sk_prot but it was not enough.
Some functions in inet6_(stream|dgram)_ops still access sk->sk_prot
without lock_sock() or rtnl_lock(), so they need READ_ONCE() to avoid
load tearing.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fda6d62642a9c544a293d7ad7cb058f8c7f8f3dd)
[Vegard: fix conflict due to missing commit
d74bad4e74ee373787a9ae24197c17b7cdc428d5 ("bpf: Hooks for
sys_connect").]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit 086d49058cd8471046ae9927524708820f5fd1c7 upstream.
Changes from the original is that the applied code to inet6_sendmsg
and inet6_recvmsg is ported to inet_sendmsg and inet_recvmsg because
the same functions are shared between ipv4 and v6 in 4.19 kernel.
The original commit message is as below.
IPv6 has this hack changing sk->sk_prot when an IPv6 socket
is 'converted' to an IPv4 one with IPV6_ADDRFORM option.
This operation is only performed for TCP and UDP, knowing
their 'struct proto' for the two network families are populated
in the same way, and can not disappear while a reader
might use and dereference sk->sk_prot.
If we think about it all reads of sk->sk_prot while
either socket lock or RTNL is not acquired should be using READ_ONCE().
Also note that other layers like MPTCP, XFRM, CHELSIO_TLS also
write over sk->sk_prot.
BUG: KCSAN: data-race in inet6_recvmsg / ipv6_setsockopt
write to 0xffff8881386f7aa8 of 8 bytes by task 26932 on cpu 0:
do_ipv6_setsockopt net/ipv6/ipv6_sockglue.c:492 [inline]
ipv6_setsockopt+0x3758/0x3910 net/ipv6/ipv6_sockglue.c:1019
udpv6_setsockopt+0x85/0x90 net/ipv6/udp.c:1649
sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3489
__sys_setsockopt+0x209/0x2a0 net/socket.c:2180
__do_sys_setsockopt net/socket.c:2191 [inline]
__se_sys_setsockopt net/socket.c:2188 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff8881386f7aa8 of 8 bytes by task 26911 on cpu 1:
inet6_recvmsg+0x7a/0x210 net/ipv6/af_inet6.c:659
____sys_recvmsg+0x16c/0x320
___sys_recvmsg net/socket.c:2674 [inline]
do_recvmmsg+0x3f5/0xae0 net/socket.c:2768
__sys_recvmmsg net/socket.c:2847 [inline]
__do_sys_recvmmsg net/socket.c:2870 [inline]
__se_sys_recvmmsg net/socket.c:2863 [inline]
__x64_sys_recvmmsg+0xde/0x160 net/socket.c:2863
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xffffffff85e0e980 -> 0xffffffff85e01580
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 26911 Comm: syz-executor.3 Not tainted 5.17.0-rc2-syzkaller-00316-g0457e5153e0e-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 13bda7ac5801f501bed6e21717dbf3b0df773847)
[Vegard: fix conflicts due to missing commit
3679d585bbc07a1ac4448d5b478b492cad3587ce ("net: Introduce __inet_bind()
and __inet6_bind").]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit 537a350d14321c8cca5efbf0a33a404fec3a9f9e upstream.
The internal handling of VLAN IDs in batman-adv is only specified for
following encodings:
* VLAN is used
- bit 15 is 1
- bit 11 - bit 0 is the VLAN ID (0-4095)
- remaining bits are 0
* No VLAN is used
- bit 15 is 0
- remaining bits are 0
batman-adv was only preparing new translation table entries (based on its
soft interface information) using this encoding format. But the receive
path was never checking if entries in the roam or TT TVLVs were also
following this encoding.
It was therefore possible to create more than the expected maximum of 4096
+ 1 entries in the originator VLAN list. Simply by setting the "remaining
bits" to "random" values in corresponding TVLV.
Cc: stable@vger.kernel.org
Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific")
Reported-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4d5a2d6b7a9a1140342c5229d1a427ec37a12fd4)
[Vegard: fix trivial conflicts due to missing commit 7e9a8c2ce7c5
("batman-adv: Use parentheses in function kernel-doc").]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
The linux/gfp.h provides the GFP_ATOMIC and GFP_KERNEL define. It should
therefore be included instead of linux/fs.h.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
(cherry picked from commit b92b94ac732f5c83c60be2825d8b5cec4dc469d3)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit be4e1304419c99a164b4c0e101c7c2a756b635b9 ]
For CONFIG_CPUMASK_OFFSTACK=y kernel, explicit allocation of cpumask
variable on stack is not recommended since it can cause potential stack
overflow.
Instead, kernel code should always use *cpumask_var API(s) to allocate
cpumask var in config-neutral way, leaving allocation strategy to
CONFIG_CPUMASK_OFFSTACK.
Use *cpumask_var API(s) to address it.
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20240331053441.1276826-2-dawei.li@shingroup.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 2b085521be5292016097b5e7ca81b26be3f7098d)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 7931d32955e09d0a11b1fe0b6aac1bfa061c005c ]
register store validation for NFT_DATA_VALUE is conditional, however,
the datatype is always either NFT_DATA_VALUE or NFT_DATA_VERDICT. This
only requires a new helper function to infer the register type from the
set datatype so this conditional check can be removed. Otherwise,
pointer to chain object can be leaked through the registers.
Fixes: 96518518cc41 ("netfilter: add nftables")
Reported-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 40188a25a9847dbeb7ec67517174a835a677752f)
[Vegard: fixed unrelated conflict due to missing commit
9c22bd1ab442c552e9481f1157589362887a7f47 ("netfilter: nf_tables: defer
gc run if previous batch is still pending").]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 0b9130247f3b6a1122478471ff0e014ea96bb735 ]
syzbot reported a memory leak in nr_create() [0].
Commit 409db27e3a2e ("netrom: Fix use-after-free of a listening socket.")
added sock_hold() to the nr_heartbeat_expiry() function, where
a) a socket has a SOCK_DESTROY flag or
b) a listening socket has a SOCK_DEAD flag.
But in the case "a," when the SOCK_DESTROY flag is set, the file descriptor
has already been closed and the nr_release() function has been called.
So it makes no sense to hold the reference count because no one will
call another nr_destroy_socket() and put it as in the case "b."
nr_connect
nr_establish_data_link
nr_start_heartbeat
nr_release
switch (nr->state)
case NR_STATE_3
nr->state = NR_STATE_2
sock_set_flag(sk, SOCK_DESTROY);
nr_rx_frame
nr_process_rx_frame
switch (nr->state)
case NR_STATE_2
nr_state2_machine()
nr_disconnect()
nr_sk(sk)->state = NR_STATE_0
sock_set_flag(sk, SOCK_DEAD)
nr_heartbeat_expiry
switch (nr->state)
case NR_STATE_0
if (sock_flag(sk, SOCK_DESTROY) ||
(sk->sk_state == TCP_LISTEN
&& sock_flag(sk, SOCK_DEAD)))
sock_hold() // ( !!! )
nr_destroy_socket()
To fix the memory leak, let's call sock_hold() only for a listening socket.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with Syzkaller.
[0]: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16
Reported-by: syzbot+d327a1f3b12e1e206c16@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16
Fixes: 409db27e3a2e ("netrom: Fix use-after-free of a listening socket.")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d616876256b38ecf9a1a1c7d674192c5346bc69c)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 9f36169912331fa035d7b73a91252d7c2512eb1a ]
As evident from the definition of ip_options_get(), the IP option
IPOPT_END is used to pad the IP option data array, not IPOPT_NOP. Yet
the loop that walks the IP options to determine the total IP options
length in cipso_v4_delopt() doesn't take IPOPT_END into account.
Fix it by recognizing the IPOPT_END value as the end of actual options.
Fixes: 014ab19a69c3 ("selinux: Set socket NetLabel based on connection endpoint")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 5d3b9efa04c0d8967e00cbc4b6b5aab79952f5d1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 14a20e5b4ad998793c5f43b0330d9e1388446cf3 ]
The net.ipv6.route.flush system parameter takes a value which specifies
a delay used during the flush operation for aging exception routes. The
written value is however not used in the currently requested flush and
instead utilized only in the next one.
A problem is that ipv6_sysctl_rtcache_flush() first reads the old value
of net->ipv6.sysctl.flush_delay into a local delay variable and then
calls proc_dointvec() which actually updates the sysctl based on the
provided input.
Fix the problem by switching the order of the two operations.
Fixes: 4990509f19e8 ("[NETNS][IPV6]: Make sysctls route per namespace.")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607112828.30285-1-petr.pavlu@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ebde6e8a52c68dc45b4ae354e279ba74788579e7)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit f0fb9b288d0a7e9cc324ae362e2dfd2cc2217ded ]
While flushing the cache via ipv6_sysctl_rtcache_flush(), the call
to proc_dointvec() may fail. The fix adds a check that returns the
error, on failure.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 14a20e5b4ad9 ("net/ipv6: Fix the RT cache flush via sysctl using a previous delay")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f00c343be6f25a6a3736fd6b332c918da57d81d8)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 806a5198c05987b748b50f3d0c0cfb3d417381a4 ]
This removes the bogus check for max > hcon->le_conn_max_interval since
the later is just the initial maximum conn interval not the maximum the
stack could support which is really 3200=4000ms.
In order to pass GAP/CONN/CPUP/BV-05-C one shall probably enter values
of the following fields in IXIT that would cause hci_check_conn_params
to fail:
TSPX_conn_update_int_min
TSPX_conn_update_int_max
TSPX_conn_update_peripheral_latency
TSPX_conn_update_supervision_timeout
Link: https://github.com/bluez/bluez/issues/847
Fixes: e4b019515f95 ("Bluetooth: Enforce validation on max value of connection interval")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a1f9c8328219b9bb2c29846d6c74ea981e60b5a7)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit d37fe4255abe8e7b419b90c5847e8ec2b8debb08 ]
tcp_v6_syn_recv_sock() calls ip6_dst_store() before
inet_sk(newsk)->pinet6 has been set up.
This means ip6_dst_store() writes over the parent (listener)
np->dst_cookie.
This is racy because multiple threads could share the same
parent and their final np->dst_cookie could be wrong.
Move ip6_dst_store() call after inet_sk(newsk)->pinet6
has been changed and after the copy of parent ipv6_pinfo.
Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 06f208ae0670ef97b0bf253a5523863e113026b6)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit efaf24e30ec39ebbea9112227485805a48b0ceb1 ]
While dumping sockets via UNIX_DIAG, we do not hold unix_state_lock().
Let's use READ_ONCE() to read sk->sk_shutdown.
Fixes: e4e541a84863 ("sock-diag: Report shutdown for inet and unix sockets (v2)")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9c2d206ddc697595ceb0e8e356c64f9e845a0c2f)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 5d915e584d8408211d4567c22685aae8820bfc55 ]
We can dump the socket queue length via UNIX_DIAG by specifying
UDIAG_SHOW_RQLEN.
If sk->sk_state is TCP_LISTEN, we return the recv queue length,
but here we do not hold recvq lock.
Let's use skb_queue_len_lockless() in sk_diag_show_rqlen().
Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 141826272a2cd16436f9b6c241cf88b72feddb3b)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 45d872f0e65593176d880ec148f41ad7c02e40a7 ]
Once sk->sk_state is changed to TCP_LISTEN, it never changes.
unix_accept() takes advantage of this characteristics; it does not
hold the listener's unix_state_lock() and only acquires recvq lock
to pop one skb.
It means unix_state_lock() does not prevent the queue length from
changing in unix_stream_connect().
Thus, we need to use unix_recvq_full_lockless() to avoid data-race.
Now we remove unix_recvq_full() as no one uses it.
Note that we can remove READ_ONCE() for sk->sk_max_ack_backlog in
unix_recvq_full_lockless() because of the following reasons:
(1) For SOCK_DGRAM, it is a written-once field in unix_create1()
(2) For SOCK_STREAM and SOCK_SEQPACKET, it is changed under the
listener's unix_state_lock() in unix_listen(), and we hold
the lock in unix_stream_connect()
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b56ff0d20a4a60b9e6424d0609acccbbde573ea8)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit bd9f2d05731f6a112d0c7391a0d537bfc588dbe6 ]
net->unx.sysctl_max_dgram_qlen is exposed as a sysctl knob and can be
changed concurrently.
Let's use READ_ONCE() in unix_create1().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f5cf5e139ec736f1ac18cda8d5e5fc80d7eef787)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 0aa3be7b3e1f8f997312cc4705f8165e02806f8f ]
While dumping AF_UNIX sockets via UNIX_DIAG, sk->sk_state is read
locklessly.
Let's use READ_ONCE() there.
Note that the result could be inconsistent if the socket is dumped
during the state change. This is common for other SOCK_DIAG and
similar interfaces.
Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report")
Fixes: 2aac7a2cb0d9 ("unix_diag: Pending connections IDs NLA")
Fixes: 45a96b9be6ec ("unix_diag: Dumping all sockets core")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 123d798351c96a15456478558fccac0eb766f26d)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>