349 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
e0d945721c This is the 4.14.299 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmNtDxsACgkQONu9yGCS
 aT47khAAoQQFXKBBT/NFUUyUwPR6mrB457eOofeBrdDgTQ2DwJklipJo2j9kLJTM
 M1O5uQfQsfKIPtmwhC+bJaVqhvMS2N4bapGmtEcNlb0ughr8rYzewtc3DK4A5BZn
 Lg2PlwlSdwkq+5YJDbBPt5886Cq2/ufuqd2nc27G6DewKXS79tJHPkO/EXaDznjh
 fINXBH4aodigYjFacnGugR3OkCssT7+pzdZjLz/D8cUJMJrBrxCB9bekX4aP6egg
 dLrY4zRhz4cHNN9yx2JnRPXth8xoZf42XnChAYw8OxJqReYB+2bfjE3zKlaELLVX
 6wiqdMuNaw8evvI2nR5WV1goFBaeyxJb30qLpbYAK6MhUZ+16VLvBm/cVsfod4RV
 l2Sb+Zhif0z03SSiFLBWZekL9NlDV7igq+JmZ6WkDKUi5Rf0nq/CFz7EIsw61oIA
 0fc525Cx8+EPC56ovYeWPHFXGKSc5k7m0fKa9nzE6fpKyMvoEy90SsjDdLim2Yg4
 hKT5lvE8bGC+4NaKa6BnqJoR85kU8YTP25+IGdmYnpqfXuWY1G4FGNJ8rqgQdclx
 3isN5vLFNseG1l0/PzuBN5ozGX0qizfy2ljbuERofz2M71jNOsgXzc926H1YKJfT
 nI4q5SzKSLSHywR0aVGKiUGLE7tBAQscK/4lRn69tIH2lpG8Yy8=
 =4f2m
 -----END PGP SIGNATURE-----

Merge 4.14.299 into android-4.14-stable

Changes in 4.14.299
	NFSv4.1: Handle RECLAIM_COMPLETE trunking errors
	NFSv4.1: We must always send RECLAIM_COMPLETE after a reboot
	nfs4: Fix kmemleak when allocate slot failed
	net: dsa: Fix possible memory leaks in dsa_loop_init()
	nfc: s3fwrn5: Fix potential memory leak in s3fwrn5_nci_send()
	nfc: nfcmrvl: Fix potential memory leak in nfcmrvl_i2c_nci_send()
	net: fec: fix improper use of NETDEV_TX_BUSY
	ata: pata_legacy: fix pdc20230_set_piomode()
	net: sched: Fix use after free in red_enqueue()
	ipvs: use explicitly signed chars
	rose: Fix NULL pointer dereference in rose_send_frame()
	mISDN: fix possible memory leak in mISDN_register_device()
	isdn: mISDN: netjet: fix wrong check of device registration
	btrfs: fix inode list leak during backref walking at resolve_indirect_refs()
	btrfs: fix ulist leaks in error paths of qgroup self tests
	Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu
	Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del()
	net: mdio: fix undefined behavior in bit shift for __mdiobus_register
	net, neigh: Fix null-ptr-deref in neigh_table_clear()
	media: s5p_cec: limit msg.len to CEC_MAX_MSG_SIZE
	media: dvb-frontends/drxk: initialize err to 0
	i2c: xiic: Add platform module alias
	Bluetooth: L2CAP: Fix attempting to access uninitialized memory
	block, bfq: protect 'bfqd->queued' by 'bfqd->lock'
	btrfs: fix type of parameter generation in btrfs_get_dentry
	tcp/udp: Make early_demux back namespacified.
	capabilities: fix potential memleak on error path from vfs_getxattr_alloc()
	ALSA: usb-audio: Add quirks for MacroSilicon MS2100/MS2106 devices
	efi: random: reduce seed size to 32 bytes
	parisc: Make 8250_gsc driver dependend on CONFIG_PARISC
	parisc: Export iosapic_serial_irq() symbol for serial port driver
	ext4: fix warning in 'ext4_da_release_space'
	KVM: x86: Mask off reserved bits in CPUID.80000008H
	KVM: x86: emulator: em_sysexit should update ctxt->mode
	KVM: x86: emulator: introduce emulator_recalc_and_set_mode
	KVM: x86: emulator: update the emulation mode after CR0 write
	linux/const.h: prefix include guard of uapi/linux/const.h with _UAPI
	linux/const.h: move UL() macro to include/linux/const.h
	linux/bits.h: make BIT(), GENMASK(), and friends available in assembly
	wifi: brcmfmac: Fix potential buffer overflow in brcmf_fweh_event_worker()
	Linux 4.14.299

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id5fc7f49fbfe0ce07862d5484f49035b8c664206
2022-11-10 20:08:43 +01:00
Kuniyuki Iwashima
8783066210 tcp/udp: Make early_demux back namespacified.
commit 11052589cf5c0bab3b4884d423d5f60c38fcf25d upstream.

Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made
it possible to enable/disable early_demux on a per-netns basis.  Then, we
introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for
TCP/UDP in commit dddb64bcb346 ("net: Add sysctl to toggle early demux for
tcp and udp").  However, the .proc_handler() was wrong and actually
disabled us from changing the behaviour in each netns.

We can execute early_demux if net.ipv4.ip_early_demux is on and each proto
.early_demux() handler is not NULL.  When we toggle (tcp|udp)_early_demux,
the change itself is saved in each netns variable, but the .early_demux()
handler is a global variable, so the handler is switched based on the
init_net's sysctl variable.  Thus, netns (tcp|udp)_early_demux knobs have
nothing to do with the logic.  Whether we CAN execute proto .early_demux()
is always decided by init_net's sysctl knob, and whether we DO it or not is
by each netns ip_early_demux knob.

This patch namespacifies (tcp|udp)_early_demux again.  For now, the users
of the .early_demux() handler are TCP and UDP only, and they are called
directly to avoid retpoline.  So, we can remove the .early_demux() handler
from inet6?_protos and need not dereference them in ip6?_rcv_finish_core().
If another proto needs .early_demux(), we can restore it at that time.

Fixes: dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10 15:47:22 +01:00
Greg Kroah-Hartman
c8ea89af5f This is the 4.14.296 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmNZF4YACgkQONu9yGCS
 aT64kw//UQC8bsE7DzcZIXoVwOVuKJ30nK33xw/cIzlowoTskkiBaGRaWe67X7ID
 zy/a3ZGoLlfcd82BYRzfcwhPDfoA3S1GkngDhz2k6X1VYwTMng+LSBCHScVY0Bc7
 InBLl6TYr/yegbZPHnMfPnxlhbUQfcYIQqJfy3kaEq93rl74B3Rz7qXZBTd8JhXu
 x7v3GAGPxFk7mGEDQc+ZKeKslLNiR3/gLcS9gEopkiVW597+v1o4WDfsyBbanhyb
 OqQITB6RD195l0heBAFvFT0o2CdxBhumfCzlMd12ylo8GAmpopAU8FcfDGwzDPhu
 gPG5jTuxp/1Hv9nxuuDv0rDBgFXw/bldQ5mkxOlwVUsxuXfhk8CdFRf13aiUHny/
 CfmofIWcyJczK5O6iZ/cTHfa+LXgoIBKCyPR9RXzaBG/+VC+W5Fjn3fVtfVZMxz6
 BJuockT44JD7fji3C/M1tdFWlj8o4Ji1+E8l38uE4BxHizOE3Hp3xb4sUo3uC4E2
 MX9952cO7j4EI07jRHY/i88kxigHljJwJZcmWIsrMTKXo7ZUliKeK36BNMWwPTjl
 l2uJjNRnmMIEv84tgx71dyy99B+Cz0XaYXOZ4rd7Er/k9Z8EQGykEx9EMPLa8kpf
 CIHs69/HZxjtgKy0yJvpeayhYbfT9vgivvP2d/HhTEwHVTFAHLw=
 =LsXL
 -----END PGP SIGNATURE-----

Merge 4.14.296 into android-4.14-stable

Changes in 4.14.296
	uas: add no-uas quirk for Hiksemi usb_disk
	usb-storage: Add Hiksemi USB3-FW to IGNORE_UAS
	uas: ignore UAS for Thinkplus chips
	net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
	ntfs: fix BUG_ON in ntfs_lookup_inode_by_name()
	mmc: moxart: fix 4-bit bus width and remove 8-bit bus width
	mm/page_alloc: fix race condition between build_all_zonelists and page allocation
	mm: prevent page_frag_alloc() from corrupting the memory
	mm/migrate_device.c: flush TLB while holding PTL
	soc: sunxi: sram: Actually claim SRAM regions
	soc: sunxi: sram: Fix debugfs info for A64 SRAM C
	Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time"
	Input: melfas_mip4 - fix return value check in mip4_probe()
	usbnet: Fix memory leak in usbnet_disconnect()
	nvme: add new line after variable declatation
	nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices
	selftests: Fix the if conditions of in test_extra_filter()
	clk: iproc: Minor tidy up of iproc pll data structures
	clk: iproc: Do not rely on node name for correct PLL setup
	Makefile.extrawarn: Move -Wcast-function-type-strict to W=1
	i2c: dev: prevent ZERO_SIZE_PTR deref in i2cdev_ioctl_rdwr()
	ARM: fix function graph tracer and unwinder dependencies
	fs: fix UAF/GPF bug in nilfs_mdt_destroy
	dmaengine: xilinx_dma: cleanup for fetching xlnx,num-fstores property
	dmaengine: xilinx_dma: Report error in case of dma_set_mask_and_coherent API failure
	ARM: dts: fix Moxa SDIO 'compatible', remove 'sdhci' misnomer
	net/ieee802154: fix uninit value bug in dgram_sendmsg
	um: Cleanup syscall_handler_t cast in syscalls_32.h
	um: Cleanup compiler warning in arch/x86/um/tls_32.c
	usb: mon: make mmapped memory read only
	USB: serial: ftdi_sio: fix 300 bps rate for SIO
	mmc: core: Replace with already defined values for readability
	mmc: core: Terminate infinite loop in SD-UHS voltage switch
	rpmsg: qcom: glink: replace strncpy() with strscpy_pad()
	netfilter: nf_queue: fix socket leak
	nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
	nilfs2: fix leak of nilfs_root in case of writer thread creation failure
	nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
	ceph: don't truncate file in atomic_open
	random: clamp credited irq bits to maximum mixed
	ALSA: hda: Fix position reporting on Poulsbo
	scsi: stex: Properly zero out the passthrough command structure
	USB: serial: qcserial: add new usb-id for Dell branded EM7455
	random: restore O_NONBLOCK support
	random: avoid reading two cache lines on irq randomness
	wifi: mac80211_hwsim: avoid mac80211 warning on bad rate
	Input: xpad - add supported devices as contributed on github
	Input: xpad - fix wireless 360 controller breaking after suspend
	random: use expired timer rather than wq for mixing fast pool
	ALSA: oss: Fix potential deadlock at unregistration
	ALSA: rawmidi: Drop register_mutex in snd_rawmidi_free()
	ALSA: usb-audio: Fix potential memory leaks
	ALSA: usb-audio: Fix NULL dererence at error path
	iio: dac: ad5593r: Fix i2c read protocol requirements
	fs: dlm: fix race between test_bit() and queue_work()
	fs: dlm: handle -EBUSY first in lock arg validation
	HID: multitouch: Add memory barriers
	quota: Check next/prev free block number after reading from quota file
	regulator: qcom_rpm: Fix circular deferral regression
	Revert "fs: check FMODE_LSEEK to control internal pipe splicing"
	parisc: fbdev/stifb: Align graphics memory size to 4MB
	UM: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
	PCI: Sanitise firmware BAR assignments behind a PCI-PCI bridge
	fbdev: smscufx: Fix use-after-free in ufx_ops_open()
	nilfs2: fix use-after-free bug of struct nilfs_root
	nilfs2: fix lockdep warnings in page operations for btree nodes
	nilfs2: fix lockdep warnings during disk space reclamation
	ext4: avoid crash when inline data creation follows DIO write
	ext4: fix null-ptr-deref in ext4_write_info
	ext4: make ext4_lazyinit_thread freezable
	ext4: place buffer head allocation before handle start
	livepatch: fix race between fork and KLP transition
	ftrace: Properly unset FTRACE_HASH_FL_MOD
	ring-buffer: Allow splice to read previous partially read pages
	ring-buffer: Check pending waiters when doing wake ups as well
	ring-buffer: Fix race between reset page and reading page
	KVM: x86/emulator: Fix handing of POP SS to correctly set interruptibility
	KVM: nVMX: Unconditionally purge queued/injected events on nested "exit"
	gcov: support GCC 12.1 and newer compilers
	selinux: use "grep -E" instead of "egrep"
	sh: machvec: Use char[] for section boundaries
	wifi: ath10k: add peer map clean up for peer delete in ath10k_sta_state()
	wifi: mac80211: allow bw change during channel switch in mesh
	wifi: rtl8xxxu: tighten bounds checking in rtl8xxxu_read_efuse()
	spi: qup: add missing clk_disable_unprepare on error in spi_qup_resume()
	spi: qup: add missing clk_disable_unprepare on error in spi_qup_pm_resume_runtime()
	wifi: rtl8xxxu: Fix skb misuse in TX queue selection
	wifi: rtl8xxxu: gen2: Fix mistake in path B IQ calibration
	net: fs_enet: Fix wrong check in do_pd_setup
	spi/omap100k:Fix PM disable depth imbalance in omap1_spi100k_probe
	netfilter: nft_fib: Fix for rpath check with VRF devices
	spi: s3c64xx: Fix large transfers with DMA
	vhost/vsock: Use kvmalloc/kvfree for larger packets.
	mISDN: fix use-after-free bugs in l1oip timer handlers
	tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited
	net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks()
	bnx2x: fix potential memory leak in bnx2x_tpa_stop()
	drm/mipi-dsi: Detach devices when removing the host
	platform/x86: msi-laptop: Fix old-ec check for backlight registering
	platform/x86: msi-laptop: Fix resource cleanup
	drm/bridge: megachips: Fix a null pointer dereference bug
	mmc: au1xmmc: Fix an error handling path in au1xmmc_probe()
	ASoC: eureka-tlv320: Hold reference returned from of_find_xxx API
	ALSA: dmaengine: increment buffer pointer atomically
	mmc: wmt-sdmmc: Fix an error handling path in wmt_mci_probe()
	memory: of: Fix refcount leak bug in of_get_ddr_timings()
	soc: qcom: smsm: Fix refcount leak bugs in qcom_smsm_probe()
	soc: qcom: smem_state: Add refcounting for the 'state->of_node'
	ARM: dts: turris-omnia: Fix mpp26 pin name and comment
	ARM: dts: kirkwood: lsxl: fix serial line
	ARM: dts: kirkwood: lsxl: remove first ethernet port
	ARM: Drop CMDLINE_* dependency on ATAGS
	ARM: dts: exynos: fix polarity of VBUS GPIO of Origen
	iio: adc: at91-sama5d2_adc: fix AT91_SAMA5D2_MR_TRACKTIM_MAX
	iio: inkern: only release the device node when done with it
	iio: ABI: Fix wrong format of differential capacitance channel ABI.
	clk: oxnas: Hold reference returned by of_get_parent()
	clk: tegra: Fix refcount leak in tegra210_clock_init
	clk: tegra: Fix refcount leak in tegra114_clock_init
	clk: tegra20: Fix refcount leak in tegra20_clock_init
	HSI: omap_ssi: Fix refcount leak in ssi_probe
	HSI: omap_ssi_port: Fix dma_map_sg error check
	media: exynos4-is: fimc-is: Add of_node_put() when breaking out of loop
	tty: xilinx_uartps: Fix the ignore_status
	media: xilinx: vipp: Fix refcount leak in xvip_graph_dma_init
	RDMA/rxe: Fix "kernel NULL pointer dereference" error
	RDMA/rxe: Fix the error caused by qp->sk
	dyndbg: fix module.dyndbg handling
	dyndbg: let query-modname override actual module name
	ata: fix ata_id_sense_reporting_enabled() and ata_id_has_sense_reporting()
	ata: fix ata_id_has_devslp()
	ata: fix ata_id_has_ncq_autosense()
	ata: fix ata_id_has_dipm()
	md/raid5: Ensure stripe_fill happens on non-read IO with journal
	xhci: Don't show warning for reinit on known broken suspend
	usb: gadget: function: fix dangling pnp_string in f_printer.c
	drivers: serial: jsm: fix some leaks in probe
	phy: qualcomm: call clk_disable_unprepare in the error handling
	firmware: google: Test spinlock on panic path to avoid lockups
	serial: 8250: Fix restoring termios speed after suspend
	fsi: core: Check error number after calling ida_simple_get
	mfd: intel_soc_pmic: Fix an error handling path in intel_soc_pmic_i2c_probe()
	mfd: fsl-imx25: Fix an error handling path in mx25_tsadc_setup_irq()
	mfd: lp8788: Fix an error handling path in lp8788_probe()
	mfd: lp8788: Fix an error handling path in lp8788_irq_init() and lp8788_irq_init()
	mfd: sm501: Add check for platform_driver_register()
	dmaengine: ioat: stop mod_timer from resurrecting deleted timer in __cleanup()
	spmi: pmic-arb: correct duplicate APID to PPID mapping logic
	clk: bcm2835: fix bcm2835_clock_rate_from_divisor declaration
	clk: ti: dra7-atl: Fix reference leak in of_dra7_atl_clk_probe
	mailbox: bcm-ferxrm-mailbox: Fix error check for dma_map_sg
	powerpc/math_emu/efp: Include module.h
	powerpc/sysdev/fsl_msi: Add missing of_node_put()
	powerpc/pci_dn: Add missing of_node_put()
	powerpc/powernv: add missing of_node_put() in opal_export_attrs()
	powerpc: Fix SPE Power ISA properties for e500v1 platforms
	iommu/omap: Fix buffer overflow in debugfs
	iommu/iova: Fix module config properly
	crypto: cavium - prevent integer overflow loading firmware
	f2fs: fix race condition on setting FI_NO_EXTENT flag
	ACPI: video: Add Toshiba Satellite/Portege Z830 quirk
	MIPS: BCM47XX: Cast memcmp() of function to (void *)
	powercap: intel_rapl: fix UBSAN shift-out-of-bounds issue
	thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash
	NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
	wifi: brcmfmac: fix invalid address access when enabling SCAN log level
	openvswitch: Fix double reporting of drops in dropwatch
	openvswitch: Fix overreporting of drops in dropwatch
	tcp: annotate data-race around tcp_md5sig_pool_populated
	wifi: ath9k: avoid uninit memory read in ath9k_htc_rx_msg()
	xfrm: Update ipcomp_scratches with NULL when freed
	wifi: brcmfmac: fix use-after-free bug in brcmf_netdev_start_xmit()
	Bluetooth: L2CAP: initialize delayed works at l2cap_chan_create()
	Bluetooth: hci_sysfs: Fix attempting to call device_add multiple times
	can: bcm: check the result of can_send() in bcm_can_tx()
	wifi: rt2x00: don't run Rt5592 IQ calibration on MT7620
	wifi: rt2x00: set correct TX_SW_CFG1 MAC register for MT7620
	wifi: rt2x00: set SoC wmac clock register
	wifi: rt2x00: correctly set BBP register 86 for MT7620
	net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memory
	Bluetooth: L2CAP: Fix user-after-free
	r8152: Rate limit overflow messages
	drm: Use size_t type for len variable in drm_copy_field()
	drm: Prevent drm_copy_field() to attempt copying a NULL pointer
	drm/vc4: vec: Fix timings for VEC modes
	platform/x86: msi-laptop: Change DMI match / alias strings to fix module autoloading
	drm/amdgpu: fix initial connector audio value
	ARM: dts: imx7d-sdb: config the max pressure for tsc2046
	ARM: dts: imx6q: add missing properties for sram
	ARM: dts: imx6dl: add missing properties for sram
	ARM: dts: imx6qp: add missing properties for sram
	ARM: dts: imx6sl: add missing properties for sram
	media: cx88: Fix a null-ptr-deref bug in buffer_prepare()
	scsi: 3w-9xxx: Avoid disabling device if failing to enable it
	nbd: Fix hung when signal interrupts nbd_start_device_ioctl()
	HID: roccat: Fix use-after-free in roccat_read()
	md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d
	usb: host: xhci: Fix potential memory leak in xhci_alloc_stream_info()
	usb: musb: Fix musb_gadget.c rxstate overflow bug
	Revert "usb: storage: Add quirk for Samsung Fit flash"
	usb: idmouse: fix an uninit-value in idmouse_open
	perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
	net: ieee802154: return -EINVAL for unknown addr type
	net/ieee802154: don't warn zero-sized raw_sendmsg()
	ext4: continue to expand file system when the target size doesn't reach
	md: Replace snprintf with scnprintf
	efi: libstub: drop pointless get_memory_map() call
	inet: fully convert sk->sk_rx_dst to RCU rules
	thermal: intel_powerclamp: Use first online CPU as control_cpu
	Linux 4.14.296

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7d490d1d6185e26e23921167583f36793b87b9c1
2022-10-26 13:32:32 +02:00
Eric Dumazet
92e6e36ecd inet: fully convert sk->sk_rx_dst to RCU rules
commit 8f905c0e7354ef261360fb7535ea079b1082c105 upstream.

syzbot reported various issues around early demux,
one being included in this changelog [1]

sk->sk_rx_dst is using RCU protection without clearly
documenting it.

And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv()
are not following standard RCU rules.

[a]    dst_release(dst);
[b]    sk->sk_rx_dst = NULL;

They look wrong because a delete operation of RCU protected
pointer is supposed to clear the pointer before
the call_rcu()/synchronize_rcu() guarding actual memory freeing.

In some cases indeed, dst could be freed before [b] is done.

We could cheat by clearing sk_rx_dst before calling
dst_release(), but this seems the right time to stick
to standard RCU annotations and debugging facilities.

[1]
BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline]
BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204

CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
 __kasan_report mm/kasan/report.c:433 [inline]
 kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
 dst_check include/net/dst.h:470 [inline]
 tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
 ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340
 ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
 ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
 ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
 __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
 __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
 __netif_receive_skb_list net/core/dev.c:5608 [inline]
 netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
 gro_normal_list net/core/dev.c:5853 [inline]
 gro_normal_list net/core/dev.c:5849 [inline]
 napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
 virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
 __napi_poll+0xaf/0x440 net/core/dev.c:7023
 napi_poll net/core/dev.c:7090 [inline]
 net_rx_action+0x801/0xb40 net/core/dev.c:7177
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
 invoke_softirq kernel/softirq.c:432 [inline]
 __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
 common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240
 asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629
RIP: 0033:0x7f5e972bfd57
Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73
RSP: 002b:00007fff8a413210 EFLAGS: 00000283
RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45
RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45
RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9
R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0
R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019
 </TASK>

Allocated by task 13:
 kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:434 [inline]
 __kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467
 kasan_slab_alloc include/linux/kasan.h:259 [inline]
 slab_post_alloc_hook mm/slab.h:519 [inline]
 slab_alloc_node mm/slub.c:3234 [inline]
 slab_alloc mm/slub.c:3242 [inline]
 kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247
 dst_alloc+0x146/0x1f0 net/core/dst.c:92
 rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
 ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340
 ip_route_input_rcu net/ipv4/route.c:2470 [inline]
 ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415
 ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354
 ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
 ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
 ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
 __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
 __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
 __netif_receive_skb_list net/core/dev.c:5608 [inline]
 netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
 gro_normal_list net/core/dev.c:5853 [inline]
 gro_normal_list net/core/dev.c:5849 [inline]
 napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
 virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
 __napi_poll+0xaf/0x440 net/core/dev.c:7023
 napi_poll net/core/dev.c:7090 [inline]
 net_rx_action+0x801/0xb40 net/core/dev.c:7177
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558

Freed by task 13:
 kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
 kasan_set_track+0x21/0x30 mm/kasan/common.c:46
 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
 ____kasan_slab_free mm/kasan/common.c:366 [inline]
 ____kasan_slab_free mm/kasan/common.c:328 [inline]
 __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:1723 [inline]
 slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
 slab_free mm/slub.c:3513 [inline]
 kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530
 dst_destroy+0x2d6/0x3f0 net/core/dst.c:127
 rcu_do_batch kernel/rcu/tree.c:2506 [inline]
 rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558

Last potentially related work creation:
 kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
 __kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348
 __call_rcu kernel/rcu/tree.c:2985 [inline]
 call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065
 dst_release net/core/dst.c:177 [inline]
 dst_release+0x79/0xe0 net/core/dst.c:167
 tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712
 sk_backlog_rcv include/net/sock.h:1030 [inline]
 __release_sock+0x134/0x3b0 net/core/sock.c:2768
 release_sock+0x54/0x1b0 net/core/sock.c:3300
 tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441
 inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 sock_write_iter+0x289/0x3c0 net/socket.c:1057
 call_write_iter include/linux/fs.h:2162 [inline]
 new_sync_write+0x429/0x660 fs/read_write.c:503
 vfs_write+0x7cd/0xae0 fs/read_write.c:590
 ksys_write+0x1ee/0x250 fs/read_write.c:643
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff88807f1cb700
 which belongs to the cache ip_dst_cache of size 176
The buggy address is located 58 bytes inside of
 176-byte region [ffff88807f1cb700, ffff88807f1cb7b0)
The buggy address belongs to the page:
page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062
 prep_new_page mm/page_alloc.c:2418 [inline]
 get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
 alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
 alloc_slab_page mm/slub.c:1793 [inline]
 allocate_slab mm/slub.c:1930 [inline]
 new_slab+0x32d/0x4a0 mm/slub.c:1993
 ___slab_alloc+0x918/0xfe0 mm/slub.c:3022
 __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
 slab_alloc_node mm/slub.c:3200 [inline]
 slab_alloc mm/slub.c:3242 [inline]
 kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247
 dst_alloc+0x146/0x1f0 net/core/dst.c:92
 rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
 __mkroute_output net/ipv4/route.c:2564 [inline]
 ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791
 ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619
 __ip_route_output_key include/net/route.h:126 [inline]
 ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850
 ip_route_output_key include/net/route.h:142 [inline]
 geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809
 geneve_xmit_skb drivers/net/geneve.c:899 [inline]
 geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082
 __netdev_start_xmit include/linux/netdevice.h:4994 [inline]
 netdev_start_xmit include/linux/netdevice.h:5008 [inline]
 xmit_one net/core/dev.c:3590 [inline]
 dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606
 __dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1338 [inline]
 free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
 free_unref_page_prepare mm/page_alloc.c:3309 [inline]
 free_unref_page+0x19/0x690 mm/page_alloc.c:3388
 qlink_free mm/kasan/quarantine.c:146 [inline]
 qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
 kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
 __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444
 kasan_slab_alloc include/linux/kasan.h:259 [inline]
 slab_post_alloc_hook mm/slab.h:519 [inline]
 slab_alloc_node mm/slub.c:3234 [inline]
 kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270
 __alloc_skb+0x215/0x340 net/core/skbuff.c:414
 alloc_skb include/linux/skbuff.h:1126 [inline]
 alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078
 sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575
 mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754
 add_grhead+0x265/0x330 net/ipv6/mcast.c:1857
 add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995
 mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242
 mld_send_initial_cr net/ipv6/mcast.c:1232 [inline]
 mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268
 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445

Memory state around the buggy address:
 ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                        ^
 ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
 ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20211220143330.680945-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[cmllamas: fixed trivial merge conflict]
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-26 13:17:14 +02:00
Greg Kroah-Hartman
73f6c0fdd9 This is the 4.14.289 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmLZniwACgkQONu9yGCS
 aT7lPg/+NWwO6go0MBBlTEq0pTxDps4zLLCjQnhNWYKeDEEaSCNkL4DY3Pt0m57V
 VgLEk5V4KjrLFECOv8RjqpZz3mjSAgJR2EMjER9XP6ztTCRzWeLdNPgauOf58Kxy
 y8a1ZUx2f321oHnuf2u6X1Z2gnTSf9mTVxam5qRg8/87jriyiJrkNb22bv3ryOJ6
 OKivQxqWQd4Hz9QvoxOMJYCC0ldtyOCj4bcIKemrJKFG8rkLdoAG23vwsh9WXh7s
 SO6bL6nYezY9I4B8SvqKvlQ5iqf4I5j8n2tCyW0mrVMtq28REYdPHPZDDlxhEAlD
 URiZIBfZ5YmJ1Tm2XBNnoiSmWrLwcecEW3hwQsFuf2835bKRUVN2MojZlI6igbtd
 MhGmi/tF76AXP93rnhIokSuhKxkOpXBUUwrZKedV62X/lwj9e/Cuy6BaYW25ogOq
 5aoYxsvmvGofpkQoqKINiyAsV2EpC/y8nJrkL/OAtf0yVtUuHEv74CJMIBYnQpXR
 Ag1v+vJP3alTwXrHq4zRKKyUaVS4bLflodbkFriBb61duCDWQG+cIrkM/gvPx/vn
 ETCaV/t3J8+erS85PtFiEJJ0MwK/zsCqoJ7dFJyd5+fBmaHUVRdXbpjuh8/bI3jl
 MTknXHpppfmABXhtqPv2YzMmTEG04RwEQATjrm8iV3SE11AlDT0=
 =ySDW
 -----END PGP SIGNATURE-----

Merge 4.14.289 into android-4.14-stable

Changes in 4.14.289
	ALSA: hda - Add fixup for Dell Latitidue E5430
	ALSA: hda/conexant: Apply quirk for another HP ProDesk 600 G3 model
	xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue
	net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointer
	ARM: 9213/1: Print message about disabled Spectre workarounds only once
	ARM: 9214/1: alignment: advance IT state after emulating Thumb instruction
	cgroup: Use separate src/dst nodes when preloading css_sets for migration
	nilfs2: fix incorrect masking of permission flags for symlinks
	net: dsa: bcm_sf2: force pause link settings
	xhci: bail out early if driver can't accress host in resume
	xhci: make xhci_handshake timeout for xhci_reset() adjustable
	ARM: 9209/1: Spectre-BHB: avoid pr_info() every time a CPU comes out of idle
	inetpeer: Fix data-races around sysctl.
	net: Fix data-races around sysctl_mem.
	cipso: Fix data-races around sysctl.
	icmp: Fix data-races around sysctl.
	ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero
	icmp: Fix a data-race around sysctl_icmp_ratelimit.
	icmp: Fix a data-race around sysctl_icmp_ratemask.
	ipv4: Fix data-races around sysctl_ip_dynaddr.
	sfc: fix use after free when disabling sriov
	seg6: fix skb checksum evaluation in SRH encapsulation/insertion
	seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors
	sfc: fix kernel panic when creating VF
	virtio_mmio: Add missing PM calls to freeze/restore
	virtio_mmio: Restore guest page size on resume
	netfilter: br_netfilter: do not skip all hooks with 0 priority
	cpufreq: pmac32-cpufreq: Fix refcount leak bug
	platform/x86: hp-wmi: Ignore Sanitization Mode event
	net: tipc: fix possible refcount leak in tipc_sk_create()
	NFC: nxp-nci: don't print header length mismatch on i2c error
	net: sfp: fix memory leak in sfp_probe()
	ASoC: ops: Fix off by one in range control validation
	ASoC: wm5110: Fix DRE control
	irqchip: or1k-pic: Undefine mask_ack for level triggered hardware
	x86: Clear .brk area at early boot
	signal handling: don't use BUG_ON() for debugging
	USB: serial: ftdi_sio: add Belimo device ids
	usb: dwc3: gadget: Fix event pending check
	tty: serial: samsung_tty: set dma burst_size to 1
	serial: 8250: fix return error code in serial8250_request_std_resource()
	mm: invalidate hwpoison page cache page in fault path
	can: m_can: m_can_tx_handler(): fix use after free of skb
	Linux 4.14.289

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1e9b12a81151982c15f4a71b01aff2f1ad2eb7e5
2022-07-21 21:55:35 +02:00
Kuniyuki Iwashima
fa4bb704b0 ipv4: Fix data-races around sysctl_ip_dynaddr.
[ Upstream commit e49e4aff7ec19b2d0d0957ee30e93dade57dab9e ]

While reading sysctl_ip_dynaddr, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-21 20:42:45 +02:00
Greg Kroah-Hartman
0eec6f6001 This is the 4.14.269 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmIfSB4ACgkQONu9yGCS
 aT6eMQ//X/iNMO6+/ZRkVor0HrGYSfrGURpPzjSUEW11bf2Uzx1rV97SoMMKKXbK
 4QCDkK2UKCu41AyCFeDtzI96iAp7U36Z5ty1zJ/HVLp+9miYiNVF82E8IkNf1Imk
 OusVFTd/lZbNo57jlDQCJwf7z3ohv1sPnUddz2eFMVf6fOHA+kvN5+yvPECC5pG7
 ahPAMI8CuNfYqfEYHSI0ykfZ+dXuHVW7ag6fqxz8x6xucq5kw+yNCEPRr2QCkupH
 CdOrq55OVA1n/YY3sY5aAuvfHVthYwV303Vz+gurq4C+ZJ1+8HIUNVk0xI2xGj8U
 ORpIHx2OY7A3pzRacAxsxVg5cO1pgCv5X9Qoj4TCi9IURVQSxAI+wafahuFMDROI
 X24bI8xDf/gzMQoOtO7Pt5zKZxqfPE+CZpVVL9nchBCWuVKFqIPbyDdnVhLg4PIN
 2QscmSIU6gY6AIaKoCRAd8vJLkn3eOWsHak1CtVt8f+YtWXS6Vjf1LZgyPv2yk2T
 GeRnwRMhU/rMT+arU7T5R7TQzOhlqRaVAvrFDsemGFpxG/91eId4tVQHs3xhPlWs
 UNYSKZ41PuzZ235s6QJ67QsBD4DoHSoLsKu4gpn8vJG7OHFgYgmDniwCCYUIzmqv
 fk3vxOhsCy42wCBDoXP+BxRmLOqA3v5PldbKMAjBTjaE3/lvSyA=
 =e7C8
 -----END PGP SIGNATURE-----

Merge 4.14.269 into android-4.14-stable

Changes in 4.14.269
	cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
	vhost/vsock: don't check owner in vhost_vsock_stop() while releasing
	parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
	parisc/unaligned: Fix ldw() and stw() unalignment handlers
	sr9700: sanity check for packet length
	USB: zaurus: support another broken Zaurus
	serial: 8250: of: Fix mapped region size when using reg-offset property
	ping: remove pr_err from ping_lookup
	net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends
	gso: do not skip outer ip header in case of ipip and net_failover
	openvswitch: Fix setting ipv6 fields causing hw csum failure
	drm/edid: Always set RGB444
	net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
	configfs: fix a race in configfs_{,un}register_subsystem()
	RDMA/ib_srp: Fix a deadlock
	iio: adc: men_z188_adc: Fix a resource leak in an error handling path
	ata: pata_hpt37x: disable primary channel on HPT371
	Revert "USB: serial: ch341: add new Product ID for CH341A"
	usb: gadget: rndis: add spinlock for rndis response list
	USB: gadget: validate endpoint index for xilinx udc
	tracefs: Set the group ownership in apply_options() not parse_options()
	USB: serial: option: add support for DW5829e
	USB: serial: option: add Telit LE910R1 compositions
	usb: dwc3: gadget: Let the interrupt handler disable bottom halves.
	xhci: re-initialize the HC during resume if HCE was set
	xhci: Prevent futile URB re-submissions due to incorrect return value.
	tty: n_gsm: fix encoding of control signal octet bit DV
	tty: n_gsm: fix proper link termination after failed open
	Revert "drm/nouveau/pmu/gm200-: avoid touching PMU outside of DEVINIT/PREOS/ACR"
	memblock: use kfree() to release kmalloced memblock regions
	fget: clarify and improve __fget_files() implementation
	Linux 4.14.269

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0c7a1a638cac0693161ad06dd369075a6dd42402
2022-03-02 15:28:01 +01:00
Tao Liu
7840e55979 gso: do not skip outer ip header in case of ipip and net_failover
commit cc20cced0598d9a5ff91ae4ab147b3b5e99ee819 upstream.

We encounter a tcp drop issue in our cloud environment. Packet GROed in
host forwards to a VM virtio_net nic with net_failover enabled. VM acts
as a IPVS LB with ipip encapsulation. The full path like:
host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
 -> ipip encap -> net_failover tx -> virtio_net tx

When net_failover transmits a ipip pkt (gso_type = 0x0103, which means
SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4), there is no gso
did because it supports TSO and GSO_IPXIP4. But network_header points to
inner ip header.

Call Trace:
 tcp4_gso_segment        ------> return NULL
 inet_gso_segment        ------> inner iph, network_header points to
 ipip_gso_segment
 inet_gso_segment        ------> outer iph
 skb_mac_gso_segment

Afterwards virtio_net transmits the pkt, only inner ip header is modified.
And the outer one just keeps unchanged. The pkt will be dropped in remote
host.

Call Trace:
 inet_gso_segment        ------> inner iph, outer iph is skipped
 skb_mac_gso_segment
 __skb_gso_segment
 validate_xmit_skb
 validate_xmit_skb_list
 sch_direct_xmit
 __qdisc_run
 __dev_queue_xmit        ------> virtio_net
 dev_hard_start_xmit
 __dev_queue_xmit        ------> net_failover
 ip_finish_output2
 ip_output
 iptunnel_xmit
 ip_tunnel_xmit
 ipip_tunnel_xmit        ------> ipip
 dev_hard_start_xmit
 __dev_queue_xmit
 ip_finish_output2
 ip_output
 ip_forward
 ip_rcv
 __netif_receive_skb_one_core
 netif_receive_skb_internal
 napi_gro_receive
 receive_buf
 virtnet_poll
 net_rx_action

The root cause of this issue is specific with the rare combination of
SKB_GSO_DODGY and a tunnel device that adds an SKB_GSO_ tunnel option.
SKB_GSO_DODGY is set from external virtio_net. We need to reset network
header when callbacks.gso_segment() returns NULL.

This patch also includes ipv6_gso_segment(), considering SIT, etc.

Fixes: cb32f511a70b ("ipip: add GSO/TSO support")
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02 11:33:55 +01:00
Greg Kroah-Hartman
f3a2f786eb This is the 4.14.261 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmHVgigACgkQONu9yGCS
 aT4LLhAA2LPgWT+YRh7rggiqYUHf/ZKENGgjsnqAQPS+FjKklvI5cCcs2nyVMGwz
 y2paHiyEAqnffJR6dHNi+f4GALdRQbzVDZN5OUDPh07LD6m7kfePGF3XvT0peGuN
 hmlTeDW2nABzdzQm5M47kDWwB0d4NHrqOu22VQ+BmEdRiHrCnLcZqvkO01fh9gYt
 DSFopfINvYLMUu97cFseP1ayKGteJN7uQtxYke8ElxL8DLjSC5c5iixAgaOiLqDq
 ieOHDpaR7j7exCic423sk6pFcgbu1iDYYLJADwaXcuMnzEM4uKIdctABn5Oe43D4
 P0ySQ2UJfRSHypyuvkTLt0YpYn6OhBrX/Tve09ot6Kk7ouLE4oqIvZImpEWZmiub
 8NGATLPQPTUrmC48QKu0BgIsVNQl1yp9KWyJzG1CFAM2jMjYuHY6Vn5DErkIlsyv
 CpdEpW7JQPiAyfR1VFB4WRMzQlTcjNoX6DssYFYU+N4vh0nB+noPfzOw5JLA002+
 85YWWYirSxBRdncWQy0Xpw+iMGqEh4Kx+8mF+8DeVf156UnhZuRmhxuBJTBDXibn
 uBo3nwsSzuc79VqbnOwzrJuJmLqlbHxyUwHzyubCbOuECWoAvfzb+Rg/p5u0Zt2J
 zdCGJGlJByzPG/Ver6pK3GGuiAwo6rHPvCZrWDHcyL1Ph/Ds4NQ=
 =BUz9
 -----END PGP SIGNATURE-----

Merge 4.14.261 into android-4.14-stable

Changes in 4.14.261
	HID: asus: Add depends on USB_HID to HID_ASUS Kconfig option
	tee: handle lookup of shm with reference count 0
	platform/x86: apple-gmux: use resource_size() with res
	recordmcount.pl: fix typo in s390 mcount regex
	selinux: initialize proto variable in selinux_ip_postroute_compat()
	scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()
	net: usb: pegasus: Do not drop long Ethernet frames
	NFC: st21nfca: Fix memory leak in device probe and remove
	fsl/fman: Fix missing put_device() call in fman_port_probe
	nfc: uapi: use kernel size_t to fix user-space builds
	uapi: fix linux/nfc.h userspace compilation errors
	xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set.
	usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear.
	binder: fix async_free_space accounting for empty parcels
	scsi: vmw_pvscsi: Set residual data length conditionally
	Input: appletouch - initialize work before device registration
	Input: spaceball - fix parsing of movement data packets
	net: fix use-after-free in tw_timer_handler
	sctp: use call_rcu to free endpoint
	Linux 4.14.261

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I778bc28ac0835029328e2b503cb8fa241981c610
2022-01-05 13:21:58 +01:00
Muchun Song
5c2fe20ad3 net: fix use-after-free in tw_timer_handler
commit e22e45fc9e41bf9fcc1e92cfb78eb92786728ef0 upstream.

A real world panic issue was found as follow in Linux 5.4.

    BUG: unable to handle page fault for address: ffffde49a863de28
    PGD 7e6fe62067 P4D 7e6fe62067 PUD 7e6fe63067 PMD f51e064067 PTE 0
    RIP: 0010:tw_timer_handler+0x20/0x40
    Call Trace:
     <IRQ>
     call_timer_fn+0x2b/0x120
     run_timer_softirq+0x1ef/0x450
     __do_softirq+0x10d/0x2b8
     irq_exit+0xc7/0xd0
     smp_apic_timer_interrupt+0x68/0x120
     apic_timer_interrupt+0xf/0x20

This issue was also reported since 2017 in the thread [1],
unfortunately, the issue was still can be reproduced after fixing
DCCP.

The ipv4_mib_exit_net is called before tcp_sk_exit_batch when a net
namespace is destroyed since tcp_sk_ops is registered befrore
ipv4_mib_ops, which means tcp_sk_ops is in the front of ipv4_mib_ops
in the list of pernet_list. There will be a use-after-free on
net->mib.net_statistics in tw_timer_handler after ipv4_mib_exit_net
if there are some inflight time-wait timers.

This bug is not introduced by commit f2bf415cfed7 ("mib: add net to
NET_ADD_STATS_BH") since the net_statistics is a global variable
instead of dynamic allocation and freeing. Actually, commit
61a7e26028b9 ("mib: put net statistics on struct net") introduces
the bug since it put net statistics on struct net and free it when
net namespace is destroyed.

Moving init_ipv4_mibs() to the front of tcp_init() to fix this bug
and replace pr_crit() with panic() since continuing is meaningless
when init_ipv4_mibs() fails.

[1] https://groups.google.com/g/syzkaller/c/p1tn-_Kc6l4/m/smuL_FMAAgAJ?pli=1

Fixes: 61a7e26028b9 ("mib: put net statistics on struct net")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Cong Wang <cong.wang@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20211228104145.9426-1-songmuchun@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:33:49 +01:00
Chenbo Feng
317336de57 ANDROID: Remove Android paranoid check for socket creation
For 4.14+ kernels, eBPF cgroup socket filter is used to control socket
creation on devices. Remove this check since it is no longer useful.

Signed-off-by: Chenbo Feng <fengc@google.com>
Bug: 128944261
Test: CtsNetTestCasesInternetPermission
Change-Id: I2f353663389fc0f992e5a1b424c12215a2b074b0
2019-03-29 04:19:07 +00:00
Chenbo Feng
a03a547021 ANDROID: Remove xt_qtaguid module from new kernels.
For new devices ship with 4.14 kernel, the eBPF replacement should cover
all the functionalities of xt_qtaguid and it is safe now to remove this
android only module from the kernel.

Signed-off-by: Chenbo Feng <fengc@google.com>
Bug: 79938294
Test: kernel build
Change-Id: I032aecc048f7349f6a0c5192dd381f286fc7e5bf
2019-02-11 11:48:05 -08:00
Greg Kroah-Hartman
84ae3e35e1 This is the 4.14.73 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAluvTpMACgkQONu9yGCS
 aT7ujRAA2ITu5tdgwFN3UthGxHp5LvlKkeNpsZOzsQoTjgIEWMO1RPF6NFuDZUO0
 AdmovtfYgIF29PF9EALl8gIl0wNzI20TUGhx0t/6Ek2wsO3259RnNLFvaX14BfXF
 wmGg1uCVZo1HrWe/E2sh1+D/1gIKy5pl/sLW7N5imZ20AyRP4Q69AH4LwBQ0yAzT
 YgJK734i3p6Vf+nWeNAHFCl6m0lfwAJMTaMEZ567xM6HcZEnPqFoKOYvqA0znRau
 Sr3r5xH/YNyLQeTBpzOjJxIvWLOE9IxWgShQqvwcZ5PM35YPsNISmzp8AvmglM7x
 M9CuUQqr1+nawBnNAtHcUSy65VsGpPiLRne02Mc2ym7NAf1izFBHoAAkCtS9aGGy
 JQlZ0kos4A8VMO8j09gDguyLvdO0H1InWmfAWaAXtzDjjRHjYckFW99zgWRtN3Tn
 JUqPt5SkJzMU552tt7dGd7wTkXLjNxYpX24xw17ly7Z0oxArGlpCfF8XcxhNnRpX
 EgE3iNUcWxFbARSy4vbtSXqJDi/rB3o7YW74SxowAjivtTPsLeNXd6gFhiuOp2CQ
 Ak6XXgXI+Jzsu89edmM7V0vqw2UF6JdI9BJQvyGBC6KVUS6eYM00KQ+Rrwrvldsa
 KAKnV5cttAnwOZrAWof6PQLRJdGT1qXnNn4CWt3SX5LYFiLpFgI=
 =v/u2
 -----END PGP SIGNATURE-----

Merge 4.14.73 into android-4.14

Changes in 4.14.73
	gso_segment: Reset skb->mac_len after modifying network header
	ipv6: fix possible use-after-free in ip6_xmit()
	net/appletalk: fix minor pointer leak to userspace in SIOCFINDIPDDPRT
	net: hp100: fix always-true check for link up state
	pppoe: fix reception of frames with no mac header
	qmi_wwan: set DTR for modems in forced USB2 mode
	udp4: fix IP_CMSG_CHECKSUM for connected sockets
	neighbour: confirm neigh entries when ARP packet is received
	udp6: add missing checks on edumux packet processing
	net/sched: act_sample: fix NULL dereference in the data path
	tls: don't copy the key out of tls12_crypto_info_aes_gcm_128
	tls: zero the crypto information from tls_context before freeing
	tls: clear key material from kernel memory when do_tls_setsockopt_conf fails
	NFC: Fix possible memory corruption when handling SHDLC I-Frame commands
	NFC: Fix the number of pipes
	ASoC: cs4265: fix MMTLR Data switch control
	ASoC: rsnd: fixup not to call clk_get/set under non-atomic
	ALSA: bebob: fix memory leak for M-Audio FW1814 and ProjectMix I/O at error path
	ALSA: bebob: use address returned by kmalloc() instead of kernel stack for streaming DMA mapping
	ALSA: emu10k1: fix possible info leak to userspace on SNDRV_EMU10K1_IOCTL_INFO
	ALSA: fireface: fix memory leak in ff400_switch_fetching_mode()
	ALSA: firewire-digi00x: fix memory leak of private data
	ALSA: firewire-tascam: fix memory leak of private data
	ALSA: fireworks: fix memory leak of response buffer at error path
	ALSA: oxfw: fix memory leak for model-dependent data at error path
	ALSA: oxfw: fix memory leak of discovered stream formats at error path
	ALSA: oxfw: fix memory leak of private data
	platform/x86: alienware-wmi: Correct a memory leak
	xen/netfront: don't bug in case of too many frags
	xen/x86/vpmu: Zero struct pt_regs before calling into sample handling code
	spi: fix IDR collision on systems with both fixed and dynamic SPI bus numbers
	Revert "PCI: Add ACS quirk for Intel 300 series"
	ring-buffer: Allow for rescheduling when removing pages
	mm: shmem.c: Correctly annotate new inodes for lockdep
	Revert "rpmsg: core: add support to power domains for devices"
	Revert "uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name"
	scsi: target: iscsi: Use hex2bin instead of a re-implementation
	scsi: target: iscsi: Use bin2hex instead of a re-implementation
	Revert "ubifs: xattr: Don't operate on deleted inodes"
	ocfs2: fix ocfs2 read block panic
	drm/nouveau: Fix deadlocks in nouveau_connector_detect()
	drm/nouveau/drm/nouveau: Don't forget to cancel hpd_work on suspend/unload
	drm/nouveau/drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placement
	drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()
	drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too early
	drm/vc4: Fix the "no scaling" case on multi-planar YUV formats
	drm: udl: Destroy framebuffer only if it was initialized
	drm/amdgpu: add new polaris pci id
	tty: vt_ioctl: fix potential Spectre v1
	ext4: check to make sure the rename(2)'s destination is not freed
	ext4: avoid divide by zero fault when deleting corrupted inline directories
	ext4: avoid arithemetic overflow that can trigger a BUG
	ext4: recalucate superblock checksum after updating free blocks/inodes
	ext4: fix online resize's handling of a too-small final block group
	ext4: fix online resizing for bigalloc file systems with a 1k block size
	ext4: don't mark mmp buffer head dirty
	ext4: show test_dummy_encryption mount option in /proc/mounts
	sched/fair: Fix vruntime_normalized() for remote non-migration wakeup
	PCI: aardvark: Size bridges before resources allocation
	vmw_balloon: include asm/io.h
	iw_cxgb4: only allow 1 flush on user qps
	tick/nohz: Prevent bogus softirq pending warning
	spi: Fix double IDR allocation with DT aliases
	Linux 4.14.73

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-09-29 07:52:28 -07:00
Toke Høiland-Jørgensen
13a47054f0 gso_segment: Reset skb->mac_len after modifying network header
[ Upstream commit c56cae23c6b167acc68043c683c4573b80cbcc2c ]

When splitting a GSO segment that consists of encapsulated packets, the
skb->mac_len of the segments can end up being set wrong, causing packet
drops in particular when using act_mirred and ifb interfaces in
combination with a qdisc that splits GSO packets.

This happens because at the time skb_segment() is called, network_header
will point to the inner header, throwing off the calculation in
skb_reset_mac_len(). The network_header is subsequently adjust by the
outer IP gso_segment handlers, but they don't set the mac_len.

Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6
gso_segment handlers, after they modify the network_header.

Many thanks to Eric Dumazet for his help in identifying the cause of
the bug.

Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29 03:06:00 -07:00
Greg Kroah-Hartman
5adbbb16a5 This is the 4.14.7 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlo2ek8ACgkQONu9yGCS
 aT77ZBAA00LgEemgczsewIMs/la6ztVfyMnL0FjZOFh4/X6vUd3jfonv6TplSVvt
 MkvsnGfw+YTH5PE5db38HAhruNRVLHo6xZOl3R5NeQghQoe8PNjxWm7FXsQ4H5Zp
 apyllIPBnck2XzlXR2iiiS41dem++/ktYum24Xu4Bre+rplt/6HXQ42osHyfrCbD
 jcwMdw4IesjdPaooZBBgJENZTBlft+NR8bkO+ZjcvameBqpIaEyeAAAYpd0+2SYE
 e/nbEP7FI3aSMcki7zaMkFLXJql/+mMuZK4I6ZsbfDSX1uqgBvxpOHJ9rpuzjkgm
 Bmv78c5YimppXsIF81+Raixvf7XW+xjydTL6T4I+phgLu8HaClbk3DNleWi60hdj
 nrd5q4SO3EMtT3T1Te54xTBW8gbNpKymp2QQWuOKxVaRuy1sppZxOqJVu0n6kpVp
 rsZnfxGHfxM0xCVy2mPH37xFsppn54TVbA7qnv/BbLtw80dmY+HGsCZ+EN9EzYh0
 f6ZHo487UZnzrHggwPQKdsiHBpITfEGxa1c/qG6nTRnUgpTv5ilmZFSD94w8PES6
 tqRRi7aZJbrGjTRWOwOX3Ot/4jpFhmnAIY8F+CKm/qWtD064FCfR4puutVk2ZbBe
 hLncVOO9JAuvn6Gcy8zKTBIkE7TE4hYlDM82NxEWLHfLkRnwOwk=
 =069a
 -----END PGP SIGNATURE-----

Merge 4.14.7 into android-4.14

Changes in 4.14.7
	net: qmi_wwan: add Quectel BG96 2c7c:0296
	net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts
	net: thunderx: Fix TCP/UDP checksum offload for IPv4 pkts
	net: realtek: r8169: implement set_link_ksettings()
	s390/qeth: fix early exit from error path
	tipc: fix memory leak in tipc_accept_from_sock()
	vhost: fix skb leak in handle_rx()
	rds: Fix NULL pointer dereference in __rds_rdma_map
	sit: update frag_off info
	tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()
	packet: fix crash in fanout_demux_rollover()
	net/packet: fix a race in packet_bind() and packet_notifier()
	tcp: remove buggy call to tcp_v6_restore_cb()
	usbnet: fix alignment for frames with no ethernet header
	net: remove hlist_nulls_add_tail_rcu()
	stmmac: reset last TSO segment size after device open
	tcp/dccp: block bh before arming time_wait timer
	s390/qeth: build max size GSO skbs on L2 devices
	s390/qeth: fix thinko in IPv4 multicast address tracking
	s390/qeth: fix GSO throughput regression
	tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match()
	tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()
	tcp: use current time in tcp_rcv_space_adjust()
	net: sched: cbq: create block for q->link.block
	tap: free skb if flags error
	tcp: when scheduling TLP, time of RTO should account for current ACK
	tun: free skb in early errors
	net: ipv6: Fixup device for anycast routes during copy
	tun: fix rcu_read_lock imbalance in tun_build_skb
	net: accept UFO datagrams from tuntap and packet
	net: openvswitch: datapath: fix data type in queue_gso_packets
	cls_bpf: don't decrement net's refcount when offload fails
	sctp: use right member as the param of list_for_each_entry
	ipmi: Stop timers before cleaning up the module
	usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping
	fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall
	fix kcm_clone()
	KVM: arm/arm64: vgic-its: Preserve the revious read from the pending table
	kbuild: do not call cc-option before KBUILD_CFLAGS initialization
	powerpc/powernv/idle: Round up latency and residency values
	ipvlan: fix ipv6 outbound device
	ide: ide-atapi: fix compile error with defining macro DEBUG
	blk-mq: Avoid that request queue removal can trigger list corruption
	nvmet-rdma: update queue list during ib_device removal
	audit: Allow auditd to set pid to 0 to end auditing
	audit: ensure that 'audit=1' actually enables audit for PID 1
	dm raid: fix panic when attempting to force a raid to sync
	md: free unused memory after bitmap resize
	RDMA/cxgb4: Annotate r2 and stag as __be32
	x86/intel_rdt: Fix potential deadlock during resctrl unmount
	media: dvb-core: always call invoke_release() in fe_free()
	dvb_frontend: don't use-after-free the frontend struct
	Linux 4.14.7

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-04 13:47:49 +01:00
Chenbo Feng
aa284cdfbf ANDROID: netfilter: xt_qtaguid: Add untag hacks to inet_release function
To prevent protential risk of memory leak caused by closing socket with
out untag it from qtaguid module, the qtaguid module now do not hold any
socket file reference count. Instead, it will increase the sk_refcnt of
the sk struct to prevent a reuse of the socket pointer.  And when a socket
is released. It will delete the tag if the socket is previously tagged so
no more resources is held by xt_qtaguid moudle. A flag is added to the untag
process to prevent possible kernel crash caused by fail to delete
corresponding socket_tag_entry list.
Bug: 36374484
Test: compile and run test under system/extra/test/iptables,
      run cts -m CtsNetTestCases -t android.net.cts.SocketRefCntTest

Signed-off-by: Chenbo Feng <fengc@google.com>
Change-Id: Iea7c3bf0c59b9774a5114af905b2405f6bc9ee52
2017-12-18 21:11:22 +05:30
Chia-chi Yeh
d846068e98 ANDROID: net: paranoid: Replace AID_NET_RAW checks with capable(CAP_NET_RAW).
Signed-off-by: Chia-chi Yeh <chiachi@android.com>
2017-12-18 21:11:22 +05:30
Robert Love
7bafcbf59a ANDROID: net: Paranoid network.
With CONFIG_ANDROID_PARANOID_NETWORK, require specific uids/gids to instantiate
network sockets.

Signed-off-by: Robert Love <rlove@google.com>

paranoid networking: Use in_egroup_p() to check group membership

The previous group_search() caused trouble for partners with module builds.
in_egroup_p() is also cleaner.

Signed-off-by: Nick Pelly <npelly@google.com>

Fix 2.6.29 build.

Signed-off-by: Arve Hjønnevåg <arve@android.com>

net: Fix compilation of the IPv6 module

Fix compilation of the IPv6 module -- current->euid does not exist anymore,
current_euid() is what needs to be used.

Signed-off-by: Steinar H. Gunderson <sesse@google.com>

net: bluetooth: Remove the AID_NET_BT* gid numbers

Removed bluetooth checks for AID_NET_BT and AID_NET_BT_ADMIN
which are not useful anymore.
This is in preparation for getting rid of all the AID_* gids.

Change-Id: I879d7181f07532784499ef152288d12a03ab6354
Signed-off-by: JP Abgrall <jpa@google.com>

[AmitP: Folded following android-4.9 commit changes into this patch
        a2624d7b9d73 ("ANDROID: Add android_aid.h")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2017-12-18 21:11:22 +05:30
Willem de Bruijn
60335608e2 net: accept UFO datagrams from tuntap and packet
[ Upstream commit 0c19f846d582af919db66a5914a0189f9f92c936 ]

Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.

Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.

It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
("net: avoid skb_warn_bad_offload false positives on UFO").

(*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.

Tested
  Booted a v4.13 guest kernel with QEMU. On a host kernel before this
  patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
  enabled, same as on a v4.13 host kernel.

  A UFO packet sent from the guest appears on the tap device:
    host:
      nc -l -p -u 8000 &
      tcpdump -n -i tap0

    guest:
      dd if=/dev/zero of=payload.txt bs=1 count=2000
      nc -u 192.16.1.1 8000 < payload.txt

  Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
  packets arriving fragmented:

    ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
    (from https://github.com/wdebruij/kerneltools/tree/master/tests)

Changes
  v1 -> v2
    - simplified set_offload change (review comment)
    - documented test procedure

Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17 15:07:58 +01:00
David Ahern
a8e3bb347d net: Add comment that early_demux can change via sysctl
Twice patches trying to constify inet{6}_protocol have been reverted:
39294c3df2a8 ("Revert "ipv6: constify inet6_protocol structures"") to
revert 3a3a4e3054137 and then 03157937fe0b5 ("Revert "ipv4: make
net_protocol const"") to revert aa8db499ea67.

Add a comment that the structures can not be const because the
early_demux field can change based on a sysctl.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:17:29 -07:00
David Ahern
03157937fe Revert "ipv4: make net_protocol const"
This reverts commit aa8db499ea67cff1f5f049033810ffede2fe5ae4.

Early demux structs can not be made const. Doing so results in:
[   84.967355] BUG: unable to handle kernel paging request at ffffffff81684b10
[   84.969272] IP: proc_configure_early_demux+0x1e/0x3d
[   84.970544] PGD 1a0a067
[   84.970546] P4D 1a0a067
[   84.971212] PUD 1a0b063
[   84.971733] PMD 80000000016001e1

[   84.972669] Oops: 0003 [#1] SMP
[   84.973065] Modules linked in: ip6table_filter ip6_tables veth vrf
[   84.973833] CPU: 0 PID: 955 Comm: sysctl Not tainted 4.13.0-rc6+ #22
[   84.974612] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[   84.975855] task: ffff88003854ce00 task.stack: ffffc900005a4000
[   84.976580] RIP: 0010:proc_configure_early_demux+0x1e/0x3d
[   84.977253] RSP: 0018:ffffc900005a7dd0 EFLAGS: 00010246
[   84.977891] RAX: ffffffff81684b10 RBX: 0000000000000001 RCX: 0000000000000000
[   84.978759] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000000
[   84.979628] RBP: ffffc900005a7dd0 R08: 0000000000000000 R09: 0000000000000000
[   84.980501] R10: 0000000000000001 R11: 0000000000000008 R12: 0000000000000001
[   84.981373] R13: ffffffffffffffea R14: ffffffff81a9b4c0 R15: 0000000000000002
[   84.982249] FS:  00007feb237b7700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[   84.983231] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.983941] CR2: ffffffff81684b10 CR3: 0000000038492000 CR4: 00000000000406f0
[   84.984817] Call Trace:
[   84.985133]  proc_tcp_early_demux+0x29/0x30

I think this is the second time such a patch has been reverted.

Cc: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 14:30:46 -07:00
Bhumika Goyal
aa8db499ea ipv4: make net_protocol const
Make these const as they are only passed to a const argument of the
function inet_add_protocol.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:30:02 -07:00
David S. Miller
3b2b69efec Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Mainline had UFO fixes, but UFO is removed in net-next so we
take the HEAD hunks.

Minor context conflict in bcmsysport statistics bug fix.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10 12:11:16 -07:00
Nikolay Borisov
1714020e42 igmp: Fix regression caused by igmp sysctl namespace code.
Commit dcd87999d415 ("igmp: net: Move igmp namespace init to correct file")
moved the igmp sysctls initialization from tcp_sk_init to igmp_net_init. This
function is only called as part of per-namespace initialization, only if
CONFIG_IP_MULTICAST is defined, otherwise igmp_mc_init() call in ip_init is
compiled out, casuing the igmp pernet ops to not be registerd and those sysctl
being left initialized with 0. However, there are certain functions, such as
ip_mc_join_group which are always compiled and make use of some of those
sysctls. Let's do a partial revert of the aforementioned commit and move the
sysctl initialization into inet_init_net, that way they will always have
sane values.

Fixes: dcd87999d415 ("igmp: net: Move igmp namespace init to correct file")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196595
Reported-by: Gerardo Exequiel Pozzi <vmlinuz386@gmail.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09 22:46:44 -07:00
Tonghao Zhang
93b1b31f87 ipv4: Introduce ipip_offload_init helper function.
It's convenient to init ipip offload. We will check
the return value, and print KERN_CRIT info on failure.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:27:07 -07:00
Tom Herbert
306b13eb3c proto_ops: Add locked held versions of sendmsg and sendpage
Add new proto_ops sendmsg_locked and sendpage_locked that can be
called when the socket lock is already held. Correspondingly, add
kernel_sendmsg_locked and kernel_sendpage_locked as front end
functions.

These functions will be used in zero proxy so that we can take
the socket lock in a ULP sendmsg/sendpage and then directly call the
backend transport proto_ops functions.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 15:26:18 -07:00
David S. Miller
880388aa3c net: Remove all references to SKB_GSO_UDP.
Such packets are no longer possible.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:58 -07:00
Reshetova, Elena
14afee4b60 net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 07:39:08 -07:00
Eric Dumazet
77d4b1d369 net: ping: do not abuse udp_poll()
Alexander reported various KASAN messages triggered in recent kernels

The problem is that ping sockets should not use udp_poll() in the first
place, and recent changes in UDP stack finally exposed this old bug.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Solar Designer <solar@openwall.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Acked-By: Lorenzo Colitti <lorenzo@google.com>
Tested-By: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 22:56:55 -04:00
Linus Torvalds
8d65b08deb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Millar:
 "Here are some highlights from the 2065 networking commits that
  happened this development cycle:

   1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)

   2) Add a generic XDP driver, so that anyone can test XDP even if they
      lack a networking device whose driver has explicit XDP support
      (me).

   3) Sparc64 now has an eBPF JIT too (me)

   4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
      Starovoitov)

   5) Make netfitler network namespace teardown less expensive (Florian
      Westphal)

   6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)

   7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)

   8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)

   9) Multiqueue support in stmmac driver (Joao Pinto)

  10) Remove TCP timewait recycling, it never really could possibly work
      well in the real world and timestamp randomization really zaps any
      hint of usability this feature had (Soheil Hassas Yeganeh)

  11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
      Aleksandrov)

  12) Add socket busy poll support to epoll (Sridhar Samudrala)

  13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
      and several others)

  14) IPSEC hw offload infrastructure (Steffen Klassert)"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
  tipc: refactor function tipc_sk_recv_stream()
  tipc: refactor function tipc_sk_recvmsg()
  net: thunderx: Optimize page recycling for XDP
  net: thunderx: Support for XDP header adjustment
  net: thunderx: Add support for XDP_TX
  net: thunderx: Add support for XDP_DROP
  net: thunderx: Add basic XDP support
  net: thunderx: Cleanup receive buffer allocation
  net: thunderx: Optimize CQE_TX handling
  net: thunderx: Optimize RBDR descriptor handling
  net: thunderx: Support for page recycling
  ipx: call ipxitf_put() in ioctl error path
  net: sched: add helpers to handle extended actions
  qed*: Fix issues in the ptp filter config implementation.
  qede: Fix concurrency issue in PTP Tx path processing.
  stmmac: Add support for SIMATIC IOT2000 platform
  net: hns: fix ethtool_get_strings overflow in hns driver
  tcp: fix wraparound issue in tcp_lp
  bpf, arm64: fix jit branch offset related to ldimm64
  bpf, arm64: implement jiting of BPF_XADD
  ...
2017-05-02 16:40:27 -07:00
Steffen Klassert
9b83e03198 ipv4: Don't pass IP fragments to upper layer GRO handlers.
Upper layer GRO handlers can not handle IP fragments, so
exit GRO processing in this case.

This fixes ESP GRO because the packet must be reassembled
before we can decapsulate, otherwise we get authentication
failures.

It also aligns IPv4 to IPv6 where packets with fragmentation
headers are not passed to upper layer GRO handlers.

Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-28 16:00:38 -04:00
subashab@codeaurora.org
dddb64bcb3 net: Add sysctl to toggle early demux for tcp and udp
Certain system process significant unconnected UDP workload.
It would be preferrable to disable UDP early demux for those systems
and enable it for TCP only.

By disabling UDP demux, we see these slight gains on an ARM64 system-
782 -> 788Mbps unconnected single stream UDPv4
633 -> 654Mbps unconnected UDPv4 different sources

The performance impact can change based on CPU architecure and cache
sizes. There will not much difference seen if entire UDP hash table
is in cache.

Both sysctls are enabled by default to preserve existing behavior.

v1->v2: Change function pointer instead of adding conditional as
suggested by Stephen.

v2->v3: Read once in callers to avoid issues due to compiler
optimizations. Also update commit message with the tests.

v3->v4: Store and use read once result instead of querying pointer
again incorrectly.

v4->v5: Refactor to avoid errors due to compilation with IPV6={m,n}

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Suggested-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Tom Herbert <tom@herbertland.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 13:17:07 -07:00
David Howells
cdfbabfb2f net: Work around lockdep limitation in sockets that use sockets
Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
     calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
     creating a call requires the socket lock:

	mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
     binds the underlying UDP socket whilst holding its socket lock.
     inet_bind() takes its own socket lock:

	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
     and thus cause the kernel to take the mmap_sem, but the TCP socket is
     locked whilst doing this:

	sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
     used if the socket is created by userspace and the other set is used
     if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
     sock struct (sk_kern_sock).  This informs sock_lock_init(),
     sock_init_data() and sk_clone_lock() as to the lock keys to be used.

     Note that the child created by sk_clone_lock() inherits the parent's
     kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
     passed in to ->create() that distinguishes whether kernel_accept() or
     sys_accept4() was the caller and can be passed to sk_alloc().

     Note that a lot of accept functions merely dequeue an already
     allocated socket.  I haven't touched these as the new socket already
     exists before we get the parameter.

     Note also that there are a couple of places where I've made the accepted
     socket unconditionally kernel-based:

	irda_accept()
	rds_rcp_accept_one()
	tcp_accept_from_sock()

     because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 18:23:27 -08:00
Paolo Abeni
294acf1c01 net/tunnel: set inner protocol in network gro hooks
The gso code of several tunnels type (gre and udp tunnels)
takes for granted that the skb->inner_protocol is properly
initialized and drops the packet elsewhere.

On the forwarding path no one is initializing such field,
so gro encapsulated packets are dropped on forward.

Since commit 38720352412a ("gre: Use inner_proto to obtain
inner header protocol"), this can be reproduced when the
encapsulated packets use gre as the tunneling protocol.

The issue happens also with vxlan and geneve tunnels since
commit 8bce6d7d0d1e ("udp: Generalize skb_udp_segment"), if the
forwarding host's ingress nic has h/w offload for such tunnel
and a vxlan/geneve device is configured on top of it, regardless
of the configured peer address and vni.

To address the issue, this change initialize the inner_protocol
field for encapsulated packets in both ipv4 and ipv6 gro complete
callbacks.

Fixes: 38720352412a ("gre: Use inner_proto to obtain inner header protocol")
Fixes: 8bce6d7d0d1e ("udp: Generalize skb_udp_segment")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 13:19:52 -08:00
Steffen Klassert
5f114163f2 net: Add a skb_gro_flush_final helper.
Add a skb_gro_flush_final helper to prepare for  consuming
skbs in call_gro_receive. We will extend this helper to not
touch the skb if the skb is consumed by a gro callback with
a followup patch. We need this to handle the upcomming IPsec
ESP callbacks as they reinject the skb to the napi_gro_receive
asynchronous. The handler is used in all gro_receive functions
that can call the ESP gro handlers.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 09:39:39 +01:00
Willy Tarreau
3979ad7e82 net/tcp-fastopen: make connect()'s return case more consistent with non-TFO
Without TFO, any subsequent connect() call after a successful one returns
-1 EISCONN. The last API update ensured that __inet_stream_connect() can
return -1 EINPROGRESS in response to sendmsg() when TFO is in use to
indicate that the connection is now in progress. Unfortunately since this
function is used both for connect() and sendmsg(), it has the undesired
side effect of making connect() now return -1 EINPROGRESS as well after
a successful call, while at the same time poll() returns POLLOUT. This
can confuse some applications which happen to call connect() and to
check for -1 EISCONN to ensure the connection is usable, and for which
EINPROGRESS indicates a need to poll, causing a loop.

This problem was encountered in haproxy where a call to connect() is
precisely used in certain cases to confirm a connection's readiness.
While arguably haproxy's behaviour should be improved here, it seems
important to aim at a more robust behaviour when the goal of the new
API is to make it easier to implement TFO in existing applications.

This patch simply ensures that we preserve the same semantics as in
the non-TFO case on the connect() syscall when using TFO, while still
returning -1 EINPROGRESS on sendmsg(). For this we simply tell
__inet_stream_connect() whether we're doing a regular connect() or in
fact connecting for a sendmsg() call.

Cc: Wei Wang <weiwan@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:12:21 -05:00
Wei Wang
19f6d3f3c8 net/tcp-fastopen: Add new API support
This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an
alternative way to perform Fast Open on the active side (client). Prior
to this patch, a client needs to replace the connect() call with
sendto(MSG_FASTOPEN). This can be cumbersome for applications who want
to use Fast Open: these socket operations are often done in lower layer
libraries used by many other applications. Changing these libraries
and/or the socket call sequences are not trivial. A more convenient
approach is to perform Fast Open by simply enabling a socket option when
the socket is created w/o changing other socket calls sequence:
  s = socket()
    create a new socket
  setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT …);
    newly introduced sockopt
    If set, new functionality described below will be used.
    Return ENOTSUPP if TFO is not supported or not enabled in the
    kernel.

  connect()
    With cookie present, return 0 immediately.
    With no cookie, initiate 3WHS with TFO cookie-request option and
    return -1 with errno = EINPROGRESS.

  write()/sendmsg()
    With cookie present, send out SYN with data and return the number of
    bytes buffered.
    With no cookie, and 3WHS not yet completed, return -1 with errno =
    EINPROGRESS.
    No MSG_FASTOPEN flag is needed.

  read()
    Return -1 with errno = EWOULDBLOCK/EAGAIN if connect() is called but
    write() is not called yet.
    Return -1 with errno = EWOULDBLOCK/EAGAIN if connection is
    established but no msg is received yet.
    Return number of bytes read if socket is established and there is
    msg received.

The new API simplifies life for applications that always perform a write()
immediately after a successful connect(). Such applications can now take
advantage of Fast Open by merely making one new setsockopt() call at the time
of creating the socket. Nothing else about the application's socket call
sequence needs to change.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:04:38 -05:00
Krister Johansen
4548b683b7 Introduce a sysctl that modifies the value of PROT_SOCK.
Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl
that denotes the first unprivileged inet port in the namespace.  To
disable all privileged ports set this to zero.  It also checks for
overlap with the local port range.  The privileged and local range may
not overlap.

The use case for this change is to allow containerized processes to bind
to priviliged ports, but prevent them from ever being allowed to modify
their container's network configuration.  The latter is accomplished by
ensuring that the network namespace is not a child of the user
namespace.  This modification was needed to allow the container manager
to disable a namespace's priviliged port restrictions without exposing
control of the network namespace to processes in the user namespace.

Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24 12:10:51 -05:00
Haishuang Yan
1946e672c1 ipv4: Namespaceify tcp_tw_recycle and tcp_max_tw_buckets knob
Different namespace application might require fast recycling
TIME-WAIT sockets independently of the host.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 11:38:31 -05:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
David S. Miller
2745529ac7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Couple conflicts resolved here:

1) In the MACB driver, a bug fix to properly initialize the
   RX tail pointer properly overlapped with some changes
   to support variable sized rings.

2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
   overlapping with a reorganization of the driver to support
   ACPI, OF, as well as PCI variants of the chip.

3) In 'net' we had several probe error path bug fixes to the
   stmmac driver, meanwhile a lot of this code was cleaned up
   and reorganized in 'net-next'.

4) The cls_flower classifier obtained a helper function in
   'net-next' called __fl_delete() and this overlapped with
   Daniel Borkamann's bug fix to use RCU for object destruction
   in 'net'.  It also overlapped with Jiri's change to guard
   the rhashtable_remove_fast() call with a check against
   tc_skip_sw().

5) In mlx4, a revert bug fix in 'net' overlapped with some
   unrelated changes in 'net-next'.

6) In geneve, a stale header pointer after pskb_expand_head()
   bug fix in 'net' overlapped with a large reorganization of
   the same code in 'net-next'.  Since the 'net-next' code no
   longer had the bug in question, there was nothing to do
   other than to simply take the 'net-next' hunks.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03 12:29:53 -05:00
David Ahern
6102365876 bpf: Add new cgroup attach type to enable sock modifications
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
Currently only sk_bound_dev_if is exported to userspace for modification
by a bpf program.

This allows a cgroup to be configured such that AF_INET{6} sockets opened
by processes are automatically bound to a specific device. In turn, this
enables the running of programs that do not support SO_BINDTODEVICE in a
specific VRF context / L3 domain.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:46:08 -05:00
Arnaldo Carvalho de Melo
a510887824 GSO: Reload iph after pskb_may_pull
As it may get stale and lead to use after free.

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Alexander Duyck <aduyck@mirantis.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Fixes: cbc53e08a793 ("GSO: Add GSO type for fixed IPv4 ID")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 20:45:54 -05:00
WANG Cong
14135f30e3 inet: fix sleeping inside inet_wait_for_connect()
Andrey reported this kernel warning:

  WARNING: CPU: 0 PID: 4608 at kernel/sched/core.c:7724
  __might_sleep+0x14c/0x1a0 kernel/sched/core.c:7719
  do not call blocking ops when !TASK_RUNNING; state=1 set at
  [<ffffffff811f5a5c>] prepare_to_wait+0xbc/0x210
  kernel/sched/wait.c:178
  Modules linked in:
  CPU: 0 PID: 4608 Comm: syz-executor Not tainted 4.9.0-rc2+ #320
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
   ffff88006625f7a0 ffffffff81b46914 ffff88006625f818 0000000000000000
   ffffffff84052960 0000000000000000 ffff88006625f7e8 ffffffff81111237
   ffff88006aceac00 ffffffff00001e2c ffffed000cc4beff ffffffff84052960
  Call Trace:
   [<     inline     >] __dump_stack lib/dump_stack.c:15
   [<ffffffff81b46914>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
   [<ffffffff81111237>] __warn+0x1a7/0x1f0 kernel/panic.c:550
   [<ffffffff8111132c>] warn_slowpath_fmt+0xac/0xd0 kernel/panic.c:565
   [<ffffffff811922fc>] __might_sleep+0x14c/0x1a0 kernel/sched/core.c:7719
   [<     inline     >] slab_pre_alloc_hook mm/slab.h:393
   [<     inline     >] slab_alloc_node mm/slub.c:2634
   [<     inline     >] slab_alloc mm/slub.c:2716
   [<ffffffff81508da0>] __kmalloc_track_caller+0x150/0x2a0 mm/slub.c:4240
   [<ffffffff8146be14>] kmemdup+0x24/0x50 mm/util.c:113
   [<ffffffff8388b2cf>] dccp_feat_clone_sp_val.part.5+0x4f/0xe0 net/dccp/feat.c:374
   [<     inline     >] dccp_feat_clone_sp_val net/dccp/feat.c:1141
   [<     inline     >] dccp_feat_change_recv net/dccp/feat.c:1141
   [<ffffffff8388d491>] dccp_feat_parse_options+0xaa1/0x13d0 net/dccp/feat.c:1411
   [<ffffffff83894f01>] dccp_parse_options+0x721/0x1010 net/dccp/options.c:128
   [<ffffffff83891280>] dccp_rcv_state_process+0x200/0x15b0 net/dccp/input.c:644
   [<ffffffff838b8a94>] dccp_v4_do_rcv+0xf4/0x1a0 net/dccp/ipv4.c:681
   [<     inline     >] sk_backlog_rcv ./include/net/sock.h:872
   [<ffffffff82b7ceb6>] __release_sock+0x126/0x3a0 net/core/sock.c:2044
   [<ffffffff82b7d189>] release_sock+0x59/0x1c0 net/core/sock.c:2502
   [<     inline     >] inet_wait_for_connect net/ipv4/af_inet.c:547
   [<ffffffff8316b2a2>] __inet_stream_connect+0x5d2/0xbb0 net/ipv4/af_inet.c:617
   [<ffffffff8316b8d5>] inet_stream_connect+0x55/0xa0 net/ipv4/af_inet.c:656
   [<ffffffff82b705e4>] SYSC_connect+0x244/0x2f0 net/socket.c:1533
   [<ffffffff82b72dd4>] SyS_connect+0x24/0x30 net/socket.c:1514
   [<ffffffff83fbf701>] entry_SYSCALL_64_fastpath+0x1f/0xc2
  arch/x86/entry/entry_64.S:209

Unlike commit 26cabd31259ba43f68026ce3f62b78094124333f
("sched, net: Clean up sk_wait_event() vs. might_sleep()"), the
sleeping function is called before schedule_timeout(), this is indeed
a bug. Fix this by moving the wait logic to the new API, it is similar
to commit ff960a731788a7408b6f66ec4fd772ff18833211
("netdev, sched/wait: Fix sleeping inside wait event").

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-03 15:18:07 -04:00
Sabrina Dubroca
fcd91dd449 net: add recursion limit to GRO
Currently, GRO can do unlimited recursion through the gro_receive
handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem.  Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.

This patch adds a recursion counter to the GRO layer to prevent stack
overflow.  When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally.  This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.

Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.

Fixes: CVE-2016-7039
Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:32:22 -04:00
Steffen Klassert
07b26c9454 gso: Support partial splitting at the frag_list pointer
Since commit 8a29111c7 ("net: gro: allow to build full sized skb")
gro may build buffers with a frag_list. This can hurt forwarding
because most NICs can't offload such packets, they need to be
segmented in software. This patch splits buffers with a frag_list
at the frag_list pointer into buffers that can be TSO offloaded.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-19 20:59:34 -04:00
Tom Herbert
3203558589 tcp: Set read_sock and peek_len proto_ops
In inet_stream_ops we set read_sock to tcp_read_sock and peek_len to
tcp_peek_len (which is just a stub function that calls tcp_inq).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:32:41 -04:00
Yuchung Cheng
cebc5cbab4 net-tcp: retire TFO_SERVER_WO_SOCKOPT2 config
TFO_SERVER_WO_SOCKOPT2 was intended for debugging purposes during
Fast Open development. Remove this config option and also
update/clean-up the documentation of the Fast Open sysctl.

Reported-by: Piotr Jurkiewicz <piotr.jerzy.jurkiewicz@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 17:01:01 -07:00
Paul Gortmaker
d3fc0353f7 ipv4: af_inet: make it explicitly non-modular
The Makefile controlling compilation of this file is obj-y,
meaning that it currently is never being built as a module.

Since MODULE_ALIAS is a no-op for non-modular code, we can simply
remove the MODULE_ALIAS_NETPROTO variant used here.

We replace module.h with kmod.h since the file does make use of
request_module() in order to load other modules from here.

We don't have to worry about init.h coming in via the removed
module.h since the file explicitly includes init.h already.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 22:44:26 -07:00
Ezequiel Garcia
049bbf589e ipv4: Fix non-initialized TTL when CONFIG_SYSCTL=n
Commit fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
moves the default TTL assignment, and as side-effect IPv4 TTL now
has a default value only if sysctl support is enabled (CONFIG_SYSCTL=y).

The sysctl_ip_default_ttl is fundamental for IP to work properly,
as it provides the TTL to be used as default. The defautl TTL may be
used in ip_selected_ttl, through the following flow:

  ip_select_ttl
    ip4_dst_hoplimit
      net->ipv4.sysctl_ip_default_ttl

This commit fixes the issue by assigning net->ipv4.sysctl_ip_default_ttl
in net_init_net, called during ipv4's initialization.

Without this commit, a kernel built without sysctl support will send
all IP packets with zero TTL (unless a TTL is explicitly set, e.g.
with setsockopt).

Given a similar issue might appear on the other knobs that were
namespaceify, this commit also moves them.

Fixes: fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-23 14:32:06 -07:00