450 Commits

Author SHA1 Message Date
Jaegeuk Kim
8b312f9552 Merge remote-tracking branch 'origin/upstream-f2fs-stable-linux-4.14.y/v5.5-rc1' into android-4.14
* origin/upstream-f2fs-stable-linux-4.14.y:
  fs/lock: skip lock owner pid translation in case we are in init_pid_ns
  f2fs: stop GC when the victim becomes fully valid
  f2fs: expose main_blkaddr in sysfs
  f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
  f2fs: Fix deadlock in f2fs_gc() context during atomic files handling
  f2fs: show f2fs instance in printk_ratelimited
  f2fs: fix potential overflow
  f2fs: fix to update dir's i_pino during cross_rename
  f2fs: support aligned pinned file
  f2fs: avoid kernel panic on corruption test
  f2fs: fix wrong description in document
  f2fs: cache global IPU bio
  f2fs: fix to avoid memory leakage in f2fs_listxattr
  f2fs: check total_segments from devices in raw_super
  f2fs: update multi-dev metadata in resize_fs
  f2fs: mark recovery flag correctly in read_raw_super_block()
  f2fs: fix to update time in lazytime mode
  vfs: don't allow writes to swap files
  mm: set S_SWAPFILE on blockdev swap devices

Bug: 146023540
Change-Id: Ia24ce5f48f245dd7ba4fd94aa00a7d84615a8b22
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2019-12-16 17:39:13 -08:00
Darrick J. Wong
f1a7f58cf4 vfs: don't allow writes to swap files
Don't let userspace write to an active swap file because the kernel
effectively has a long term lease on the storage and things could get
seriously corrupted if we let this happen.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-12-02 15:55:55 -08:00
Catalin Marinas
424beb9af4 UPSTREAM: mm: untag user pointers in mmap/munmap/mremap/brk
(Upstream commit ce18d171cb7368557e6498a3ce111d7d3dc03e4d).

There isn't a good reason to differentiate between the user address space
layout modification syscalls and the other memory permission/attributes
ones (e.g.  mprotect, madvise) w.r.t.  the tagged address ABI.  Untag the
user addresses on entry to these functions.

Link: http://lkml.kernel.org/r/20190821164730.47450-2-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Dave P Martin <Dave.Martin@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
Change-Id: If6a7be6d5f39f411bd65c49100088d0ceae530ee
2019-10-07 14:56:04 -04:00
Greg Kroah-Hartman
c680586c4f This is the 4.14.114 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzEBk4ACgkQONu9yGCS
 aT7oPg/+LqGEp+af4Q2623Y5tzG+pV580Xzzeyu+ZulmfTiG8yylSCxtVKvzjlmf
 omeCYxZXCNDtOn1aWFWvM+cZlNC90gOem2Xm2P7KEx25QZflFFI+Uzt+7sKrLr1l
 v/6YOf2cjvfOAlYF6euI98Ja6+m+OWXhWDUQUEUbl0X8Of2pXW9opWsf13LKT/BT
 p9WpVjDN+pow1kGl1Sk4zu11LBZsN0PI5ZW64PTSG2AuSIMQ9pHZzxrGD7/vhQMC
 50s2WsJxlIvuE3tmWDnpqfR0WjzaUk59hHrrBM9YLDlqjzFZNgD2ziRn0A0sfW1n
 us81cw6Wz+LcykK3D2qvIvhZkRkDVI7J6LQSzeNaBWl3AkEEjwYw3cSwD5jl5+xn
 cbTgaBjKursuBZU5rdXPcabAhFIlL6NIt43n6DYRl/MYSpFvzifLKnCso2fPNNgT
 lXZuwH1qDBepVVQ0YrTnOBf+7u822lPuGyIq1Nz4YUBhKAAlBTV/Hxv3gJCXTihO
 6NW42qk44VLjmu/Gpo5Q4Nc6EWeujwZRXNEZo8m5YfV92VteJTs3520iPRB0qFga
 aPOyiMNIKyhzZ3CPxxkDXgeRDh7AFznwcljlDE6DiCVmbPaUucJkvad/TwyFf4ul
 Wp1zZ2aCrt/oO5GK/MQfGNh4rmN/0qB9cxYoBDWbOJSG4R1+PTI=
 =dQgB
 -----END PGP SIGNATURE-----

Merge 4.14.114 into android-4.14

Changes in 4.14.114
	bonding: fix event handling for stacked bonds
	net: atm: Fix potential Spectre v1 vulnerabilities
	net: bridge: fix per-port af_packet sockets
	net: bridge: multicast: use rcu to access port list from br_multicast_start_querier
	net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv
	tcp: tcp_grow_window() needs to respect tcp_space()
	team: set slave to promisc if team is already in promisc mode
	vhost: reject zero size iova range
	ipv4: recompile ip options in ipv4_link_failure
	ipv4: ensure rcu_read_lock() in ipv4_link_failure()
	net: thunderx: raise XDP MTU to 1508
	net: thunderx: don't allow jumbo frames with XDP
	CIFS: keep FileInfo handle live during oplock break
	KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
	KVM: x86: svm: make sure NMI is injected after nmi_singlestep
	Staging: iio: meter: fixed typo
	staging: iio: ad7192: Fix ad7193 channel address
	iio: gyro: mpu3050: fix chip ID reading
	iio/gyro/bmg160: Use millidegrees for temperature scale
	iio: cros_ec: Fix the maths for gyro scale calculation
	iio: ad_sigma_delta: select channel when reading register
	iio: dac: mcp4725: add missing powerdown bits in store eeprom
	iio: Fix scan mask selection
	iio: adc: at91: disable adc channel interrupt in timeout case
	iio: core: fix a possible circular locking dependency
	io: accel: kxcjk1013: restore the range after resume.
	staging: comedi: vmk80xx: Fix use of uninitialized semaphore
	staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf
	staging: comedi: ni_usb6501: Fix use of uninitialized mutex
	staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf
	ALSA: hda/realtek - add two more pin configuration sets to quirk table
	ALSA: core: Fix card races between register and disconnect
	scsi: core: set result when the command cannot be dispatched
	Revert "scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO"
	Revert "svm: Fix AVIC incomplete IPI emulation"
	coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
	crypto: x86/poly1305 - fix overflow during partial reduction
	arm64: futex: Restore oldval initialization to work around buggy compilers
	x86/kprobes: Verify stack frame on kretprobe
	kprobes: Mark ftrace mcount handler functions nokprobe
	kprobes: Fix error check when reusing optimized probes
	rt2x00: do not increment sequence number while re-transmitting
	mac80211: do not call driver wake_tx_queue op during reconfig
	perf/x86/amd: Add event map for AMD Family 17h
	x86/cpu/bugs: Use __initconst for 'const' init data
	perf/x86: Fix incorrect PEBS_REGS
	x86/speculation: Prevent deadlock on ssb_state::lock
	crypto: crypto4xx - properly set IV after de- and encrypt
	mmc: sdhci: Fix data command CRC error handling
	mmc: sdhci: Rename SDHCI_ACMD12_ERR and SDHCI_INT_ACMD12ERR
	mmc: sdhci: Handle auto-command errors
	modpost: file2alias: go back to simple devtable lookup
	modpost: file2alias: check prototype of handler
	tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is incomplete
	ipv6: frags: fix a lockdep false positive
	net: IP defrag: encapsulate rbtree defrag code into callable functions
	ipv6: remove dependency of nf_defrag_ipv6 on ipv6 module
	net: IP6 defrag: use rbtrees for IPv6 defrag
	net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c
	Revert "kbuild: use -Oz instead of -Os when using clang"
	sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
	device_cgroup: fix RCU imbalance in error case
	mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
	ALSA: info: Fix racy addition/deletion of nodes
	percpu: stop printing kernel addresses
	tools include: Adopt linux/bits.h
	iomap: report collisions between directio and buffered writes to userspace
	xfs: add the ability to join a held buffer to a defer_ops
	xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of an attribute
	i2c-hid: properly terminate i2c_hid_dmi_desc_override_table[] array
	Revert "locking/lockdep: Add debug_locks check in __lock_downgrade()"
	kernel/sysctl.c: fix out-of-bounds access when setting file-max
	Linux 4.14.114

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-30 12:56:41 +02:00
Andrea Arcangeli
bb461ad8e6 coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream.

The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it.  Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier.  For example in Hugh's post from Jul 2017:

  https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

  "Not strictly relevant here, but a related note: I was very surprised
   to discover, only quite recently, how handle_mm_fault() may be called
   without down_read(mmap_sem) - when core dumping. That seems a
   misguided optimization to me, which would also be nice to correct"

In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.

Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.

Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.

For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs.  Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.

Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.

In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.

Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm().  The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.

Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27 09:35:37 +02:00
Greg Kroah-Hartman
9ba09a2171 This is the 4.14.105 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlx+qpsACgkQONu9yGCS
 aT6+Aw//WiHXgsyk+jnwuOzJ0rvx38ky6l7RRm6uZ5r2M+aYQJ12FEc+AVomyu6T
 lA7ciLeIWNQFO0xQJ5Tg8LMPumL4JbA/EVElPu5h5yHMXySkQuC6WLXnH083d+dZ
 QbXEk6O9xuIxbk2sy58BmQnmqdXipTnpNIXpT/137Rlz1/jzxyv8PpsihitFI9/C
 3HrMHgSodKEOnZbWdruYv30Ac6nMrWolGmGaQiRszE7FfoFLyGl3KEX7kp6sMAKU
 P4L+9qUDswRVs4Zfa07XiNlXAmZUYDzZPkQVstsgKOmXNHtBrSLE94pYllsHqW8C
 GDbtW5+ZpUm89LObhlZWPcxLXQkOUJUOb2SJChtaPUmOT0rEoExTB0bH1SADQM5d
 Mb3jLGD9CVrCjgxB4J6871xeCZPIL1XIXEKs0lDz7YDOSF7WTYXTrQOAiA+PZLlR
 VcD7WO7Z25XYP15WcU4ypzswsLPGro3QQi/nzk9qyh/NQt65LlpW/USZ3hiyp/Sh
 u4Wh2rrvAWyrOi01ShRmn87S9X/IuzB9cuzthxM7rWrF+/uj+vHOQp+/BAEFkkub
 njis6oFbkB2xfykRA99w1tLe6to3qNcWFtJuOBxjVFdbB24pqIWYmj4dwyE0TQCD
 7QLfDCXx3vBDAg30Qbxx4Bh2w0ruggtHWa+D24Qz4YehJv1ppNY=
 =Ofi4
 -----END PGP SIGNATURE-----

Merge 4.14.105 into android-4.14

Changes in 4.14.105
	Revert "loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()"
	Revert "loop: Get rid of loop_index_mutex"
	Revert "loop: Fold __loop_release into loop_release"
	net: stmmac: Fix reception of Broadcom switches tags
	net: stmmac: Disable ACS Feature for GMAC >= 4
	scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached
	drm/msm: Unblock writer if reader closes file
	ASoC: Intel: Haswell/Broadwell: fix setting for .dynamic field
	ALSA: compress: prevent potential divide by zero bugs
	ASoC: Variable "val" in function rt274_i2c_probe() could be uninitialized
	clk: vc5: Abort clock configuration without upstream clock
	thermal: int340x_thermal: Fix a NULL vs IS_ERR() check
	usb: dwc3: gadget: synchronize_irq dwc irq in suspend
	usb: dwc3: gadget: Fix the uninitialized link_state when udc starts
	usb: gadget: Potential NULL dereference on allocation error
	genirq: Make sure the initial affinity is not empty
	ASoC: dapm: change snprintf to scnprintf for possible overflow
	ASoC: imx-audmux: change snprintf to scnprintf for possible overflow
	selftests: seccomp: use LDLIBS instead of LDFLAGS
	selftests: gpio-mockup-chardev: Check asprintf() for error
	ARC: fix __ffs return value to avoid build warnings
	drivers: thermal: int340x_thermal: Fix sysfs race condition
	staging: rtl8723bs: Fix build error with Clang when inlining is disabled
	mac80211: fix miscounting of ttl-dropped frames
	sched/wait: Fix rcuwait_wake_up() ordering
	futex: Fix (possible) missed wakeup
	locking/rwsem: Fix (possible) missed wakeup
	drm/amd/powerplay: OD setting fix on Vega10
	serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling
	staging: android: ion: Support cpu access during dma_buf_detach
	direct-io: allow direct writes to empty inodes
	writeback: synchronize sync(2) against cgroup writeback membership switches
	scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state()
	net: altera_tse: fix connect_local_phy error path
	hv_netvsc: Fix ethtool change hash key error
	net: usb: asix: ax88772_bind return error when hw_reset fail
	net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP
	ibmveth: Do not process frames after calling napi_reschedule
	mac80211: don't initiate TDLS connection if station is not associated to AP
	mac80211: Add attribute aligned(2) to struct 'action'
	cfg80211: extend range deviation for DMG
	svm: Fix AVIC incomplete IPI emulation
	KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
	powerpc: Always initialize input array when calling epapr_hypercall()
	mmc: spi: Fix card detection during probe
	mmc: tmio_mmc_core: don't claim spurious interrupts
	mmc: tmio: fix access width of Block Count Register
	mmc: sdhci-esdhc-imx: correct the fix of ERR004536
	mm: enforce min addr even if capable() in expand_downwards()
	MIPS: fix truncation in __cmpxchg_small for short values
	MIPS: eBPF: Fix icache flush end address
	x86/uaccess: Don't leak the AC flag into __put_user() value evaluation
	Linux 4.14.105

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-03-05 18:04:21 +01:00
Jann Horn
f581706924 mm: enforce min addr even if capable() in expand_downwards()
commit 0a1d52994d440e21def1c2174932410b4f2a98a1 upstream.

security_mmap_addr() does a capability check with current_cred(), but
we can reach this code from contexts like a VFS write handler where
current_cred() must not be used.

This can be abused on systems without SMAP to make NULL pointer
dereferences exploitable again.

Fixes: 8869477a49c3 ("security: protect from stack expansion into low vm addresses")
Cc: stable@kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-05 17:58:02 +01:00
Greg Kroah-Hartman
818299f6bd This is the 4.14.56 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAltNuVYACgkQONu9yGCS
 aT7kTA/+MRHC5oFvdnhSsF6jAHsY9rgJNQXPtZCFhZnHhhYHtubQ2OJOmSZ7IfM0
 9yhz/7vijC9+tLufXQxQnu2UUL3ojNu1+l+q9s0U1GUzNiONlJ9q/CyB4xjXFRCS
 1RdiDZaQbIqUCYs38UCTsEJF65uKjzQ6dpF21XdIXp5FPxgiZawo4HpjQRJswbAl
 Du97ybMEPN3XnAn207GjZwy58ubRLF5HDG1sqNGfjVWJ7oMTi+QJOCvY3PJtU3j2
 unS0qjxLU432rOyDfaJK7Yj9s61zu0PurbJrHo+dw3O3hd/Og7soqoqohUEjZWXd
 z7jjrntXZOZ/0st2yHmygfAPUJm/8jsh7Pd39Jgyfeu/3Clo51gO494rwATQsyE5
 mwIdllyzyMNBEJI2F2fxE60WlFsbTjeBOX3BaOwnF8pGRJWsCAfbFknRbuKh1fO5
 czFbUSOi00POw4WHT1rxV9u0yDBXmP47fy9zHquOim+PfK8pFvWuf6GSFjvqRTv8
 20w1w7eixMi09ZXOkgTJ3S00MKHSpxoaenI3n2NcEVVRgDEVfh3C/zelvvfCDMHD
 i36DN39Sj41PNA/R4n0TIA4W+ab9qBVzQl16yaj9JURR2rA92GyMVC1+Xjqo1Py3
 GRFOf2Gprlm0/vfkiRsMu9coAJuKV6+8fHXQU4mzHulKUaDWuJ0=
 =/wBU
 -----END PGP SIGNATURE-----

Merge 4.14.56 into android-4.14

Changes in 4.14.56
	media: rc: mce_kbd decoder: fix stuck keys
	ASoC: mediatek: preallocate pages use platform device
	MIPS: Call dump_stack() from show_regs()
	MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()
	MIPS: Fix ioremap() RAM check
	mmc: sdhci-esdhc-imx: allow 1.8V modes without 100/200MHz pinctrl states
	mmc: dw_mmc: fix card threshold control configuration
	ibmasm: don't write out of bounds in read handler
	staging: rtl8723bs: Prevent an underflow in rtw_check_beacon_data().
	staging: r8822be: Fix RTL8822be can't find any wireless AP
	ata: Fix ZBC_OUT command block check
	ata: Fix ZBC_OUT all bit handling
	vmw_balloon: fix inflation with batching
	ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS
	USB: serial: ch341: fix type promotion bug in ch341_control_in()
	USB: serial: cp210x: add another USB ID for Qivicon ZigBee stick
	USB: serial: keyspan_pda: fix modem-status error handling
	USB: yurex: fix out-of-bounds uaccess in read handler
	USB: serial: mos7840: fix status-register error handling
	usb: quirks: add delay quirks for Corsair Strafe
	xhci: xhci-mem: off by one in xhci_stream_id_to_ring()
	devpts: hoist out check for DEVPTS_SUPER_MAGIC
	devpts: resolve devpts bind-mounts
	Fix up non-directory creation in SGID directories
	genirq/affinity: assign vectors to all possible CPUs
	scsi: megaraid_sas: use adapter_type for all gen controllers
	scsi: megaraid_sas: replace instance->ctrl_context checks with instance->adapter_type
	scsi: megaraid_sas: replace is_ventura with adapter_type checks
	scsi: megaraid_sas: Create separate functions to allocate ctrl memory
	scsi: megaraid_sas: fix selection of reply queue
	ALSA: hda/realtek - two more lenovo models need fixup of MIC_LOCATION
	ALSA: hda - Handle pm failure during hotplug
	mm: do not drop unused pages when userfaultd is running
	fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
	fs, elf: make sure to page align bss in load_elf_library
	mm: do not bug_on on incorrect length in __mm_populate()
	tracing: Reorder display of TGID to be after PID
	kbuild: delete INSTALL_FW_PATH from kbuild documentation
	arm64: neon: Fix function may_use_simd() return error status
	tools build: fix # escaping in .cmd files for future Make
	IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values
	i2c: tegra: Fix NACK error handling
	iw_cxgb4: correctly enforce the max reg_mr depth
	xen: setup pv irq ops vector earlier
	nvme-pci: Remap CMB SQ entries on every controller reset
	crypto: x86/salsa20 - remove x86 salsa20 implementations
	uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn()
	netfilter: nf_queue: augment nfqa_cfg_policy
	netfilter: x_tables: initialise match/target check parameter struct
	loop: add recursion validation to LOOP_CHANGE_FD
	PM / hibernate: Fix oops at snapshot_write()
	RDMA/ucm: Mark UCM interface as BROKEN
	loop: remember whether sysfs_create_group() was done
	f2fs: give message and set need_fsck given broken node id
	Linux 4.14.56

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-07-17 12:29:15 +02:00
Michal Hocko
81ebc9decd mm: do not bug_on on incorrect length in __mm_populate()
commit bb177a732c4369bb58a1fe1df8f552b6f0f7db5f upstream.

syzbot has noticed that a specially crafted library can easily hit
VM_BUG_ON in __mm_populate

  kernel BUG at mm/gup.c:1242!
  invalid opcode: 0000 [#1] SMP
  CPU: 2 PID: 9667 Comm: a.out Not tainted 4.18.0-rc3 #644
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
  RIP: 0010:__mm_populate+0x1e2/0x1f0
  Code: 55 d0 65 48 33 14 25 28 00 00 00 89 d8 75 21 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 75 18 f1 ff 0f 0b e8 6e 18 f1 ff <0f> 0b 31 db eb c9 e8 93 06 e0 ff 0f 1f 00 55 48 89 e5 53 48 89 fb
  Call Trace:
     vm_brk_flags+0xc3/0x100
     vm_brk+0x1f/0x30
     load_elf_library+0x281/0x2e0
     __ia32_sys_uselib+0x170/0x1e0
     do_fast_syscall_32+0xca/0x420
     entry_SYSENTER_compat+0x70/0x7f

The reason is that the length of the new brk is not page aligned when we
try to populate the it.  There is no reason to bug on that though.
do_brk_flags already aligns the length properly so the mapping is
expanded as it should.  All we need is to tell mm_populate about it.
Besides that there is absolutely no reason to to bug_on in the first
place.  The worst thing that could happen is that the last page wouldn't
get populated and that is far from putting system into an inconsistent
state.

Fix the issue by moving the length sanitization code from do_brk_flags
up to vm_brk_flags.  The only other caller of do_brk_flags is brk
syscall entry and it makes sure to provide the proper length so t here
is no need for sanitation and so we can use do_brk_flags without it.

Also remove the bogus BUG_ONs.

[osalvador@techadventures.net: fix up vm_brk_flags s@request@len@]
Link: http://lkml.kernel.org/r/20180706090217.GI32658@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: syzbot <syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-17 11:39:29 +02:00
Greg Kroah-Hartman
37f5b3d9c7 This is the 4.14.49 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlse4FIACgkQONu9yGCS
 aT5ZqxAAqCPguhXiTIWJPL0760M4I8C/cTLgl0JWpD946cHaQJpUhmuiXfc91+KO
 1mhgjW6PQjojWJmj5VYM2aCkvUoZ4sowtMaiGuhjLL1Hr+s879wI0wB+9uO2gy3O
 8si9gLV1qoa6QOAe5hbGiyOhkv0OYpRq290ar/NFVS+sKucwJEMFY/rLhjedNGlY
 BmT4C8xK5D/s/r6rRAGCyxQNar8RcFd7C+iEZXvQsybi4euJAd9DcpvileKORNxj
 bfBhHyFikbvN6Pfb6ooD9q88y8U3Tvk+sCZJI15acRePIJ4eJPtkjKkunsGms3jM
 cIA9WuC7NHSGMf8ZJAFAaxfNCepWpeb5lPSdkMuIjWhwQ22OhZR4eS+PQvhjojUl
 Z3Ry8fvxtMz4hseaeg+8mUJohLP1+v6GAIAGm9XRwElyTC0RKzKaUkK//zlrmmy3
 wR8vpgdPBZJlvqE61+aNWcnLpKxRp/aWMTklXNVBsnL2FVfl0+8iBz6ylDjfmAx5
 kRGdbWuCcHVfVq91fAwIwOIrHZg2Pi3/pIHQQXljqpSxG7vQrmPj2uiaYPTyAjZM
 LAVtE21TdtZX+4idHDZ8nNew204cr+ug1X373vugS8/NdLovpjb7vmxUusNy9ghG
 A5hif1fa8HEb3bCsUEBgZdkpD46uvk1IZc8js/e2zb5FmR3V8vM=
 =px2V
 -----END PGP SIGNATURE-----

Merge 4.14.49 into android-4.14

Changes in 4.14.49
	scsi: sd_zbc: Fix potential memory leak
	scsi: sd_zbc: Avoid that resetting a zone fails sporadically
	mmap: introduce sane default mmap limits
	mmap: relax file size limit for regular files
	btrfs: define SUPER_FLAG_METADUMP_V2
	kconfig: Avoid format overflow warning from GCC 8.1
	be2net: Fix error detection logic for BE3
	bnx2x: use the right constant
	dccp: don't free ccid2_hc_tx_sock struct in dccp_disconnect()
	enic: set DMA mask to 47 bit
	ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
	ip6_tunnel: remove magic mtu value 0xFFF8
	ipmr: properly check rhltable_init() return value
	ipv4: remove warning in ip_recv_error
	ipv6: omit traffic class when calculating flow hash
	isdn: eicon: fix a missing-check bug
	kcm: Fix use-after-free caused by clonned sockets
	netdev-FAQ: clarify DaveM's position for stable backports
	net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
	net: metrics: add proper netlink validation
	net/packet: refine check for priv area size
	net: phy: broadcom: Fix bcm_write_exp()
	net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
	packet: fix reserve calculation
	qed: Fix mask for physical address in ILT entry
	sctp: not allow transport timeout value less than HZ/5 for hb_timer
	team: use netdev_features_t instead of u32
	vhost: synchronize IOTLB message with dev cleanup
	vrf: check the original netdevice for generating redirect
	ipv6: sr: fix memory OOB access in seg6_do_srh_encap/inline
	net: phy: broadcom: Fix auxiliary control register reads
	net-sysfs: Fix memory leak in XPS configuration
	virtio-net: correctly transmit XDP buff after linearizing
	net/mlx4: Fix irq-unsafe spinlock usage
	tun: Fix NULL pointer dereference in XDP redirect
	virtio-net: correctly check num_buf during err path
	net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
	virtio-net: fix leaking page for gso packet during mergeable XDP
	rtnetlink: validate attributes in do_setlink()
	cls_flower: Fix incorrect idr release when failing to modify rule
	PCI: hv: Do not wait forever on a device that has disappeared
	drm: set FMODE_UNSIGNED_OFFSET for drm files
	Linux 4.14.49

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-06-12 18:33:52 +02:00
Linus Torvalds
af760b568e mmap: relax file size limit for regular files
commit 423913ad4ae5b3e8fb8983f70969fb522261ba26 upstream.

Commit be83bbf80682 ("mmap: introduce sane default mmap limits") was
introduced to catch problems in various ad-hoc character device drivers
doing mmap and getting the size limits wrong.  In the process, it used
"known good" limits for the normal cases of mapping regular files and
block device drivers.

It turns out that the "s_maxbytes" limit was less "known good" than I
thought.  In particular, /proc doesn't set it, but exposes one regular
file to mmap: /proc/vmcore.  As a result, that file got limited to the
default MAX_INT s_maxbytes value.

This went unnoticed for a while, because apparently the only thing that
needs it is the s390 kernel zfcpdump, but there might be other tools
that use this too.

Vasily suggested just changing s_maxbytes for all of /proc, which isn't
wrong, but makes me nervous at this stage.  So instead, just make the
new mmap limit always be MAX_LFS_FILESIZE for regular files, which won't
affect anything else.  It wasn't the regular file case I was worried
about.

I'd really prefer for maxsize to have been per-inode, but that is not
how things are today.

Fixes: be83bbf80682 ("mmap: introduce sane default mmap limits")
Reported-by: Vasily Gorbik <gor@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11 22:49:18 +02:00
Linus Torvalds
16d7ceb04b mmap: introduce sane default mmap limits
commit be83bbf806822b1b89e0a0f23cd87cddc409e429 upstream.

The internal VM "mmap()" interfaces are based on the mmap target doing
everything using page indexes rather than byte offsets, because
traditionally (ie 32-bit) we had the situation that the byte offset
didn't fit in a register.  So while the mmap virtual address was limited
by the word size of the architecture, the backing store was not.

So we're basically passing "pgoff" around as a page index, in order to
be able to describe backing store locations that are much bigger than
the word size (think files larger than 4GB etc).

But while this all makes a ton of sense conceptually, we've been dogged
by various drivers that don't really understand this, and internally
work with byte offsets, and then try to work with the page index by
turning it into a byte offset with "pgoff << PAGE_SHIFT".

Which obviously can overflow.

Adding the size of the mapping to it to get the byte offset of the end
of the backing store just exacerbates the problem, and if you then use
this overflow-prone value to check various limits of your device driver
mmap capability, you're just setting yourself up for problems.

The correct thing for drivers to do is to do their limit math in page
indices, the way the interface is designed.  Because the generic mmap
code _does_ test that the index doesn't overflow, since that's what the
mmap code really cares about.

HOWEVER.

Finding and fixing various random drivers is a sisyphean task, so let's
just see if we can just make the core mmap() code do the limiting for
us.  Realistically, the only "big" backing stores we need to care about
are regular files and block devices, both of which are known to do this
properly, and which have nice well-defined limits for how much data they
can access.

So let's special-case just those two known cases, and then limit other
random mmap users to a backing store that still fits in "unsigned long".
Realistically, that's not much of a limit at all on 64-bit, and on
32-bit architectures the only worry might be the GPU drivers, which can
have big physical address spaces.

To make it possible for drivers like that to say that they are 64-bit
clean, this patch does repurpose the "FMODE_UNSIGNED_OFFSET" bit in the
file flags to allow drivers to mark their file descriptors as safe in
the full 64-bit mmap address space.

[ The timing for doing this is less than optimal, and this should really
  go in a merge window. But realistically, this needs wide testing more
  than it needs anything else, and being main-line is the only way to do
  that.

  So the earlier the better, even if it's outside the proper development
  cycle        - Linus ]

Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11 22:49:18 +02:00
Greg Kroah-Hartman
04f740d4da This is the 4.14.41 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlr753gACgkQONu9yGCS
 aT7p/Q//TIC9EKe21E2Lb1Kh4lL5SDjmwe/rkA3PxiqxbkXfUDBehMCfDk4YVNVG
 TlH1TXOubzpS/8cZJPRFHEkrYXPKIA3+hKlAvJukUJCBQqmW1ILEAX5m7jrSmf+B
 tLe/r0ijOtlfB1xQdUs5RxXGIndw0gMGhpo/QTXPAC0hGh0Ykd8v2s4YAjxOvdKw
 z4DaUKtZGEPBWFVK/Bx1Fv3iAmJMt2yerERUqz8MVegYXJt+2RUGoJtsxHuvOk1p
 9q0lzHBWYihQVt1tJ0es/8cB7WsYt8txnVmeN907sryUhDjvTWIxQJb5jEV0gxxK
 AL89PHy4Hfki6l6r+tqYi92frFda8aLfsaSseOhlmqsv0MlwngW2dx3UbjaYd4If
 IQA6n0hWHuxUvjrjsPpsMAa4lvTW+/kFilb0mD6Vixy3ru+/RelKnuawJm6kbMNu
 Cb8QSVSJrhvC/UZLvwO7a3viJdKoI5B9pTh5FTKcY5wUPI1k01pg3WlWNxmnv4ZJ
 LPImR06aoJYhvbutf94AvxbCOt/au8sY4s/yk9oHgvGUEIccrGYf3BwX6ciWRt4b
 r4ZN92C9ZuD+u/ATFgi/akngtjjixw5YrZ20aX86dYcBZ25hYOiIMoc482tYQ12Z
 1vqyvKg9o1oMypG9orF09PWstbNRu3ihGATKdXL9lfAhDklOTKc=
 =zWTK
 -----END PGP SIGNATURE-----

Merge 4.14.41 into android-4.14

Changes in 4.14.41
	ipvs: fix rtnl_lock lockups caused by start_sync_thread
	netfilter: ebtables: don't attempt to allocate 0-sized compat array
	kcm: Call strp_stop before strp_done in kcm_attach
	crypto: af_alg - fix possible uninit-value in alg_bind()
	netlink: fix uninit-value in netlink_sendmsg
	net: fix rtnh_ok()
	net: initialize skb->peeked when cloning
	net: fix uninit-value in __hw_addr_add_ex()
	dccp: initialize ireq->ir_mark
	ipv4: fix uninit-value in ip_route_output_key_hash_rcu()
	soreuseport: initialise timewait reuseport field
	inetpeer: fix uninit-value in inet_getpeer
	memcg: fix per_node_info cleanup
	perf: Remove superfluous allocation error check
	tcp: fix TCP_REPAIR_QUEUE bound checking
	bdi: wake up concurrent wb_shutdown() callers.
	bdi: Fix oops in wb_workfn()
	KVM: PPC: Book3S HV: Fix trap number return from __kvmppc_vcore_entry
	KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN
	KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
	arm64: Add work around for Arm Cortex-A55 Erratum 1024718
	compat: fix 4-byte infoleak via uninitialized struct field
	gpioib: do not free unrequested descriptors
	gpio: fix aspeed_gpio unmask irq
	gpio: fix error path in lineevent_create
	rfkill: gpio: fix memory leak in probe error path
	libata: Apply NOLPM quirk for SanDisk SD7UB3Q*G1001 SSDs
	dm integrity: use kvfree for kvmalloc'd memory
	tracing: Fix regex_match_front() to not over compare the test string
	z3fold: fix reclaim lock-ups
	mm: sections are not offlined during memory hotremove
	mm, oom: fix concurrent munlock and oom reaper unmap, v3
	ceph: fix rsize/wsize capping in ceph_direct_read_write()
	can: kvaser_usb: Increase correct stats counter in kvaser_usb_rx_can_msg()
	can: hi311x: Acquire SPI lock on ->do_get_berr_counter
	can: hi311x: Work around TX complete interrupt erratum
	drm/vc4: Fix scaling of uni-planar formats
	drm/i915: Fix drm:intel_enable_lvds ERROR message in kernel log
	drm/nouveau: Fix deadlock in nv50_mstm_register_connector()
	drm/atomic: Clean old_state/new_state in drm_atomic_state_default_clear()
	drm/atomic: Clean private obj old_state/new_state in drm_atomic_state_default_clear()
	net: atm: Fix potential Spectre v1
	atm: zatm: Fix potential Spectre v1
	PCI / PM: Always check PME wakeup capability for runtime wakeup support
	PCI / PM: Check device_may_wakeup() in pci_enable_wake()
	cpufreq: schedutil: Avoid using invalid next_freq
	Revert "Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174"
	Bluetooth: btusb: Add Dell XPS 13 9360 to btusb_needs_reset_resume_table
	Bluetooth: btusb: Only check needs_reset_resume DMI table for QCA rome chipsets
	thermal: exynos: Reading temperature makes sense only when TMU is turned on
	thermal: exynos: Propagate error value from tmu_read()
	nvme: add quirk to force medium priority for SQ creation
	smb3: directory sync should not return an error
	sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
	tracing/uprobe_event: Fix strncpy corner case
	perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_*
	perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr
	perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver
	perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[]
	perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map()
	KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler
	KVM: x86: remove APIC Timer periodic/oneshot spikes
	Linux 4.14.41

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-05-16 11:40:03 +02:00
David Rientjes
2270dfcc4b mm, oom: fix concurrent munlock and oom reaper unmap, v3
commit 27ae357fa82be5ab73b2ef8d39dcb8ca2563483a upstream.

Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.

This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma.  Since munlock_vma_pages_range() depends
on clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.

This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires serialize_against_pte_lookup().  If the pmd
is zapped by the oom reaper during follow_page_mask() after the check
for pmd_none() is bypassed, this ends up deferencing a NULL ptl or a
kernel oops.

Fix this by manually freeing all possible memory from the mm before
doing the munlock and then setting MMF_OOM_SKIP.  The oom reaper can not
run on the mm anymore so the munlock is safe to do in exit_mmap().  It
also matches the logic that the oom reaper currently uses for
determining when to set MMF_OOM_SKIP itself, so there's no new risk of
excessive oom killing.

This issue fixes CVE-2018-1000200.

Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.corp.google.com
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-16 10:10:27 +02:00
Guenter Roeck
cb3422d0ff ANDROID: mm: Export do_munmap
The 0-day build bot reports the following build error, seen if SDCARD_FS
is built as module.

ERROR: "do_munmap" undefined!

Fixes: 84a1b7d3d312 ("Included sdcardfs source code for kernel 3.0")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
2018-01-29 19:40:12 -08:00
Greg Kroah-Hartman
7237d3f322 This is the 4.14.8 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlo6Km0ACgkQONu9yGCS
 aT7vKA//WO6PM1s+/eDNZu7IuBcvrvSuFI7FffvbDuWsIFk5fKwLen2M9RLT9oZF
 8XxnZG0fWKZCdvLTeC14C4oKedcOMSR6VZIg5CddMqibq5uWNLisfZUb0KKG4DE3
 Mljvw8AM8OJupmXtDHu/nG4/J1vP03jWi6liMAwLyzxTGOMjtq0a7Nbt3NiTArFz
 cI7Xs+zEnS2fgKX2SrJGB+6xYKQe4Q4JSjZ1lQAM3LW3G96n731HsU/oRKkU9FyQ
 ZBysxNITbPSwKP9SuvgAl5rYo7ZKIQ9UbUDyJmm82gvwxDupdaa4hkrBo/+imESN
 ngDO/EFirsPXTEj2H7SPw3+sRh9j41IDAcqDPZKFNuwEe0g0fwJ/g3/7EaZeXpR2
 kXT+sHrwkzXcDXeuEQkUtsPULCJuYfgVUXnSIGtuPv7sG1IOaeOyofXHYixGsONu
 HRZMNPQcdqT3aYQIlMIOWEJidpOtxGztwdz1fWWzUQwNPBBNH2tlgEXhyR0ofDaM
 VVRRw9V6IyS4/5HjaCs/nIp+WN+vyW0p+xijQIzzLTAFRITB5IMDWwggPT5z4JPk
 Yf41I8bjkTUA0oMWL/b737u2krM8wYmmwvdc+9+lrA+HM51B63ViuvWyXrEAd2Sv
 37X8oZa8dj4lIDZoTll+8wXcY80VHuz7+vvVYIRUwnSEg1bLKIU=
 =eZgu
 -----END PGP SIGNATURE-----

Merge 4.14.8 into android-4.14

Changes in 4.14.8
	mfd: fsl-imx25: Clean up irq settings during removal
	crypto: algif_aead - fix reference counting of null skcipher
	crypto: rsa - fix buffer overread when stripping leading zeroes
	crypto: hmac - require that the underlying hash algorithm is unkeyed
	crypto: salsa20 - fix blkcipher_walk API usage
	crypto: af_alg - fix NULL pointer dereference in
	cifs: fix NULL deref in SMB2_read
	string.h: workaround for increased stack usage
	autofs: fix careless error in recent commit
	kernel: make groups_sort calling a responsibility group_info allocators
	mm, oom_reaper: fix memory corruption
	tracing: Allocate mask_str buffer dynamically
	USB: uas and storage: Add US_FL_BROKEN_FUA for another JMicron JMS567 ID
	USB: core: prevent malicious bNumInterfaces overflow
	ovl: Pass ovl_get_nlink() parameters in right order
	ovl: update ctx->pos on impure dir iteration
	usbip: fix stub_rx: get_pipe() to validate endpoint number
	usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input
	usbip: prevent vhci_hcd driver from leaking a socket pointer address
	usbip: fix stub_send_ret_submit() vulnerability to null transfer_buffer
	mmc: core: apply NO_CMD23 quirk to some specific cards
	ceph: drop negative child dentries before try pruning inode's alias
	usb: xhci: fix TDS for MTK xHCI1.1
	xhci: Don't add a virt_dev to the devs array before it's fully allocated
	IB/core: Bound check alternate path port number
	IB/core: Don't enforce PKey security on SMI MADs
	nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests
	arm64: mm: Fix pte_mkclean, pte_mkdirty semantics
	arm64: Initialise high_memory global variable earlier
	arm64: fix CONFIG_DEBUG_WX address reporting
	scsi: core: Fix a scsi_show_rq() NULL pointer dereference
	scsi: libsas: fix length error in sas_smp_handler()
	sched/rt: Do not pull from current CPU if only one CPU to pull
	dm: fix various targets to dm_register_target after module __init resources created
	SUNRPC: Fix a race in the receive code path
	iw_cxgb4: only insert drain cqes if wq is flushed
	x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
	x86/boot/compressed/64: Print error if 5-level paging is not supported
	eeprom: at24: change nvmem stride to 1
	posix-timer: Properly check sigevent->sigev_notify
	dmaengine: dmatest: move callback wait queue to thread context
	Revert "exec: avoid RLIMIT_STACK races with prlimit()"
	ext4: support fast symlinks from ext3 file systems
	ext4: fix fdatasync(2) after fallocate(2) operation
	ext4: add missing error check in __ext4_new_inode()
	ext4: fix crash when a directory's i_size is too small
	IB/mlx4: Fix RSS's QPC attributes assignments
	HID: cp2112: fix broken gpio_direction_input callback
	sfc: don't warn on successful change of MAC
	fbdev: controlfb: Add missing modes to fix out of bounds access
	video: udlfb: Fix read EDID timeout
	video: fbdev: au1200fb: Release some resources if a memory allocation fails
	video: fbdev: au1200fb: Return an error code if a memory allocation fails
	rtc: pcf8563: fix output clock rate
	scsi: aacraid: use timespec64 instead of timeval
	drm/amdgpu: bypass lru touch for KIQ ring submission
	PM / s2idle: Clear the events_check_enabled flag
	ASoC: Intel: Skylake: Fix uuid_module memory leak in failure case
	dmaengine: ti-dma-crossbar: Correct am335x/am43xx mux value type
	mlxsw: spectrum: Fix error return code in mlxsw_sp_port_create()
	PCI/PME: Handle invalid data when reading Root Status
	powerpc/powernv/cpufreq: Fix the frequency read by /proc/cpuinfo
	PCI: Do not allocate more buses than available in parent
	iommu/mediatek: Fix driver name
	thunderbolt: tb: fix use after free in tb_activate_pcie_devices
	netfilter: ipvs: Fix inappropriate output of procfs
	powerpc/opal: Fix EBUSY bug in acquiring tokens
	powerpc/ipic: Fix status get and status clear
	powerpc/pseries/vio: Dispose of virq mapping on vdevice unregister
	platform/x86: intel_punit_ipc: Fix resource ioremap warning
	target/iscsi: Detect conn_cmd_list corruption early
	target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd()
	iscsi-target: fix memory leak in lio_target_tiqn_addtpg()
	target:fix condition return in core_pr_dump_initiator_port()
	target/file: Do not return error for UNMAP if length is zero
	badblocks: fix wrong return value in badblocks_set if badblocks are disabled
	iommu/amd: Limit the IOVA page range to the specified addresses
	xfs: truncate pagecache before writeback in xfs_setattr_size()
	arm-ccn: perf: Prevent module unload while PMU is in use
	crypto: tcrypt - fix buffer lengths in test_aead_speed()
	mm: Handle 0 flags in _calc_vm_trans() macro
	net: hns3: fix for getting advertised_caps in hns3_get_link_ksettings
	net: hns3: Fix a misuse to devm_free_irq
	staging: rtl8188eu: Revert part of "staging: rtl8188eu: fix comments with lines over 80 characters"
	clk: mediatek: add the option for determining PLL source clock
	clk: imx: imx7d: Fix parent clock for OCRAM_CLK
	clk: imx6: refine hdmi_isfr's parent to make HDMI work on i.MX6 SoCs w/o VPU
	media: camss-vfe: always initialize reg at vfe_set_xbar_cfg()
	clk: hi6220: mark clock cs_atb_syspll as critical
	blk-mq-sched: dispatch from scheduler IFF progress is made in ->dispatch
	clk: tegra: Use readl_relaxed_poll_timeout_atomic() in tegra210_clock_init()
	clk: tegra: Fix cclk_lp divisor register
	ppp: Destroy the mutex when cleanup
	ASoC: rsnd: rsnd_ssi_run_mods() needs to care ssi_parent_mod
	thermal/drivers/step_wise: Fix temperature regulation misbehavior
	misc: pci_endpoint_test: Fix failure path return values in probe
	misc: pci_endpoint_test: Avoid triggering a BUG()
	scsi: scsi_debug: write_same: fix error report
	GFS2: Take inode off order_write list when setting jdata flag
	media: usbtv: fix brightness and contrast controls
	rpmsg: glink: Initialize the "intent_req_comp" completion variable
	bcache: explicitly destroy mutex while exiting
	bcache: fix wrong cache_misses statistics
	Ib/hfi1: Return actual operational VLs in port info query
	Bluetooth: hci_ldisc: Fix another race when closing the tty.
	arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
	btrfs: fix false EIO for missing device
	btrfs: Explicitly handle btrfs_update_root failure
	btrfs: undo writable superblocke when sprouting fails
	btrfs: avoid null pointer dereference on fs_info when calling btrfs_crit
	btrfs: tests: Fix a memory leak in error handling path in 'run_test()'
	qtnfmac: modify full Tx queue error reporting
	mtd: spi-nor: stm32-quadspi: Fix uninitialized error return code
	ARM64: dts: meson-gxbb-odroidc2: fix usb1 power supply
	Bluetooth: btusb: Add new NFA344A entry.
	samples/bpf: adjust rlimit RLIMIT_MEMLOCK for xdp1
	liquidio: fix kernel panic in VF driver
	platform/x86: hp_accel: Add quirk for HP ProBook 440 G4
	nvme: use kref_get_unless_zero in nvme_find_get_ns
	l2tp: cleanup l2tp_tunnel_delete calls
	xfs: fix log block underflow during recovery cycle verification
	xfs: return a distinct error code value for IGET_INCORE cache misses
	xfs: fix incorrect extent state in xfs_bmap_add_extent_unwritten_real
	net: dsa: lan9303: Do not disable switch fabric port 0 at .probe
	net: hns3: fix a bug in hclge_uninit_client_instance
	net: hns3: add nic_client check when initialize roce base information
	net: hns3: fix the bug of hns3_set_txbd_baseinfo
	RDMA/cxgb4: Declare stag as __be32
	PCI: Detach driver before procfs & sysfs teardown on device remove
	scsi: hisi_sas: fix the risk of freeing slot twice
	scsi: hpsa: cleanup sas_phy structures in sysfs when unloading
	scsi: hpsa: destroy sas transport properties before scsi_host
	mfd: mxs-lradc: Fix error handling in mxs_lradc_probe()
	net: hns3: fix the TX/RX ring.queue_index in hns3_ring_get_cfg
	net: hns3: fix the bug when map buffer fail
	net: hns3: fix a bug when alloc new buffer
	serdev: ttyport: enforce tty-driver open() requirement
	powerpc/perf/hv-24x7: Fix incorrect comparison in memord
	powerpc/xmon: Check before calling xive functions
	soc: mediatek: pwrap: fix compiler errors
	ipv4: ipv4_default_advmss() should use route mtu
	KVM: nVMX: Fix EPT switching advertising
	tty fix oops when rmmod 8250
	dev/dax: fix uninitialized variable build warning
	pinctrl: adi2: Fix Kconfig build problem
	raid5: Set R5_Expanded on parity devices as well as data.
	scsi: scsi_devinfo: Add REPORTLUN2 to EMC SYMMETRIX blacklist entry
	IB/core: Fix use workqueue without WQ_MEM_RECLAIM
	IB/core: Fix calculation of maximum RoCE MTU
	vt6655: Fix a possible sleep-in-atomic bug in vt6655_suspend
	IB/hfi1: Mask out A bit from psn trace
	rtl8188eu: Fix a possible sleep-in-atomic bug in rtw_createbss_cmd
	rtl8188eu: Fix a possible sleep-in-atomic bug in rtw_disassoc_cmd
	ipmi_si: fix memory leak on new_smi
	nullb: fix error return code in null_init()
	scsi: sd: change manage_start_stop to bool in sysfs interface
	scsi: sd: change allow_restart to bool in sysfs interface
	scsi: bfa: integer overflow in debugfs
	raid5-ppl: check recovery_offset when performing ppl recovery
	md-cluster: fix wrong condition check in raid1_write_request
	xprtrdma: Don't defer fencing an async RPC's chunks
	udf: Avoid overflow when session starts at large offset
	macvlan: Only deliver one copy of the frame to the macvlan interface
	IB/core: Fix endianness annotation in rdma_is_multicast_addr()
	RDMA/cma: Avoid triggering undefined behavior
	IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop
	icmp: don't fail on fragment reassembly time exceeded
	lightnvm: pblk: prevent gc kicks when gc is not operational
	lightnvm: pblk: fix changing GC group list for a line
	lightnvm: pblk: use right flag for GC allocation
	lightnvm: pblk: initialize debug stat counter
	lightnvm: pblk: fix min size for page mempool
	lightnvm: pblk: protect line bitmap while submitting meta io
	ath9k: fix tx99 potential info leak
	ath10k: fix core PCI suspend when WoWLAN is supported but disabled
	ath10k: fix build errors with !CONFIG_PM
	usb: musb: da8xx: fix babble condition handling
	Linux 4.14.8

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-04 13:48:52 +01:00
Greg Kroah-Hartman
c5c36272cd This is the 4.14.4 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlomdF8ACgkQONu9yGCS
 aT668xAAtbsoX6y/g11RT5/DsBoeIYgvNTzfcU3dGJC2/1rEx+pBOLFkGVNcK7nR
 wXD/DUFHWQRSsynke+gP8mjmWsRxwmoo0bv04eZ3Xdf8GGAVKIJQjUXV5jXOCPtw
 fMWshZkQlM11aus/bxEW0H7vqBK4DBLoYJ7H21i5SKkWubyUmDV6rX0So1w6sKYo
 RSvVG1MTkLjRSrSStgBKTBMoOdj6PfCKcQRmaqjPNZRP2+uqD+8NuUlbMZijxuYw
 U3lhXv8czRt0NSyA3pc9ucFR6DwAvc6VgVRvLec1+XzKHlvmCgBo9Tmsq5DcfT1B
 9owFlS53yzyEMk8o7FYznX5rDd32MBIejjAgpCKyXxurkv58NiwSs6VJIzHcNHJK
 2xc1nmZH8wIrUaYo7ecq6e7hN+TMvPK9wWyhsKauiofaJUY4c7pI2Qb37ddNPxpE
 11j3Vb0OlqxK3rAc+ElDmTe6GZ3rd2hLZU6nyPIqIWOrwgXf2zlB5X9ytZzR4gMi
 rZrzDyKNO3lRNhteb5qzGzT6bH5wMvDZUp6DhviSBd4FVSXfTT53AEDoYgk9OLE2
 rhaMVTu4zgRQi7AvM1PRyiVisQHwnXQUU6pGiXDWltFJMz9uPvHmMT8iZlCODePG
 3x/Hj4ZAXHARNKkDQCwvPz3zWffwugRdXzMiPN1oyDzxgzQuC/c=
 =bxe4
 -----END PGP SIGNATURE-----

Merge 4.14.4 into android-4.14

Changes in 4.14.4
	platform/x86: hp-wmi: Fix tablet mode detection for convertibles
	mm, memory_hotplug: do not back off draining pcp free pages from kworker context
	mm, oom_reaper: gather each vma to prevent leaking TLB entry
	mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
	mm/cma: fix alloc_contig_range ret code/potential leak
	mm: fix device-dax pud write-faults triggered by get_user_pages()
	mm, hugetlbfs: introduce ->split() to vm_operations_struct
	device-dax: implement ->split() to catch invalid munmap attempts
	mm: introduce get_user_pages_longterm
	mm: fail get_vaddr_frames() for filesystem-dax mappings
	v4l2: disable filesystem-dax mapping support
	IB/core: disable memory registration of filesystem-dax vmas
	exec: avoid RLIMIT_STACK races with prlimit()
	mm/madvise.c: fix madvise() infinite loop under special circumstances
	mm: migrate: fix an incorrect call of prep_transhuge_page()
	mm, memcg: fix mem_cgroup_swapout() for THPs
	fs/fat/inode.c: fix sb_rdonly() change
	autofs: revert "autofs: take more care to not update last_used on path walk"
	autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
	mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
	btrfs: clear space cache inode generation always
	nfsd: Fix stateid races between OPEN and CLOSE
	nfsd: Fix another OPEN stateid race
	nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
	crypto: algif_aead - skip SGL entries with NULL page
	crypto: af_alg - remove locking in async callback
	crypto: skcipher - Fix skcipher_walk_aead_common
	lockd: lost rollback of set_grace_period() in lockd_down_net()
	s390: revert ELF_ET_DYN_BASE base changes
	drm: omapdrm: Fix DPI on platforms using the DSI VDDS
	omapdrm: hdmi4: Correct the SoC revision matching
	apparmor: fix oops in audit_signal_cb hook
	arm64: module-plts: factor out PLT generation code for ftrace
	arm64: ftrace: emit ftrace-mod.o contents through code
	powerpc/powernv: Fix kexec crashes caused by tlbie tracing
	powerpc/kexec: Fix kexec/kdump in P9 guest kernels
	KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk
	KVM: x86: Exit to user-mode on #UD intercept when emulator requires
	KVM: x86: inject exceptions produced by x86_decode_insn
	KVM: lapic: Split out x2apic ldr calculation
	KVM: lapic: Fixup LDR on load in x2apic
	mmc: sdhci: Avoid swiotlb buffer being full
	mmc: block: Fix missing blk_put_request()
	mmc: block: Check return value of blk_get_request()
	mmc: core: Do not leave the block driver in a suspended state
	mmc: block: Ensure that debugfs files are removed
	mmc: core: prepend 0x to pre_eol_info entry in sysfs
	mmc: core: prepend 0x to OCR entry in sysfs
	ACPI / EC: Fix regression related to PM ops support in ECDT device
	eeprom: at24: fix reading from 24MAC402/24MAC602
	eeprom: at24: correctly set the size for at24mac402
	eeprom: at24: check at24_read/write arguments
	i2c: i801: Fix Failed to allocate irq -2147483648 error
	cxl: Check if vphb exists before iterating over AFU devices
	bcache: Fix building error on MIPS
	bcache: only permit to recovery read error when cache device is clean
	bcache: recover data from backing when data is clean
	hwmon: (jc42) optionally try to disable the SMBUS timeout
	nvme-pci: add quirk for delay before CHK RDY for WDC SN200
	Revert "drm/radeon: dont switch vt on suspend"
	drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs()
	drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories()
	drm/amdgpu: correct reference clock value on vega10
	drm/amdgpu: fix error handling in amdgpu_bo_do_create
	drm/amdgpu: Properly allocate VM invalidate eng v2
	drm/amdgpu: Remove check which is not valid for certain VBIOS
	drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
	dma-buf: make reservation_object_copy_fences rcu save
	drm/amdgpu: reserve root PD while releasing it
	drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list
	drm/vblank: Fix flip event vblank count
	drm/vblank: Tune drm_crtc_accurate_vblank_count() WARN down to a debug
	drm/tilcdc: Precalculate total frametime in tilcdc_crtc_set_mode()
	drm/radeon: fix atombios on big endian
	drm/panel: simple: Add missing panel_simple_unprepare() calls
	drm/hisilicon: Ensure LDI regs are properly configured.
	drm/ttm: once more fix ttm_buffer_object_transfer
	drm/amd/pp: fix typecast error in powerplay.
	drm/fb_helper: Disable all crtc's when initial setup fails.
	drm/fsl-dcu: Don't set connector DPMS property
	drm/edid: Don't send non-zero YQ in AVI infoframe for HDMI 1.x sinks
	drm/amdgpu: move UVD/VCE and VCN structure out from union
	drm/amdgpu: Set adev->vcn.irq.num_types for VCN
	include/linux/compiler-clang.h: handle randomizable anonymous structs
	IB/core: Do not warn on lid conversions for OPA
	IB/hfi1: Do not warn on lid conversions for OPA
	e1000e: fix the use of magic numbers for buffer overrun issue
	md: forbid a RAID5 from having both a bitmap and a journal.
	drm/i915: Fix false-positive assert_rpm_wakelock_held in i915_pmic_bus_access_notifier v2
	drm/i915: Re-register PMIC bus access notifier on runtime resume
	drm/i915/fbdev: Serialise early hotplug events with async fbdev config
	drm/i915/gvt: Correct ADDR_4K/2M/1G_MASK definition
	drm/i915: Don't try indexed reads to alternate slave addresses
	drm/i915: Prevent zero length "index" write
	Revert "x86/entry/64: Add missing irqflags tracing to native_load_gs_index()"
	Linux 4.14.4

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-04 13:44:57 +01:00
Michal Hocko
55fe4698d8 mm, oom_reaper: fix memory corruption
commit 4837fe37adff1d159904f0c013471b1ecbcb455e upstream.

David Rientjes has reported the following memory corruption while the
oom reaper tries to unmap the victims address space

  BUG: Bad page map in process oom_reaper  pte:6353826300000000 pmd:00000000
  addr:00007f50cab1d000 vm_flags:08100073 anon_vma:ffff9eea335603f0 mapping:          (null) index:7f50cab1d
  file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
  CPU: 2 PID: 1001 Comm: oom_reaper
  Call Trace:
     unmap_page_range+0x1068/0x1130
     __oom_reap_task_mm+0xd5/0x16b
     oom_reaper+0xff/0x14c
     kthread+0xc1/0xe0

Tetsuo Handa has noticed that the synchronization inside exit_mmap is
insufficient.  We only synchronize with the oom reaper if
tsk_is_oom_victim which is not true if the final __mmput is called from
a different context than the oom victim exit path.  This can trivially
happen from context of any task which has grabbed mm reference (e.g.  to
read /proc/<pid>/ file which requires mm etc.).

The race would look like this

  oom_reaper		oom_victim		task
						mmget_not_zero
			do_exit
			  mmput
  __oom_reap_task_mm				mmput
  						  __mmput
						    exit_mmap
						      remove_vma
    unmap_page_range

Fix this issue by providing a new mm_is_oom_victim() helper which
operates on the mm struct rather than a task.  Any context which
operates on a remote mm struct should use this helper in place of
tsk_is_oom_victim.  The flag is set in mark_oom_victim and never cleared
so it is stable in the exit_mmap path.

Debugged by Tetsuo Handa.

Link: http://lkml.kernel.org/r/20171210095130.17110-1-mhocko@kernel.org
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: David Rientjes <rientjes@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Andrea Argangeli <andrea@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:10:18 +01:00
Colin Cross
8392add7be ANDROID: mm: add a field to store names for private anonymous memory
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory.  When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.

This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma.  vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.

Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.

The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas.  If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".

The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen.  This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging.  The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.

Includes fix from Jed Davis <jld@mozilla.com> for typo in
prctl_set_vma_anon_name, which could attempt to set the name
across two vmas at the same time due to a typo, which might
corrupt the vma list.  Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.

Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>

[AmitP: Fix get_user_pages_remote() call to align with upstream commit
        5b56d49fc31d ("mm: add locked parameter to get_user_pages_remote()")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2017-12-18 21:11:22 +05:30
Dan Williams
c6c78a1d45 mm, hugetlbfs: introduce ->split() to vm_operations_struct
commit 31383c6865a578834dd953d9dbc88e6b19fe3997 upstream.

Patch series "device-dax: fix unaligned munmap handling"

When device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and fail attempts to split vmas into unaligned ranges.  It
would be messy to teach the munmap path about device-dax alignment
constraints in the same (hstate) way that hugetlbfs communicates this
constraint.  Instead, these patches introduce a new ->split() vm
operation.

This patch (of 2):

The device-dax interface has similar constraints as hugetlbfs in that it
requires the munmap path to unmap in huge page aligned units.  Rather
than add more custom vma handling code in __split_vma() introduce a new
vm operation to perform this vma specific check.

Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05 11:26:28 +01:00
Davidlohr Bueso
f808c13fd3 lib/interval_tree: fast overlap detection
Allow interval trees to quickly check for overlaps to avoid unnecesary
tree lookups in interval_tree_iter_first().

As of this patch, all interval tree flavors will require using a
'rb_root_cached' such that we can have the leftmost node easily
available.  While most users will make use of this feature, those with
special functions (in addition to the generic insert, delete, search
calls) will avoid using the cached option as they can do funky things
with insertions -- for example, vma_interval_tree_insert_after().

[jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()]
  Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:49 -07:00
Andrea Arcangeli
2129258024 mm: oom: let oom_reap_task and exit_mmap run concurrently
This is purely required because exit_aio() may block and exit_mmap() may
never start, if the oom_reap_task cannot start running on a mm with
mm_users == 0.

At the same time if the OOM reaper doesn't wait at all for the memory of
the current OOM candidate to be freed by exit_mmap->unmap_vmas, it would
generate a spurious OOM kill.

If it wasn't because of the exit_aio or similar blocking functions in
the last mmput, it would be enough to change the oom_reap_task() in the
case it finds mm_users == 0, to wait for a timeout or to wait for
__mmput to set MMF_OOM_SKIP itself, but it's not just exit_mmap the
problem here so the concurrency of exit_mmap and oom_reap_task is
apparently warranted.

It's a non standard runtime, exit_mmap() runs without mmap_sem, and
oom_reap_task runs with the mmap_sem for reading as usual (kind of
MADV_DONTNEED).

The race between the two is solved with a combination of
tsk_is_oom_victim() (serialized by task_lock) and MMF_OOM_SKIP
(serialized by a dummy down_write/up_write cycle on the same lines of
the ksm_exit method).

If the oom_reap_task() may be running concurrently during exit_mmap,
exit_mmap will wait it to finish in down_write (before taking down mm
structures that would make the oom_reap_task fail with use after free).

If exit_mmap comes first, oom_reap_task() will skip the mm if
MMF_OOM_SKIP is already set and in turn all memory is already freed and
furthermore the mm data structures may already have been taken down by
free_pgtables.

[aarcange@redhat.com: incremental one liner]
  Link: http://lkml.kernel.org/r/20170726164319.GC29716@redhat.com
[rientjes@google.com: remove unused mmput_async]
  Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708141733130.50317@chino.kir.corp.google.com
[aarcange@redhat.com: microoptimization]
  Link: http://lkml.kernel.org/r/20170817171240.GB5066@redhat.com
Link: http://lkml.kernel.org/r/20170726162912.GA29716@redhat.com
Fixes: 26db62f179d1 ("oom: keep mm of the killed task available")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: David Rientjes <rientjes@google.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:30 -07:00
Andrea Arcangeli
2376dd7ced userfaultfd: call userfaultfd_unmap_prep only if __split_vma succeeds
A __split_vma is not a worthy event to report, and it's definitely not a
unmap so it would be incorrect to report unmap for the whole region to
the userfaultfd manager if a __split_vma fails.

So only call userfaultfd_unmap_prep after the __vma_splitting is over
and do_munmap cannot fail anymore.

Also add unlikely because it's better to optimize for the vast majority
of apps that aren't using userfaultfd in a non cooperative way.  Ideally
we should also find a way to eliminate the branch entirely if
CONFIG_USERFAULTFD=n, but it would complicate things so stick to
unlikely for now.

Link: http://lkml.kernel.org/r/20170802165145.22628-5-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Alexey Perevalov <a.perevalov@samsung.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:29 -07:00
Michal Hocko
c41f012ade mm: rename global_page_state to global_zone_page_state
global_page_state is error prone as a recent bug report pointed out [1].
It only returns proper values for zone based counters as the enum it
gets suggests.  We already have global_node_page_state so let's rename
global_page_state to global_zone_page_state to be more explicit here.
All existing users seems to be correct:

$ git grep "global_page_state(NR_" | sed 's@.*(\(NR_[A-Z_]*\)).*@\1@' | sort | uniq -c
      2 NR_BOUNCE
      2 NR_FREE_CMA_PAGES
     11 NR_FREE_PAGES
      1 NR_KERNEL_STACK_KB
      1 NR_MLOCK
      2 NR_PAGETABLE

This patch shouldn't introduce any functional change.

[1] http://lkml.kernel.org/r/201707260628.v6Q6SmaS030814@www262.sakura.ne.jp

Link: http://lkml.kernel.org/r/20170801134256.5400-2-hannes@cmpxchg.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:29 -07:00
Helge Deller
37511fb5c9 mm: fix overflow check in expand_upwards()
Jörn Engel noticed that the expand_upwards() function might not return
-ENOMEM in case the requested address is (unsigned long)-PAGE_SIZE and
if the architecture didn't defined TASK_SIZE as multiple of PAGE_SIZE.

Affected architectures are arm, frv, m68k, blackfin, h8300 and xtensa
which all define TASK_SIZE as 0xffffffff, but since none of those have
an upwards-growing stack we currently have no actual issue.

Nevertheless let's fix this just in case any of the architectures with
an upward-growing stack (currently parisc, metag and partly ia64) define
TASK_SIZE similar.

Link: http://lkml.kernel.org/r/20170702192452.GA11868@p100.box
Fixes: bd726c90b6b8 ("Allow stack to grow up to address space limit")
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: Jörn Engel <joern@purestorage.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-14 15:05:12 -07:00
Krzysztof Opasiak
24c79d8e0a mm: use dedicated helper to access rlimit value
Use rlimit() helper instead of manually writing whole chain from current
task to rlim_cur.

Link: http://lkml.kernel.org/r/20170705172811.8027-1-k.opasiak@samsung.com
Signed-off-by: Krzysztof Opasiak <k.opasiak@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:33 -07:00
Oleg Nesterov
32e4e6d5cb mm/mmap.c: expand_downwards: don't require the gap if !vm_prev
expand_stack(vma) fails if address < stack_guard_gap even if there is no
vma->vm_prev.  I don't think this makes sense, and we didn't do this
before the recent commit 1be7107fbe18 ("mm: larger stack guard gap,
between vmas").

We do not need a gap in this case, any address is fine as long as
security_mmap_addr() doesn't object.

This also simplifies the code, we know that address >= prev->vm_end and
thus underflow is not possible.

Link: http://lkml.kernel.org/r/20170628175258.GA24881@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:33 -07:00
Michal Hocko
561b5e0709 mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
Commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas") has
introduced a regression in some rust and Java environments which are
trying to implement their own stack guard page.  They are punching a new
MAP_FIXED mapping inside the existing stack Vma.

This will confuse expand_{downwards,upwards} into thinking that the
stack expansion would in fact get us too close to an existing non-stack
vma which is a correct behavior wrt safety.  It is a real regression on
the other hand.

Let's work around the problem by considering PROT_NONE mapping as a part
of the stack.  This is a gros hack but overflowing to such a mapping
would trap anyway an we only can hope that usespace knows what it is
doing and handle it propely.

Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas")
Link: http://lkml.kernel.org/r/20170705182849.GA18027@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Debugged-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:33 -07:00
Linus Torvalds
09b56d5a41 Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:

 - add support for ftrace-with-registers, which is needed for kgraft and
   other ftrace tools

 - support for mremap() for the sigpage/vDSO so that checkpoint/restore
   can work

 - add timestamps to each line of the register dump output

 - remove the unused KTHREAD_SIZE from nommu

 - align the ARM bitops APIs with the generic API (using unsigned long
   pointers rather than void pointers)

 - make the configuration of userspace Thumb support an expert option so
   that we can default it on, and avoid some hard to debug userspace
   crashes

* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8684/1: NOMMU: Remove unused KTHREAD_SIZE definition
  ARM: 8683/1: ARM32: Support mremap() for sigpage/vDSO
  ARM: 8679/1: bitops: Align prototypes to generic API
  ARM: 8678/1: ftrace: Adds support for CONFIG_DYNAMIC_FTRACE_WITH_REGS
  ARM: make configuration of userspace Thumb support an expert option
  ARM: 8673/1: Fix __show_regs output timestamps
2017-07-08 12:17:25 -07:00
Daniel Micay
ac34ceaf1c mm/mmap.c: mark protection_map as __ro_after_init
The protection map is only modified by per-arch init code so it can be
protected from writes after the init code runs.

This change was extracted from PaX where it's part of KERNEXEC.

Link: http://lkml.kernel.org/r/20170510174441.26163-1-danielmicay@gmail.com
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:31 -07:00
Helge Deller
bd726c90b6 Allow stack to grow up to address space limit
Fix expand_upwards() on architectures with an upward-growing stack (parisc,
metag and partly IA-64) to allow the stack to reliably grow exactly up to
the address space limit given by TASK_SIZE.

Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-21 11:07:18 -07:00
Hugh Dickins
f4cb767d76 mm: fix new crash in unmapped_area_topdown()
Trinity gets kernel BUG at mm/mmap.c:1963! in about 3 minutes of
mmap testing.  That's the VM_BUG_ON(gap_end < gap_start) at the
end of unmapped_area_topdown().  Linus points out how MAP_FIXED
(which does not have to respect our stack guard gap intentions)
could result in gap_end below gap_start there.  Fix that, and
the similar case in its alternative, unmapped_area().

Cc: stable@vger.kernel.org
Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Debugged-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-21 10:56:11 -07:00
Dmitry Safonov
280e87e98c ARM: 8683/1: ARM32: Support mremap() for sigpage/vDSO
CRIU restores application mappings on the same place where they
were before Checkpoint. That means, that we need to move vDSO
and sigpage during restore on exactly the same place where
they were before C/R.

Make mremap() code update mm->context.{sigpage,vdso} pointers
during VMA move. Sigpage is used for landing after handling
a signal - if the pointer is not updated during moving, the
application might crash on any signal after mremap().

vDSO pointer on ARM32 is used only for setting auxv at this moment,
update it during mremap() in case of future usage.

Without those updates, current work of CRIU on ARM32 is not reliable.
Historically, we error Checkpointing if we find vDSO page on ARM32
and suggest user to disable CONFIG_VDSO.
But that's not correct - it goes from x86 where signal processing
is ended in vDSO blob. For arm32 it's sigpage, which is not disabled
with `CONFIG_VDSO=n'.

Looks like C/R was working by luck - because userspace on ARM32 at
this moment always sets SA_RESTORER.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-06-21 13:02:58 +01:00
Hugh Dickins
1be7107fbe mm: larger stack guard gap, between vmas
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications.  For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.

Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-19 21:50:20 +08:00
Anshuman Khandual
20ac28933c mm/mmap: replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff
Commit 091d0d55b286 ("shm: fix null pointer deref when userspace
specifies invalid hugepage size") had replaced MAP_HUGE_MASK with
SHM_HUGE_MASK.  Though both of them contain the same numeric value of
0x3f, MAP_HUGE_MASK flag sounds more appropriate than the other one in
the context.  Hence change it back.

Link: http://lkml.kernel.org/r/20170404045635.616-1-khandual@linux.vnet.ibm.com
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03 15:52:10 -07:00
Linus Torvalds
94e877d0fb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile two from Al Viro:

 - orangefs fix

 - series of fs/namei.c cleanups from me

 - VFS stuff coming from overlayfs tree

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  orangefs: Use RCU for destroy_inode
  vfs: use helper for calling f_op->fsync()
  mm: use helper for calling f_op->mmap()
  vfs: use helpers for calling f_op->{read,write}_iter()
  vfs: pass type instead of fn to do_{loop,iter}_readv_writev()
  vfs: extract common parts of {compat_,}do_readv_writev()
  vfs: wrap write f_ops with file_{start,end}_write()
  vfs: deny copy_file_range() for non regular files
  vfs: deny fallocate() on directory
  vfs: create vfs helper vfs_tmpfile()
  namei.c: split unlazy_walk()
  namei.c: fold the check for DCACHE_OP_REVALIDATE into d_revalidate()
  lookup_fast(): clean up the logics around the fallback to non-rcu mode
  namei: fold unlazy_link() into its sole caller
2017-03-02 15:20:00 -08:00
Al Viro
653a7746fa Merge remote-tracking branch 'ovl/for-viro' into for-linus
Overlayfs-related series from Miklos and Amir
2017-03-02 06:41:22 -05:00
David Rientjes
def5efe037 mm, madvise: fail with ENOMEM when splitting vma will hit max_map_count
If madvise(2) advice will result in the underlying vma being split and
the number of areas mapped by the process will exceed
/proc/sys/vm/max_map_count as a result, return ENOMEM instead of EAGAIN.

EAGAIN is returned by madvise(2) when a kernel resource, such as slab,
is temporarily unavailable.  It indicates that userspace should retry
the advice in the near future.  This is important for advice such as
MADV_DONTNEED which is often used by malloc implementations to free
memory back to the system: we really do want to free memory back when
madvise(2) returns EAGAIN because slab allocations (for vmas, anon_vmas,
or mempolicies) cannot be allocated.

Encountering /proc/sys/vm/max_map_count is not a temporary failure,
however, so return ENOMEM to indicate this is a more serious issue.  A
followup patch to the man page will specify this behavior.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1701241431120.42507@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Mike Rapoport
897ab3e0c4 userfaultfd: non-cooperative: add event for memory unmaps
When a non-cooperative userfaultfd monitor copies pages in the
background, it may encounter regions that were already unmapped.
Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely
changes in the virtual memory layout.

Since there might be different uffd contexts for the affected VMAs, we
first should create a temporary representation for the unmap event for
each uffd context and then notify them one by one to the appropriate
userfault file descriptors.

The event notification occurs after the mmap_sem has been released.

[arnd@arndb.de: fix nommu build]
  Link: http://lkml.kernel.org/r/20170203165141.3665284-1-arnd@arndb.de
[mhocko@suse.com: fix nommu build]
  Link: http://lkml.kernel.org/r/20170202091503.GA22823@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1485542673-24387-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Mike Rapoport
846b1a0f1d mm: call vm_munmap in munmap syscall instead of using open coded version
Patch series "userfaultfd: non-cooperative: better tracking for mapping
changes", v2.

These patches try to address issues I've encountered during integration
of userfaultfd with CRIU.

Previously added userfaultfd events for fork(), madvise() and mremap()
unfortunately do not cover all possible changes to a process virtual
memory layout required for uffd monitor.

When one or more VMAs is removed from the process mm, the external uffd
monitor has no way to detect those changes and will attempt to fill the
removed regions with userfaultfd_copy.

Another problematic event is the exit() of the process.  Here again, the
external uffd monitor will try to use userfaultfd_copy, although mm
owning the memory has already gone.

The first patch in the series is a minor cleanup and it's not strictly
related to the rest of the series.

The patches 2 and 3 below add UFFD_EVENT_UNMAP and UFFD_EVENT_EXIT to
allow the uffd monitor track changes in the memory layout of a process.

The patches 4 and 5 amend error codes returned by userfaultfd_copy to
make the uffd monitor able to cope with races that might occur between
delivery of unmap and exit events and outstanding userfaultfd_copy's.

This patch (of 5):

Commit dc0ef0df7b6a ("mm: make mmap_sem for write waits killable for mm
syscalls") replaced call to vm_munmap in munmap syscall with open coded
version to allow different waits on mmap_sem in munmap syscall and
vm_munmap.

Now both functions use down_write_killable, so we can restore the call
to vm_munmap from the munmap system call.

Link: http://lkml.kernel.org/r/1485542673-24387-2-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
seokhoon.yoon
3edf41d845 mm: fix comments for mmap_init()
mmap_init() is no longer associated with VMA slab.  So fix it.

Link: http://lkml.kernel.org/r/1485182601-9294-1-git-send-email-iamyooon@gmail.com
Signed-off-by: seokhoon.yoon <iamyooon@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Dave Jiang
11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Denys Vlasenko
16e72e9b30 powerpc: do not make the entire heap executable
On 32-bit powerpc the ELF PLT sections of binaries (built with
--bss-plt, or with a toolchain which defaults to it) look like this:

  [17] .sbss             NOBITS          0002aff8 01aff8 000014 00  WA  0   0  4
  [18] .plt              NOBITS          0002b00c 01aff8 000084 00 WAX  0   0  4
  [19] .bss              NOBITS          0002b090 01aff8 0000a4 00  WA  0   0  4

Which results in an ELF load header:

  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x019c70 0x00029c70 0x00029c70 0x01388 0x014c4 RWE 0x10000

This is all correct, the load region containing the PLT is marked as
executable.  Note that the PLT starts at 0002b00c but the file mapping
ends at 0002aff8, so the PLT falls in the 0 fill section described by
the load header, and after a page boundary.

Unfortunately the generic ELF loader ignores the X bit in the load
headers when it creates the 0 filled non-file backed mappings.  It
assumes all of these mappings are RW BSS sections, which is not the case
for PPC.

gcc/ld has an option (--secure-plt) to not do this, this is said to
incur a small performance penalty.

Currently, to support 32-bit binaries with PLT in BSS kernel maps
*entire brk area* with executable rights for all binaries, even
--secure-plt ones.

Stop doing that.

Teach the ELF loader to check the X bit in the relevant load header and
create 0 filled anonymous mappings that are executable if the load
header requests that.

Test program showing the difference in /proc/$PID/maps:

int main() {
	char buf[16*1024];
	char *p = malloc(123); /* make "[heap]" mapping appear */
	int fd = open("/proc/self/maps", O_RDONLY);
	int len = read(fd, buf, sizeof(buf));
	write(1, buf, len);
	printf("%p\n", p);
	return 0;
}

Compiled using: gcc -mbss-plt -m32 -Os test.c -otest

Unpatched ppc64 kernel:
00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
10690000-106c0000 rwxp 00000000 00:00 0                                  [heap]
f7f70000-f7fa0000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
f7fa0000-f7fb0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
f7fb0000-f7fc0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
ffa90000-ffac0000 rw-p 00000000 00:00 0                                  [stack]
0x10690008

Patched ppc64 kernel:
00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
10180000-101b0000 rw-p 00000000 00:00 0                                  [heap]
                  ^^^^ this has changed
f7c60000-f7c90000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
f7c90000-f7ca0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
f7ca0000-f7cb0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
ff860000-ff890000 rw-p 00000000 00:00 0                                  [stack]
0x10180008

The patch was originally posted in 2012 by Jason Gunthorpe
and apparently ignored:

https://lkml.org/lkml/2012/9/30/138

Lightly run-tested.

Link: http://lkml.kernel.org/r/20161215131950.23054-1-dvlasenk@redhat.com
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:29 -08:00
Miklos Szeredi
f74ac01520 mm: use helper for calling f_op->mmap()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-20 16:51:23 +01:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Andrea Arcangeli
8f26e0b176 mm: vma_merge: correct false positive from __vma_unlink->validate_mm_rb
The old code was always doing:

   vma->vm_end = next->vm_end
   vma_rb_erase(next) // in __vma_unlink
   vma->vm_next = next->vm_next // in __vma_unlink
   next = vma->vm_next
   vma_gap_update(next)

The new code still does the above for remove_next == 1 and 2, but for
remove_next == 3 it has been changed and it does:

   next->vm_start = vma->vm_start
   vma_rb_erase(vma) // in __vma_unlink
   vma_gap_update(next)

In the latter case, while unlinking "vma", validate_mm_rb() is told to
ignore "vma" that is being removed, but next->vm_start was reduced
instead. So for the new case, to avoid the false positive from
validate_mm_rb, it should be "next" that is ignored when "vma" is
being unlinked.

"vma" and "next" in the above comment, considered pre-swap().

Link: http://lkml.kernel.org/r/1474492522-2261-4-git-send-email-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jan Vorlicek <janvorli@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:29 -07:00
Andrea Arcangeli
86d12e471d mm: vma_adjust: minor comment correction
The cases are three not two.

Link: http://lkml.kernel.org/r/1474492522-2261-3-git-send-email-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jan Vorlicek <janvorli@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:29 -07:00
Andrea Arcangeli
97a42cd439 mm: vma_adjust: remove superfluous check for next not NULL
If next would be NULL we couldn't reach such code path.

Link: http://lkml.kernel.org/r/1474309513-20313-2-git-send-email-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jan Vorlicek <janvorli@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:29 -07:00
Andrea Arcangeli
e86f15ee64 mm: vma_merge: fix vm_page_prot SMP race condition against rmap_walk
The rmap_walk can access vm_page_prot (and potentially vm_flags in the
pte/pmd manipulations).  So it's not safe to wait the caller to update
the vm_page_prot/vm_flags after vma_merge returned potentially removing
the "next" vma and extending the "current" vma over the
next->vm_start,vm_end range, but still with the "current" vma
vm_page_prot, after releasing the rmap locks.

The vm_page_prot/vm_flags must be transferred from the "next" vma to the
current vma while vma_merge still holds the rmap locks.

The side effect of this race condition is pte corruption during migrate
as remove_migration_ptes when run on a address of the "next" vma that
got removed, used the vm_page_prot of the current vma.

  migrate   	      	        mprotect
  ------------			-------------
  migrating in "next" vma
				vma_merge() # removes "next" vma and
			        	    # extends "current" vma
					    # current vma is not with
					    # vm_page_prot updated
  remove_migration_ptes
  read vm_page_prot of current "vma"
  establish pte with wrong permissions
				vm_set_page_prot(vma) # too late!
				change_protection in the old vma range
				only, next range is not updated

This caused segmentation faults and potentially memory corruption in
heavy mprotect loads with some light page migration caused by compaction
in the background.

Hugh Dickins pointed out the comment about the Odd case 8 in vma_merge
which confirms the case 8 is only buggy one where the race can trigger,
in all other vma_merge cases the above cannot happen.

This fix removes the oddness factor from case 8 and it converts it from:

      AAAA
  PPPPNNNNXXXX -> PPPPNNNNNNNN

to:

      AAAA
  PPPPNNNNXXXX -> PPPPXXXXXXXX

XXXX has the right vma properties for the whole merged vma returned by
vma_adjust, so it solves the problem fully.  It has the added benefits
that the callers could stop updating vma properties when vma_merge
succeeds however the callers are not updated by this patch (there are
bits like VM_SOFTDIRTY that still need special care for the whole range,
as the vma merging ignores them, but as long as they're not processed by
rmap walks and instead they're accessed with the mmap_sem at least for
reading, they are fine not to be updated within vma_adjust before
releasing the rmap_locks).

Link: http://lkml.kernel.org/r/1474309513-20313-1-git-send-email-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Aditya Mandaleeka <adityam@microsoft.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jan Vorlicek <janvorli@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:29 -07:00
Andrea Arcangeli
fb8c41e9ad mm: vma_adjust: remove superfluous confusing update in remove_next == 1 case
mm->highest_vm_end doesn't need any update.

After finally removing the oddness from vma_merge case 8 that was
causing:

1) constant risk of trouble whenever anybody would check vma fields
   from rmap_walks, like it happened when page migration was
   introduced and it read the vma->vm_page_prot from a rmap_walk

2) the callers of vma_merge to re-initialize any value different from
   the current vma, instead of vma_merge() more reliably returning a
   vma that already matches all fields passed as parameter

.. it is also worth to take the opportunity of cleaning up superfluous
code in vma_adjust(), that if not removed adds up to the hard
readability of the function.

Link: http://lkml.kernel.org/r/1474492522-2261-5-git-send-email-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jan Vorlicek <janvorli@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:29 -07:00