26810 Commits

Author SHA1 Message Date
Mark Rutland
cc7a64cd0d BACKPORT: sched/core / kcov: avoid kcov_area during task switch
(Upstream commit 0ed557aa813922f6f32adec69e266532091c895b.)

During a context switch, we first switch_mm() to the next task's mm,
then switch_to() that new task.  This means that vmalloc'd regions which
had previously been faulted in can transiently disappear in the context
of the prev task.

Functions instrumented by KCOV may try to access a vmalloc'd kcov_area
during this window, and as the fault handling code is instrumented, this
results in a recursive fault.

We must avoid accessing any kcov_area during this window.  We can do so
with a new flag in kcov_mode, set prior to switching the mm, and cleared
once the new task is live.  Since task_struct::kcov_mode isn't always a
specific enum kcov_mode value, this is made an unsigned int.

The manipulation is hidden behind kcov_{prepare,finish}_switch() helpers,
which are empty for !CONFIG_KCOV kernels.

The code uses macros because I can't use static inline functions without a
circular include dependency between <linux/sched.h> and <linux/kcov.h>,
since the definition of task_struct uses things defined in <linux/kcov.h>

Link: http://lkml.kernel.org/r/20180504135535.53744-4-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I58f8a16210e6e58b3cca5b9e6976da305bc9a83a
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 147413187
2020-01-13 19:39:11 +00:00
Mark Rutland
bef2ed2af6 UPSTREAM: kcov: prefault the kcov_area
(Upstream commit dc55daff9040a90adce97208e776ee0bf515ab12.)

On many architectures the vmalloc area is lazily faulted in upon first
access.  This is problematic for KCOV, as __sanitizer_cov_trace_pc
accesses the (vmalloc'd) kcov_area, and fault handling code may be
instrumented.  If an access to kcov_area faults, this will result in
mutual recursion through the fault handling code and
__sanitizer_cov_trace_pc(), eventually leading to stack corruption
and/or overflow.

We can avoid this by faulting in the kcov_area before
__sanitizer_cov_trace_pc() is permitted to access it.  Once it has been
faulted in, it will remain present in the process page tables, and will
not fault again.

[akpm@linux-foundation.org: code cleanup]
[akpm@linux-foundation.org: add comment explaining kcov_fault_in_area()]
[akpm@linux-foundation.org: fancier code comment from Mark]
Link: http://lkml.kernel.org/r/20180504135535.53744-3-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 147413187
Change-Id: Id90248e11b7a0ea2c5d28faf6e55515cd7dc4987
2020-01-13 19:38:58 +00:00
Dmitry Vyukov
13ffffd3ba UPSTREAM: kcov: fix comparison callback signature
(Upstream commit 689d77f001cd22da31cc943170e1f6f2e8197035.)

Fix a silly copy-paste bug.  We truncated u32 args to u16.

Link: http://lkml.kernel.org/r/20171207101134.107168-1-dvyukov@google.com
Fixes: ded97d2c2b2c ("kcov: support comparison operands collection")
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: syzkaller@googlegroups.com
Cc: Alexander Potapenko <glider@google.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 147413187
Change-Id: If15ae4d86c445e908c355975ccf5cf53e296b27d
2020-01-13 19:38:31 +00:00
Victor Chibotaru
fccb16d96f BACKPORT: kcov: support comparison operands collection
(Upstream commit ded97d2c2b2c5f1dcced0bc57133f7753b037dfc.)

Enables kcov to collect comparison operands from instrumented code.
This is done by using Clang's -fsanitize=trace-cmp instrumentation
(currently not available for GCC).

The comparison operands help a lot in fuzz testing.  E.g.  they are used
in Syzkaller to cover the interiors of conditional statements with way
less attempts and thus make previously unreachable code reachable.

To allow separate collection of coverage and comparison operands two
different work modes are implemented.  Mode selection is now done via a
KCOV_ENABLE ioctl call with corresponding argument value.

Link: http://lkml.kernel.org/r/20171011095459.70721-1-glider@google.com
Signed-off-by: Victor Chibotaru <tchibo@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I227775c812f342423102cd28fd68b235579c60d3
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 147413187
2020-01-13 19:38:07 +00:00
Andrey Ryabinin
ac137d538b UPSTREAM: kcov: remove pointless current != NULL check
(Upstream commit fcf4edac049a8bca41658970292e2dfdbc9d5f62.)

__sanitizer_cov_trace_pc() is a hot code, so it's worth to remove
pointless '!current' check.  Current is never NULL.

Link: http://lkml.kernel.org/r/20170929162221.32500-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 147413187
Change-Id: I2c07ed1a805ff2e9e569adc5aa59386e47f5bb23
2020-01-13 19:37:58 +00:00
Greg Kroah-Hartman
d2905c6a0e This is the 4.14.164 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4a/2MACgkQONu9yGCS
 aT4BwA//diCficMfLINrc/9bMq3VS2Y+/lnuURMXEM9MJibjQCUS1spc6YhhNFrE
 8m3aavAYywjjD3zGHj8KEaKQFDrPQxYQDzPOPK9rxjpxlUFpnYWUGlI2krpwBV6c
 8xAekM62sMEIq09EHqqhKVls+WmYi47/pdfGAAt3PUR8c2eTOlxiFsiwq4nuZDdv
 rcMkQm87V8Wn1Nq+Dfp6R3U+X9f4DcU5n5cKiGq6ujoalT7h5/jj36JIFxBwMapF
 WjpqXMUUeylXxXnNFMUbEMg+lEqJlWfvj1sxdxyMdgS+L9rc9bXk/NTub4TZPaXu
 odwMl9RKWjJvFsvn26Pc4s31K2raEhCDYdkVoFTXWsc7vbE4A/h/yAw4Wq+cuBI4
 H4fBXYYZ3D0Il9kxYYbfSaki5z1YbI54tkWcrs8f8jli5C0M3Wkkux1TA4HPj2Ja
 8zJFH0++cyfpuKRiYXro+H2Tq4KxBwsWEtync8230MEywlTxkz4IIue+SCgVV+WD
 jmg/enRjbnkpYBSH1pKOdAAga0kHSxtwWlfLFrjhcgGse8y6sCJhUOPPcQMnf/k0
 Jrmc3InHg+mtLiSsJXAp4iGABJlW+W/ouaxaxYoA9wucwQlcgxXpkigl5rOgFTma
 153RYc1TSZJAe+cjx42qZxRxcD8/Vg5d6D2tL1otbMSIsD3e7Gk=
 =sq63
 -----END PGP SIGNATURE-----

Merge 4.14.164 into android-4.14

Changes in 4.14.164
	USB: dummy-hcd: use usb_urb_dir_in instead of usb_pipein
	USB: dummy-hcd: increase max number of devices to 32
	locking/spinlock/debug: Fix various data races
	netfilter: ctnetlink: netns exit must wait for callbacks
	mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame()
	libtraceevent: Fix lib installation with O=
	x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage
	efi/gop: Return EFI_NOT_FOUND if there are no usable GOPs
	efi/gop: Return EFI_SUCCESS if a usable GOP was found
	efi/gop: Fix memory leak in __gop_query32/64()
	ARM: vexpress: Set-up shared OPP table instead of individual for each CPU
	netfilter: uapi: Avoid undefined left-shift in xt_sctp.h
	netfilter: nf_tables: validate NFT_SET_ELEM_INTERVAL_END
	ARM: dts: Cygnus: Fix MDIO node address/size cells
	spi: spi-cavium-thunderx: Add missing pci_release_regions()
	ASoC: topology: Check return value for soc_tplg_pcm_create()
	ARM: dts: bcm283x: Fix critical trip point
	bpf, mips: Limit to 33 tail calls
	ARM: dts: am437x-gp/epos-evm: fix panel compatible
	samples: bpf: Replace symbol compare of trace_event
	samples: bpf: fix syscall_tp due to unused syscall
	powerpc: Ensure that swiotlb buffer is allocated from low memory
	bnx2x: Do not handle requests from VFs after parity
	bnx2x: Fix logic to get total no. of PFs per engine
	net: usb: lan78xx: Fix error message format specifier
	rfkill: Fix incorrect check to avoid NULL pointer dereference
	ASoC: wm8962: fix lambda value
	regulator: rn5t618: fix module aliases
	kconfig: don't crash on NULL expressions in expr_eq()
	perf/x86/intel: Fix PT PMI handling
	fs: avoid softlockups in s_inodes iterators
	net: stmmac: Do not accept invalid MTU values
	net: stmmac: RX buffer size must be 16 byte aligned
	s390/dasd/cio: Interpret ccw_device_get_mdc return value correctly
	s390/dasd: fix memleak in path handling error case
	block: fix memleak when __blk_rq_map_user_iov() is failed
	parisc: Fix compiler warnings in debug_core.c
	llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
	hv_netvsc: Fix unwanted rx_table reset
	bpf: reject passing modified ctx to helper functions
	bpf: Fix passing modified ctx to ld/abs/ind instruction
	PCI/switchtec: Read all 64 bits of part_event_bitmap
	mmc: block: Convert RPMB to a character device
	mmc: block: Delete mmc_access_rpmb()
	mmc: block: Fix bug when removing RPMB chardev
	mmc: core: Prevent bus reference leak in mmc_blk_init()
	mmc: block: propagate correct returned value in mmc_rpmb_ioctl
	gtp: fix bad unlock balance in gtp_encap_enable_socket
	macvlan: do not assume mac_header is set in macvlan_broadcast()
	net: dsa: mv88e6xxx: Preserve priority when setting CPU port.
	net: stmmac: dwmac-sun8i: Allow all RGMII modes
	net: stmmac: dwmac-sunxi: Allow all RGMII modes
	net: usb: lan78xx: fix possible skb leak
	pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM
	USB: core: fix check for duplicate endpoints
	USB: serial: option: add Telit ME910G1 0x110a composition
	sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY
	tcp: fix "old stuff" D-SACK causing SACK to be treated as D-SACK
	vxlan: fix tos value before xmit
	vlan: vlan_changelink() should propagate errors
	net: sch_prio: When ungrafting, replace with FIFO
	vlan: fix memory leak in vlan_dev_set_egress_priority
	Linux 4.14.164

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifbce6635b5a3df896c29e23dd15098e80ecddeba
2020-01-12 12:24:05 +01:00
Daniel Borkmann
b454ac1b22 bpf: Fix passing modified ctx to ld/abs/ind instruction
commit 6d4f151acf9a4f6fab09b615f246c717ddedcf0c upstream.

Anatoly has been fuzzing with kBdysch harness and reported a KASAN
slab oob in one of the outcomes:

  [...]
  [   77.359642] BUG: KASAN: slab-out-of-bounds in bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.360463] Read of size 4 at addr ffff8880679bac68 by task bpf/406
  [   77.361119]
  [   77.361289] CPU: 2 PID: 406 Comm: bpf Not tainted 5.5.0-rc2-xfstests-00157-g2187f215eba #1
  [   77.362134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  [   77.362984] Call Trace:
  [   77.363249]  dump_stack+0x97/0xe0
  [   77.363603]  print_address_description.constprop.0+0x1d/0x220
  [   77.364251]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.365030]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.365860]  __kasan_report.cold+0x37/0x7b
  [   77.366365]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.366940]  kasan_report+0xe/0x20
  [   77.367295]  bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.367821]  ? bpf_skb_load_helper_8+0xf0/0xf0
  [   77.368278]  ? mark_lock+0xa3/0x9b0
  [   77.368641]  ? kvm_sched_clock_read+0x14/0x30
  [   77.369096]  ? sched_clock+0x5/0x10
  [   77.369460]  ? sched_clock_cpu+0x18/0x110
  [   77.369876]  ? bpf_skb_load_helper_8+0xf0/0xf0
  [   77.370330]  ___bpf_prog_run+0x16c0/0x28f0
  [   77.370755]  __bpf_prog_run32+0x83/0xc0
  [   77.371153]  ? __bpf_prog_run64+0xc0/0xc0
  [   77.371568]  ? match_held_lock+0x1b/0x230
  [   77.371984]  ? rcu_read_lock_held+0xa1/0xb0
  [   77.372416]  ? rcu_is_watching+0x34/0x50
  [   77.372826]  sk_filter_trim_cap+0x17c/0x4d0
  [   77.373259]  ? sock_kzfree_s+0x40/0x40
  [   77.373648]  ? __get_filter+0x150/0x150
  [   77.374059]  ? skb_copy_datagram_from_iter+0x80/0x280
  [   77.374581]  ? do_raw_spin_unlock+0xa5/0x140
  [   77.375025]  unix_dgram_sendmsg+0x33a/0xa70
  [   77.375459]  ? do_raw_spin_lock+0x1d0/0x1d0
  [   77.375893]  ? unix_peer_get+0xa0/0xa0
  [   77.376287]  ? __fget_light+0xa4/0xf0
  [   77.376670]  __sys_sendto+0x265/0x280
  [   77.377056]  ? __ia32_sys_getpeername+0x50/0x50
  [   77.377523]  ? lock_downgrade+0x350/0x350
  [   77.377940]  ? __sys_setsockopt+0x2a6/0x2c0
  [   77.378374]  ? sock_read_iter+0x240/0x240
  [   77.378789]  ? __sys_socketpair+0x22a/0x300
  [   77.379221]  ? __ia32_sys_socket+0x50/0x50
  [   77.379649]  ? mark_held_locks+0x1d/0x90
  [   77.380059]  ? trace_hardirqs_on_thunk+0x1a/0x1c
  [   77.380536]  __x64_sys_sendto+0x74/0x90
  [   77.380938]  do_syscall_64+0x68/0x2a0
  [   77.381324]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [   77.381878] RIP: 0033:0x44c070
  [...]

After further debugging, turns out while in case of other helper functions
we disallow passing modified ctx, the special case of ld/abs/ind instruction
which has similar semantics (except r6 being the ctx argument) is missing
such check. Modified ctx is impossible here as bpf_skb_load_helper_8_no_cache()
and others are expecting skb fields in original position, hence, add
check_ctx_reg() to reject any modified ctx. Issue was first introduced back
in f1174f77b50c ("bpf/verifier: rework value tracking").

Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200106215157.3553-1-daniel@iogearbox.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12 12:12:02 +01:00
Daniel Borkmann
7fed98f4a1 bpf: reject passing modified ctx to helper functions
commit 58990d1ff3f7896ee341030e9a7c2e4002570683 upstream.

As commit 28e33f9d78ee ("bpf: disallow arithmetic operations on
context pointer") already describes, f1174f77b50c ("bpf/verifier:
rework value tracking") removed the specific white-listed cases
we had previously where we would allow for pointer arithmetic in
order to further generalize it, and allow e.g. context access via
modified registers. While the dereferencing of modified context
pointers had been forbidden through 28e33f9d78ee, syzkaller did
recently manage to trigger several KASAN splats for slab out of
bounds access and use after frees by simply passing a modified
context pointer to a helper function which would then do the bad
access since verifier allowed it in adjust_ptr_min_max_vals().

Rejecting arithmetic on ctx pointer in adjust_ptr_min_max_vals()
generally could break existing programs as there's a valid use
case in tracing in combination with passing the ctx to helpers as
bpf_probe_read(), where the register then becomes unknown at
verification time due to adding a non-constant offset to it. An
access sequence may look like the following:

  offset = args->filename;  /* field __data_loc filename */
  bpf_probe_read(&dst, len, (char *)args + offset); // args is ctx

There are two options: i) we could special case the ctx and as
soon as we add a constant or bounded offset to it (hence ctx type
wouldn't change) we could turn the ctx into an unknown scalar, or
ii) we generalize the sanity test for ctx member access into a
small helper and assert it on the ctx register that was passed
as a function argument. Fwiw, latter is more obvious and less
complex at the same time, and one case that may potentially be
legitimate in future for ctx member access at least would be for
ctx to carry a const offset. Therefore, fix follows approach
from ii) and adds test cases to BPF kselftests.

Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Reported-by: syzbot+3d0b2441dbb71751615e@syzkaller.appspotmail.com
Reported-by: syzbot+c8504affd4fdd0c1b626@syzkaller.appspotmail.com
Reported-by: syzbot+e5190cb881d8660fb1a3@syzkaller.appspotmail.com
Reported-by: syzbot+efae31b384d5badbd620@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12 12:12:02 +01:00
Marco Elver
09226e5c38 locking/spinlock/debug: Fix various data races
[ Upstream commit 1a365e822372ba24c9da0822bc583894f6f3d821 ]

This fixes various data races in spinlock_debug. By testing with KCSAN,
it is observable that the console gets spammed with data races reports,
suggesting these are extremely frequent.

Example data race report:

  read to 0xffff8ab24f403c48 of 4 bytes by task 221 on cpu 2:
   debug_spin_lock_before kernel/locking/spinlock_debug.c:85 [inline]
   do_raw_spin_lock+0x9b/0x210 kernel/locking/spinlock_debug.c:112
   __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
   _raw_spin_lock+0x39/0x40 kernel/locking/spinlock.c:151
   spin_lock include/linux/spinlock.h:338 [inline]
   get_partial_node.isra.0.part.0+0x32/0x2f0 mm/slub.c:1873
   get_partial_node mm/slub.c:1870 [inline]
  <snip>

  write to 0xffff8ab24f403c48 of 4 bytes by task 167 on cpu 3:
   debug_spin_unlock kernel/locking/spinlock_debug.c:103 [inline]
   do_raw_spin_unlock+0xc9/0x1a0 kernel/locking/spinlock_debug.c:138
   __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:159 [inline]
   _raw_spin_unlock_irqrestore+0x2d/0x50 kernel/locking/spinlock.c:191
   spin_unlock_irqrestore include/linux/spinlock.h:393 [inline]
   free_debug_processing+0x1b3/0x210 mm/slub.c:1214
   __slab_free+0x292/0x400 mm/slub.c:2864
  <snip>

As a side-effect, with KCSAN, this eventually locks up the console, most
likely due to deadlock, e.g. .. -> printk lock -> spinlock_debug ->
KCSAN detects data race -> kcsan_print_report() -> printk lock ->
deadlock.

This fix will 1) avoid the data races, and 2) allow using lock debugging
together with KCSAN.

Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Marco Elver <elver@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/20191120155715.28089-1-elver@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12 12:11:49 +01:00
Joel Fernandes (Google)
ca96898263 BACKPORT: perf_event: Add support for LSM and SELinux checks
In current mainline, the degree of access to perf_event_open(2) system
call depends on the perf_event_paranoid sysctl.  This has a number of
limitations:

1. The sysctl is only a single value. Many types of accesses are controlled
   based on the single value thus making the control very limited and
   coarse grained.
2. The sysctl is global, so if the sysctl is changed, then that means
   all processes get access to perf_event_open(2) opening the door to
   security issues.

This patch adds LSM and SELinux access checking which will be used in
Android to access perf_event_open(2) for the purposes of attaching BPF
programs to tracepoints, perf profiling and other operations from
userspace. These operations are intended for production systems.

5 new LSM hooks are added:
1. perf_event_open: This controls access during the perf_event_open(2)
   syscall itself. The hook is called from all the places that the
   perf_event_paranoid sysctl is checked to keep it consistent with the
   systctl. The hook gets passed a 'type' argument which controls CPU,
   kernel and tracepoint accesses (in this context, CPU, kernel and
   tracepoint have the same semantics as the perf_event_paranoid sysctl).
   Additionally, I added an 'open' type which is similar to
   perf_event_paranoid sysctl == 3 patch carried in Android and several other
   distros but was rejected in mainline [1] in 2016.

2. perf_event_alloc: This allocates a new security object for the event
   which stores the current SID within the event. It will be useful when
   the perf event's FD is passed through IPC to another process which may
   try to read the FD. Appropriate security checks will limit access.

3. perf_event_free: Called when the event is closed.

4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event.

5. perf_event_write: Called from the ioctl(2) syscalls for the event.

[1] https://lwn.net/Articles/696240/

Since Peter had suggest LSM hooks in 2016 [1], I am adding his
Suggested-by tag below.

To use this patch, we set the perf_event_paranoid sysctl to -1 and then
apply selinux checking as appropriate (default deny everything, and then
add policy rules to give access to domains that need it). In the future
we can remove the perf_event_paranoid sysctl altogether.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: James Morris <jmorris@namei.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: rostedt@goodmis.org
Cc: Yonghong Song <yhs@fb.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: jeffv@google.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: primiano@google.com
Cc: Song Liu <songliubraving@fb.com>
Cc: rsavitski@google.com
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Matthew Garrett <matthewgarrett@google.com>
Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org

(cherry picked from commit da97e18458fb42d7c00fac5fd1c56a3896ec666e)
[ Ryan Savitski: adapted for older APIs, and folded in upstream
  ae79d5588a04 (perf/core: Fix !CONFIG_PERF_EVENTS build warnings and
  failures). This should fix the build errors from the previous backport
  attempt, where certain configurations would end up with functions
  referring to the perf_event struct prior to its declaration (and
  therefore declaring it with a different scope). ]
Bug: 137092007
Change-Id: Iece194b3519dc5016ccbe127fc4e5c425ee7c442
Signed-off-by: Ryan Savitski <rsavitski@google.com>
2020-01-10 15:17:57 +00:00
Greg Kroah-Hartman
1cfd841998 This is the 4.14.163 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4W78kACgkQONu9yGCS
 aT4NTA//YhoVBrpdfY8I2ELs1HZzQJmzZBI0U8eB3pGPy/PuBLpQLVl99Zmjm62u
 q2vFtIh7LNLS5f74Pl5qTkWPN/Zapk9nAsyJ9Iy8EUGXZer4mZaIILpBpG+N4OhB
 /qbK5hkrJg51z97uspk9jTfdzCJ6Sa7j5ssB1BTulT936Naa1wcVgIAoAk0ezNrW
 mNAfjSmUT0iYbdl60LQgRshFhfk6/S+EuDOWUa34zK9c8JU9ll+cpyqTI//d+gnp
 5ttBIx9h/ZDJUDYLe9LSFy56ZUpLkNoDgO7Cqgs+AgvVYVhu2yrZhVSYIpa9gHh9
 wxxrRJST2kUELTiKw3e1pEL4SmNv6RvldwqBXqjO7cdTp19aYBGUEKWVEJsHuPrs
 waW0FMvEZO7lo02sh3nlakOOj6IDqAJOI1ZvLyZiffxwp5LzxV+uh2AvkfQTYSst
 vEawCIomABPLynYfGil1TCBEe9N0zieNVFRxXLb89xn0KfMWzCvqLO2zr9wChCel
 KEe+yCViT4bvC+zpUgk3WCXYtk04mLzbVJeEPc6PBn8I4t1olzrztfWRPlK3ywqK
 XsWDIuOZ0E6advzv+7Ztwk2gcHgKo0vMXXkVUwUm/jOpeaEO9i5bedp37vsGld2m
 QCwYxGBTFNaASfHHrXQ0i2p/7V7dGTPB2ZR6pQvf96DZepLtS6I=
 =yTFw
 -----END PGP SIGNATURE-----

Merge 4.14.163 into android-4.14

Changes in 4.14.163
	nvme_fc: add module to ops template to allow module references
	iio: adc: max9611: Fix too short conversion time delay
	PM / devfreq: Don't fail devfreq_dev_release if not in list
	RDMA/cma: add missed unregister_pernet_subsys in init failure
	rxe: correctly calculate iCRC for unaligned payloads
	scsi: lpfc: Fix memory leak on lpfc_bsg_write_ebuf_set func
	scsi: qla2xxx: Don't call qlt_async_event twice
	scsi: iscsi: qla4xxx: fix double free in probe
	scsi: libsas: stop discovering if oob mode is disconnected
	drm/nouveau: Move the declaration of struct nouveau_conn_atom up a bit
	usb: gadget: fix wrong endpoint desc
	net: make socket read/write_iter() honor IOCB_NOWAIT
	md: raid1: check rdev before reference in raid1_sync_request func
	s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits
	s390/cpum_sf: Avoid SBD overflow condition in irq handler
	IB/mlx4: Follow mirror sequence of device add during device removal
	xen-blkback: prevent premature module unload
	xen/balloon: fix ballooned page accounting without hotplug enabled
	PM / hibernate: memory_bm_find_bit(): Tighten node optimisation
	xfs: fix mount failure crash on invalid iclog memory access
	taskstats: fix data-race
	drm: limit to INT_MAX in create_blob ioctl
	ALSA: ice1724: Fix sleep-in-atomic in Infrasonic Quartet support code
	drm/sun4i: hdmi: Remove duplicate cleanup calls
	MIPS: Avoid VDSO ABI breakage due to global register variable
	media: pulse8-cec: fix lost cec_transmit_attempt_done() call
	media: cec: CEC 2.0-only bcast messages were ignored
	media: cec: avoid decrementing transmit_queue_sz if it is 0
	mm/zsmalloc.c: fix the migrated zspage statistics.
	memcg: account security cred as well to kmemcg
	pstore/ram: Write new dumps to start of recycled zones
	locks: print unsigned ino in /proc/locks
	dmaengine: Fix access to uninitialized dma_slave_caps
	compat_ioctl: block: handle Persistent Reservations
	compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE
	ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys()
	ata: ahci_brcm: Allow optional reset controller to be used
	ata: ahci_brcm: Fix AHCI resources management
	gpiolib: fix up emulated open drain outputs
	tracing: Fix lock inversion in trace_event_enable_tgid_record()
	tracing: Have the histogram compare functions convert to u64 first
	ALSA: cs4236: fix error return comparison of an unsigned integer
	ALSA: firewire-motu: Correct a typo in the clock proc string
	exit: panic before exit_mm() on global init exit
	ftrace: Avoid potential division by zero in function profiler
	arm64: Revert support for execute-only user mappings
	PM / devfreq: Check NULL governor in available_governors_show
	nfsd4: fix up replay_matches_cache()
	scsi: qla2xxx: Drop superfluous INIT_WORK of del_work
	xfs: don't check for AG deadlock for realtime files in bunmapi
	platform/x86: pmc_atom: Add Siemens CONNECT X300 to critclk_systems DMI table
	Bluetooth: btusb: fix PM leak in error case of setup
	Bluetooth: delete a stray unlock
	Bluetooth: Fix memory leak in hci_connect_le_scan
	media: flexcop-usb: ensure -EIO is returned on error condition
	regulator: ab8500: Remove AB8505 USB regulator
	media: usb: fix memory leak in af9005_identify_state
	dt-bindings: clock: renesas: rcar-usb2-clock-sel: Fix typo in example
	tty: serial: msm_serial: Fix lockup for sysrq and oops
	fix compat handling of FICLONERANGE, FIDEDUPERANGE and FS_IOC_FIEMAP
	scsi: qedf: Do not retry ELS request if qedf_alloc_cmd fails
	drm/mst: Fix MST sideband up-reply failure handling
	powerpc/pseries/hvconsole: Fix stack overread via udbg
	selftests: rtnetlink: add addresses with fixed life time
	rxrpc: Fix possible NULL pointer access in ICMP handling
	ath9k_htc: Modify byte order for an error message
	ath9k_htc: Discard undersized packets
	arm64: dts: meson: odroid-c2: Disable usb_otg bus to avoid power failed warning
	net: add annotations on hh->hh_len lockless accesses
	s390/smp: fix physical to logical CPU map for SMT
	xen/blkback: Avoid unmapping unmapped grant pages
	perf/x86/intel/bts: Fix the use of page_private()
	Linux 4.14.163

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-01-09 16:03:53 +01:00
Jeffrey Vander Stoep
7753861270 Revert "BACKPORT: perf_event: Add support for LSM and SELinux checks"
This reverts commit f81151cd3afda797c1f0871e42a19606277b414b.

Reason for revert: collides with aosp/1137243 and breaks build

Change-Id: I6d0216ccaa1a759fb1732c07601f5877b81a5f03
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
2020-01-09 10:37:25 +00:00
Wen Yang
7650b4b1df ftrace: Avoid potential division by zero in function profiler
commit e31f7939c1c27faa5d0e3f14519eaf7c89e8a69d upstream.

The ftrace_profile->counter is unsigned long and
do_div truncates it to 32 bits, which means it can test
non-zero and be truncated to zero for division.
Fix this issue by using div64_ul() instead.

Link: http://lkml.kernel.org/r/20200103030248.14516-1-wenyang@linux.alibaba.com

Cc: stable@vger.kernel.org
Fixes: e330b3bcd8319 ("tracing: Show sample std dev in function profiling")
Fixes: 34886c8bc590f ("tracing: add average time in function to function profiler")
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:17:56 +01:00
chenqiwu
66a10703d6 exit: panic before exit_mm() on global init exit
commit 43cf75d96409a20ef06b756877a2e72b10a026fc upstream.

Currently, when global init and all threads in its thread-group have exited
we panic via:
do_exit()
-> exit_notify()
   -> forget_original_parent()
      -> find_child_reaper()
This makes it hard to extract a useable coredump for global init from a
kernel crashdump because by the time we panic exit_mm() will have already
released global init's mm.
This patch moves the panic futher up before exit_mm() is called. As was the
case previously, we only panic when global init and all its threads in the
thread-group have exited.

Signed-off-by: chenqiwu <chenqiwu@xiaomi.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
[christian.brauner@ubuntu.com: fix typo, rewrite commit message]
Link: https://lore.kernel.org/r/1576736993-10121-1-git-send-email-qiwuchen55@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:17:56 +01:00
Steven Rostedt (VMware)
bc5e8a8a58 tracing: Have the histogram compare functions convert to u64 first
commit 106f41f5a302cb1f36c7543fae6a05de12e96fa4 upstream.

The compare functions of the histogram code would be specific for the size
of the value being compared (byte, short, int, long long). It would
reference the value from the array via the type of the compare, but the
value was stored in a 64 bit number. This is fine for little endian
machines, but for big endian machines, it would end up comparing zeros or
all ones (depending on the sign) for anything but 64 bit numbers.

To fix this, first derference the value as a u64 then convert it to the type
being compared.

Link: http://lkml.kernel.org/r/20191211103557.7bed6928@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 08d43a5fa063e ("tracing: Add lock-free tracing_map")
Acked-by: Tom Zanussi <zanussi@kernel.org>
Reported-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:17:56 +01:00
Prateek Sood
af24719234 tracing: Fix lock inversion in trace_event_enable_tgid_record()
commit 3a53acf1d9bea11b57c1f6205e3fe73f9d8a3688 upstream.

       Task T2                             Task T3
trace_options_core_write()            subsystem_open()

 mutex_lock(trace_types_lock)           mutex_lock(event_mutex)

 set_tracer_flag()

   trace_event_enable_tgid_record()       mutex_lock(trace_types_lock)

    mutex_lock(event_mutex)

This gives a circular dependency deadlock between trace_types_lock and
event_mutex. To fix this invert the usage of trace_types_lock and
event_mutex in trace_options_core_write(). This keeps the sequence of
lock usage consistent.

Link: http://lkml.kernel.org/r/0101016eef175e38-8ca71caf-a4eb-480d-a1e6-6f0bbc015495-000000@us-west-2.amazonses.com

Cc: stable@vger.kernel.org
Fixes: d914ba37d7145 ("tracing: Add support for recording tgid of tasks")
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:17:56 +01:00
Shakeel Butt
abb59358f1 memcg: account security cred as well to kmemcg
commit 84029fd04c201a4c7e0b07ba262664900f47c6f5 upstream.

The cred_jar kmem_cache is already memcg accounted in the current kernel
but cred->security is not.  Account cred->security to kmemcg.

Recently we saw high root slab usage on our production and on further
inspection, we found a buggy application leaking processes.  Though that
buggy application was contained within its memcg but we observe much
more system memory overhead, couple of GiBs, during that period.  This
overhead can adversely impact the isolation on the system.

One source of high overhead we found was cred->security objects, which
have a lifetime of at least the life of the process which allocated
them.

Link: http://lkml.kernel.org/r/20191205223721.40034-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:17:54 +01:00
Christian Brauner
4fe7cb918a taskstats: fix data-race
[ Upstream commit 0b8d616fb5a8ffa307b1d3af37f55c15dae14f28 ]

When assiging and testing taskstats in taskstats_exit() there's a race
when setting up and reading sig->stats when a thread-group with more
than one thread exits:

write to 0xffff8881157bbe10 of 8 bytes by task 7951 on cpu 0:
 taskstats_tgid_alloc kernel/taskstats.c:567 [inline]
 taskstats_exit+0x6b7/0x717 kernel/taskstats.c:596
 do_exit+0x2c2/0x18e0 kernel/exit.c:864
 do_group_exit+0xb4/0x1c0 kernel/exit.c:983
 get_signal+0x2a2/0x1320 kernel/signal.c:2734
 do_signal+0x3b/0xc00 arch/x86/kernel/signal.c:815
 exit_to_usermode_loop+0x250/0x2c0 arch/x86/entry/common.c:159
 prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
 do_syscall_64+0x2d7/0x2f0 arch/x86/entry/common.c:299
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

read to 0xffff8881157bbe10 of 8 bytes by task 7949 on cpu 1:
 taskstats_tgid_alloc kernel/taskstats.c:559 [inline]
 taskstats_exit+0xb2/0x717 kernel/taskstats.c:596
 do_exit+0x2c2/0x18e0 kernel/exit.c:864
 do_group_exit+0xb4/0x1c0 kernel/exit.c:983
 __do_sys_exit_group kernel/exit.c:994 [inline]
 __se_sys_exit_group kernel/exit.c:992 [inline]
 __x64_sys_exit_group+0x2e/0x30 kernel/exit.c:992
 do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by using smp_load_acquire() and smp_store_release().

Reported-by: syzbot+c5d03165a1bd1dead0c1@syzkaller.appspotmail.com
Fixes: 34ec12349c8a ("taskstats: cleanup ->signal->stats allocation")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Marco Elver <elver@google.com>
Reviewed-by: Will Deacon <will@kernel.org>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/r/20191009114809.8643-1-christian.brauner@ubuntu.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-09 10:17:53 +01:00
Andy Whitcroft
c0fb3b1b49 PM / hibernate: memory_bm_find_bit(): Tighten node optimisation
[ Upstream commit da6043fe85eb5ec621e34a92540735dcebbea134 ]

When looking for a bit by number we make use of the cached result from the
preceding lookup to speed up operation.  Firstly we check if the requested
pfn is within the cached zone and if not lookup the new zone.  We then
check if the offset for that pfn falls within the existing cached node.
This happens regardless of whether the node is within the zone we are
now scanning.  With certain memory layouts it is possible for this to
false trigger creating a temporary alias for the pfn to a different bit.
This leads the hibernation code to free memory which it was never allocated
with the expected fallout.

Ensure the zone we are scanning matches the cached zone before considering
the cached node.

Deep thanks go to Andrea for many, many, many hours of hacking and testing
that went into cornering this bug.

Reported-by: Andrea Righi <andrea.righi@canonical.com>
Tested-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-09 10:17:53 +01:00
Joel Fernandes (Google)
f81151cd3a BACKPORT: perf_event: Add support for LSM and SELinux checks
In current mainline, the degree of access to perf_event_open(2) system
call depends on the perf_event_paranoid sysctl.  This has a number of
limitations:

1. The sysctl is only a single value. Many types of accesses are controlled
   based on the single value thus making the control very limited and
   coarse grained.
2. The sysctl is global, so if the sysctl is changed, then that means
   all processes get access to perf_event_open(2) opening the door to
   security issues.

This patch adds LSM and SELinux access checking which will be used in
Android to access perf_event_open(2) for the purposes of attaching BPF
programs to tracepoints, perf profiling and other operations from
userspace. These operations are intended for production systems.

5 new LSM hooks are added:
1. perf_event_open: This controls access during the perf_event_open(2)
   syscall itself. The hook is called from all the places that the
   perf_event_paranoid sysctl is checked to keep it consistent with the
   systctl. The hook gets passed a 'type' argument which controls CPU,
   kernel and tracepoint accesses (in this context, CPU, kernel and
   tracepoint have the same semantics as the perf_event_paranoid sysctl).
   Additionally, I added an 'open' type which is similar to
   perf_event_paranoid sysctl == 3 patch carried in Android and several other
   distros but was rejected in mainline [1] in 2016.

2. perf_event_alloc: This allocates a new security object for the event
   which stores the current SID within the event. It will be useful when
   the perf event's FD is passed through IPC to another process which may
   try to read the FD. Appropriate security checks will limit access.

3. perf_event_free: Called when the event is closed.

4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event.

5. perf_event_write: Called from the ioctl(2) syscalls for the event.

[1] https://lwn.net/Articles/696240/

Since Peter had suggest LSM hooks in 2016 [1], I am adding his
Suggested-by tag below.

To use this patch, we set the perf_event_paranoid sysctl to -1 and then
apply selinux checking as appropriate (default deny everything, and then
add policy rules to give access to domains that need it). In the future
we can remove the perf_event_paranoid sysctl altogether.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: James Morris <jmorris@namei.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: rostedt@goodmis.org
Cc: Yonghong Song <yhs@fb.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: jeffv@google.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: primiano@google.com
Cc: Song Liu <songliubraving@fb.com>
Cc: rsavitski@google.com
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Matthew Garrett <matthewgarrett@google.com>
Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org

Bug: 137092007
Change-Id: Ibb356813b0b2f0cedab7806ee21ce4c00469be32
(cherry picked from commit da97e18458fb42d7c00fac5fd1c56a3896ec666e)
[ Ryan Savitski:
  Adapted for older APIs, e.g. hlist -> list, removed refs to
  selinux_state. No new functionality. ]
Signed-off-by: Ryan Savitski <rsavitski@google.com>
2020-01-08 14:29:48 +00:00
chenqiwu
7742890e23 UPSTREAM: exit: panic before exit_mm() on global init exit
Currently, when global init and all threads in its thread-group have exited
we panic via:
do_exit()
-> exit_notify()
   -> forget_original_parent()
      -> find_child_reaper()
This makes it hard to extract a useable coredump for global init from a
kernel crashdump because by the time we panic exit_mm() will have already
released global init's mm.
This patch moves the panic futher up before exit_mm() is called. As was the
case previously, we only panic when global init and all its threads in the
thread-group have exited.

Signed-off-by: chenqiwu <chenqiwu@xiaomi.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
[christian.brauner@ubuntu.com: fix typo, rewrite commit message]
Link: https://lore.kernel.org/r/1576736993-10121-1-git-send-email-qiwuchen55@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
(cherry picked from commit 43cf75d96409a20ef06b756877a2e72b10a026fc)
Bug: 146789558
Change-Id: Icff81267e8c49bf1d332773351d1b47cb8cbac4a
Signed-off-by: Alistair Delva <adelva@google.com>
2020-01-06 15:53:08 +00:00
Greg Kroah-Hartman
c2bd4f8f0c This is the 4.14.162 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4QjoEACgkQONu9yGCS
 aT61Ig/9GTbv5+njbemhs01loMsA6H4u+BwFHjxJiTzfj+7TwKDZVDcllmiKkPSQ
 cS3+n6oV1G5VzzmTRU5WKBNQkgu2t6TmcxI4xiDTIZ+GlzdC7b7bp0uEv7bRGIMg
 lm6oHBoy753oMiB/Z4itA58tuLVsEw9sjZJ3O7wvlaFl4NzD8clGCc9iLQaLofDP
 7uXWPgtZ3yRDquOtjPV7c52qwbr/QUZs13iH6xwSHIK6kmTbuhKbQB2TqdrHlKrc
 FxlitA8NAjn8s7PrJd0NWQxxEW0by3W+pYZ6yvzF1zlY1UWkZB7WfKK8kW5A/5Jt
 alPtbHAZGbxuobVewObRosM/DZ6vYHNE78M6FUkyo7113lsvVNhz71h8YbO/beCc
 PPGzvQbbeaWGdVtTFVih75HwtGVktwRpgdA1H0NPZb4eWX9eZl8BrgMvo4EsAvl4
 BMYiWbYvR7ijWvbahwTHlpnpmce2acWD5H+oGE338lXvXfXjgrH5d2DlZ9bWTdKv
 h1YmINQ/cZuUoAe9vlUr/uXIflwza65TJWDRRjzXMZ7FOLwXTTCjqFO+36PZ5zRf
 4jdfZa4Uz0HmfH95bVJRbRuAt1Fny/mK3sx7vjTcu0qT9FpG8P3tSJR9rz8yEbVb
 X0dmyUHl2qNFj0Y/cV3AJJjTEuDbhmXfwPmXPgF4owR6R0rhfM4=
 =57Qt
 -----END PGP SIGNATURE-----

Merge 4.14.162 into android-4.14

Changes in 4.14.162
	scsi: lpfc: Fix discovery failures when target device connectivity bounces
	scsi: mpt3sas: Fix clear pending bit in ioctl status
	scsi: lpfc: Fix locking on mailbox command completion
	Input: atmel_mxt_ts - disable IRQ across suspend
	iommu/tegra-smmu: Fix page tables in > 4 GiB memory
	scsi: target: compare full CHAP_A Algorithm strings
	scsi: lpfc: Fix SLI3 hba in loop mode not discovering devices
	scsi: csiostor: Don't enable IRQs too early
	powerpc/pseries: Mark accumulate_stolen_time() as notrace
	powerpc/pseries: Don't fail hash page table insert for bolted mapping
	powerpc/tools: Don't quote $objdump in scripts
	dma-debug: add a schedule point in debug_dma_dump_mappings()
	clocksource/drivers/asm9260: Add a check for of_clk_get
	powerpc/security/book3s64: Report L1TF status in sysfs
	powerpc/book3s64/hash: Add cond_resched to avoid soft lockup warning
	ext4: update direct I/O read lock pattern for IOCB_NOWAIT
	jbd2: Fix statistics for the number of logged blocks
	scsi: tracing: Fix handling of TRANSFER LENGTH == 0 for READ(6) and WRITE(6)
	scsi: lpfc: Fix duplicate unreg_rpi error in port offline flow
	f2fs: fix to update dir's i_pino during cross_rename
	clk: qcom: Allow constant ratio freq tables for rcg
	irqchip/irq-bcm7038-l1: Enable parent IRQ if necessary
	irqchip: ingenic: Error out if IRQ domain creation failed
	fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long
	scsi: lpfc: fix: Coverity: lpfc_cmpl_els_rsp(): Null pointer dereferences
	scsi: ufs: fix potential bug which ends in system hang
	powerpc/pseries/cmm: Implement release() function for sysfs device
	powerpc/security: Fix wrong message when RFI Flush is disable
	scsi: atari_scsi: sun3_scsi: Set sg_tablesize to 1 instead of SG_NONE
	clk: pxa: fix one of the pxa RTC clocks
	bcache: at least try to shrink 1 node in bch_mca_scan()
	HID: logitech-hidpp: Silence intermittent get_battery_capacity errors
	libnvdimm/btt: fix variable 'rc' set but not used
	HID: Improve Windows Precision Touchpad detection.
	scsi: pm80xx: Fix for SATA device discovery
	scsi: ufs: Fix error handing during hibern8 enter
	scsi: scsi_debug: num_tgts must be >= 0
	scsi: NCR5380: Add disconnect_mask module parameter
	scsi: iscsi: Don't send data to unbound connection
	scsi: target: iscsi: Wait for all commands to finish before freeing a session
	gpio: mpc8xxx: Don't overwrite default irq_set_type callback
	apparmor: fix unsigned len comparison with less than zero
	scripts/kallsyms: fix definitely-lost memory leak
	cdrom: respect device capabilities during opening action
	perf script: Fix brstackinsn for AUXTRACE
	perf regs: Make perf_reg_name() return "unknown" instead of NULL
	s390/zcrypt: handle new reply code FILTERED_BY_HYPERVISOR
	libfdt: define INT32_MAX and UINT32_MAX in libfdt_env.h
	s390/cpum_sf: Check for SDBT and SDB consistency
	ocfs2: fix passing zero to 'PTR_ERR' warning
	kernel: sysctl: make drop_caches write-only
	userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
	x86/mce: Fix possibly incorrect severity calculation on AMD
	net, sysctl: Fix compiler warning when only cBPF is present
	netfilter: nf_queue: enqueue skbs with NULL dst
	ALSA: hda - Downgrade error message for single-cmd fallback
	bonding: fix active-backup transition after link failure
	perf strbuf: Remove redundant va_end() in strbuf_addv()
	Make filldir[64]() verify the directory entry filename is valid
	filldir[64]: remove WARN_ON_ONCE() for bad directory entries
	netfilter: ebtables: compat: reject all padding in matches/watchers
	6pack,mkiss: fix possible deadlock
	netfilter: bridge: make sure to pull arp header in br_nf_forward_arp()
	inetpeer: fix data-race in inet_putpeer / inet_putpeer
	net: add a READ_ONCE() in skb_peek_tail()
	net: icmp: fix data-race in cmp_global_allow()
	hrtimer: Annotate lockless access to timer->state
	spi: fsl: don't map irq during probe
	tty/serial: atmel: fix out of range clock divider handling
	pinctrl: baytrail: Really serialize all register accesses
	net: ena: fix napi handler misbehavior when the napi budget is zero
	net/mlxfw: Fix out-of-memory error in mfa2 flash burning
	ptp: fix the race between the release of ptp_clock and cdev
	udp: fix integer overflow while computing available space in sk_rcvbuf
	vhost/vsock: accept only packets with the right dst_cid
	net: add bool confirm_neigh parameter for dst_ops.update_pmtu
	ip6_gre: do not confirm neighbor when do pmtu update
	gtp: do not confirm neighbor when do pmtu update
	net/dst: add new function skb_dst_update_pmtu_no_confirm
	tunnel: do not confirm neighbor when do pmtu update
	vti: do not confirm neighbor when do pmtu update
	sit: do not confirm neighbor when do pmtu update
	gtp: do not allow adding duplicate tid and ms_addr pdp context
	tcp/dccp: fix possible race __inet_lookup_established()
	tcp: do not send empty skb from tcp_write_xmit()
	gtp: fix wrong condition in gtp_genl_dump_pdp()
	gtp: fix an use-after-free in ipv4_pdp_find()
	gtp: avoid zero size hashtable
	spi: fsl: use platform_get_irq() instead of of_irq_to_resource()
	Linux 4.14.162

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-01-04 19:09:37 +01:00
Vladis Dronov
2dece4d6d1 ptp: fix the race between the release of ptp_clock and cdev
[ Upstream commit a33121e5487b424339636b25c35d3a180eaa5f5e ]

In a case when a ptp chardev (like /dev/ptp0) is open but an underlying
device is removed, closing this file leads to a race. This reproduces
easily in a kvm virtual machine:

ts# cat openptp0.c
int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); }
ts# uname -r
5.5.0-rc3-46cf053e
ts# cat /proc/cmdline
... slub_debug=FZP
ts# modprobe ptp_kvm
ts# ./openptp0 &
[1] 670
opened /dev/ptp0, sleeping 10s...
ts# rmmod ptp_kvm
ts# ls /dev/ptp*
ls: cannot access '/dev/ptp*': No such file or directory
ts# ...woken up
[   48.010809] general protection fault: 0000 [#1] SMP
[   48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25
[   48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
[   48.016270] RIP: 0010:module_put.part.0+0x7/0x80
[   48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202
[   48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0
[   48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b
[   48.019470] ...                                              ^^^ a slub poison
[   48.023854] Call Trace:
[   48.024050]  __fput+0x21f/0x240
[   48.024288]  task_work_run+0x79/0x90
[   48.024555]  do_exit+0x2af/0xab0
[   48.024799]  ? vfs_write+0x16a/0x190
[   48.025082]  do_group_exit+0x35/0x90
[   48.025387]  __x64_sys_exit_group+0xf/0x10
[   48.025737]  do_syscall_64+0x3d/0x130
[   48.026056]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   48.026479] RIP: 0033:0x7f53b12082f6
[   48.026792] ...
[   48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm]
[   48.045001] Fixing recursive fault but reboot is needed!

This happens in:

static void __fput(struct file *file)
{   ...
    if (file->f_op->release)
        file->f_op->release(inode, file); <<< cdev is kfree'd here
    if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
             !(mode & FMODE_PATH))) {
        cdev_put(inode->i_cdev); <<< cdev fields are accessed here

Namely:

__fput()
  posix_clock_release()
    kref_put(&clk->kref, delete_clock) <<< the last reference
      delete_clock()
        delete_ptp_clock()
          kfree(ptp) <<< cdev is embedded in ptp
  cdev_put
    module_put(p->owner) <<< *p is kfree'd, bang!

Here cdev is embedded in posix_clock which is embedded in ptp_clock.
The race happens because ptp_clock's lifetime is controlled by two
refcounts: kref and cdev.kobj in posix_clock. This is wrong.

Make ptp_clock's sysfs device a parent of cdev with cdev_device_add()
created especially for such cases. This way the parent device with its
ptp_clock is not released until all references to the cdev are released.
This adds a requirement that an initialized but not exposed struct
device should be provided to posix_clock_register() by a caller instead
of a simple dev_t.

This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix
the race between the release of watchdog_core_data and cdev"). See
details of the implementation in the commit 233ed09d7fda ("chardev: add
helper function to register char devs with a struct device").

Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#u
Analyzed-by: Stephen Johnston <sjohnsto@redhat.com>
Analyzed-by: Vern Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04 14:00:12 +01:00
Eric Dumazet
a51afeedc6 hrtimer: Annotate lockless access to timer->state
commit 56144737e67329c9aaed15f942d46a6302e2e3d8 upstream.

syzbot reported various data-race caused by hrtimer_is_queued() reading
timer->state. A READ_ONCE() is required there to silence the warning.

Also add the corresponding WRITE_ONCE() when timer->state is set.

In remove_hrtimer() the hrtimer_is_queued() helper is open coded to avoid
loading timer->state twice.

KCSAN reported these cases:

BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check

write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0:
 __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
 __run_hrtimer kernel/time/hrtimer.c:1496 [inline]
 __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
 __do_softirq+0x115/0x33f kernel/softirq.c:292
 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1:
 tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline]
 tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225
 tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044
 tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558
 tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717
 tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696
 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
 sk_backlog_rcv include/net/sock.h:945 [inline]
 __release_sock+0x135/0x1e0 net/core/sock.c:2435
 release_sock+0x61/0x160 net/core/sock.c:2951
 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
 sock_sendmsg_nosec net/socket.c:637 [inline]
 sock_sendmsg+0x9f/0xc0 net/socket.c:657

BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check

write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0:
 __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
 __run_hrtimer kernel/time/hrtimer.c:1496 [inline]
 __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
 __do_softirq+0x115/0x33f kernel/softirq.c:292
 invoke_softirq kernel/softirq.c:373 [inline]
 irq_exit+0xbb/0xe0 kernel/softirq.c:413
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830

read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1:
 __tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265
 tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline]
 tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708
 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
 sk_backlog_rcv include/net/sock.h:945 [inline]
 __release_sock+0x135/0x1e0 net/core/sock.c:2435
 release_sock+0x61/0x160 net/core/sock.c:2951
 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
 sock_sendmsg_nosec net/socket.c:637 [inline]
 sock_sendmsg+0x9f/0xc0 net/socket.c:657
 __sys_sendto+0x21f/0x320 net/socket.c:1952
 __do_sys_sendto net/socket.c:1964 [inline]
 __se_sys_sendto net/socket.c:1960 [inline]
 __x64_sys_sendto+0x89/0xb0 net/socket.c:1960
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

[ tglx: Added comments ]

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191106174804.74723-1-edumazet@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04 14:00:08 +01:00
Johannes Weiner
5dc89b665d kernel: sysctl: make drop_caches write-only
[ Upstream commit 204cb79ad42f015312a5bbd7012d09c93d9b46fb ]

Currently, the drop_caches proc file and sysctl read back the last value
written, suggesting this is somehow a stateful setting instead of a
one-time command.  Make it write-only, like e.g.  compact_memory.

While mitigating a VM problem at scale in our fleet, there was confusion
about whether writing to this file will permanently switch the kernel into
a non-caching mode.  This influences the decision making in a tense
situation, where tens of people are trying to fix tens of thousands of
affected machines: Do we need a rollback strategy?  What are the
performance implications of operating in a non-caching state for several
days?  It also caused confusion when the kernel team said we may need to
write the file several times to make sure it's effective ("But it already
reads back 3?").

Link: http://lkml.kernel.org/r/20191031221602.9375-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04 13:59:57 +01:00
Greg Kroah-Hartman
f960b38ecc This is the 4.14.159 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl35LxUACgkQONu9yGCS
 aT4dLRAAn62JwQdXIRD51BSeXOCxH/oTba3lec9YCf7ttnQojnBKg4Fzxps4R0eH
 a32uSWOfEb9P7CIIlSAKTx6XPJ3TAmsFjUO1KmO0kbMVmUw6G3yb0g/96/tNjSUt
 xJwyhYSasQDMmxV/5HvrkCbobeHP1Gf+eacKWfJHaVOUo1UXaW+25A++I5fYOdhn
 vrcHmJyJAQN38beSOVLDUJ+VXTyEO5ZUG+Pe7IwK1QiOF4JfWoIddCdxxyynT5PR
 R54x+xPLsaiFXQEjlVIATIsr9KgR3is4utpfSd7MYGxCD7yV4VNrZZighVKBIlV8
 39K0zmcpbSIu3PHvxVGxpdjTzPWErPKH6tjHJ/weMI+zy4tHpzUOvpooH46BvYsn
 XMhlqsYlWS0Nj9eCpUxxkDr1hyuZlpv5RPyW4xKFWor6zQvVi+cl1wiDu0tKCD7T
 gg3vB04mMOBnGUsEzTc0I/hPcWp6xThQg4N9Zh/MbdwqSkN5KHDgakIMa2yEYRB7
 ZLskhnvB2te1KVHvn5CsxR0ABPextALn/u/7qELgGIKoyJVzgmL/lF3wceGsUwz3
 hpcWmYKKu5nPg+L1bCHj05O3IcaUhCmvTBkV39nh4TshTTPU0PkvBv20UoChcgER
 /4QhKydpeLwKi5hTuBuHN6z3PuGrId3opf28KdGsHQ1KGPqd5os=
 =p3OE
 -----END PGP SIGNATURE-----

Merge 4.14.159 into android-4.14

Changes in 4.14.159
	rsi: release skb if rsi_prepare_beacon fails
	arm64: tegra: Fix 'active-low' warning for Jetson TX1 regulator
	usb: gadget: u_serial: add missing port entry locking
	tty: serial: fsl_lpuart: use the sg count from dma_map_sg
	tty: serial: msm_serial: Fix flow control
	serial: pl011: Fix DMA ->flush_buffer()
	serial: serial_core: Perform NULL checks for break_ctl ops
	serial: ifx6x60: add missed pm_runtime_disable
	autofs: fix a leak in autofs_expire_indirect()
	RDMA/hns: Correct the value of HNS_ROCE_HEM_CHUNK_LEN
	iwlwifi: pcie: don't consider IV len in A-MSDU
	exportfs_decode_fh(): negative pinned may become positive without the parent locked
	audit_get_nd(): don't unlock parent too early
	NFC: nxp-nci: Fix NULL pointer dereference after I2C communication error
	xfrm: release device reference for invalid state
	Input: cyttsp4_core - fix use after free bug
	sched/core: Avoid spurious lock dependencies
	ALSA: pcm: Fix stream lock usage in snd_pcm_period_elapsed()
	rsxx: add missed destroy_workqueue calls in remove
	net: ep93xx_eth: fix mismatch of request_mem_region in remove
	i2c: core: fix use after free in of_i2c_notify
	serial: core: Allow processing sysrq at port unlock time
	cxgb4vf: fix memleak in mac_hlist initialization
	iwlwifi: mvm: synchronize TID queue removal
	iwlwifi: mvm: Send non offchannel traffic via AP sta
	ARM: 8813/1: Make aligned 2-byte getuser()/putuser() atomic on ARMv6+
	net/mlx5: Release resource on error flow
	clk: sunxi-ng: a64: Fix gate bit of DSI DPHY
	dlm: fix possible call to kfree() for non-initialized pointer
	extcon: max8997: Fix lack of path setting in USB device mode
	net: ethernet: ti: cpts: correct debug for expired txq skb
	rtc: s3c-rtc: Avoid using broken ALMYEAR register
	i40e: don't restart nway if autoneg not supported
	clk: rockchip: fix rk3188 sclk_smc gate data
	clk: rockchip: fix rk3188 sclk_mac_lbtest parameter ordering
	ARM: dts: rockchip: Fix rk3288-rock2 vcc_flash name
	dlm: fix missing idr_destroy for recover_idr
	MIPS: SiByte: Enable ZONE_DMA32 for LittleSur
	net: dsa: mv88e6xxx: Work around mv886e6161 SERDES missing MII_PHYSID2
	scsi: zfcp: drop default switch case which might paper over missing case
	crypto: ecc - check for invalid values in the key verification test
	crypto: bcm - fix normal/non key hash algorithm failure
	pinctrl: qcom: ssbi-gpio: fix gpio-hog related boot issues
	Staging: iio: adt7316: Fix i2c data reading, set the data field
	mm/vmstat.c: fix NUMA statistics updates
	clk: rockchip: fix I2S1 clock gate register for rk3328
	clk: rockchip: fix ID of 8ch clock of I2S1 for rk3328
	regulator: Fix return value of _set_load() stub
	net-next/hinic:fix a bug in set mac address
	iomap: sub-block dio needs to zeroout beyond EOF
	MIPS: OCTEON: octeon-platform: fix typing
	net/smc: use after free fix in smc_wr_tx_put_slot()
	math-emu/soft-fp.h: (_FP_ROUND_ZERO) cast 0 to void to fix warning
	rtc: max8997: Fix the returned value in case of error in 'max8997_rtc_read_alarm()'
	rtc: dt-binding: abx80x: fix resistance scale
	ARM: dts: exynos: Use Samsung SoC specific compatible for DWC2 module
	media: pulse8-cec: return 0 when invalidating the logical address
	media: cec: report Vendor ID after initialization
	dmaengine: coh901318: Fix a double-lock bug
	dmaengine: coh901318: Remove unused variable
	dmaengine: dw-dmac: implement dma protection control setting
	usb: dwc3: debugfs: Properly print/set link state for HS
	usb: dwc3: don't log probe deferrals; but do log other error codes
	ACPI: fix acpi_find_child_device() invocation in acpi_preset_companion()
	f2fs: fix count of seg_freed to make sec_freed correct
	f2fs: change segment to section in f2fs_ioc_gc_range
	ARM: dts: rockchip: Fix the PMU interrupt number for rv1108
	ARM: dts: rockchip: Assign the proper GPIO clocks for rv1108
	f2fs: fix to allow node segment for GC by ioctl path
	sparc: Correct ctx->saw_frame_pointer logic.
	dma-mapping: fix return type of dma_set_max_seg_size()
	altera-stapl: check for a null key before strcasecmp'ing it
	serial: imx: fix error handling in console_setup
	i2c: imx: don't print error message on probe defer
	lockd: fix decoding of TEST results
	ASoC: rsnd: tidyup registering method for rsnd_kctrl_new()
	ARM: dts: sun5i: a10s: Fix HDMI output DTC warning
	ARM: dts: sun8i: v3s: Change pinctrl nodes to avoid warning
	dlm: NULL check before kmem_cache_destroy is not needed
	ARM: debug: enable UART1 for socfpga Cyclone5
	nfsd: fix a warning in __cld_pipe_upcall()
	ASoC: au8540: use 64-bit arithmetic instead of 32-bit
	ARM: OMAP1/2: fix SoC name printing
	arm64: dts: meson-gxl-libretech-cc: fix GPIO lines names
	arm64: dts: meson-gxbb-nanopi-k2: fix GPIO lines names
	arm64: dts: meson-gxbb-odroidc2: fix GPIO lines names
	arm64: dts: meson-gxl-khadas-vim: fix GPIO lines names
	net/x25: fix called/calling length calculation in x25_parse_address_block
	net/x25: fix null_x25_address handling
	ARM: dts: mmp2: fix the gpio interrupt cell number
	ARM: dts: realview-pbx: Fix duplicate regulator nodes
	tcp: fix off-by-one bug on aborting window-probing socket
	tcp: fix SNMP under-estimation on failed retransmission
	tcp: fix SNMP TCP timeout under-estimation
	modpost: skip ELF local symbols during section mismatch check
	kbuild: fix single target build for external module
	mtd: fix mtd_oobavail() incoherent returned value
	ARM: dts: pxa: clean up USB controller nodes
	clk: sunxi-ng: h3/h5: Fix CSI_MCLK parent
	ARM: dts: realview: Fix some more duplicate regulator nodes
	dlm: fix invalid cluster name warning
	net/mlx4_core: Fix return codes of unsupported operations
	pstore/ram: Avoid NULL deref in ftrace merging failure path
	powerpc/math-emu: Update macros from GCC
	clk: renesas: r8a77995: Correct parent clock of DU
	MIPS: OCTEON: cvmx_pko_mem_debug8: use oldest forward compatible definition
	nfsd: Return EPERM, not EACCES, in some SETATTR cases
	tty: Don't block on IO when ldisc change is pending
	media: stkwebcam: Bugfix for wrong return values
	firmware: qcom: scm: fix compilation error when disabled
	mlxsw: spectrum_router: Relax GRE decap matching check
	IB/hfi1: Ignore LNI errors before DC8051 transitions to Polling state
	IB/hfi1: Close VNIC sdma_progress sleep window
	mlx4: Use snprintf instead of complicated strcpy
	usb: mtu3: fix dbginfo in qmu_tx_zlp_error_handler
	ARM: dts: sunxi: Fix PMU compatible strings
	media: vimc: fix start stream when link is disabled
	net: aquantia: fix RSS table and key sizes
	tcp: exit if nothing to retransmit on RTO timeout
	sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision
	fuse: verify nlink
	fuse: verify attributes
	ALSA: hda/realtek - Dell headphone has noise on unmute for ALC236
	ALSA: pcm: oss: Avoid potential buffer overflows
	ALSA: hda - Add mute led support for HP ProBook 645 G4
	Input: synaptics - switch another X1 Carbon 6 to RMI/SMbus
	Input: synaptics-rmi4 - re-enable IRQs in f34v7_do_reflash
	Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers
	Input: goodix - add upside-down quirk for Teclast X89 tablet
	coresight: etm4x: Fix input validation for sysfs.
	Input: Fix memory leak in psxpad_spi_probe
	x86/PCI: Avoid AMD FCH XHCI USB PME# from D0 defect
	CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
	CIFS: Fix SMB2 oplock break processing
	tty: vt: keyboard: reject invalid keycodes
	can: slcan: Fix use-after-free Read in slcan_open
	kernfs: fix ino wrap-around detection
	jbd2: Fix possible overflow in jbd2_log_space_left()
	drm/i810: Prevent underflow in ioctl
	KVM: arm/arm64: vgic: Don't rely on the wrong pending table
	KVM: x86: do not modify masked bits of shared MSRs
	KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
	crypto: crypto4xx - fix double-free in crypto4xx_destroy_sdr
	crypto: af_alg - cast ki_complete ternary op to int
	crypto: ccp - fix uninitialized list head
	crypto: ecdh - fix big endian bug in ECC library
	crypto: user - fix memory leak in crypto_report
	spi: atmel: Fix CS high support
	RDMA/qib: Validate ->show()/store() callbacks before calling them
	iomap: Fix pipe page leakage during splicing
	thermal: Fix deadlock in thermal thermal_zone_device_check
	binder: Handle start==NULL in binder_update_page_range()
	ASoC: rsnd: fixup MIX kctrl registration
	KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332)
	appletalk: Fix potential NULL pointer dereference in unregister_snap_client
	appletalk: Set error code if register_snap_client failed
	usb: gadget: configfs: Fix missing spin_lock_init()
	usb: gadget: pch_udc: fix use after free
	scsi: qla2xxx: Fix driver unload hang
	media: venus: remove invalid compat_ioctl32 handler
	USB: uas: honor flag to avoid CAPACITY16
	USB: uas: heed CAPACITY_HEURISTICS
	USB: documentation: flags on usb-storage versus UAS
	usb: Allow USB device to be warm reset in suspended state
	staging: rtl8188eu: fix interface sanity check
	staging: rtl8712: fix interface sanity check
	staging: gigaset: fix general protection fault on probe
	staging: gigaset: fix illegal free on probe errors
	staging: gigaset: add endpoint-type sanity check
	usb: xhci: only set D3hot for pci device
	xhci: Increase STS_HALT timeout in xhci_suspend()
	xhci: handle some XHCI_TRUST_TX_LENGTH quirks cases as default behaviour.
	ARM: dts: pandora-common: define wl1251 as child node of mmc3
	iio: humidity: hdc100x: fix IIO_HUMIDITYRELATIVE channel reporting
	USB: atm: ueagle-atm: add missing endpoint check
	USB: idmouse: fix interface sanity checks
	USB: serial: io_edgeport: fix epic endpoint lookup
	USB: adutux: fix interface sanity check
	usb: core: urb: fix URB structure initialization function
	usb: mon: Fix a deadlock in usbmon between mmap and read
	tpm: add check after commands attribs tab allocation
	mtd: spear_smi: Fix Write Burst mode
	virtio-balloon: fix managed page counts when migrating pages between zones
	usb: dwc3: ep0: Clear started flag on completion
	btrfs: check page->mapping when loading free space cache
	btrfs: use refcount_inc_not_zero in kill_all_nodes
	Btrfs: fix negative subv_writers counter and data space leak after buffered write
	btrfs: Remove btrfs_bio::flags member
	Btrfs: send, skip backreference walking for extents with many references
	btrfs: record all roots for rename exchange on a subvol
	rtlwifi: rtl8192de: Fix missing code to retrieve RX buffer address
	rtlwifi: rtl8192de: Fix missing callback that tests for hw release of buffer
	rtlwifi: rtl8192de: Fix missing enable interrupt flag
	lib: raid6: fix awk build warnings
	ovl: relax WARN_ON() on rename to self
	ALSA: hda - Fix pending unsol events at shutdown
	md/raid0: Fix an error message in raid0_make_request()
	watchdog: aspeed: Fix clock behaviour for ast2600
	hwrng: omap - Fix RNG wait loop timeout
	dm zoned: reduce overhead of backing device checks
	workqueue: Fix spurious sanity check failures in destroy_workqueue()
	workqueue: Fix pwq ref leak in rescuer_thread()
	ASoC: Jack: Fix NULL pointer dereference in snd_soc_jack_report
	blk-mq: avoid sysfs buffer overflow with too many CPU cores
	cgroup: pids: use atomic64_t for pids->limit
	ar5523: check NULL before memcpy() in ar5523_cmd()
	s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported
	media: bdisp: fix memleak on release
	media: radio: wl1273: fix interrupt masking on release
	media: cec.h: CEC_OP_REC_FLAG_ values were swapped
	cpuidle: Do not unset the driver if it is there already
	intel_th: Fix a double put_device() in error path
	intel_th: pci: Add Ice Lake CPU support
	intel_th: pci: Add Tiger Lake CPU support
	PM / devfreq: Lock devfreq in trans_stat_show
	cpufreq: powernv: fix stack bloat and hard limit on number of CPUs
	ACPI: OSL: only free map once in osl.c
	ACPI: bus: Fix NULL pointer check in acpi_bus_get_private_data()
	ACPI: PM: Avoid attaching ACPI PM domain to certain devices
	pinctrl: samsung: Add of_node_put() before return in error path
	pinctrl: samsung: Fix device node refcount leaks in S3C24xx wakeup controller init
	pinctrl: samsung: Fix device node refcount leaks in init code
	pinctrl: samsung: Fix device node refcount leaks in S3C64xx wakeup controller init
	mmc: host: omap_hsmmc: add code for special init of wl1251 to get rid of pandora_wl1251_init_card
	ARM: dts: omap3-tao3530: Fix incorrect MMC card detection GPIO polarity
	ppdev: fix PPGETTIME/PPSETTIME ioctls
	powerpc: Allow 64bit VDSO __kernel_sync_dicache to work across ranges >4GB
	powerpc/xive: Prevent page fault issues in the machine crash handler
	powerpc: Allow flush_icache_range to work across ranges >4GB
	powerpc/xive: Skip ioremap() of ESB pages for LSI interrupts
	video/hdmi: Fix AVI bar unpack
	quota: Check that quota is not dirty before release
	ext2: check err when partial != NULL
	quota: fix livelock in dquot_writeback_dquots
	ext4: Fix credit estimate for final inode freeing
	reiserfs: fix extended attributes on the root directory
	block: fix single range discard merge
	scsi: zfcp: trace channel log even for FCP command responses
	scsi: qla2xxx: Fix DMA unmap leak
	scsi: qla2xxx: Fix session lookup in qlt_abort_work()
	scsi: qla2xxx: Fix qla24xx_process_bidir_cmd()
	scsi: qla2xxx: Always check the qla2x00_wait_for_hba_online() return value
	scsi: qla2xxx: Fix message indicating vectors used by driver
	xhci: Fix memory leak in xhci_add_in_port()
	xhci: make sure interrupts are restored to correct state
	iio: adis16480: Add debugfs_reg_access entry
	phy: renesas: rcar-gen3-usb2: Fix sysfs interface of "role"
	omap: pdata-quirks: remove openpandora quirks for mmc3 and wl1251
	scsi: lpfc: Cap NPIV vports to 256
	scsi: lpfc: Correct code setting non existent bits in sli4 ABORT WQE
	drbd: Change drbd_request_detach_interruptible's return type to int
	e100: Fix passing zero to 'PTR_ERR' warning in e100_load_ucode_wait
	x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
	x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
	power: supply: cpcap-battery: Fix signed counter sample register
	mlxsw: spectrum_router: Refresh nexthop neighbour when it becomes dead
	media: vimc: fix component match compare
	ath10k: fix fw crash by moving chip reset after napi disabled
	powerpc: Avoid clang warnings around setjmp and longjmp
	powerpc: Fix vDSO clock_getres()
	ext4: work around deleting a file with i_nlink == 0 safely
	firmware: qcom: scm: Ensure 'a0' status code is treated as signed
	mm/shmem.c: cast the type of unmap_start to u64
	ext4: fix a bug in ext4_wait_for_tail_page_commit
	mfd: rk808: Fix RK818 ID template
	blk-mq: make sure that line break can be printed
	workqueue: Fix missing kfree(rescuer) in destroy_workqueue()
	sunrpc: fix crash when cache_head become valid before update
	net/mlx5e: Fix SFF 8472 eeprom length
	gfs2: fix glock reference problem in gfs2_trans_remove_revoke
	kernel/module.c: wakeup processes in module_wq on module unload
	gpiolib: acpi: Add Terra Pad 1061 to the run_edge_events_on_boot_blacklist
	raid5: need to set STRIPE_HANDLE for batch head
	of: unittest: fix memory leak in attach_node_and_children
	Linux 4.14.159

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-12-17 21:13:36 +01:00
Konstantin Khorenko
4faf1cc3db kernel/module.c: wakeup processes in module_wq on module unload
[ Upstream commit 5d603311615f612320bb77bd2a82553ef1ced5b7 ]

Fix the race between load and unload a kernel module.

sys_delete_module()
 try_stop_module()
  mod->state = _GOING
					add_unformed_module()
					 old = find_module_all()
					 (old->state == _GOING =>
					  wait_event_interruptible())

					 During pre-condition
					 finished_loading() rets 0
					 schedule()
					 (never gets waken up later)
 free_module()
  mod->state = _UNFORMED
   list_del_rcu(&mod->list)
   (dels mod from "modules" list)

return

The race above leads to modprobe hanging forever on loading
a module.

Error paths on loading module call wake_up_all(&module_wq) after
freeing module, so let's do the same on straight module unload.

Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17 20:40:02 +01:00
Tejun Heo
80797bdcc5 workqueue: Fix missing kfree(rescuer) in destroy_workqueue()
commit 8efe1223d73c218ce7e8b2e0e9aadb974b582d7f upstream.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Qian Cai <cai@lca.pw>
Fixes: def98c84b6cd ("workqueue: Fix spurious sanity check failures in destroy_workqueue()")
Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 20:39:59 +01:00
Aleksa Sarai
fc992138b3 cgroup: pids: use atomic64_t for pids->limit
commit a713af394cf382a30dd28a1015cbe572f1b9ca75 upstream.

Because pids->limit can be changed concurrently (but we don't want to
take a lock because it would be needlessly expensive), use atomic64_ts
instead.

Fixes: commit 49b786ea146f ("cgroup: implement the PIDs subsystem")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 20:39:29 +01:00
Tejun Heo
9e0f33da90 workqueue: Fix pwq ref leak in rescuer_thread()
commit e66b39af00f426b3356b96433d620cb3367ba1ff upstream.

008847f66c3 ("workqueue: allow rescuer thread to do more work.") made
the rescuer worker requeue the pwq immediately if there may be more
work items which need rescuing instead of waiting for the next mayday
timer expiration.  Unfortunately, it doesn't check whether the pwq is
already on the mayday list and unconditionally gets the ref and moves
it onto the list.  This doesn't corrupt the list but creates an
additional reference to the pwq.  It got queued twice but will only be
removed once.

This leak later can trigger pwq refcnt warning on workqueue
destruction and prevent freeing of the workqueue.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Williams, Gerald S" <gerald.s.williams@intel.com>
Cc: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 20:39:26 +01:00
Tejun Heo
05905c2f21 workqueue: Fix spurious sanity check failures in destroy_workqueue()
commit def98c84b6cdf2eeea19ec5736e90e316df5206b upstream.

Before actually destrying a workqueue, destroy_workqueue() checks
whether it's actually idle.  If it isn't, it prints out a bunch of
warning messages and leaves the workqueue dangling.  It unfortunately
has a couple issues.

* Mayday list queueing increments pwq's refcnts which gets detected as
  busy and fails the sanity checks.  However, because mayday list
  queueing is asynchronous, this condition can happen without any
  actual work items left in the workqueue.

* Sanity check failure leaves the sysfs interface behind too which can
  lead to init failure of newer instances of the workqueue.

This patch fixes the above two by

* If a workqueue has a rescuer, disable and kill the rescuer before
  sanity checks.  Disabling and killing is guaranteed to flush the
  existing mayday list.

* Remove sysfs interface before sanity checks.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Marcin Pawlowski <mpawlowski@fb.com>
Reported-by: "Williams, Gerald S" <gerald.s.williams@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 20:39:25 +01:00
Xuewei Zhang
c294780a80 sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision
commit 4929a4e6faa0f13289a67cae98139e727f0d4a97 upstream.

The quota/period ratio is used to ensure a child task group won't get
more bandwidth than the parent task group, and is calculated as:

  normalized_cfs_quota() = [(quota_us << 20) / period_us]

If the quota/period ratio was changed during this scaling due to
precision loss, it will cause inconsistency between parent and child
task groups.

See below example:

A userspace container manager (kubelet) does three operations:

 1) Create a parent cgroup, set quota to 1,000us and period to 10,000us.
 2) Create a few children cgroups.
 3) Set quota to 1,000us and period to 10,000us on a child cgroup.

These operations are expected to succeed. However, if the scaling of
147/128 happens before step 3, quota and period of the parent cgroup
will be changed:

  new_quota: 1148437ns,   1148us
 new_period: 11484375ns, 11484us

And when step 3 comes in, the ratio of the child cgroup will be
104857, which will be larger than the parent cgroup ratio (104821),
and will fail.

Scaling them by a factor of 2 will fix the problem.

Tested-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Xuewei Zhang <xueweiz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Phil Auld <pauld@redhat.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes: 2e8e19226398 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup")
Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 20:38:43 +01:00
Peter Zijlstra
f506ed5538 sched/core: Avoid spurious lock dependencies
[ Upstream commit ff51ff84d82aea5a889b85f2b9fb3aa2b8691668 ]

While seemingly harmless, __sched_fork() does hrtimer_init(), which,
when DEBUG_OBJETS, can end up doing allocations.

This then results in the following lock order:

  rq->lock
    zone->lock.rlock
      batched_entropy_u64.lock

Which in turn causes deadlocks when we do wakeups while holding that
batched_entropy lock -- as the random code does.

Solve this by moving __sched_fork() out from under rq->lock. This is
safe because nothing there relies on rq->lock, as also evident from the
other __sched_fork() callsite.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: bigeasy@linutronix.de
Cc: cl@linux.com
Cc: keescook@chromium.org
Cc: penberg@kernel.org
Cc: rientjes@google.com
Cc: thgarnie@google.com
Cc: tytso@mit.edu
Cc: will@kernel.org
Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
Link: https://lkml.kernel.org/r/20191001091837.GK4536@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17 20:37:29 +01:00
Al Viro
0eda282779 audit_get_nd(): don't unlock parent too early
[ Upstream commit 69924b89687a2923e88cc42144aea27868913d0e ]

if the child has been negative and just went positive
under us, we want coherent d_is_positive() and ->d_inode.
Don't unlock the parent until we'd done that work...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17 20:37:27 +01:00
Sami Tolvanen
1c6394a973 FROMLIST: scs: add support for stack usage debugging
Implements CONFIG_DEBUG_STACK_USAGE for shadow stacks. When enabled,
also prints out the highest shadow stack usage per process.

Bug: 145210207
Change-Id: I4c085b51e1432e8d52e54126ffd8bf7b6e35b529
(am from https://lore.kernel.org/patchwork/patch/1149056/)
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2019-12-13 07:14:20 -08:00
Sami Tolvanen
ad751f33a5 FROMLIST: scs: add accounting
This change adds accounting for the memory allocated for shadow stacks.

Bug: 145210207
Change-Id: I51157fe0b23b4cb28bb33c86a5dfe3ac911296a4
(am from https://lore.kernel.org/patchwork/patch/1149055/)
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2019-12-13 07:14:20 -08:00
Sami Tolvanen
a7f2106930 FROMLIST: add support for Clang's Shadow Call Stack (SCS)
This change adds generic support for Clang's Shadow Call Stack,
which uses a shadow stack to protect return addresses from being
overwritten by an attacker. Details are available here:

  https://clang.llvm.org/docs/ShadowCallStack.html

Note that security guarantees in the kernel differ from the
ones documented for user space. The kernel must store addresses
of shadow stacks used by other tasks and interrupt handlers in
memory, which means an attacker capable reading and writing
arbitrary memory may be able to locate them and hijack control
flow by modifying shadow stacks that are not currently in use.

Bug: 145210207
Change-Id: Ia5f1650593fa95da4efcf86f84830a20989f161c
(am from https://lore.kernel.org/patchwork/patch/1149054/)
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2019-12-13 07:14:20 -08:00
Yonghong Song
5179a6a673 UPSTREAM: bpf: permit multiple bpf attachments for a single perf event
This patch enables multiple bpf attachments for a
kprobe/uprobe/tracepoint single trace event.
Each trace_event keeps a list of attached perf events.
When an event happens, all attached bpf programs will
be executed based on the order of attachment.

A global bpf_event_mutex lock is introduced to protect
prog_array attaching and detaching. An alternative will
be introduce a mutex lock in every trace_event_call
structure, but it takes a lot of extra memory.
So a global bpf_event_mutex lock is a good compromise.

The bpf prog detachment involves allocation of memory.
If the allocation fails, a dummy do-nothing program
will replace to-be-detached program in-place.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e87c6bc3852b981e71c757be20771546ce9f76f3)
Signed-off-by: Connor O'Brien <connoro@google.com>
Bug: 121213201
Bug: 138317270
Test: build & boot cuttlefish; attach 2 progs to 1 tracepoint
Change-Id: I25ce1ed6c9512d0a6f2db7547e109958fe1619b6
2019-12-11 20:04:37 -08:00
Yonghong Song
10408722ba UPSTREAM: bpf: use the same condition in perf event set/free bpf handler
This is a cleanup such that doing the same check in
perf_event_free_bpf_prog as we already do in
perf_event_set_bpf_prog step.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0b4c6841fee03e096b735074a0c4aab3a8e92986)
Signed-off-by: Connor O'Brien <connoro@google.com>
Bug: 121213201
Bug: 138317270
Test: build & boot cuttlefish
Change-Id: Id64d5a025d383fa3d3b16c5c74e8f9e86148efaa
2019-12-11 20:04:25 -08:00
Alexei Starovoitov
9e61c87b1f UPSTREAM: bpf: multi program support for cgroup+bpf
introduce BPF_F_ALLOW_MULTI flag that can be used to attach multiple
bpf programs to a cgroup.

The difference between three possible flags for BPF_PROG_ATTACH command:
- NONE(default): No further bpf programs allowed in the subtree.
- BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program,
  the program in this cgroup yields to sub-cgroup program.
- BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program,
  that cgroup program gets run in addition to the program in this cgroup.

NONE and BPF_F_ALLOW_OVERRIDE existed before. This patch doesn't
change their behavior. It only clarifies the semantics in relation
to new flag.

Only one program is allowed to be attached to a cgroup with
NONE or BPF_F_ALLOW_OVERRIDE flag.
Multiple programs are allowed to be attached to a cgroup with
BPF_F_ALLOW_MULTI flag. They are executed in FIFO order
(those that were attached first, run first)
The programs of sub-cgroup are executed first, then programs of
this cgroup and then programs of parent cgroup.
All eligible programs are executed regardless of return code from
earlier programs.

To allow efficient execution of multiple programs attached to a cgroup
and to avoid penalizing cgroups without any programs attached
introduce 'struct bpf_prog_array' which is RCU protected array
of pointers to bpf programs.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
for cgroup bits
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 324bda9e6c5add86ba2e1066476481c48132aca0)
Signed-off-by: Connor O'Brien <connoro@google.com>
Bug: 121213201
Bug: 138317270
Test: build & boot cuttlefish
Change-Id: If17b11a773f73d45ea565a947fc1bf7e158db98d
2019-12-11 20:04:15 -08:00
Greg Kroah-Hartman
84afceb668 This is the 4.14.158 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl3pFmwACgkQONu9yGCS
 aT68qg//RlehfhDnOYveXC8iOlpnzUfE0gI0Ix5CbOuPk0pjYHD+pjC22QUK8fza
 LWoUH4XSmQ7k5v9xw9MXA45yEpsBajrF2uiOYEUbzEIeh2QetTa9+WlZ25wFnza9
 tICP2ct9lSs+E7bs3R8RW37cRLuYfhGtc9sskMfhAdTn9MQKOf9h7jIk0lFjhMB/
 GbK449Eo6+8Rh2Pai9EYhWCh70d8ZFHLN3UWZUqG8RfWj1041GwVIoNOhwh5fCOq
 susq/EZI58BKsUv614sUxQ+dMaY+AOLKZAeYcP49tn/aARl2MjQaYSO8wnyUSVwn
 F7VYN2uvDVKXZa1/vGNtF6Q6O3nuvVwOgaFFx0srH9rSA7s8se+ZQTHg9WqGo36l
 fl2u7VN40Lq3Hv53gDV9qLHaPaxtAh39lDG9UoGnefzdbNGPVQRTqypMeRLHidwQ
 CK5lmbCr9BHoOGTRE0jl147WHTXtzYxjPnUmhZlIT2vxxDXP1AQqOHLLjHviXFrp
 VclLhGbJUAcB3fGSZJtLHYgPlZms+AFLxDQN4l9e3Xqu+F/W9z+NlAX7bEfYLBm0
 v/x/b+BL+qtQ9DBIfc57uGxajgHzoI3ZtctiqZJ93IxFMRZEQVQsHYAh/pcK2AIh
 ONu4VvFjhdxWFQuzAZe8IEDyHbhcQSL+IMmKq+wu9KtGOfLNxWg=
 =D1w0
 -----END PGP SIGNATURE-----

Merge 4.14.158 into android-4.14

Changes in 4.14.158
	Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS"
	clk: meson: gxbb: let sar_adc_clk_div set the parent clock rate
	ASoC: msm8916-wcd-analog: Fix RX1 selection in RDAC2 MUX
	ASoC: compress: fix unsigned integer overflow check
	reset: Fix memory leak in reset_control_array_put()
	ASoC: kirkwood: fix external clock probe defer
	clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume
	reset: fix reset_control_ops kerneldoc comment
	clk: at91: avoid sleeping early
	clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18
	idr: Fix idr_alloc_u32 on 32-bit systems
	x86/resctrl: Prevent NULL pointer dereference when reading mondata
	clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call
	net: fec: add missed clk_disable_unprepare in remove
	bridge: ebtables: don't crash when using dnat target in output chains
	can: peak_usb: report bus recovery as well
	can: c_can: D_CAN: c_can_chip_config(): perform a sofware reset on open
	can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak
	can: rx-offload: can_rx_offload_offload_one(): do not increase the skb_queue beyond skb_queue_len_max
	can: rx-offload: can_rx_offload_offload_one(): increment rx_fifo_errors on queue overflow or OOM
	can: rx-offload: can_rx_offload_offload_one(): use ERR_PTR() to propagate error value in case of errors
	can: rx-offload: can_rx_offload_irq_offload_timestamp(): continue on error
	can: rx-offload: can_rx_offload_irq_offload_fifo(): continue on error
	watchdog: meson: Fix the wrong value of left time
	scripts/gdb: fix debugging modules compiled with hot/cold partitioning
	net: bcmgenet: reapply manual settings to the PHY
	ceph: return -EINVAL if given fsc mount option on kernel w/o support
	mac80211: fix station inactive_time shortly after boot
	block: drbd: remove a stray unlock in __drbd_send_protocol()
	pwm: bcm-iproc: Prevent unloading the driver module while in use
	scsi: lpfc: Fix kernel Oops due to null pring pointers
	scsi: lpfc: Fix dif and first burst use in write commands
	ARM: dts: Fix up SQ201 flash access
	ARM: debug-imx: only define DEBUG_IMX_UART_PORT if needed
	ARM: dts: imx53-voipac-dmm-668: Fix memory node duplication
	parisc: Fix serio address output
	parisc: Fix HP SDC hpa address output
	arm64: mm: Prevent mismatched 52-bit VA support
	arm64: smp: Handle errors reported by the firmware
	ARM: OMAP1: fix USB configuration for device-only setups
	RDMA/vmw_pvrdma: Use atomic memory allocation in create AH
	PM / AVS: SmartReflex: NULL check before some freeing functions is not needed
	ARM: ks8695: fix section mismatch warning
	ACPI / LPSS: Ignore acpi_device_fix_up_power() return value
	scsi: lpfc: Enable Management features for IF_TYPE=6
	crypto: user - support incremental algorithm dumps
	mwifiex: fix potential NULL dereference and use after free
	mwifiex: debugfs: correct histogram spacing, formatting
	rtl818x: fix potential use after free
	xfs: require both realtime inodes to mount
	ubi: Put MTD device after it is not used
	ubi: Do not drop UBI device reference before using
	microblaze: adjust the help to the real behavior
	microblaze: move "... is ready" messages to arch/microblaze/Makefile
	iwlwifi: move iwl_nvm_check_version() into dvm
	gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB
	kvm: vmx: Set IA32_TSC_AUX for legacy mode guests
	VSOCK: bind to random port for VMADDR_PORT_ANY
	mmc: meson-gx: make sure the descriptor is stopped on errors
	mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET
	btrfs: only track ref_heads in delayed_ref_updates
	HID: intel-ish-hid: fixes incorrect error handling
	serial: 8250: Rate limit serial port rx interrupts during input overruns
	kprobes/x86/xen: blacklist non-attachable xen interrupt functions
	xen/pciback: Check dev_data before using it
	vfio-mdev/samples: Use u8 instead of char for handle functions
	pinctrl: xway: fix gpio-hog related boot issues
	net/mlx5: Continue driver initialization despite debugfs failure
	exofs_mount(): fix leaks on failure exits
	bnxt_en: Return linux standard errors in bnxt_ethtool.c
	bnxt_en: query force speeds before disabling autoneg mode.
	KVM: s390: unregister debug feature on failing arch init
	pinctrl: sh-pfc: sh7264: Fix PFCR3 and PFCR0 register configuration
	pinctrl: sh-pfc: sh7734: Fix shifted values in IPSR10
	HID: doc: fix wrong data structure reference for UHID_OUTPUT
	dm flakey: Properly corrupt multi-page bios.
	gfs2: take jdata unstuff into account in do_grow
	xfs: Align compat attrlist_by_handle with native implementation.
	xfs: Fix bulkstat compat ioctls on x32 userspace.
	IB/qib: Fix an error code in qib_sdma_verbs_send()
	clocksource/drivers/fttmr010: Fix invalid interrupt register access
	vxlan: Fix error path in __vxlan_dev_create()
	powerpc/book3s/32: fix number of bats in p/v_block_mapped()
	powerpc/xmon: fix dump_segments()
	drivers/regulator: fix a missing check of return value
	Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading
	serial: max310x: Fix tx_empty() callback
	openrisc: Fix broken paths to arch/or32
	RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer
	scsi: qla2xxx: deadlock by configfs_depend_item
	scsi: csiostor: fix incorrect dma device in case of vport
	ath6kl: Only use match sets when firmware supports it
	ath6kl: Fix off by one error in scan completion
	powerpc/perf: Fix unit_sel/cache_sel checks
	powerpc/prom: fix early DEBUG messages
	powerpc/mm: Make NULL pointer deferences explicit on bad page faults.
	powerpc/44x/bamboo: Fix PCI range
	vfio/spapr_tce: Get rid of possible infinite loop
	powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
	drbd: ignore "all zero" peer volume sizes in handshake
	drbd: reject attach of unsuitable uuids even if connected
	drbd: do not block when adjusting "disk-options" while IO is frozen
	drbd: fix print_st_err()'s prototype to match the definition
	IB/rxe: Make counters thread safe
	regulator: tps65910: fix a missing check of return value
	powerpc/83xx: handle machine check caused by watchdog timer
	powerpc/pseries: Fix node leak in update_lmb_associativity_index()
	crypto: mxc-scc - fix build warnings on ARM64
	pwm: clps711x: Fix period calculation
	net/netlink_compat: Fix a missing check of nla_parse_nested
	net/net_namespace: Check the return value of register_pernet_subsys()
	f2fs: fix to dirty inode synchronously
	um: Make GCOV depend on !KCOV
	net: (cpts) fix a missing check of clk_prepare
	net: stmicro: fix a missing check of clk_prepare
	net: dsa: bcm_sf2: Propagate error value from mdio_write
	atl1e: checking the status of atl1e_write_phy_reg
	tipc: fix a missing check of genlmsg_put
	net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe()
	ocfs2: clear journal dirty flag after shutdown journal
	vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n
	lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk
	lib/genalloc.c: use vzalloc_node() to allocate the bitmap
	fork: fix some -Wmissing-prototypes warnings
	drivers/base/platform.c: kmemleak ignore a known leak
	lib/genalloc.c: include vmalloc.h
	mtd: Check add_mtd_device() ret code
	tipc: fix memory leak in tipc_nl_compat_publ_dump
	net/core/neighbour: tell kmemleak about hash tables
	PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity()
	net/core/neighbour: fix kmemleak minimal reference count for hash tables
	serial: 8250: Fix serial8250 initialization crash
	gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change
	sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe
	ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel
	decnet: fix DN_IFREQ_SIZE
	net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
	blktrace: Show requests without sector
	tipc: fix skb may be leaky in tipc_link_input
	sfc: initialise found bitmap in efx_ef10_mtd_probe
	net: fix possible overflow in __sk_mem_raise_allocated()
	sctp: don't compare hb_timer expire date before starting it
	bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id()
	net: dev: Use unsigned integer as an argument to left-shift
	kvm: properly check debugfs dentry before using it
	bpf: drop refcount if bpf_map_new_fd() fails in map_create()
	net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED
	iommu/amd: Fix NULL dereference bug in match_hid_uid
	apparmor: delete the dentry in aafs_remove() to avoid a leak
	scsi: libsas: Support SATA PHY connection rate unmatch fixing during discovery
	ACPI / APEI: Don't wait to serialise with oops messages when panic()ing
	ACPI / APEI: Switch estatus pool to use vmalloc memory
	scsi: libsas: Check SMP PHY control function result
	powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property()
	mtd: Remove a debug trace in mtdpart.c
	mm, gup: add missing refcount overflow checks on s390
	clk: at91: fix update bit maps on CFG_MOR write
	clk: at91: generated: set audio_pll_allowed in at91_clk_register_generated()
	staging: rtl8192e: fix potential use after free
	staging: rtl8723bs: Drop ACPI device ids
	staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids
	USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P
	mei: bus: prefix device names on bus with the bus name
	xfrm: Fix memleak on xfrm state destroy
	media: v4l2-ctrl: fix flags for DO_WHITE_BALANCE
	net: macb: fix error format in dev_err()
	pwm: Clear chip_data in pwm_put()
	media: atmel: atmel-isc: fix asd memory allocation
	media: atmel: atmel-isc: fix INIT_WORK misplacement
	macvlan: schedule bc_work even if error
	net: psample: fix skb_over_panic
	openvswitch: fix flow command message size
	slip: Fix use-after-free Read in slip_open
	openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
	openvswitch: remove another BUG_ON()
	tipc: fix link name length check
	sctp: cache netns in sctp_ep_common
	net: sched: fix `tc -s class show` no bstats on class with nolock subqueues
	ext4: add more paranoia checking in ext4_expand_extra_isize handling
	watchdog: sama5d4: fix WDD value to be always set to max
	net: macb: Fix SUBNS increment and increase resolution
	net: macb driver, check for SKBTX_HW_TSTAMP
	mtd: rawnand: atmel: Fix spelling mistake in error message
	mtd: rawnand: atmel: fix possible object reference leak
	mtd: spi-nor: cast to u64 to avoid uint overflows
	y2038: futex: Move compat implementation into futex.c
	futex: Prevent robust futex exit race
	futex: Move futex exit handling into futex code
	futex: Replace PF_EXITPIDONE with a state
	exit/exec: Seperate mm_release()
	futex: Split futex_mm_release() for exit/exec
	futex: Set task::futex_state to DEAD right after handling futex exit
	futex: Mark the begin of futex exit explicitly
	futex: Sanitize exit state handling
	futex: Provide state handling for exec() as well
	futex: Add mutex around futex exit
	futex: Provide distinct return value when owner is exiting
	futex: Prevent exit livelock
	HID: core: check whether Usage Page item is after Usage ID items
	crypto: stm32/hash - Fix hmac issue more than 256 bytes
	media: stm32-dcmi: fix DMA corruption when stopping streaming
	hwrng: stm32 - fix unbalanced pm_runtime_enable
	mailbox: mailbox-test: fix null pointer if no mmio
	pinctrl: stm32: fix memory leak issue
	ASoC: stm32: i2s: fix dma configuration
	ASoC: stm32: i2s: fix 16 bit format support
	ASoC: stm32: i2s: fix IRQ clearing
	platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer
	platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size
	net: fec: fix clock count mis-match
	Linux 4.14.158

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-12-05 15:48:19 +01:00
Thomas Gleixner
61fa9f167c futex: Prevent exit livelock
commit 3ef240eaff36b8119ac9e2ea17cbf41179c930ba upstream.

Oleg provided the following test case:

int main(void)
{
	struct sched_param sp = {};

	sp.sched_priority = 2;
	assert(sched_setscheduler(0, SCHED_FIFO, &sp) == 0);

	int lock = vfork();
	if (!lock) {
		sp.sched_priority = 1;
		assert(sched_setscheduler(0, SCHED_FIFO, &sp) == 0);
		_exit(0);
	}

	syscall(__NR_futex, &lock, FUTEX_LOCK_PI, 0,0,0);
	return 0;
}

This creates an unkillable RT process spinning in futex_lock_pi() on a UP
machine or if the process is affine to a single CPU. The reason is:

 parent	    	    			child

  set FIFO prio 2

  vfork()			->	set FIFO prio 1
   implies wait_for_child()	 	sched_setscheduler(...)
 			   		exit()
					do_exit()
 					....
					mm_release()
					  tsk->futex_state = FUTEX_STATE_EXITING;
					  exit_futex(); (NOOP in this case)
					  complete() --> wakes parent
  sys_futex()
    loop infinite because
    tsk->futex_state == FUTEX_STATE_EXITING

The same problem can happen just by regular preemption as well:

  task holds futex
  ...
  do_exit()
    tsk->futex_state = FUTEX_STATE_EXITING;

  --> preemption (unrelated wakeup of some other higher prio task, e.g. timer)

  switch_to(other_task)

  return to user
  sys_futex()
	loop infinite as above

Just for the fun of it the futex exit cleanup could trigger the wakeup
itself before the task sets its futex state to DEAD.

To cure this, the handling of the exiting owner is changed so:

   - A refcount is held on the task

   - The task pointer is stored in a caller visible location

   - The caller drops all locks (hash bucket, mmap_sem) and blocks
     on task::futex_exit_mutex. When the mutex is acquired then
     the exiting task has completed the cleanup and the state
     is consistent and can be reevaluated.

This is not a pretty solution, but there is no choice other than returning
an error code to user space, which would break the state consistency
guarantee and open another can of problems including regressions.

For stable backports the preparatory commits ac31c7ff8624 .. ba31c1a48538
are required as well, but for anything older than 5.3.y the backports are
going to be provided when this hits mainline as the other dependencies for
those kernels are definitely not stable material.

Fixes: 778e9a9c3e71 ("pi-futex: fix exit races and locking problems")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stable Team <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20191106224557.041676471@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05 15:38:28 +01:00
Thomas Gleixner
e6e00df182 futex: Provide distinct return value when owner is exiting
commit ac31c7ff8624409ba3c4901df9237a616c187a5d upstream.

attach_to_pi_owner() returns -EAGAIN for various cases:

 - Owner task is exiting
 - Futex value has changed

The caller drops the held locks (hash bucket, mmap_sem) and retries the
operation. In case of the owner task exiting this can result in a live
lock.

As a preparatory step for seperating those cases, provide a distinct return
value (EBUSY) for the owner exiting case.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191106224556.935606117@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05 15:38:28 +01:00
Thomas Gleixner
ac7e59a0c1 futex: Add mutex around futex exit
commit 3f186d974826847a07bc7964d79ec4eded475ad9 upstream.

The mutex will be used in subsequent changes to replace the busy looping of
a waiter when the futex owner is currently executing the exit cleanup to
prevent a potential live lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191106224556.845798895@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05 15:38:28 +01:00
Thomas Gleixner
7d143b66d4 futex: Provide state handling for exec() as well
commit af8cbda2cfcaa5515d61ec500498d46e9a8247e2 upstream.

exec() attempts to handle potentially held futexes gracefully by running
the futex exit handling code like exit() does.

The current implementation has no protection against concurrent incoming
waiters. The reason is that the futex state cannot be set to
FUTEX_STATE_DEAD after the cleanup because the task struct is still active
and just about to execute the new binary.

While its arguably buggy when a task holds a futex over exec(), for
consistency sake the state handling can at least cover the actual futex
exit cleanup section. This provides state consistency protection accross
the cleanup. As the futex state of the task becomes FUTEX_STATE_OK after the
cleanup has been finished, this cannot prevent subsequent attempts to
attach to the task in case that the cleanup was not successfull in mopping
up all leftovers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191106224556.753355618@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05 15:38:27 +01:00
Thomas Gleixner
0633e316e2 futex: Sanitize exit state handling
commit 4a8e991b91aca9e20705d434677ac013974e0e30 upstream.

Instead of having a smp_mb() and an empty lock/unlock of task::pi_lock move
the state setting into to the lock section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191106224556.645603214@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05 15:38:27 +01:00
Thomas Gleixner
1be36de0ac futex: Mark the begin of futex exit explicitly
commit 18f694385c4fd77a09851fd301236746ca83f3cb upstream.

Instead of relying on PF_EXITING use an explicit state for the futex exit
and set it in the futex exit function. This moves the smp barrier and the
lock/unlock serialization into the futex code.

As with the DEAD state this is restricted to the exit path as exec
continues to use the same task struct.

This allows to simplify that logic in a next step.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191106224556.539409004@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05 15:38:27 +01:00
Thomas Gleixner
32676552cf futex: Set task::futex_state to DEAD right after handling futex exit
commit f24f22435dcc11389acc87e5586239c1819d217c upstream.

Setting task::futex_state in do_exit() is rather arbitrarily placed for no
reason. Move it into the futex code.

Note, this is only done for the exit cleanup as the exec cleanup cannot set
the state to FUTEX_STATE_DEAD because the task struct is still in active
use.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191106224556.439511191@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05 15:38:26 +01:00
Thomas Gleixner
a6dc90f43f futex: Split futex_mm_release() for exit/exec
commit 150d71584b12809144b8145b817e83b81158ae5f upstream.

To allow separate handling of the futex exit state in the futex exit code
for exit and exec, split futex_mm_release() into two functions and invoke
them from the corresponding exit/exec_mm_release() callsites.

Preparatory only, no functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191106224556.332094221@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05 15:38:26 +01:00
Thomas Gleixner
7d79d1c681 exit/exec: Seperate mm_release()
commit 4610ba7ad877fafc0a25a30c6c82015304120426 upstream.

mm_release() contains the futex exit handling. mm_release() is called from
do_exit()->exit_mm() and from exec()->exec_mm().

In the exit_mm() case PF_EXITING and the futex state is updated. In the
exec_mm() case these states are not touched.

As the futex exit code needs further protections against exit races, this
needs to be split into two functions.

Preparatory only, no functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191106224556.240518241@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05 15:38:25 +01:00