291 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
6f67f3af36 This is the 4.14.268 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmIWEyUACgkQONu9yGCS
 aT779Q//dXx+poIk/ameTdi+EGNWVDjmSGKKAGHDEZyzUejX8BNv1PoC8z+wbRSC
 ra8WINeEDAd/mA+8/mvH6fJEKFAc2Ppapo4gSNh0BkkdXDDMJkVQKZKnq/+pfOJH
 f0ltZPtiownRg+rw02DdLY+A5tYp9t6Gwl0EIHieqlakj3JAE6SqJxRqUEFu2WvH
 UbjmgYp+vV3uOAdSnRzfyfcCKhMsJegFfiwy15wpO2AmODN0vOT4DbaHkkqNKs3T
 j1HseuHWyYg7/1RQsHrabPJk6H76SzwI6tbAHnHZ/mkvlqg70MWH1hTax3VF/dgz
 AfeWjQKK1fa4z8K6WKu1RAq6LAOjDbxfhTAE2wImjL0TXz86YoDL34E3TT4uB0no
 HeT9JbKO245jSWh+BXttgbQp+YFypiDlyOfHeBB1jpiHCdjt8Fb4FmSHUhL80DC5
 FdxNkBmZDoZXCfrNw/VBeLs8yU7lzWqps2tJTvyfJO8ZkCtZPO2EP5k2gib71KiP
 N/3I2QnRBlFJO/HRfSJ/jOhaM981S9RxduTNirbzvvu1CuL4ltpdd1JG5HOrkr2Q
 2ce1INFJvO38ftcqHj0rjc0y2UW2QvvspG8K0RLCfn3XKmlmiBh6KbRFa+3iHYQB
 T4aBatw8Z3pnxvHYKVO5/jIacotoMQsK+GIekOqrj4Iw0fGGoqE=
 =ViBY
 -----END PGP SIGNATURE-----

Merge 4.14.268 into android-4.14-stable

Changes in 4.14.268
	Makefile.extrawarn: Move -Wunaligned-access to W=1
	net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup
	serial: parisc: GSC: fix build when IOSAPIC is not set
	parisc: Fix data TLB miss in sba_unmap_sg
	parisc: Fix sglist access in ccio-dma.c
	btrfs: send: in case of IO error log it
	net: ieee802154: at86rf230: Stop leaking skb's
	selftests/zram: Skip max_comp_streams interface on newer kernel
	selftests/zram01.sh: Fix compression ratio calculation
	selftests/zram: Adapt the situation that /dev/zram0 is being used
	ax25: improve the incomplete fix to avoid UAF and NPD bugs
	vfs: make freeze_super abort when sync_filesystem returns error
	quota: make dquot_quota_sync return errors from ->sync_fs
	Revert "module, async: async_synchronize_full() on module init iff async is used"
	iwlwifi: fix use-after-free
	drm/radeon: Fix backlight control on iMac 12,1
	xfrm: Don't accidentally set RTO_ONLINK in decode_session4()
	taskstats: Cleanup the use of task->exit_code
	vsock: remove vsock from connected table when connect is interrupted by a signal
	iwlwifi: pcie: fix locking when "HW not ready"
	iwlwifi: pcie: gen2: fix locking when "HW not ready"
	net: ieee802154: ca8210: Fix lifs/sifs periods
	ping: fix the dif and sdif check in ping_lookup
	drop_monitor: fix data-race in dropmon_net_event / trace_napi_poll_hit
	bonding: fix data-races around agg_select_timer
	libsubcmd: Fix use-after-free for realloc(..., 0)
	ALSA: hda: Fix regression on forced probe mask option
	ALSA: hda: Fix missing codec probe on Shenker Dock 15
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw()
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw_range()
	powerpc/lib/sstep: fix 'ptesync' build error
	NFS: LOOKUP_DIRECTORY is also ok with symlinks
	EDAC: Fix calculation of returned address and next offset in edac_align_ptr()
	net: sched: limit TC_ACT_REPEAT loops
	dmaengine: sh: rcar-dmac: Check for error num after setting mask
	i2c: brcmstb: fix support for DSL and CM variants
	lib/iov_iter: initialize "flags" in new pipe_buffer
	mtd: rawnand: brcmnand: Refactored code to introduce helper functions
	mtd: rawnand: brcmnand: Fixed incorrect sub-page ECC status
	KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW
	NFS: Do not report writeback errors in nfs_getattr()
	ARM: OMAP2+: hwmod: Add of_node_put() before break
	ata: libata-core: Disable TRIM on M88V29
	tracing: Fix tp_printk option related with tp_printk_stop_on_boot
	net: usb: qmi_wwan: Add support for Dell DW5829e
	net: macb: Align the dma and coherent dma masks
	Linux 4.14.268

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I03f64df8d8bf977ed3fddd0277b1b0749c703480
2022-02-23 12:10:45 +01:00
Darrick J. Wong
ebd0a32848 vfs: make freeze_super abort when sync_filesystem returns error
[ Upstream commit 2719c7160dcfaae1f73a1c0c210ad3281c19022e ]

If we fail to synchronize the filesystem while preparing to freeze the
fs, abort the freeze.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23 11:57:33 +01:00
Greg Kroah-Hartman
17f549b3ec This is the 4.14.209 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl+893cACgkQONu9yGCS
 aT6mWRAAjiKe1iAgLi7RqIfnKhqRWK58Cwwrnj2Up7X/76irsz7cmQ6FWOWY1er3
 Hy+EN/FbuEeedvxR8qzhyUr0Zq+KmJCo34eOHvSojNnT5rPWDG0WTkktHPBEVmO0
 wNlYS9V8oMImZJ91oCyrzGkP59xNN0simhfnvBHvzoaS4ebIz1COAFdDir+uk8D/
 U/ZFjXi3Fb5yxqH+J29jBkgvznOmwv44BLlccnRnEuY9eZtoZCTPou1AKjY4eNEB
 rNi5OFQgMlzVCr9ts1aAXEKr5TeZi76IUOFANPXvMXzilvDP8v2F9B8iFsVxBMn8
 +5TVjD7SiPFqy+7+p1FMD6Nst1xUMcUByOshpJtCIWt0e8hFPapWEzpG3gPbzhsc
 U8NYMo1lUIyNK4TmeegK51ceWak6zCktDpFAHslnsdhtn7m93rUjg97sacYMVvCB
 TZZ7L8L2Od/l06q7AmJJFGhsBM+uIYHghmpy6y5wX/CWK2Lu71bbKewb1+DL6ptV
 dv9/BARzOxhxTOfds7Yp4wCDwzJlnL9JAkoUvYtw3HZnTdwfnAfNobr8jqP8JzYY
 2K3UBmftblY1Qzn73wVeI9mQSudb05xKMsM+Sg626gvd321dTaWvZCJc+96Ylg/9
 F3TQpQRQw7TjXg0bNRlULtcVrIP9LSgORkl3gup2K1MLm6M3n5Q=
 =2OuD
 -----END PGP SIGNATURE-----

Merge 4.14.209 into android-4.14-stable

Changes in 4.14.209
	ah6: fix error return code in ah6_input()
	atm: nicstar: Unmap DMA on send error
	bnxt_en: read EEPROM A2h address using page 0
	devlink: Add missing genlmsg_cancel() in devlink_nl_sb_port_pool_fill()
	inet_diag: Fix error path to cancel the meseage in inet_req_diag_fill()
	mlxsw: core: Use variable timeout for EMAD retries
	net: b44: fix error return code in b44_init_one()
	net: bridge: add missing counters to ndo_get_stats64 callback
	net: dsa: mv88e6xxx: Avoid VTU corruption on 6097
	net: Have netpoll bring-up DSA management interface
	netlabel: fix our progress tracking in netlbl_unlabel_staticlist()
	netlabel: fix an uninitialized warning in netlbl_unlabel_staticlist()
	net/mlx4_core: Fix init_hca fields offset
	net: x25: Increase refcnt of "struct x25_neigh" in x25_rx_call_request
	qlcnic: fix error return code in qlcnic_83xx_restart_hw()
	sctp: change to hold/put transport for proto_unreach_timer
	net/mlx5: Disable QoS when min_rates on all VFs are zero
	net: usb: qmi_wwan: Set DTR quirk for MR400
	tcp: only postpone PROBE_RTT if RTT is < current min_rtt estimate
	net: ftgmac100: Fix crash when removing driver
	pinctrl: rockchip: enable gpio pclk for rockchip_gpio_to_irq
	arm64: psci: Avoid printing in cpu_psci_cpu_die()
	vfs: remove lockdep bogosity in __sb_start_write
	Input: adxl34x - clean up a data type in adxl34x_probe()
	MIPS: export has_transparent_hugepage() for modules
	arm: dts: imx6qdl-udoo: fix rgmii phy-mode for ksz9031 phy
	ARM: dts: imx50-evk: Fix the chip select 1 IOMUX
	perf lock: Don't free "lock_seq_stat" if read_count isn't zero
	can: af_can: prevent potential access of uninitialized member in can_rcv()
	can: af_can: prevent potential access of uninitialized member in canfd_rcv()
	can: dev: can_restart(): post buffer from the right context
	can: ti_hecc: Fix memleak in ti_hecc_probe
	can: mcba_usb: mcba_usb_start_xmit(): first fill skb, then pass to can_put_echo_skb()
	can: peak_usb: fix potential integer overflow on shift of a int
	can: m_can: m_can_handle_state_change(): fix state change
	ASoC: qcom: lpass-platform: Fix memory leak
	MIPS: Alchemy: Fix memleak in alchemy_clk_setup_cpu
	regulator: ti-abb: Fix array out of bound read access on the first transition
	xfs: revert "xfs: fix rmap key and record comparison functions"
	libfs: fix error cast of negative value in simple_attr_write()
	powerpc/uaccess-flush: fix missing includes in kup-radix.h
	speakup: Do not let the line discipline be used several times
	ALSA: ctl: fix error path at adding user-defined element set
	ALSA: mixart: Fix mutex deadlock
	tty: serial: imx: keep console clocks always on
	efivarfs: fix memory leak in efivarfs_create()
	staging: rtl8723bs: Add 024c:0627 to the list of SDIO device-ids
	ext4: fix bogus warning in ext4_update_dx_flag()
	iio: accel: kxcjk1013: Replace is_smo8500_device with an acpi_type enum
	iio: accel: kxcjk1013: Add support for KIOX010A ACPI DSM for setting tablet-mode
	regulator: fix memory leak with repeated set_machine_constraints()
	regulator: avoid resolve_supply() infinite recursion
	regulator: workaround self-referent regulators
	xtensa: disable preemption around cache alias management calls
	mac80211: minstrel: remove deferred sampling code
	mac80211: minstrel: fix tx status processing corner case
	mac80211: free sta in sta_info_insert_finish() on errors
	s390/cpum_sf.c: fix file permission for cpum_sfb_size
	s390/dasd: fix null pointer dereference for ERP requests
	x86/microcode/intel: Check patch signature before saving microcode for early loading
	Linux 4.14.209

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I123c16d3246d340d95714ce8d89c280483cb4259
2020-11-24 14:56:00 +01:00
Darrick J. Wong
7fd1f1856a vfs: remove lockdep bogosity in __sb_start_write
[ Upstream commit 22843291efc986ce7722610073fcf85a39b4cb13 ]

__sb_start_write has some weird looking lockdep code that claims to
exist to handle nested freeze locking requests from xfs.  The code as
written seems broken -- if we think we hold a read lock on any of the
higher freeze levels (e.g. we hold SB_FREEZE_WRITE and are trying to
lock SB_FREEZE_PAGEFAULT), it converts a blocking lock attempt into a
trylock.

However, it's not correct to downgrade a blocking lock attempt to a
trylock unless the downgrading code or the callers are prepared to deal
with that situation.  Neither __sb_start_write nor its callers handle
this at all.  For example:

sb_start_pagefault ignores the return value completely, with the result
that if xfs_filemap_fault loses a race with a different thread trying to
fsfreeze, it will proceed without pagefault freeze protection (thereby
breaking locking rules) and then unlocks the pagefault freeze lock that
it doesn't own on its way out (thereby corrupting the lock state), which
leads to a system hang shortly afterwards.

Normally, this won't happen because our ownership of a read lock on a
higher freeze protection level blocks fsfreeze from grabbing a write
lock on that higher level.  *However*, if lockdep is offline,
lock_is_held_type unconditionally returns 1, which means that
percpu_rwsem_is_held returns 1, which means that __sb_start_write
unconditionally converts blocking freeze lock attempts into trylocks,
even when we *don't* hold anything that would block a fsfreeze.

Apparently this all held together until 5.10-rc1, when bugs in lockdep
caused lockdep to shut itself off early in an fstests run, and once
fstests gets to the "race writes with freezer" tests, kaboom.  This
might explain the long trail of vanishingly infrequent livelocks in
fstests after lockdep goes offline that I've never been able to
diagnose.

We could fix it by spinning on the trylock if wait==true, but AFAICT the
locking works fine if lockdep is not built at all (and I didn't see any
complaints running fstests overnight), so remove this snippet entirely.

NOTE: Commit f4b554af9931 in 2015 created the current weird logic (which
used to exist in a different form in commit 5accdf82ba25c from 2012) in
__sb_start_write.  XFS solved this whole problem in the late 2.6 era by
creating a variant of transactions (XFS_TRANS_NO_WRITECOUNT) that don't
grab intwrite freeze protection, thus making lockdep's solution
unnecessary.  The commit claims that Dave Chinner explained that the
trylock hack + comment could be removed, but nobody ever did.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24 13:05:44 +01:00
Jaegeuk Kim
791557a04a Merge remote-tracking branch 'origin/upstream-f2fs-stable-linux-4.14.y' into android-4.14
* origin/upstream-f2fs-stable-linux-4.14.y:
  ext4: add verity flag check for dax
  f2fs: add a condition to detect overflow in f2fs_ioc_gc_range()
  f2fs: fix to add missing F2FS_IO_ALIGNED() condition
  f2fs: fix to fallback to buffered IO in IO aligned mode
  f2fs: fix to handle error path correctly in f2fs_map_blocks
  f2fs: fix extent corrupotion during directIO in LFS mode
  f2fs: check all the data segments against all node ones
  f2fs: Add a small clarification to CONFIG_FS_F2FS_FS_SECURITY
  f2fs: fix inode rwsem regression
  f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()
  f2fs: avoid infinite GC loop due to stale atomic files
  f2fs: Fix indefinite loop in f2fs_gc()
  f2fs: convert inline_data in prior to i_size_write
  f2fs: fix error path of f2fs_convert_inline_page()
  f2fs: add missing documents of reserve_root/resuid/resgid
  f2fs: fix flushing node pages when checkpoint is disabled
  f2fs: enhance f2fs_is_checkpoint_ready()'s readability
  f2fs: clean up __bio_alloc()'s parameter
  f2fs: fix wrong error injection path in inc_valid_block_count()
  f2fs: fix to writeout dirty inode during node flush
  f2fs: optimize case-insensitive lookups
  f2fs: introduce f2fs_match_name() for cleanup
  f2fs: Fix indefinite loop in f2fs_gc()
  f2fs: allocate memory in batch in build_sit_info()
  f2fs: fix to avoid data corruption by forbidding SSR overwrite
  f2fs: Fix build error while CONFIG_NLS=m
  Revert "f2fs: avoid out-of-range memory access"
  f2fs: cleanup the code in build_sit_entries.
  f2fs: fix wrong available node count calculation
  f2fs: remove duplicate code in f2fs_file_write_iter
  f2fs: fix to migrate blocks correctly during defragment
  f2fs: use wrapped f2fs_cp_error()
  f2fs: fix to use more generic EOPNOTSUPP
  f2fs: use wrapped IS_SWAPFILE()
  f2fs: Support case-insensitive file name lookups
  f2fs: include charset encoding information in the superblock
  fs: Reserve flag for casefolding
  f2fs: fix to avoid call kvfree under spinlock
  fs: f2fs: Remove unnecessary checks of SM_I(sbi) in update_general_status()
  f2fs: disallow direct IO in atomic write
  f2fs: fix to handle quota_{on,off} correctly
  f2fs: fix to detect cp error in f2fs_setxattr()
  f2fs: fix to spread f2fs_is_checkpoint_ready()
  f2fs: support fiemap() for directory inode
  f2fs: fix to avoid discard command leak
  f2fs: fix to avoid tagging SBI_QUOTA_NEED_REPAIR incorrectly
  f2fs: fix to drop meta/node pages during umount
  f2fs: disallow switching io_bits option during remount
  f2fs: fix panic of IO alignment feature
  f2fs: introduce {page,io}_is_mergeable() for readability
  f2fs: fix livelock in swapfile writes
  f2fs: add fs-verity support
  ext4: update on-disk format documentation for fs-verity
  ext4: add fs-verity read support
  ext4: add basic fs-verity support
  fs-verity: support builtin file signatures
  fs-verity: add SHA-512 support
  fs-verity: implement FS_IOC_MEASURE_VERITY ioctl
  fs-verity: implement FS_IOC_ENABLE_VERITY ioctl
  fs-verity: add data verification hooks for ->readpages()
  fs-verity: add the hook for file ->setattr()
  fs-verity: add the hook for file ->open()
  fs-verity: add inode and superblock fields
  fs-verity: add Kconfig and the helper functions for hashing
  fs: uapi: define verity bit for FS_IOC_GETFLAGS
  fs-verity: add UAPI header
  fs-verity: add MAINTAINERS file entry
  fs-verity: add a documentation file
  ext4: fix kernel oops caused by spurious casefold flag
  ext4: fix coverity warning on error path of filename setup
  ext4: optimize case-insensitive lookups
  ext4: fix dcache lookup of !casefolded directories
  unicode: update to Unicode 12.1.0 final
  unicode: add missing check for an error return from utf8lookup()
  ext4: export /sys/fs/ext4/feature/casefold if Unicode support is present
  unicode: refactor the rule for regenerating utf8data.h
  ext4: Support case-insensitive file name lookups
  ext4: include charset encoding information in the superblock
  unicode: update unicode database unicode version 12.1.0
  unicode: introduce test module for normalized utf8 implementation
  unicode: implement higher level API for string handling
  unicode: reduce the size of utf8data[]
  unicode: introduce code for UTF-8 normalization
  unicode: introduce UTF-8 character database
  ext4 crypto: fix to check feature status before get policy
  fscrypt: document the new ioctls and policy version
  ubifs: wire up new fscrypt ioctls
  f2fs: wire up new fscrypt ioctls
  ext4: wire up new fscrypt ioctls
  fscrypt: require that key be added when setting a v2 encryption policy
  fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS ioctl
  fscrypt: allow unprivileged users to add/remove keys for v2 policies
  fscrypt: v2 encryption policy support
  fscrypt: add an HKDF-SHA512 implementation
  fscrypt: add FS_IOC_GET_ENCRYPTION_KEY_STATUS ioctl
  fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl
  fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl
  fscrypt: rename keyinfo.c to keysetup.c
  fscrypt: move v1 policy key setup to keysetup_v1.c
  fscrypt: refactor key setup code in preparation for v2 policies
  fscrypt: rename fscrypt_master_key to fscrypt_direct_key
  fscrypt: add ->ci_inode to fscrypt_info
  fscrypt: use FSCRYPT_* definitions, not FS_*
  fscrypt: use FSCRYPT_ prefix for uapi constants
  fs, fscrypt: move uapi definitions to new header <linux/fscrypt.h>
  fscrypt: use ENOPKG when crypto API support missing
  fscrypt: improve warnings for missing crypto API support
  fscrypt: improve warning messages for unsupported encryption contexts
  fscrypt: make fscrypt_msg() take inode instead of super_block
  fscrypt: clean up base64 encoding/decoding
  fscrypt: remove loadable module related code

 Conflicts:
	fs/ext4/inode.c
	fs/ext4/ioctl.c
	fs/ext4/readpage.c

Bug: 141329812
Change-Id: I7b5c255d3766b8c66b7802f5b4c6aabe2b834f65
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2019-10-07 20:22:46 +00:00
Eric Biggers
3a7ee916f3 fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl
Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY.  This ioctl adds an
encryption key to the filesystem's fscrypt keyring ->s_master_keys,
making any files encrypted with that key appear "unlocked".

Why we need this
~~~~~~~~~~~~~~~~

The main problem is that the "locked/unlocked" (ciphertext/plaintext)
status of encrypted files is global, but the fscrypt keys are not.
fscrypt only looks for keys in the keyring(s) the process accessing the
filesystem is subscribed to: the thread keyring, process keyring, and
session keyring, where the session keyring may contain the user keyring.

Therefore, userspace has to put fscrypt keys in the keyrings for
individual users or sessions.  But this means that when a process with a
different keyring tries to access encrypted files, whether they appear
"unlocked" or not is nondeterministic.  This is because it depends on
whether the files are currently present in the inode cache.

Fixing this by consistently providing each process its own view of the
filesystem depending on whether it has the key or not isn't feasible due
to how the VFS caches work.  Furthermore, while sometimes users expect
this behavior, it is misguided for two reasons.  First, it would be an
OS-level access control mechanism largely redundant with existing access
control mechanisms such as UNIX file permissions, ACLs, LSMs, etc.
Encryption is actually for protecting the data at rest.

Second, almost all users of fscrypt actually do need the keys to be
global.  The largest users of fscrypt, Android and Chromium OS, achieve
this by having PID 1 create a "session keyring" that is inherited by
every process.  This works, but it isn't scalable because it prevents
session keyrings from being used for any other purpose.

On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't
similarly abuse the session keyring, so to make 'sudo' work on all
systems it has to link all the user keyrings into root's user keyring
[2].  This is ugly and raises security concerns.  Moreover it can't make
the keys available to system services, such as sshd trying to access the
user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to
read certificates from the user's home directory (see [5]); or to Docker
containers (see [6], [7]).

By having an API to add a key to the *filesystem* we'll be able to fix
the above bugs, remove userspace workarounds, and clearly express the
intended semantics: the locked/unlocked status of an encrypted directory
is global, and encryption is orthogonal to OS-level access control.

Why not use the add_key() syscall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We use an ioctl for this API rather than the existing add_key() system
call because the ioctl gives us the flexibility needed to implement
fscrypt-specific semantics that will be introduced in later patches:

- Supporting key removal with the semantics such that the secret is
  removed immediately and any unused inodes using the key are evicted;
  also, the eviction of any in-use inodes can be retried.

- Calculating a key-dependent cryptographic identifier and returning it
  to userspace.

- Allowing keys to be added and removed by non-root users, but only keys
  for v2 encryption policies; and to prevent denial-of-service attacks,
  users can only remove keys they themselves have added, and a key is
  only really removed after all users who added it have removed it.

Trying to shoehorn these semantics into the keyrings syscalls would be
very difficult, whereas the ioctls make things much easier.

However, to reuse code the implementation still uses the keyrings
service internally.  Thus we get lockless RCU-mode key lookups without
having to re-implement it, and the keys automatically show up in
/proc/keys for debugging purposes.

References:

    [1] https://github.com/google/fscrypt
    [2] https://goo.gl/55cCrI#heading=h.vf09isp98isb
    [3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939
    [4] https://github.com/google/fscrypt/issues/116
    [5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715
    [6] https://github.com/google/fscrypt/issues/128
    [7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystem

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-09-23 19:06:59 -07:00
Greg Kroah-Hartman
503f6fecb8 This is the 4.14.45 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlsOPCoACgkQONu9yGCS
 aT4vYBAAoESFP3oUtpyrPQU2yWQx7sRq/Dd8WyNlHlq2nRU8Y42ynB8TdRpAIces
 3aP7vPwFLaK4H0SZt4oA+NialRMhC/bN6BmKaoTUXq2nmE2XzDkcPDu0zHnqQt9C
 vc5wa2hd+H95wj9cdkkPwdlmgVhHztowJ3uqqNaPql2MVjDLKxziNVMv7lAIGPk3
 TycD9SihGAEKFjI2WIXaX6hm+3gGRnuK2ovlqnlF24dLRFiGIBL+fUp5ZGoxVlRP
 W260tQnTv/TvWUJ7V3x6rZ04kgV7LcaZrwSyN7GLJmhoi9Bw0BmL1N3cEAfEZdy2
 YoGqDemLW9bEiHBhFuPOcFr7tyAz8EsVH4/KUwkIMgWNbV8DmTKT2nbfzG9ju6Hb
 q9q3OJyLPBamGxTuiXUspRhQJrVrMX6sahHQDj5786AVgBDoGVFw1d+v9kJCoSAv
 lnA7qTbCFeq288dJ3sU7OZhmApC1oMPjMjmfVWwuQKBz81xqsquAjQRkBY3Odw+j
 yreZ9PS2Krk3bpf9QoDf/NGM+zpFyyy3xbrHpMkIEv48VGYrpe0nP6TZRfEgF65L
 036uZCPzpH+vFdyjMPWUPPXGZCD7q6DGk+wKit2eMFKOXB477yKA2+qAWs0GAeKo
 g7N0Rql7YZQK+Zu+1YvtfqF4WUBBP0uAb7FSuyVKVIzI3LfPCQk=
 =m2qv
 -----END PGP SIGNATURE-----

Merge 4.14.45 into android-4.14

Changes in 4.14.45
	MIPS: c-r4k: Fix data corruption related to cache coherence
	MIPS: ptrace: Expose FIR register through FP regset
	MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
	KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
	affs_lookup(): close a race with affs_remove_link()
	fs: don't scan the inode cache before SB_BORN is set
	aio: fix io_destroy(2) vs. lookup_ioctx() race
	ALSA: timer: Fix pause event notification
	do d_instantiate/unlock_new_inode combinations safely
	mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
	mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
	mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus
	libata: Blacklist some Sandisk SSDs for NCQ
	libata: blacklist Micron 500IT SSD with MU01 firmware
	xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
	drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
	arm64: lse: Add early clobbers to some input/output asm operands
	powerpc/64s: Clear PCR on boot
	IB/hfi1: Use after free race condition in send context error path
	IB/umem: Use the correct mm during ib_umem_release
	sr: pass down correctly sized SCSI sense buffer
	idr: fix invalid ptr dereference on item delete
	Revert "ipc/shm: Fix shmat mmap nil-page protection"
	ipc/shm: fix shmat() nil address after round-down when remapping
	mm/kasan: don't vfree() nonexistent vm_area
	kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
	kasan: fix memory hotplug during boot
	kernel/sys.c: fix potential Spectre v1 issue
	KVM/VMX: Expose SSBD properly to guests
	KVM: s390: vsie: fix < 8k check for the itdba
	KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
	kvm: x86: IA32_ARCH_CAPABILITIES is always supported
	x86/kvm: fix LAPIC timer drift when guest uses periodic mode
	powerpc/64s: Improve RFI L1-D cache flush fallback
	powerpc/pseries: Support firmware disable of RFI flush
	powerpc/powernv: Support firmware disable of RFI flush
	powerpc/rfi-flush: Move the logic to avoid a redo into the debugfs code
	powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again
	powerpc/rfi-flush: Always enable fallback flush on pseries
	powerpc/rfi-flush: Differentiate enabled and patched flush types
	powerpc/rfi-flush: Call setup_rfi_flush() after LPM migration
	powerpc/pseries: Add new H_GET_CPU_CHARACTERISTICS flags
	powerpc: Add security feature flags for Spectre/Meltdown
	powerpc/pseries: Set or clear security feature flags
	powerpc/powernv: Set or clear security feature flags
	powerpc/64s: Move cpu_show_meltdown()
	powerpc/64s: Enhance the information in cpu_show_meltdown()
	powerpc/powernv: Use the security flags in pnv_setup_rfi_flush()
	powerpc/pseries: Use the security flags in pseries_setup_rfi_flush()
	powerpc/64s: Wire up cpu_show_spectre_v1()
	powerpc/64s: Wire up cpu_show_spectre_v2()
	powerpc/pseries: Fix clearing of security feature flags
	powerpc: Move default security feature flags
	powerpc/pseries: Restore default security feature flags on setup
	powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
	powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit
	MIPS: generic: Fix machine compatible matching
	mac80211: mesh: fix wrong mesh TTL offset calculation
	ARC: Fix malformed ARC_EMUL_UNALIGNED default
	ptr_ring: prevent integer overflow when calculating size
	arm64: dts: rockchip: fix rock64 gmac2io stability issues
	arm64: dts: rockchip: correct ep-gpios for rk3399-sapphire
	libata: Fix compile warning with ATA_DEBUG enabled
	selftests: sync: missing CFLAGS while compiling
	selftest/vDSO: fix O=
	selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
	selftests: memfd: add config fragment for fuse
	ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
	ARM: OMAP3: Fix prm wake interrupt for resume
	ARM: OMAP2+: Fix sar_base inititalization for HS omaps
	ARM: OMAP1: clock: Fix debugfs_create_*() usage
	ibmvnic: Wait until reset is complete to set carrier on
	ibmvnic: Free RX socket buffer in case of adapter error
	ibmvnic: Clean RX pool buffers during device close
	tls: retrun the correct IV in getsockopt
	xhci: workaround for AMD Promontory disabled ports wakeup
	IB/uverbs: Fix method merging in uverbs_ioctl_merge
	IB/uverbs: Fix possible oops with duplicate ioctl attributes
	IB/uverbs: Fix unbalanced unlock on error path for rdma_explicit_destroy
	arm64: dts: rockchip: Fix DWMMC clocks
	ARM: dts: rockchip: Fix DWMMC clocks
	iwlwifi: mvm: fix security bug in PN checking
	iwlwifi: mvm: fix IBSS for devices that support station type API
	iwlwifi: mvm: always init rs with 20mhz bandwidth rates
	NFC: llcp: Limit size of SDP URI
	rxrpc: Work around usercopy check
	MD: Free bioset when md_run fails
	md: fix md_write_start() deadlock w/o metadata devices
	s390/dasd: fix handling of internal requests
	xfrm: do not call rcu_read_unlock when afinfo is NULL in xfrm_get_tos
	mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4
	mac80211: fix a possible leak of station stats
	mac80211: fix calling sleeping function in atomic context
	cfg80211: clear wep keys after disconnection
	mac80211: Do not disconnect on invalid operating class
	mac80211: Fix sending ADDBA response for an ongoing session
	gpu: ipu-v3: pre: fix device node leak in ipu_pre_lookup_by_phandle
	gpu: ipu-v3: prg: fix device node leak in ipu_prg_lookup_by_phandle
	md raid10: fix NULL deference in handle_write_completed()
	drm/exynos: g2d: use monotonic timestamps
	drm/exynos: fix comparison to bitshift when dealing with a mask
	drm/meson: fix vsync buffer update
	arm64: perf: correct PMUVer probing
	RDMA/bnxt_re: Unpin SQ and RQ memory if QP create fails
	RDMA/bnxt_re: Fix system crash during load/unload
	ibmvnic: Check for NULL skb's in NAPI poll routine
	net/mlx5e: Return error if prio is specified when offloading eswitch vlan push
	locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
	md: raid5: avoid string overflow warning
	virtio_net: fix XDP code path in receive_small()
	kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
	bug.h: work around GCC PR82365 in BUG()
	selftests/memfd: add run_fuse_test.sh to TEST_FILES
	seccomp: add a selftest for get_metadata
	soc: imx: gpc: de-register power domains only if initialized
	powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access
	s390/cio: fix ccw_device_start_timeout API
	s390/cio: fix return code after missing interrupt
	s390/cio: clear timer when terminating driver I/O
	selftests/bpf/test_maps: exit child process without error in ENOMEM case
	PKCS#7: fix direct verification of SignerInfo signature
	arm64: dts: cavium: fix PCI bus dtc warnings
	nfs: system crashes after NFS4ERR_MOVED recovery
	ARM: OMAP: Fix dmtimer init for omap1
	smsc75xx: fix smsc75xx_set_features()
	regulatory: add NUL to request alpha2
	integrity/security: fix digsig.c build error with header file
	x86/intel_rdt: Fix incorrect returned value when creating rdgroup sub-directory in resctrl file system
	locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs
	x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
	mac80211: drop frames with unexpected DS bits from fast-rx to slow path
	arm64: fix unwind_frame() for filtered out fn for function graph tracing
	macvlan: fix use-after-free in macvlan_common_newlink()
	KVM: nVMX: Don't halt vcpu when L1 is injecting events to L2
	kvm: fix warning for CONFIG_HAVE_KVM_EVENTFD builds
	ARM: dts: imx6dl: Include correct dtsi file for Engicam i.CoreM6 DualLite/Solo RQS
	fs: dcache: Avoid livelock between d_alloc_parallel and __d_add
	fs: dcache: Use READ_ONCE when accessing i_dir_seq
	md: fix a potential deadlock of raid5/raid10 reshape
	md/raid1: fix NULL pointer dereference
	batman-adv: fix packet checksum in receive path
	batman-adv: invalidate checksum on fragment reassembly
	netfilter: ipt_CLUSTERIP: put config struct if we can't increment ct refcount
	netfilter: ipt_CLUSTERIP: put config instead of freeing it
	netfilter: ebtables: convert BUG_ONs to WARN_ONs
	batman-adv: Ignore invalid batadv_iv_gw during netlink send
	batman-adv: Ignore invalid batadv_v_gw during netlink send
	batman-adv: Fix netlink dumping of BLA claims
	batman-adv: Fix netlink dumping of BLA backbones
	nvme-pci: Fix nvme queue cleanup if IRQ setup fails
	clocksource/drivers/fsl_ftm_timer: Fix error return checking
	libceph, ceph: avoid memory leak when specifying same option several times
	ceph: fix dentry leak when failing to init debugfs
	xen/pvcalls: fix null pointer dereference on map->sock
	ARM: orion5x: Revert commit 4904dbda41c8.
	qrtr: add MODULE_ALIAS macro to smd
	selftests/futex: Fix line continuation in Makefile
	r8152: fix tx packets accounting
	virtio-gpu: fix ioctl and expose the fixed status to userspace.
	dmaengine: rcar-dmac: fix max_chunk_size for R-Car Gen3
	bcache: fix kcrashes with fio in RAID5 backend dev
	ip_gre: fix IFLA_MTU ignored on NEWLINK
	ip6_tunnel: fix IFLA_MTU ignored on NEWLINK
	sit: fix IFLA_MTU ignored on NEWLINK
	nbd: fix return value in error handling path
	ARM: dts: NSP: Fix amount of RAM on BCM958625HR
	ARM: dts: bcm283x: Fix unit address of local_intc
	powerpc/boot: Fix random libfdt related build errors
	clocksource/drivers/mips-gic-timer: Use correct shift count to extract data
	gianfar: Fix Rx byte accounting for ndev stats
	net/tcp/illinois: replace broken algorithm reference link
	nvmet: fix PSDT field check in command format
	net/smc: use link_id of server in confirm link reply
	mlxsw: core: Fix flex keys scratchpad offset conflict
	mlxsw: spectrum: Treat IPv6 unregistered multicast as broadcast
	spectrum: Reference count VLAN entries
	ARC: mcip: halt GFRC counter when ARC cores halt
	ARC: mcip: update MCIP debug mask when the new cpu came online
	ARC: setup cpu possible mask according to possible-cpus dts property
	ipvs: remove IPS_NAT_MASK check to fix passive FTP
	IB/mlx: Set slid to zero in Ethernet completion struct
	RDMA/bnxt_re: Unconditionly fence non wire memory operations
	RDMA/bnxt_re: Fix incorrect DB offset calculation
	RDMA/bnxt_re: Fix the ib_reg failure cleanup
	xen/pirq: fix error path cleanup when binding MSIs
	drm/amd/amdgpu: Correct VRAM width for APUs with GMC9
	xfrm: Fix ESN sequence number handling for IPsec GSO packets.
	arm64: dts: rockchip: Fix rk3399-gru-* s2r (pinctrl hogs, wifi reset)
	drm/sun4i: Fix dclk_set_phase
	btrfs: use kvzalloc to allocate btrfs_fs_info
	Btrfs: send, fix issuing write op when processing hole in no data mode
	Btrfs: fix log replay failure after linking special file and fsync
	ceph: fix potential memory leak in init_caches()
	block: display the correct diskname for bio
	nvme-pci: Fix EEH failure on ppc
	nvme: pci: pass max vectors as num_possible_cpus() to pci_alloc_irq_vectors
	selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable
	net: ethtool: don't ignore return from driver get_fecparam method
	iwlwifi: mvm: fix TX of CCMP 256
	iwlwifi: mvm: Fix channel switch for count 0 and 1
	iwlwifi: mvm: fix assert 0x2B00 on older FWs
	iwlwifi: avoid collecting firmware dump if not loaded
	iwlwifi: mvm: fix "failed to remove key" message
	iwlwifi: mvm: Direct multicast frames to the correct station
	iwlwifi: mvm: Correctly set the tid for mcast queue
	rds: Incorrect reference counting in TCP socket creation
	watchdog: f71808e_wdt: Fix magic close handling
	watchdog: sbsa: use 32-bit read for WCV
	batman-adv: Fix multicast packet loss with a single WANT_ALL_IPV4/6 flag
	hv_netvsc: use napi_schedule_irqoff
	hv_netvsc: filter multicast/broadcast
	hv_netvsc: propagate rx filters to VF
	ARM: dts: rockchip: Add missing #sound-dai-cells on rk3288
	perf record: Fix crash in pipe mode
	e1000e: Fix check_for_link return value with autoneg off
	e1000e: allocate ring descriptors with dma_zalloc_coherent
	ia64/err-inject: Use get_user_pages_fast()
	RDMA/qedr: Fix kernel panic when running fio over NFSoRDMA
	RDMA/qedr: Fix iWARP write and send with immediate
	IB/mlx4: Fix corruption of RoCEv2 IPv4 GIDs
	IB/mlx4: Include GID type when deleting GIDs from HW table under RoCE
	IB/mlx5: Fix an error code in __mlx5_ib_modify_qp()
	fbdev: Fixing arbitrary kernel leak in case FBIOGETCMAP_SPARC in sbusfb_ioctl_helper().
	fsl/fman: avoid sleeping in atomic context while adding an address
	qed: Free RoCE ILT Memory on rmmod qedr
	net: qcom/emac: Use proper free methods during TX
	net: smsc911x: Fix unload crash when link is up
	IB/core: Fix possible crash to access NULL netdev
	cxgb4: do not set needs_free_netdev for mgmt dev's
	xen-blkfront: move negotiate_mq to cover all cases of new VBDs
	xen: xenbus: use put_device() instead of kfree()
	hv_netvsc: fix filter flags
	hv_netvsc: fix locking for rx_mode
	hv_netvsc: fix locking during VF setup
	ARM: davinci: fix the GPIO lookup for omapl138-hawk
	arm64: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery
	selftests/vm/run_vmtests: adjust hugetlb size according to nr_cpus
	lib/test_kmod.c: fix limit check on number of test devices created
	dmaengine: mv_xor_v2: Fix clock resource by adding a register clock
	netfilter: ebtables: fix erroneous reject of last rule
	can: m_can: change comparison to bitshift when dealing with a mask
	can: m_can: select pinctrl state in each suspend/resume function
	bnxt_en: Check valid VNIC ID in bnxt_hwrm_vnic_set_tpa().
	workqueue: use put_device() instead of kfree()
	ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu
	sunvnet: does not support GSO for sctp
	KVM: arm/arm64: vgic: Add missing irq_lock to vgic_mmio_read_pending
	gpu: ipu-v3: prg: avoid possible array underflow
	drm/imx: move arming of the vblank event to atomic_flush
	drm/nouveau/bl: fix backlight regression
	xfrm: fix rcu_read_unlock usage in xfrm_local_error
	iwlwifi: mvm: set the correct tid when we flush the MCAST sta
	iwlwifi: mvm: Correctly set IGTK for AP
	iwlwifi: mvm: fix error checking for multi/broadcast sta
	net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
	vlan: Fix out of order vlan headers with reorder header off
	batman-adv: fix header size check in batadv_dbg_arp()
	net/sched: fix NULL dereference in the error path of tcf_sample_init()
	batman-adv: Fix skbuff rcsum on packet reroute
	vti4: Don't count header length twice on tunnel setup
	ip_tunnel: Clamp MTU to bounds on new link
	vti4: Don't override MTU passed on link creation via IFLA_MTU
	vti6: Fix dev->max_mtu setting
	iwlwifi: mvm: Increase session protection time after CS
	iwlwifi: mvm: clear tx queue id when unreserving aggregation queue
	iwlwifi: mvm: make sure internal station has a valid id
	iwlwifi: mvm: fix array out of bounds reference
	drm/tegra: Shutdown on driver unbind
	perf/cgroup: Fix child event counting bug
	brcmfmac: Fix check for ISO3166 code
	kbuild: make scripts/adjust_autoksyms.sh robust against timestamp races
	RDMA/ucma: Correct option size check using optlen
	RDMA/qedr: fix QP's ack timeout configuration
	RDMA/qedr: Fix rc initialization on CNQ allocation failure
	RDMA/qedr: Fix QP state initialization race
	net/sched: fix idr leak on the error path of tcf_bpf_init()
	net/sched: fix idr leak in the error path of tcf_simp_init()
	net/sched: fix idr leak in the error path of tcf_act_police_init()
	net/sched: fix idr leak in the error path of tcp_pedit_init()
	net/sched: fix idr leak in the error path of __tcf_ipt_init()
	net/sched: fix idr leak in the error path of tcf_skbmod_init()
	net: dsa: Fix functional dsa-loop dependency on FIXED_PHY
	drm/ast: Fixed 1280x800 Display Issue
	mm/mempolicy.c: avoid use uninitialized preferred_node
	mm, thp: do not cause memcg oom for thp
	xfrm: Fix transport mode skb control buffer usage.
	selftests: ftrace: Add probe event argument syntax testcase
	selftests: ftrace: Add a testcase for string type with kprobe_event
	selftests: ftrace: Add a testcase for probepoint
	drm/amdkfd: Fix scratch memory with HWS enabled
	batman-adv: fix multicast-via-unicast transmission with AP isolation
	batman-adv: fix packet loss for broadcasted DHCP packets to a server
	ARM: 8748/1: mm: Define vdso_start, vdso_end as array
	lan78xx: Set ASD in MAC_CR when EEE is enabled.
	net: qmi_wwan: add BroadMobi BM806U 2020:2033
	bonding: fix the err path for dev hwaddr sync in bond_enslave
	net: dsa: mt7530: fix module autoloading for OF platform drivers
	net/mlx5: Make eswitch support to depend on switchdev
	perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs
	x86/alternatives: Fixup alternative_call_2
	llc: properly handle dev_queue_xmit() return value
	builddeb: Fix header package regarding dtc source links
	qede: Fix barrier usage after tx doorbell write.
	mm, slab: memcg_link the SLAB's kmem_cache
	mm/page_owner: fix recursion bug after changing skip entries
	mm/vmstat.c: fix vmstat_update() preemption BUG
	mm/kmemleak.c: wait for scan completion before disabling free
	hv_netvsc: enable multicast if necessary
	qede: Do not drop rx-checksum invalidated packets.
	net: Fix untag for vlan packets without ethernet header
	vlan: Fix vlan insertion for packets without ethernet header
	net: mvneta: fix enable of all initialized RXQs
	sh: fix debug trap failure to process signals before return to user
	firmware: dmi_scan: Fix UUID length safety check
	nvme: don't send keep-alives to the discovery controller
	Btrfs: clean up resources during umount after trans is aborted
	Btrfs: fix loss of prealloc extents past i_size after fsync log replay
	x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
	x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
	fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table
	swap: divide-by-zero when zero length swap file on ssd
	z3fold: fix memory leak
	sr: get/drop reference to device in revalidate and check_events
	Force log to disk before reading the AGF during a fstrim
	cpufreq: CPPC: Initialize shared perf capabilities of CPUs
	powerpc/fscr: Enable interrupts earlier before calling get_user()
	perf tools: Fix perf builds with clang support
	perf clang: Add support for recent clang versions
	dp83640: Ensure against premature access to PHY registers after reset
	ibmvnic: Zero used TX descriptor counter on reset
	mm/ksm: fix interaction with THP
	mm: fix races between address_space dereference and free in page_evicatable
	mm: thp: fix potential clearing to referenced flag in page_idle_clear_pte_refs_one()
	Btrfs: bail out on error during replay_dir_deletes
	Btrfs: fix NULL pointer dereference in log_dir_items
	btrfs: Fix possible softlock on single core machines
	IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
	ocfs2/dlm: don't handle migrate lockres if already in shutdown
	powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep
	sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
	x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()
	KVM: VMX: raise internal error for exception during invalid protected mode state
	lan78xx: Connect phy early
	fscache: Fix hanging wait on page discarded by writeback
	sparc64: Make atomic_xchg() an inline function rather than a macro.
	net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
	net: bgmac: Correctly annotate register space
	powerpc/64s: sreset panic if there is no debugger or crash dump handlers
	btrfs: tests/qgroup: Fix wrong tree backref level
	Btrfs: fix copy_items() return value when logging an inode
	btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
	btrfs: qgroup: Fix root item corruption when multiple same source snapshots are created with quota enabled
	rxrpc: Fix Tx ring annotation after initial Tx failure
	rxrpc: Don't treat call aborts as conn aborts
	xen/acpi: off by one in read_acpi_id()
	drivers: macintosh: rack-meter: really fix bogus memsets
	ACPI: acpi_pad: Fix memory leak in power saving threads
	powerpc/mpic: Check if cpu_possible() in mpic_physmask()
	ieee802154: ca8210: fix uninitialised data read
	ath10k: advertize beacon_int_min_gcd
	iommu/amd: Take into account that alloc_dev_data() may return NULL
	intel_th: Use correct method of finding hub
	m68k: set dma and coherent masks for platform FEC ethernets
	iwlwifi: mvm: check if mac80211_queue is valid in iwl_mvm_disable_txq
	parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode
	hwmon: (nct6775) Fix writing pwmX_mode
	powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer
	powerpc/perf: Fix kernel address leak via sampling registers
	rsi: fix kernel panic observed on 64bit machine
	tools/thermal: tmon: fix for segfault
	selftests: Print the test we're running to /dev/kmsg
	net/mlx5: Protect from command bit overflow
	watchdog: davinci_wdt: fix error handling in davinci_wdt_probe()
	ath10k: Fix kernel panic while using worker (ath10k_sta_rc_update_wk)
	nvme-pci: disable APST for Samsung NVMe SSD 960 EVO + ASUS PRIME Z370-A
	ath9k: fix crash in spectral scan
	cxgb4: Setup FW queues before registering netdev
	ima: Fix Kconfig to select TPM 2.0 CRB interface
	ima: Fallback to the builtin hash algorithm
	watchdog: aspeed: Allow configuring for alternate boot
	virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS
	arm: dts: socfpga: fix GIC PPI warning
	ext4: don't complain about incorrect features when probing
	drm/vmwgfx: Unpin the screen object backup buffer when not used
	iommu/mediatek: Fix protect memory setting
	cpufreq: cppc_cpufreq: Fix cppc_cpufreq_init() failure path
	IB/mlx5: Set the default active rate and width to QDR and 4X
	zorro: Set up z->dev.dma_mask for the DMA API
	bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set
	remoteproc: imx_rproc: Fix an error handling path in 'imx_rproc_probe()'
	dt-bindings: add device tree binding for Allwinner H6 main CCU
	ACPICA: Events: add a return on failure from acpi_hw_register_read
	ACPICA: Fix memory leak on unusual memory leak
	ACPICA: acpi: acpica: fix acpi operand cache leak in nseval.c
	cxgb4: Fix queue free path of ULD drivers
	i2c: mv64xxx: Apply errata delay only in standard mode
	KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use
	perf top: Fix top.call-graph config option reading
	perf stat: Fix core dump when flag T is used
	IB/core: Honor port_num while resolving GID for IB link layer
	drm/amdkfd: add missing include of mm.h
	coresight: Use %px to print pcsr instead of %p
	regulator: gpio: Fix some error handling paths in 'gpio_regulator_probe()'
	spi: bcm-qspi: fIX some error handling paths
	net/smc: pay attention to MAX_ORDER for CQ entries
	MIPS: ath79: Fix AR724X_PLL_REG_PCIE_CONFIG offset
	PCI: Restore config space on runtime resume despite being unbound
	watchdog: dw: RMW the control register
	watchdog: aspeed: Fix translation of reset mode to ctrl register
	ipmi_ssif: Fix kernel panic at msg_done_handler
	drm/meson: Fix some error handling paths in 'meson_drv_bind_master()'
	drm/meson: Fix an un-handled error path in 'meson_drv_bind_master()'
	powerpc: Add missing prototype for arch_irq_work_raise()
	powerpc/powernv/npu: Fix deadlock in mmio_invalidate()
	cxl: Check if PSL data-cache is available before issue flush request
	f2fs: fix to set KEEP_SIZE bit in f2fs_zero_range
	f2fs: fix to clear CP_TRIMMED_FLAG
	f2fs: fix to check extent cache in f2fs_drop_extent_tree
	perf/core: Fix installing cgroup events on CPU
	max17042: propagate of_node to power supply device
	perf/core: Fix perf_output_read_group()
	drm/panel: simple: Fix the bus format for the Ontat panel
	hwmon: (pmbus/max8688) Accept negative page register values
	hwmon: (pmbus/adm1275) Accept negative page register values
	perf/x86/intel: Properly save/restore the PMU state in the NMI handler
	cdrom: do not call check_disk_change() inside cdrom_open()
	efi/arm*: Only register page tables when they exist
	perf/x86/intel: Fix large period handling on Broadwell CPUs
	perf/x86/intel: Fix event update for auto-reload
	arm64: dts: qcom: Fix SPI5 config on MSM8996
	soc: qcom: wcnss_ctrl: Fix increment in NV upload
	gfs2: Fix fallocate chunk size
	x86/devicetree: Initialize device tree before using it
	x86/devicetree: Fix device IRQ settings in DT
	phy: rockchip-emmc: retry calpad busy trimming
	ALSA: vmaster: Propagate slave error
	phy: qcom-qmp: Fix phy pipe clock gating
	drm/bridge: sii902x: Retry status read after DDI I2C
	tools: hv: fix compiler warnings about major/target_fname
	block: null_blk: fix 'Invalid parameters' when loading module
	dmaengine: pl330: fix a race condition in case of threaded irqs
	dmaengine: rcar-dmac: Check the done lists in rcar_dmac_chan_get_residue()
	enic: enable rq before updating rq descriptors
	watchdog: asm9260_wdt: fix error handling in asm9260_wdt_probe()
	hwrng: stm32 - add reset during probe
	pinctrl: devicetree: Fix dt_to_map_one_config handling of hogs
	pinctrl: artpec6: dt: add missing pin group uart5nocts
	vfio-ccw: fence off transport mode
	dmaengine: qcom: bam_dma: get num-channels and num-ees from dt
	drm: omapdrm: dss: Move initialization code from component bind to probe
	ARM: dts: dra71-evm: Correct evm_sd regulator max voltage
	drm/amdgpu: disable GFX ring and disable PQ wptr in hw_fini
	drm/amdgpu: adjust timeout for ib_ring_tests(v2)
	net: stmmac: ensure that the device has released ownership before reading data
	net: stmmac: ensure that the MSS desc is the last desc to set the own bit
	cpufreq: Reorder cpufreq_online() error code path
	dpaa_eth: fix SG mapping
	PCI: Add function 1 DMA alias quirk for Marvell 88SE9220
	udf: Provide saner default for invalid uid / gid
	ixgbe: prevent ptp_rx_hang from running when in FILTER_ALL mode
	sh_eth: fix TSU init on SH7734/R8A7740
	power: supply: ltc2941-battery-gauge: Fix temperature units
	ARM: dts: bcm283x: Fix probing of bcm2835-i2s
	ARM: dts: bcm283x: Fix pin function of JTAG pins
	PCMCIA / PM: Avoid noirq suspend aborts during suspend-to-idle
	audit: return on memory error to avoid null pointer dereference
	net: stmmac: call correct function in stmmac_mac_config_rx_queues_routing()
	rcu: Call touch_nmi_watchdog() while printing stall warnings
	pinctrl: sh-pfc: r8a7796: Fix MOD_SEL register pin assignment for SSI pins group
	dpaa_eth: fix pause capability advertisement logic
	MIPS: Octeon: Fix logging messages with spurious periods after newlines
	drm/rockchip: Respect page offset for PRIME mmap calls
	x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified
	perf test: Fix test case inet_pton to accept inlines.
	perf report: Fix wrong jump arrow
	perf tests: Use arch__compare_symbol_names to compare symbols
	perf report: Fix memory corruption in --branch-history mode --branch-history
	perf tests: Fix dwarf unwind for stripped binaries
	selftests/net: fixes psock_fanout eBPF test case
	netlabel: If PF_INET6, check sk_buff ip header version
	drm: rcar-du: lvds: Fix LVDS startup on R-Car Gen3
	drm: rcar-du: lvds: Fix LVDS startup on R-Car Gen2
	ARM: dts: at91: tse850: use the correct compatible for the eeprom
	regmap: Correct comparison in regmap_cached
	i40e: Add delay after EMP reset for firmware to recover
	ARM: dts: imx7d: cl-som-imx7: fix pinctrl_enet
	ARM: dts: porter: Fix HDMI output routing
	regulator: of: Add a missing 'of_node_put()' in an error handling path of 'of_regulator_match()'
	pinctrl: msm: Use dynamic GPIO numbering
	pinctrl: mcp23s08: spi: Fix regmap debugfs entries
	kdb: make "mdr" command repeat
	drm/vmwgfx: Set dmabuf_size when vmw_dmabuf_init is successful
	Linux 4.14.45

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-05-30 13:17:17 +02:00
Dave Chinner
b9659ff375 fs: don't scan the inode cache before SB_BORN is set
commit 79f546a696bff2590169fb5684e23d65f4d9f591 upstream.

We recently had an oops reported on a 4.14 kernel in
xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
and so the m_perag_tree lookup walked into lala land.  It produces
an oops down this path during the failed mount:

  radix_tree_gang_lookup_tag+0xc4/0x130
  xfs_perag_get_tag+0x37/0xf0
  xfs_reclaim_inodes_count+0x32/0x40
  xfs_fs_nr_cached_objects+0x11/0x20
  super_cache_count+0x35/0xc0
  shrink_slab.part.66+0xb1/0x370
  shrink_node+0x7e/0x1a0
  try_to_free_pages+0x199/0x470
  __alloc_pages_slowpath+0x3a1/0xd20
  __alloc_pages_nodemask+0x1c3/0x200
  cache_grow_begin+0x20b/0x2e0
  fallback_alloc+0x160/0x200
  kmem_cache_alloc+0x111/0x4e0

The problem is that the superblock shrinker is running before the
filesystem structures it depends on have been fully set up. i.e.
the shrinker is registered in sget(), before ->fill_super() has been
called, and the shrinker can call into the filesystem before
fill_super() does it's setup work. Essentially we are exposed to
both use-after-free and use-before-initialisation bugs here.

To fix this, add a check for the SB_BORN flag in super_cache_count.
In general, this flag is not set until ->fs_mount() completes
successfully, so we know that it is set after the filesystem
setup has completed. This matches the trylock_super() behaviour
which will not let super_cache_scan() run if SB_BORN is not set, and
hence will not allow the superblock shrinker from entering the
filesystem while it is being set up or after it has failed setup
and is being torn down.

Cc: stable@kernel.org
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:51:47 +02:00
Greg Kroah-Hartman
85ab9a0468 This is the 4.14.24 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlqaaf0ACgkQONu9yGCS
 aT7cDxAAxjZ8e9TGlix7q2wIWSFRfAaWpb4SyZYxP6pYnrdhrHr6IQ+U5ydtiRcz
 T+zYkpXGMTMdkmKogXITp8FUL9ztkABJ/RyHcYuTdxTSpSUN67KNrVwGbM5NobX/
 dPwPkkvUQDh1jyCUsqbYMoGfBSJVH5e7KgsfCtpcnckNzX3R2TOuwRb7aVjpyD63
 Nb2tY70o07bjQZ+M3iWM1cHQ5AaMkJcZeML7mc/40AAcDB0pPNr53LKfVjSFrwgK
 Od5tOHR//XF17Kdi1dtT+XSmHsXcocq4FEp6x4htJPD19uOou5KC31ceXi2k8UEG
 g6iCRrsijdTrsl0ajyrwvXRWtQFN5fUw6BjA1G1/82FE8Eovxv28VjEHFElS+jX3
 gQNDsyeJjQIP7Kpq2tRLmUTtFBGnBW7pcLRR/9jmZJdKsvTGa1BwOUbp9OO2FHip
 hiijnuqz8gpS9mEilALpAF7QLQk3dX8qLS1HZO3KKnFLxwSJqZhENvdfPZ2Fl7kr
 4zavBe7suEyj1+jEt6xqksNOEZh+KAqRIhOZVBry9bvxAG4VCiN6pxEx63uIimMC
 bN9OFZZACFlao/4MCOggS0M48/tWU15Hep+jstUZ3FarUfrNy4VcRjcrTKdDEPMX
 Z5kwJEi9p/J0cReQMagJ/Y63aG4lPHTW8wUxOlHcp+e1wi0q+Kc=
 =h0lU
 -----END PGP SIGNATURE-----

Merge 4.14.24 into android-4.14

Changes in 4.14.24
	hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers)
	exec: avoid gcc-8 warning for get_task_comm
	mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()'
	scsi: aacraid: Fix I/O drop during reset
	dmaengine: fsl-edma: disable clks on all error paths
	phy: cpcap-usb: Fix platform_get_irq_byname's error checking.
	nvme-fc: remove double put reference if admin connect fails
	nvme: check hw sectors before setting chunk sectors
	net: aquantia: Fix actual speed capabilities reporting
	net: aquantia: Fix hardware DMA stream overload on large MRRS
	net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
	mtd: nand: gpmi: Fix failure when a erased page has a bitflip at BBM
	mtd: nand: brcmnand: Zero bitflip is not an error
	ipv6: icmp6: Allow icmp messages to be looped back
	parisc: Reduce thread stack to 16 kb
	ARM: 8731/1: Fix csum_partial_copy_from_user() stack mismatch
	x86/asm: Allow again using asm.h when building for the 'bpf' clang target
	sctp: fix the issue that a __u16 variable may overflow in sctp_ulpq_renege
	sget(): handle failures of register_shrinker()
	net: phy: xgene: disable clk on error paths
	drm/nouveau/pci: do a msi rearm on init
	xfrm: Reinject transport-mode packets through tasklet
	x86/stacktrace: Make zombie stack traces reliable
	mac80211_hwsim: Fix a possible sleep-in-atomic bug in hwsim_get_radio_nl
	spi: atmel: fixed spin_lock usage inside atmel_spi_remove
	ASoC: nau8825: fix issue that pop noise when start capture
	cgroup: Fix deadlock in cpu hotplug path
	staging: ion: Fix ion_cma_heap allocations
	x86-64/Xen: eliminate W+X mappings
	net: mediatek: setup proper state for disabled GMAC on the default
	net: arc_emac: fix arc_emac_rx() error paths
	vxlan: update skb dst pmtu on tx path
	ip_gre: remove the incorrect mtu limit for ipgre tap
	ip6_gre: remove the incorrect mtu limit for ipgre tap
	ip6_tunnel: get the min mtu properly in ip6_tnl_xmit
	net: stmmac: Fix TX timestamp calculation
	net: stmmac: Fix bad RX timestamp extraction
	net/mlx5e: Fix ETS BW check
	net/mlx5: Cleanup IRQs in case of unload failure
	net/mlx5: Stay in polling mode when command EQ destroy fails
	ASoC: rsnd: fixup ADG register mask
	xen/balloon: Mark unallocated host memory as UNUSABLE
	netfilter: nf_tables: fix chain filter in nf_tables_dump_rules()
	scsi: storvsc: Fix scsi_cmd error assignments in storvsc_handle_error
	netfilter: uapi: correct UNTRACKED conntrack state bit number
	i915: Reject CCS modifiers for pipe C on Geminilake
	RDMA/vmw_pvrdma: Call ib_umem_release on destroy QP path
	ARM: dts: ls1021a: fix incorrect clock references
	crypto: af_alg - Fix race around ctx->rcvused by making it atomic_t
	lib/mpi: Fix umul_ppmm() for MIPS64r6
	arm64: dts: renesas: ulcb: Remove renesas, no-ether-link property
	crypto: inside-secure - per request invalidation
	crypto: inside-secure - free requests even if their handling failed
	crypto: inside-secure - fix request allocations in invalidation path
	netfilter: nf_tables: fix potential NULL-ptr deref in nf_tables_dump_obj_done()
	tipc: error path leak fixes in tipc_enable_bearer()
	tipc: fix tipc_mon_delete() oops in tipc_enable_bearer() error path
	tg3: Add workaround to restrict 5762 MRRS to 2048
	tg3: Enable PHY reset in MTU change path for 5720
	bnx2x: Improve reliability in case of nested PCI errors
	perf/x86/intel: Plug memory leak in intel_pmu_init()
	led: core: Fix brightness setting when setting delay_off=0
	IB/mlx5: Fix mlx5_ib_alloc_mr error flow
	genirq: Guard handle_bad_irq log messages
	afs: Fix missing error handling in afs_write_end()
	s390/dasd: fix wrongly assigned configuration data
	btrfs: Fix flush bio leak
	ip6_tunnel: allow ip6gre dev mtu to be set below 1280
	Input: xen-kbdfront - do not advertise multi-touch pressure support
	IB/mlx4: Fix mlx4_ib_alloc_mr error flow
	IB/ipoib: Fix race condition in neigh creation
	xfs: quota: fix missed destroy of qi_tree_lock
	xfs: quota: check result of register_shrinker()
	macvlan: Fix one possible double free
	e1000: fix disabling already-disabled warning
	NET: usb: qmi_wwan: add support for YUGA CLM920-NC5 PID 0x9625
	drm/ttm: check the return value of kzalloc
	RDMA/netlink: Fix locking around __ib_get_device_by_index
	x86/efi: Fix kernel param add_efi_memmap regression
	uapi libc compat: add fallback for unsupported libcs
	i40e/i40evf: Account for frags split over multiple descriptors in check linearize
	i40e: don't remove netdev->dev_addr when syncing uc list
	net: ena: unmask MSI-X only after device initialization is completed
	nl80211: Check for the required netlink attribute presence
	mac80211: mesh: drop frames appearing to be from us
	can: flex_can: Correct the checking for frame length in flexcan_start_xmit()
	wcn36xx: Fix dynamic power saving
	block: drain queue before waiting for q_usage_counter becoming zero
	ia64, sched/cputime: Fix build error if CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
	bpf: sockmap missing NULL psock check
	leds: core: Fix regression caused by commit 2b83ff96f51d
	powerpc/pseries: Make RAS IRQ explicitly dependent on DLPAR WQ
	nvme-fabrics: initialize default host->id in nvmf_host_default()
	x86/platform/intel-mid: Revert "Make 'bt_sfi_data' const"
	bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
	bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
	xen-netfront: enable device after manual module load
	mdio-sun4i: Fix a memory leak
	SolutionEngine771x: fix Ether platform data
	xen/gntdev: Fix off-by-one error when unmapping with holes
	xen/gntdev: Fix partial gntdev_mmap() cleanup
	sctp: add a ceiling to optlen in some sockopts
	sctp: make use of pre-calculated len
	net: gianfar_ptp: move set_fipers() to spinlock protecting area
	of_mdio: avoid MDIO bus removal when a PHY is missing
	nfp: always unmask aux interrupts at init
	mlxsw: pci: Wait after reset before accessing HW
	MIPS: Implement __multi3 for GCC7 MIPS64r6 builds
	powerpc/pseries: Enable RAS hotplug events later
	arm64: dts: marvell: add comphy nodes on cp110 master and slave
	arm64: dts: marvell: mcbin: add comphy references to Ethernet ports
	net: sched: fix crash when deleting secondary chains
	net: sched: crash on blocks with goto chain action
	net_sched: get rid of rcu_barrier() in tcf_block_put_ext()
	net: sched: fix use-after-free in tcf_block_put_ext
	Linux 4.14.24

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-03-05 07:42:40 +01:00
Al Viro
ac4dc9f1af sget(): handle failures of register_shrinker()
[ Upstream commit 9ee332d99e4d5a97548943b81c54668450ce641b ]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03 10:24:24 +01:00
Daniel Rosenberg
04e61a5bd7 ANDROID: vfs: Allow filesystems to access their private mount data
Now we pass the vfsmount when mounting and remounting.
This allows the filesystem to actually set up the mount
specific data, although we can't quite do anything with
it yet. show_options is expanded to include data that
lives with the mount.

To avoid changing existing filesystems, these have
been added as new vfs functions.

Change-Id: If80670bfad9f287abb8ac22457e1b034c9697097
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2018-01-29 19:39:58 -08:00
Christian Poetzsch
078204d5b8 ANDROID: fs: Fix for in kernel emergency remount when loop mounts are used
adb reboot calls /proc/sysrq-trigger to force an emergency remount (ro) of all
mounted disks. This is executed in the order of the time the mount was originally
done. Because we have a test system which loop mount images from an extra
partition, we see errors cause the loop mounted partitions gets remounted after
this physical partition was set to read only already.

Fix this by reversing the order of the emergency remount. This will remount the
disk first which have been mounted last.

So instead of remounting in this order:
 /dev/sda1
 /dev/loop1
 /dev/loop2
we now remount in this order:
 /dev/loop2
 /dev/loop1
 /dev/sda1

Change-Id: I68fe7e16cc9400ab5278877af70c9ea1d9b57936
Signed-off-by: Christian Poetzsch <christian.potzsch@imgtec.com>
2017-12-18 21:11:22 +05:30
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Linus Torvalds
0f0d12728e Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount flag updates from Al Viro:
 "Another chunk of fmount preparations from dhowells; only trivial
  conflicts for that part. It separates MS_... bits (very grotty
  mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
  only a small subset of MS_... stuff).

  This does *not* convert the filesystems to new constants; only the
  infrastructure is done here. The next step in that series is where the
  conflicts would be; that's the conversion of filesystems. It's purely
  mechanical and it's better done after the merge, so if you could run
  something like

	list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

	sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \
	        -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \
	        -e 's/\<MS_NODEV\>/SB_NODEV/g' \
	        -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \
	        -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \
	        -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \
	        -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \
	        -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \
	        -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \
	        -e 's/\<MS_SILENT\>/SB_SILENT/g' \
	        -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \
	        -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \
	        -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \
	        -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \
	        $list

  and commit it with something along the lines of 'convert filesystems
  away from use of MS_... constants' as commit message, it would save a
  quite a bit of headache next cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Differentiate mount flags (MS_*) from internal superblock flags
  VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
  vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-14 18:54:01 -07:00
Jan Kara
bc8230ee8e quota: Convert dqio_mutex to rwsem
Convert dqio_mutex to rwsem and call it dqio_sem. No functional changes
yet.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 18:52:48 +02:00
David Howells
e462ec50cb VFS: Differentiate mount flags (MS_*) from internal superblock flags
Differentiate the MS_* flags passed to mount(2) from the internal flags set
in the super_block's s_flags.  s_flags are now called SB_*, with the names
and the values for the moment mirroring the MS_* flags that they're
equivalent to.

In this patch, just the headers are altered and some kernel code where
blind automated conversion isn't necessarily correct.

Note that this shows up some interesting issues:

 (1) Some MS_* flags get translated to MNT_* flags (such as MS_NODEV ->
     MNT_NODEV) without passing this on to the filesystem, but some
     filesystems set such flags anyway.

 (2) The ->remount_fs() methods of some filesystems adjust the *flags
     argument by setting MS_* flags in it, such as MS_NOATIME - but these
     flags are then scrubbed by do_remount_sb() (only the occupants of
     MS_RMT_MASK are permitted: MS_RDONLY, MS_SYNCHRONOUS, MS_MANDLOCK,
     MS_I_VERSION and MS_LAZYTIME)

I'm not sure what's the best way to solve all these cases.

Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
2017-07-17 08:45:35 +01:00
David Howells
bc98a42c1f VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
Firstly by applying the following with coccinelle's spatch:

	@@ expression SB; @@
	-SB->s_flags & MS_RDONLY
	+sb_rdonly(SB)

to effect the conversion to sb_rdonly(sb), then by applying:

	@@ expression A, SB; @@
	(
	-(!sb_rdonly(SB)) && A
	+!sb_rdonly(SB) && A
	|
	-A != (sb_rdonly(SB))
	+A != sb_rdonly(SB)
	|
	-A == (sb_rdonly(SB))
	+A == sb_rdonly(SB)
	|
	-!(sb_rdonly(SB))
	+!sb_rdonly(SB)
	|
	-A && (sb_rdonly(SB))
	+A && sb_rdonly(SB)
	|
	-A || (sb_rdonly(SB))
	+A || sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) != A
	+sb_rdonly(SB) != A
	|
	-(sb_rdonly(SB)) == A
	+sb_rdonly(SB) == A
	|
	-(sb_rdonly(SB)) && A
	+sb_rdonly(SB) && A
	|
	-(sb_rdonly(SB)) || A
	+sb_rdonly(SB) || A
	)

	@@ expression A, B, SB; @@
	(
	-(sb_rdonly(SB)) ? 1 : 0
	+sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) ? A : B
	+sb_rdonly(SB) ? A : B
	)

to remove left over excess bracketage and finally by applying:

	@@ expression A, SB; @@
	(
	-(A & MS_RDONLY) != sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
	|
	-(A & MS_RDONLY) == sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
	)

to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-07-17 08:45:34 +01:00
David Howells
1d278a8790 VFS: Kill off s_options and helpers
Kill off s_options, save/replace_mount_options() and generic_show_options()
as all filesystems now implement ->show_options() for themselves.  This
should make it easier to implement a context-based mount where the mount
options can be passed individually over a file descriptor.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-11 06:09:21 -04:00
David Howells
dd111b31e9 VFS: Clean up whitespace in fs/namespace.c and fs/super.c
Clean up line terminal whitespace in fs/namespace.c and fs/super.c.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-06 03:27:09 -04:00
Jan Kara
7c4cc30024 bdi: Drop 'parent' argument from bdi_register[_va]()
Drop 'parent' argument of bdi_register() and bdi_register_va().  It is
always NULL.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20 12:09:55 -06:00
Jan Kara
c1844d536d fs: Remove SB_I_DYNBDI flag
Now that all bdi structures filesystems use are properly refcounted, we
can remove the SB_I_DYNBDI flag.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20 12:09:55 -06:00
Jan Kara
13eec2363e fs: Get proper reference for s_bdi
So far we just relied on block device to hold a bdi reference for us
while the filesystem is mounted. While that works perfectly fine, it is
a bit awkward that we have a pointer to a refcounted structure in the
superblock without proper reference. So make s_bdi hold a proper
reference to block device's BDI. No filesystem using mount_bdev()
actually changes s_bdi so this is safe and will make bdev filesystems
work the same way as filesystems needing to set up their private bdi.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20 12:09:55 -06:00
Jan Kara
fca39346a5 fs: Provide infrastructure for dynamic BDIs in filesystems
Provide helper functions for setting up dynamically allocated
backing_dev_info structures for filesystems and cleaning them up on
superblock destruction.

CC: linux-mtd@lists.infradead.org
CC: linux-nfs@vger.kernel.org
CC: Petr Vandrovec <petr@vandrovec.name>
CC: linux-nilfs@vger.kernel.org
CC: cluster-devel@redhat.com
CC: osd-dev@open-osd.org
CC: codalist@coda.cs.cmu.edu
CC: linux-afs@lists.infradead.org
CC: ecryptfs@vger.kernel.org
CC: linux-cifs@vger.kernel.org
CC: ceph-devel@vger.kernel.org
CC: linux-btrfs@vger.kernel.org
CC: v9fs-developer@lists.sourceforge.net
CC: lustre-devel@lists.lustre.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20 12:09:55 -06:00
Linus Torvalds
f1ef09fde1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "There is a lot here. A lot of these changes result in subtle user
  visible differences in kernel behavior. I don't expect anything will
  care but I will revert/fix things immediately if any regressions show
  up.

  From Seth Forshee there is a continuation of the work to make the vfs
  ready for unpriviled mounts. We had thought the previous changes
  prevented the creation of files outside of s_user_ns of a filesystem,
  but it turns we missed the O_CREAT path. Ooops.

  Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
  standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
  children that are forked after the prctl are considered and not
  children forked before the prctl. The only known user of this prctl
  systemd forks all children after the prctl. So no userspace
  regressions will occur. Holding earlier forked children to the same
  rules as later forked children creates a semantic that is sane enough
  to allow checkpoing of processes that use this feature.

  There is a long delayed change by Nikolay Borisov to limit inotify
  instances inside a user namespace.

  Michael Kerrisk extends the API for files used to maniuplate
  namespaces with two new trivial ioctls to allow discovery of the
  hierachy and properties of namespaces.

  Konstantin Khlebnikov with the help of Al Viro adds code that when a
  network namespace exits purges it's sysctl entries from the dcache. As
  in some circumstances this could use a lot of memory.

  Vivek Goyal fixed a bug with stacked filesystems where the permissions
  on the wrong inode were being checked.

  I continue previous work on ptracing across exec. Allowing a file to
  be setuid across exec while being ptraced if the tracer has enough
  credentials in the user namespace, and if the process has CAP_SETUID
  in it's own namespace. Proc files for setuid or otherwise undumpable
  executables are now owned by the root in the user namespace of their
  mm. Allowing debugging of setuid applications in containers to work
  better.

  A bug I introduced with permission checking and automount is now
  fixed. The big change is to mark the mounts that the kernel initiates
  as a result of an automount. This allows the permission checks in sget
  to be safely suppressed for this kind of mount. As the permission
  check happened when the original filesystem was mounted.

  Finally a special case in the mount namespace is removed preventing
  unbounded chains in the mount hash table, and making the semantics
  simpler which benefits CRIU.

  The vfs fix along with related work in ima and evm I believe makes us
  ready to finish developing and merge fully unprivileged mounts of the
  fuse filesystem. The cleanups of the mount namespace makes discussing
  how to fix the worst case complexity of umount. The stacked filesystem
  fixes pave the way for adding multiple mappings for the filesystem
  uids so that efficient and safer containers can be implemented"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc/sysctl: Don't grab i_lock under sysctl_lock.
  vfs: Use upper filesystem inode in bprm_fill_uid()
  proc/sysctl: prune stale dentries during unregistering
  mnt: Tuck mounts under others instead of creating shadow/side mounts.
  prctl: propagate has_child_subreaper flag to every descendant
  introduce the walk_process_tree() helper
  nsfs: Add an ioctl() to return owner UID of a userns
  fs: Better permission checking for submounts
  exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
  vfs: open() with O_CREAT should not create inodes with unknown ids
  nsfs: Add an ioctl() to return the namespace type
  proc: Better ownership of files for non-dumpable tasks in user namespaces
  exec: Remove LSM_UNSAFE_PTRACE_CAP
  exec: Test the ptracer's saved cred to see if the tracee can gain caps
  exec: Don't reset euid and egid when the tracee has CAP_SETUID
  inotify: Convert to using per-namespace limits
2017-02-23 20:33:51 -08:00
Jan Kara
dc3b17cc8b block: Use pointer to backing_dev_info from request_queue
We will want to have struct backing_dev_info allocated separately from
struct request_queue. As the first step add pointer to backing_dev_info
to request_queue and convert all users touching it. No functional
changes in this patch.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02 08:20:48 -07:00
Eric W. Biederman
93faccbbfa fs: Better permission checking for submounts
To support unprivileged users mounting filesystems two permission
checks have to be performed: a test to see if the user allowed to
create a mount in the mount namespace, and a test to see if
the user is allowed to access the specified filesystem.

The automount case is special in that mounting the original filesystem
grants permission to mount the sub-filesystems, to any user who
happens to stumble across the their mountpoint and satisfies the
ordinary filesystem permission checks.

Attempting to handle the automount case by using override_creds
almost works.  It preserves the idea that permission to mount
the original filesystem is permission to mount the sub-filesystem.
Unfortunately using override_creds messes up the filesystems
ordinary permission checks.

Solve this by being explicit that a mount is a submount by introducing
vfs_submount, and using it where appropriate.

vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let
sget and friends know that a mount is a submount so they can take appropriate
action.

sget and sget_userns are modified to not perform any permission checks
on submounts.

follow_automount is modified to stop using override_creds as that
has proven problemantic.

do_mount is modified to always remove the new MS_SUBMOUNT flag so
that we know userspace will never by able to specify it.

autofs4 is modified to stop using current_real_cred that was put in
there to handle the previous version of submount permission checking.

cifs is modified to pass the mountpoint all of the way down to vfs_submount.

debugfs is modified to pass the mountpoint all of the way down to
trace_automount by adding a new parameter.  To make this change easier
a new typedef debugfs_automount_t is introduced to capture the type of
the debugfs automount function.

Cc: stable@vger.kernel.org
Fixes: 069d5ac9ae0d ("autofs:  Fix automounts by using current_real_cred()->uid")
Fixes: aeaa4a79ff6a ("fs: Call d_automount with the filesystems creds")
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-02-02 04:36:12 +13:00
Jan Kara
c3b004460d quota: Remove dqonoff_mutex
The only places that were grabbing dqonoff_mutex are functions turning
quotas on and off and these are properly serialized using s_umount
semaphore. Remove dqonoff_mutex.

Signed-off-by: Jan Kara <jack@suse.cz>
2016-11-30 08:38:07 +01:00
Jan Kara
ba6379f7e6 fs: Provide function to get superblock with exclusive s_umount
Quota code will need a variant of get_super_thawed() that returns
superblock with s_umount held in exclusive mode to serialize quota on
and quota off operations. Provide this functionality.

Signed-off-by: Jan Kara <jack@suse.cz>
2016-11-23 12:53:00 +01:00
Oleg Nesterov
f1a9622037 fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths
sb_wait_write()->percpu_rwsem_release() fools lockdep to avoid the
false-positives. Now that xfs was fixed by Dave's commit dbad7c993053
("xfs: stop holding ILOCK over filldir callbacks") we can remove it and
change freeze_super() and thaw_super() to run with s_writers.rw_sem locks
held; we add two trivial helpers for that, lockdep_sb_freeze_release()
and lockdep_sb_freeze_acquire().

xfstests-dev/check `grep -il freeze tests/*/???` does not trigger any
warning from lockdep.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-14 20:41:59 -04:00
Oleg Nesterov
89f39af129 fs/super.c: fix race between freeze_super() and thaw_super()
Change thaw_super() to check frozen != SB_FREEZE_COMPLETE rather than
frozen == SB_UNFROZEN, otherwise it can race with freeze_super() which
drops sb->s_umount after SB_FREEZE_WRITE to preserve the lock ordering.

In this case thaw_super() will wrongly call s_op->unfreeze_fs() before
it was actually frozen, and call sb_freeze_unlock() which leads to the
unbalanced percpu_up_write(). Unfortunately lockdep can't detect this,
so this triggers misc BUG_ON()'s in kernel/rcu/sync.c.

Reported-and-tested-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-14 20:00:34 -04:00
Linus Torvalds
a867d7349e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns vfs updates from Eric Biederman:
 "This tree contains some very long awaited work on generalizing the
  user namespace support for mounting filesystems to include filesystems
  with a backing store.  The real world target is fuse but the goal is
  to update the vfs to allow any filesystem to be supported.  This
  patchset is based on a lot of code review and testing to approach that
  goal.

  While looking at what is needed to support the fuse filesystem it
  became clear that there were things like xattrs for security modules
  that needed special treatment.  That the resolution of those concerns
  would not be fuse specific.  That sorting out these general issues
  made most sense at the generic level, where the right people could be
  drawn into the conversation, and the issues could be solved for
  everyone.

  At a high level what this patchset does a couple of simple things:

   - Add a user namespace owner (s_user_ns) to struct super_block.

   - Teach the vfs to handle filesystem uids and gids not mapping into
     to kuids and kgids and being reported as INVALID_UID and
     INVALID_GID in vfs data structures.

  By assigning a user namespace owner filesystems that are mounted with
  only user namespace privilege can be detected.  This allows security
  modules and the like to know which mounts may not be trusted.  This
  also allows the set of uids and gids that are communicated to the
  filesystem to be capped at the set of kuids and kgids that are in the
  owning user namespace of the filesystem.

  One of the crazier corner casees this handles is the case of inodes
  whose i_uid or i_gid are not mapped into the vfs.  Most of the code
  simply doesn't care but it is easy to confuse the inode writeback path
  so no operation that could cause an inode write-back is permitted for
  such inodes (aka only reads are allowed).

  This set of changes starts out by cleaning up the code paths involved
  in user namespace permirted mounts.  Then when things are clean enough
  adds code that cleanly sets s_user_ns.  Then additional restrictions
  are added that are possible now that the filesystem superblock
  contains owner information.

  These changes should not affect anyone in practice, but there are some
  parts of these restrictions that are changes in behavior.

   - Andy's restriction on suid executables that does not honor the
     suid bit when the path is from another mount namespace (think
     /proc/[pid]/fd/) or when the filesystem was mounted by a less
     privileged user.

   - The replacement of the user namespace implicit setting of MNT_NODEV
     with implicitly setting SB_I_NODEV on the filesystem superblock
     instead.

     Using SB_I_NODEV is a stronger form that happens to make this state
     user invisible.  The user visibility can be managed but it caused
     problems when it was introduced from applications reasonably
     expecting mount flags to be what they were set to.

  There is a little bit of work remaining before it is safe to support
  mounting filesystems with backing store in user namespaces, beyond
  what is in this set of changes.

   - Verifying the mounter has permission to read/write the block device
     during mount.

   - Teaching the integrity modules IMA and EVM to handle filesystems
     mounted with only user namespace root and to reduce trust in their
     security xattrs accordingly.

   - Capturing the mounters credentials and using that for permission
     checks in d_automount and the like.  (Given that overlayfs already
     does this, and we need the work in d_automount it make sense to
     generalize this case).

  Furthermore there are a few changes that are on the wishlist:

   - Get all filesystems supporting posix acls using the generic posix
     acls so that posix_acl_fix_xattr_from_user and
     posix_acl_fix_xattr_to_user may be removed.  [Maintainability]

   - Reducing the permission checks in places such as remount to allow
     the superblock owner to perform them.

   - Allowing the superblock owner to chown files with unmapped uids and
     gids to something that is mapped so the files may be treated
     normally.

  I am not considering even obvious relaxations of permission checks
  until it is clear there are no more corner cases that need to be
  locked down and handled generically.

  Many thanks to Seth Forshee who kept this code alive, and putting up
  with me rewriting substantial portions of what he did to handle more
  corner cases, and for his diligent testing and reviewing of my
  changes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
  fs: Call d_automount with the filesystems creds
  fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
  evm: Translate user/group ids relative to s_user_ns when computing HMAC
  dquot: For now explicitly don't support filesystems outside of init_user_ns
  quota: Handle quota data stored in s_user_ns in quota_setxquota
  quota: Ensure qids map to the filesystem
  vfs: Don't create inodes with a uid or gid unknown to the vfs
  vfs: Don't modify inodes with a uid or gid unknown to the vfs
  cred: Reject inodes with invalid ids in set_create_file_as()
  fs: Check for invalid i_uid in may_follow_link()
  vfs: Verify acls are valid within superblock's s_user_ns.
  userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
  fs: Refuse uid/gid changes which don't map into s_user_ns
  selinux: Add support for unprivileged mounts from user namespaces
  Smack: Handle labels consistently in untrusted mounts
  Smack: Add support for unprivileged mounts from user namespaces
  fs: Treat foreign mounts as nosuid
  fs: Limit file caps to the user namespace of the super block
  userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
  userns: Remove implicit MNT_NODEV fragility.
  ...
2016-07-29 15:54:19 -07:00
Dave Chinner
6c60d2b574 fs/fs-writeback.c: add a new writeback list for sync
wait_sb_inodes() currently does a walk of all inodes in the filesystem
to find dirty one to wait on during sync.  This is highly inefficient
and wastes a lot of CPU when there are lots of clean cached inodes that
we don't need to wait on.

To avoid this "all inode" walk, we need to track inodes that are
currently under writeback that we need to wait for.  We do this by
adding inodes to a writeback list on the sb when the mapping is first
tagged as having pages under writeback.  wait_sb_inodes() can then walk
this list of "inodes under IO" and wait specifically just for the inodes
that the current sync(2) needs to wait for.

Define a couple helpers to add/remove an inode from the writeback list
and call them when the overall mapping is tagged for or cleared from
writeback.  Update wait_sb_inodes() to walk only the inodes under
writeback due to the sync.

With this change, filesystem sync times are significantly reduced for
fs' with largely populated inode caches and otherwise no other work to
do.  For example, on a 16xcpu 2GHz x86-64 server, 10TB XFS filesystem
with a ~10m entry inode cache, sync times are reduced from ~7.3s to less
than 0.1s when the filesystem is fully clean.

Link: http://lkml.kernel.org/r/1466594593-6757-2-git-send-email-bfoster@redhat.com
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Holger Hoffstätte <holger.hoffstaette@applied-asynchrony.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Eric W. Biederman
cc50a07a24 userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
Now that SB_I_NODEV controls the nodev behavior devpts can just clear
this flag during mount.  Simplifying the code and making it easier
to audit how the code works.  While still preserving the invariant
that s_iflags is only modified during mount.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-23 15:47:31 -05:00
Eric W. Biederman
67690f937c userns: Remove implicit MNT_NODEV fragility.
Replace the implict setting of MNT_NODEV on mounts that happen with
just user namespace permissions with an implicit setting of SB_I_NODEV
in s_iflags.  The visibility of the implicit MNT_NODEV has caused
problems in the past.

With this change the fragile case where an implicit MNT_NODEV needs to
be preserved in do_remount is removed.  Using SB_I_NODEV is much less
fragile as s_iflags are set during the original mount and never
changed.

In do_new_mount with the implicit setting of MNT_NODEV gone, the only
code that can affect mnt_flags is fs_fully_visible so simplify the if
statement and reduce the indentation of the code to make that clear.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-23 15:47:23 -05:00
Eric W. Biederman
a001e74cef mnt: Move the FS_USERNS_MOUNT check into sget_userns
Allowing a filesystem to be mounted by other than root in the initial
user namespace is a filesystem property not a mount namespace property
and as such should be checked in filesystem specific code.  Move the
FS_USERNS_MOUNT test into super.c:sget_userns().

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-23 15:41:55 -05:00
Eric W. Biederman
6e4eab577a fs: Add user namespace member to struct super_block
Start marking filesystems with a user namespace owner, s_user_ns.  In
this change this is only used for permission checks of who may mount a
filesystem.  Ultimately s_user_ns will be used for translating ids and
checking capabilities for filesystems mounted from user namespaces.

The default policy for setting s_user_ns is implemented in sget(),
which arranges for s_user_ns to be set to current_user_ns() and to
ensure that the mounter of the filesystem has CAP_SYS_ADMIN in that
user_ns.

The guts of sget are split out into another function sget_userns().
The function sget_userns calls alloc_super with the specified user
namespace or it verifies the existing superblock that was found
has the expected user namespace, and fails with EBUSY when it is not.
This failing prevents users with the wrong privileges mounting a
filesystem.

The reason for the split of sget_userns from sget is that in some
cases such as mount_ns and kernfs_mount_ns a different policy for
permission checking of mounts and setting s_user_ns is necessary, and
the existence of sget_userns() allows those policies to be
implemented.

The helper mount_ns is expected to be used for filesystems such as
proc and mqueuefs which present per namespace information.  The
function mount_ns is modified to call sget_userns instead of sget to
ensure the user namespace owner of the namespace whose information is
presented by the filesystem is used on the superblock.

For sysfs and cgroup the appropriate permission checks are already in
place, and kernfs_mount_ns is modified to call sget_userns so that
the init_user_ns is the only user namespace used.

For the cgroup filesystem cgroup namespace mounts are bind mounts of a
subset of the full cgroup filesystem and as such s_user_ns must be the
same for all of them as there is only a single superblock.

Mounts of sysfs that vary based on the network namespace could in principle
change s_user_ns but it keeps the analysis and implementation of kernfs
simpler if that is not supported, and at present there appear to be no
benefits from supporting a different s_user_ns on any sysfs mount.

Getting the details of setting s_user_ns correct has been
a long process.  Thanks to Pavel Tikhorirorv who spotted a leak
in sget_userns.  Thanks to Seth Forshee who has kept the work alive.

Thanks-to: Seth Forshee <seth.forshee@canonical.com>
Thanks-to: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-06-23 15:41:55 -05:00
Eric W. Biederman
d91ee87d8d vfs: Pass data, ns, and ns->userns to mount_ns
Today what is normally called data (the mount options) is not passed
to fill_super through mount_ns.

Pass the mount options and the namespace separately to mount_ns so
that filesystems such as proc that have mount options, can use
mount_ns.

Pass the user namespace to mount_ns so that the standard permission
check that verifies the mounter has permissions over the namespace can
be performed in mount_ns instead of in each filesystems .mount method.
Thus removing the duplication between mqueuefs and proc in terms of
permission checks.  The extra permission check does not currently
affect the rpc_pipefs filesystem and the nfsd filesystem as those
filesystems do not currently allow unprivileged mounts.  Without
unpvileged mounts it is guaranteed that the caller has already passed
capable(CAP_SYS_ADMIN) which guarantees extra permission check will
pass.

Update rpc_pipefs and the nfsd filesystem to ensure that the network
namespace reference is always taken in fill_super and always put in kill_sb
so that the logic is simpler and so that errors originating inside of
fill_super do not cause a network namespace leak.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-23 15:41:53 -05:00
Jiri Kosina
9938b04472 Merge branch 'master' into for-next
Sync with Linus' tree so that patches against newer codebase can be applied.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-18 11:18:55 +02:00
Masanari Iida
bd7ced9881 Doc: treewide : Fix typos in DocBook/filesystem.xml
This patch fix spelling typos found in DocBook/filesystem.xml.
It is because the file was generated from comments in code,
I have to fix the comments in codes, instead of xml file.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-18 11:13:05 +02:00
Tejun Heo
a1a0e23e49 writeback: flush inode cgroup wb switches instead of pinning super_block
If cgroup writeback is in use, inodes can be scheduled for
asynchronous wb switching.  Before 5ff8eaac1636 ("writeback: keep
superblock pinned during cgroup writeback association switches"), this
could race with umount leading to super_block being destroyed while
inodes are pinned for wb switching.  5ff8eaac1636 fixed it by bumping
s_active while wb switches are in flight; however, this allowed
in-flight wb switches to make umounts asynchronous when the userland
expected synchronosity - e.g. fsck immediately following umount may
fail because the device is still busy.

This patch removes the problematic super_block pinning and instead
makes generic_shutdown_super() flush in-flight wb switches.  wb
switches are now executed on a dedicated isw_wq so that they can be
flushed and isw_nr_in_flight keeps track of the number of in-flight wb
switches so that flushing can be avoided in most cases.

v2: Move cgroup_writeback_umount() further below and add MS_ACTIVE
    check in inode_switch_wbs() as Jan an Al suggested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tahsin Erdogan <tahsin@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com
Fixes: 5ff8eaac1636 ("writeback: keep superblock pinned during cgroup writeback association switches")
Cc: stable@vger.kernel.org #v4.5
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 14:42:50 -07:00
Linus Torvalds
7d1fc01afc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  floppy: make local variable non-static
  exynos: fixes an incorrect header guard
  dt-bindings: fixes some incorrect header guards
  cpufreq-dt: correct dead link in documentation
  cpufreq: ARM big LITTLE: correct dead link in documentation
  treewide: Fix typos in printk
  Documentation: filesystem: Fix typo in fs/eventfd.c
  fs/super.c: use && instead of & for warn_on condition
  Documentation: fix sysfs-ptp
  lib: scatterlist: fix Kconfig description
2016-01-14 17:04:19 -08:00
Dmitry Monakhov
a1c6f05733 fs: use block_device name vsprintf helper
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 13:03:18 -05:00
Vincent Stehlé
22224a1758 fs/super.c: use && instead of & for warn_on condition
This fixes the following sparse warning:

  fs/super.c:1202:9: warning: dubious: x & !y

Bitwise and logical and are equivalent here, but logical was intended.
The generated code is identical, with and without CONFIG_LOCKDEP.

Signed-off-by: Vincent Stehlé <vincent.stehle@freescale.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-12-08 14:50:57 +01:00
Al Viro
061f98e959 Merge branch 'superblock-scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-next
Conflicts:
	include/linux/fs.h
2015-08-21 02:31:20 -04:00
Dave Chinner
e97fedb9ef sync: serialise per-superblock sync operations
When competing sync(2) calls walk the same filesystem, they need to
walk the list of inodes on the superblock to find all the inodes
that we need to wait for IO completion on. However, when multiple
wait_sb_inodes() calls do this at the same time, they contend on the
the inode_sb_list_lock and the contention causes system wide
slowdowns. In effect, concurrent sync(2) calls can take longer and
burn more CPU than if they were serialised.

Stop the worst of the contention by adding a per-sb mutex to wrap
around wait_sb_inodes() so that we only execute one sync(2) IO
completion walk per superblock superblock at a time and hence avoid
contention being triggered by concurrent sync(2) calls.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-17 18:39:47 -04:00
Dave Chinner
74278da9f7 inode: convert inode_sb_list_lock to per-sb
The process of reducing contention on per-superblock inode lists
starts with moving the locking to match the per-superblock inode
list. This takes the global lock out of the picture and reduces the
contention problems to within a single filesystem. This doesn't get
rid of contention as the locks still have global CPU scope, but it
does isolate operations on different superblocks form each other.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-17 18:39:46 -04:00
Oleg Nesterov
8129ed2964 change sb_writers to use percpu_rw_semaphore
We can remove everything from struct sb_writers except frozen
and add the array of percpu_rw_semaphore's instead.

This patch doesn't remove sb_writers->wait_unfrozen yet, we keep
it for get_super_thawed(). We will probably remove it later.

This change tries to address the following problems:

	- Firstly, __sb_start_write() looks simply buggy. It does
	  __sb_end_write() if it sees ->frozen, but if it migrates
	  to another CPU before percpu_counter_dec(), sb_wait_write()
	  can wrongly succeed if there is another task which holds
	  the same "semaphore": sb_wait_write() can miss the result
	  of the previous percpu_counter_inc() but see the result
	  of this percpu_counter_dec().

	- As Dave Hansen reports, it is suboptimal. The trivial
	  microbenchmark that writes to a tmpfs file in a loop runs
	  12% faster if we change this code to rely on RCU and kill
	  the memory barriers.

	- This code doesn't look simple. It would be better to rely
	  on the generic locking code.

	  According to Dave, this change adds the same performance
	  improvement.

Note: with this change both freeze_super() and thaw_super() will do
synchronize_sched_expedited() 3 times. This is just ugly. But:

	- This will be "fixed" by the rcu_sync changes we are going
	  to merge. After that freeze_super()->percpu_down_write()
	  will use synchronize_sched(), and thaw_super() won't use
	  synchronize() at all.

	  This doesn't need any changes in fs/super.c.

	- Once we merge rcu_sync changes, we can also change super.c
	  so that all wb_write->rw_sem's will share the single ->rss
	  in struct sb_writes, then freeze_super() will need only one
	  synchronize_sched().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jan Kara <jack@suse.com>
2015-08-15 13:52:13 +02:00
Oleg Nesterov
853b39a7c8 shift percpu_counter_destroy() into destroy_super_work()
Of course, this patch is ugly as hell. It will be (partially)
reverted later. We add it to ensure that other WIP changes in
percpu_rw_semaphore won't break fs/super.c.

We do not even need this change right now, percpu_free_rwsem()
is fine in atomic context. But we are going to change this, it
will be might_sleep() after we merge the rcu_sync() patches.

And even after that we do not really need destroy_super_work(),
we will kill it in any case. Instead, destroy_super_rcu() should
just check that rss->cb_state == CB_IDLE and do call_rcu() again
in the (very unlikely) case this is not true.

So this is just the temporary kludge which helps us to avoid the
conflicts with the changes which will be (hopefully) routed via
rcu tree.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jan Kara <jack@suse.com>
2015-08-15 13:52:11 +02:00
Oleg Nesterov
0e28e01f1e document rwsem_release() in sb_wait_write()
Not only we need to avoid the warning from lockdep_sys_exit(), the
caller of freeze_super() can never release this lock. Another thread
can do this, so there is another reason for rwsem_release().

Plus the comment should explain why we have to fool lockdep.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jan Kara <jack@suse.com>
2015-08-15 13:52:09 +02:00
Oleg Nesterov
f4b554af99 fix the broken lockdep logic in __sb_start_write()
1. wait_event(frozen < level) without rwsem_acquire_read() is just
   wrong from lockdep perspective. If we are going to deadlock
   because the caller is buggy, lockdep can't detect this problem.

2. __sb_start_write() can race with thaw_super() + freeze_super(),
   and after "goto retry" the 2nd  acquire_freeze_lock() is wrong.

3. The "tell lockdep we are doing trylock" hack doesn't look nice.

   I think this is correct, but this logic should be more explicit.
   Yes, the recursive read_lock() is fine if we hold the lock on a
   higher level. But we do not need to fool lockdep. If we can not
   deadlock in this case then try-lock must not fail and we can use
   use wait == F throughout this code.

Note: as Dave Chinner explains, the "trylock" hack and the fat comment
can be probably removed. But this needs a separate change and it will
be trivial: just kill __sb_start_write() and rename do_sb_start_write()
back to __sb_start_write().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jan Kara <jack@suse.com>
2015-08-15 13:52:09 +02:00