This reverts commit b73e822d12ecbea7cad3742c46fd1be17aa141c8.
This is reverted to integrate new file encryption framework support changes
to ensure all fixes are present to use new encryption policies.
Change-Id: I455ec66664064069ac34e6fe410bd28dc3a53d07
Signed-off-by: Neeraj Soni <neersoni@codeaurora.org>
c57952b UPSTREAM: ubifs: wire up FS_IOC_GET_ENCRYPTION_NONCE
379237b UPSTREAM: f2fs: wire up FS_IOC_GET_ENCRYPTION_NONCE
10e5acf UPSTREAM: ext4: wire up FS_IOC_GET_ENCRYPTION_NONCE
63bf273 ANDROID: scsi: ufs: add ->map_sg_crypto() variant op
10d4512 FROMLIST: f2fs: Handle casefolding with Encryption
4efb7e2 ANDROID: fscrypt: fall back to filesystem-layer crypto when needed
a14fa7b ANDROID: block: require drivers to declare supported crypto key type(s)
5578bea ANDROID: block: make blk_crypto_start_using_mode() properly check for support
e9c80bd UPSTREAM: fscrypt: add FS_IOC_GET_ENCRYPTION_NONCE ioctl
9e469e7 UPSTREAM: fscrypt: don't evict dirty inodes after removing key
53f2446 fscrypt: don't evict dirty inodes after removing key
207be96 FROMLIST: fscrypt: Have filesystems handle their d_ops
06ab740 ANDROID: dm: Add wrapped key support in dm-default-key
23e670a ANDROID: dm: add support for passing through derive_raw_secret
166fda7 ANDROID: block: Prevent crypto fallback for wrapped keys
fe6e855 fscrypt: improve format of no-key names
216d8ca fscrypt: clarify what is meant by a per-file key
7e25032 fscrypt: derive dirhash key for casefolded directories
e16d849 fscrypt: don't allow v1 policies with casefolding
0bc68c1 fscrypt: add "fscrypt_" prefix to fname_encrypt()
85b9c3e fscrypt: don't print name of busy file when removing key
9c5c8c5 fscrypt: document gfp_flags for bounce page allocation
bee5bd5 fscrypt: optimize fscrypt_zeroout_range()
1c88eea fscrypt: remove redundant bi_status check
04f5184 fscrypt: Allow modular crypto algorithms
737ae90 fscrypt: include <linux/ioctl.h> in UAPI header
8842133 fscrypt: don't check for ENOKEY from fscrypt_get_encryption_info()
b21b79d fscrypt: remove fscrypt_is_direct_key_policy()
19b132b fscrypt: move fscrypt_valid_enc_modes() to policy.c
add6ac4 fscrypt: check for appropriate use of DIRECT_KEY flag earlier
2454b5b fscrypt: split up fscrypt_supported_policy() by policy version
bfa4ca6 fscrypt: introduce fscrypt_needs_contents_encryption()
3871977 fscrypt: move fscrypt_d_revalidate() to fname.c
39a0acc fscrypt: constify inode parameter to filename encryption functions
3942229 fscrypt: constify struct fscrypt_hkdf parameter to fscrypt_hkdf_expand()
a7b6398 fscrypt: verify that the crypto_skcipher has the correct ivsize
9c1b3af fscrypt: use crypto_skcipher_driver_name()
3529026 fscrypt: support passing a keyring key to FS_IOC_ADD_ENCRYPTION_KEY
Change-Id: Ib1abe832e16d5f40bfcc9e34bdccbb063b37dbbc
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl6F9/8ACgkQONu9yGCS
aT5DJQ//aAbpT3q2hDbthg9szl/SsAlJr6UX90k7ZjxlI/wnXTepNIWTZqvSnvV4
sb8HeOz2KUuTUh/PvM2vS37kdtzKqefku77tGl3JOE+pIIlKQ1au82U7vuSmo/FH
Ua+/MEN55f8baiZvYTGGQtwc36Bnj64iO8RUT5iSK2GU7KVVyIgFwKKqRQIzJ+Ds
dPACfMErty/+gvC9t0nx5u4BkC9ilIj5DH0OXiQvxZr9PQfg3lg7FFF/a6M0gaRF
qhBZFX2xKzQRKVKnbob5kSpir6gsW/cu8S43YIcNzx72Ce4ROFi910J7P1Jzlb5j
KEQGL7IuP+k8fwCpMZ7B9Goh9ian9VSUXKjrlr+UGotOGLzQ+dk4c/NJvCjxQvqx
m8FtHNjo3WUl72Ul1p6zJc4JMC3LD3ZSkIQGhVny4Z52n4D4CnWI7+b5ppQe9RZD
Iu8XjS0pTGfUUiomtci9ZcpWcTiWvW/VY0sRQbKj94h1nETWblXzXef5vJygZbMm
hL950oGkWeh2MoBM3FYyBSP0YYkruTtUSQ1GRs7tsboUsiMM9cNSkwzsFU9xeEvh
ZPIN5IdAIRilauOiI3YLEfO7JPz4OG0AlzodgnjbFchLqSIVzme8Wr84tFOYBhp1
868Am3/E3p8qqmnMvtS8/TTETeehhbrPVUp1D+7zHnkv/mRC1CU=
=uswL
-----END PGP SIGNATURE-----
Merge 4.14.175 into android-4.14
Changes in 4.14.175
spi: qup: call spi_qup_pm_resume_runtime before suspending
powerpc: Include .BTF section
ARM: dts: dra7: Add "dma-ranges" property to PCIe RC DT nodes
spi: pxa2xx: Add CS control clock quirk
spi/zynqmp: remove entry that causes a cs glitch
drm/exynos: dsi: propagate error value and silence meaningless warning
drm/exynos: dsi: fix workaround for the legacy clock name
drivers/perf: arm_pmu_acpi: Fix incorrect checking of gicc pointer
altera-stapl: altera_get_note: prevent write beyond end of 'key'
dm bio record: save/restore bi_end_io and bi_integrity
xenbus: req->body should be updated before req->state
xenbus: req->err should be updated before req->state
block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
parse-maintainers: Mark as executable
USB: Disable LPM on WD19's Realtek Hub
usb: quirks: add NO_LPM quirk for RTL8153 based ethernet adapters
USB: serial: option: add ME910G1 ECM composition 0x110b
usb: host: xhci-plat: add a shutdown
USB: serial: pl2303: add device-id for HP LD381
usb: xhci: apply XHCI_SUSPEND_DELAY to AMD XHCI controller 1022:145c
ALSA: line6: Fix endless MIDI read loop
ALSA: seq: virmidi: Fix running status after receiving sysex
ALSA: seq: oss: Fix running status after receiving sysex
ALSA: pcm: oss: Avoid plugin buffer overflow
ALSA: pcm: oss: Remove WARNING from snd_pcm_plug_alloc() checks
iio: trigger: stm32-timer: disable master mode when stopping
iio: magnetometer: ak8974: Fix negative raw values in sysfs
mmc: sdhci-of-at91: fix cd-gpios for SAMA5D2
staging: rtl8188eu: Add device id for MERCUSYS MW150US v2
staging/speakup: fix get_word non-space look-ahead
intel_th: Fix user-visible error codes
intel_th: pci: Add Elkhart Lake CPU support
rtc: max8907: add missing select REGMAP_IRQ
xhci: Do not open code __print_symbolic() in xhci trace events
memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
mm: slub: be more careful about the double cmpxchg of freelist
mm, slub: prevent kmalloc_node crashes and memory leaks
page-flags: fix a crash at SetPageError(THP_SWAP)
x86/mm: split vmalloc_sync_all()
USB: cdc-acm: fix close_delay and closing_wait units in TIOCSSERIAL
USB: cdc-acm: fix rounding error in TIOCSSERIAL
iio: adc: at91-sama5d2_adc: fix channel configuration for differential channels
iio: adc: at91-sama5d2_adc: fix differential channels in triggered mode
kbuild: Disable -Wpointer-to-enum-cast
futex: Fix inode life-time issue
futex: Unbreak futex hashing
Revert "vrf: mark skb for multicast or link-local as enslaved to VRF"
Revert "ipv6: Fix handling of LLA with VRF and sockets bound to VRF"
ALSA: hda/realtek: Fix pop noise on ALC225
arm64: smp: fix smp_send_stop() behaviour
arm64: smp: fix crash_smp_send_stop() behaviour
drm/bridge: dw-hdmi: fix AVI frame colorimetry
staging: greybus: loopback_test: fix potential path truncation
staging: greybus: loopback_test: fix potential path truncations
Revert "drm/dp_mst: Skip validating ports during destruction, just ref"
hsr: fix general protection fault in hsr_addr_is_self()
macsec: restrict to ethernet devices
net: dsa: Fix duplicate frames flooded by learning
net: mvneta: Fix the case where the last poll did not process all rx
net/packet: tpacket_rcv: avoid a producer race condition
net: qmi_wwan: add support for ASKEY WWHC050
net_sched: cls_route: remove the right filter from hashtable
net_sched: keep alloc_hash updated after hash allocation
net: stmmac: dwmac-rk: fix error path in rk_gmac_probe
NFC: fdp: Fix a signedness bug in fdp_nci_send_patch()
slcan: not call free_netdev before rtnl_unlock in slcan_open
bnxt_en: fix memory leaks in bnxt_dcbnl_ieee_getets()
net: dsa: mt7530: Change the LINK bit to reflect the link status
vxlan: check return value of gro_cells_init()
hsr: use rcu_read_lock() in hsr_get_node_{list/status}()
hsr: add restart routine into hsr_get_node_list()
hsr: set .netnsok flag
net: ipv4: don't let PMTU updates increase route MTU
cgroup-v1: cgroup_pidlist_next should update position index
cpupower: avoid multiple definition with gcc -fno-common
drivers/of/of_mdio.c:fix of_mdiobus_register()
cgroup1: don't call release_agent when it is ""
dt-bindings: net: FMan erratum A050385
arm64: dts: ls1043a: FMan erratum A050385
fsl/fman: detect FMan erratum A050385
scsi: ipr: Fix softlockup when rescanning devices in petitboot
mac80211: Do not send mesh HWMP PREQ if HWMP is disabled
dpaa_eth: Remove unnecessary boolean expression in dpaa_get_headroom
sxgbe: Fix off by one in samsung driver strncpy size arg
arm64: ptrace: map SPSR_ELx<->PSR for compat tasks
arm64: compat: map SPSR_ELx<->PSR for signals
ftrace/x86: Anotate text_mutex split between ftrace_arch_code_modify_post_process() and ftrace_arch_code_modify_prepare()
i2c: hix5hd2: add missed clk_disable_unprepare in remove
Input: synaptics - enable RMI on HP Envy 13-ad105ng
Input: avoid BIT() macro usage in the serio.h UAPI header
ARM: dts: dra7: Add bus_dma_limit for L3 bus
ARM: dts: omap5: Add bus_dma_limit for L3 bus
perf probe: Do not depend on dwfl_module_addrsym()
tools: Let O= makes handle a relative path with -C option
scripts/dtc: Remove redundant YYLOC global declaration
scsi: sd: Fix optimal I/O size for devices that change reported values
mac80211: mark station unauthorized before key removal
gpiolib: acpi: Correct comment for HP x2 10 honor_wakeup quirk
gpiolib: acpi: Rework honor_wakeup option into an ignore_wake option
gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 BYT + AXP288 model
RDMA/core: Ensure security pkey modify is not lost
genirq: Fix reference leaks on irq affinity notifiers
xfrm: handle NETDEV_UNREGISTER for xfrm device
vti[6]: fix packet tx through bpf_redirect() in XinY cases
RDMA/mlx5: Block delay drop to unprivileged users
xfrm: fix uctx len check in verify_sec_ctx_len
xfrm: add the missing verify_sec_ctx_len check in xfrm_add_acquire
xfrm: policy: Fix doulbe free in xfrm_policy_timer
netfilter: nft_fwd_netdev: validate family and chain type
vti6: Fix memory leak of skb if input policy check fails
Input: raydium_i2c_ts - use true and false for boolean values
Input: raydium_i2c_ts - fix error codes in raydium_i2c_boot_trigger()
afs: Fix some tracing details
USB: serial: option: add support for ASKEY WWHC050
USB: serial: option: add BroadMobi BM806U
USB: serial: option: add Wistron Neweb D19Q1
USB: cdc-acm: restore capability check order
USB: serial: io_edgeport: fix slab-out-of-bounds read in edge_interrupt_callback
usb: musb: fix crash with highmen PIO and usbmon
media: flexcop-usb: fix endpoint sanity check
media: usbtv: fix control-message timeouts
staging: rtl8188eu: Add ASUS USB-N10 Nano B1 to device table
staging: wlan-ng: fix ODEBUG bug in prism2sta_disconnect_usb
staging: wlan-ng: fix use-after-free Read in hfa384x_usbin_callback
libfs: fix infoleak in simple_attr_read()
media: ov519: add missing endpoint sanity checks
media: dib0700: fix rc endpoint lookup
media: stv06xx: add missing descriptor sanity checks
media: xirlink_cit: add missing descriptor sanity checks
mac80211: Check port authorization in the ieee80211_tx_dequeue() case
mac80211: fix authentication with iwlwifi/mvm
vt: selection, introduce vc_is_sel
vt: ioctl, switch VT_IS_IN_USE and VT_BUSY to inlines
vt: switch vt_dont_switch to bool
vt: vt_ioctl: remove unnecessary console allocation checks
vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use virtual console
vt: vt_ioctl: fix use-after-free in vt_in_use()
platform/x86: pmc_atom: Add Lex 2I385SW to critclk_systems DMI table
bpf: Explicitly memset the bpf_attr structure
bpf: Explicitly memset some bpf info structures declared on the stack
gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 CHT + AXP288 model
net: ks8851-ml: Fix IO operations, again
arm64: alternative: fix build with clang integrated assembler
perf map: Fix off by one in strncpy() size argument
ARM: dts: oxnas: Fix clear-mask property
ARM: bcm2835-rpi-zero-w: Add missing pinctrl name
arm64: dts: ls1043a-rdb: correct RGMII delay mode to rgmii-id
arm64: dts: ls1046ardb: set RGMII interfaces to RGMII_ID mode
Linux 4.14.175
Change-Id: If2c2cb5b3745ed6fbc5cb77737cfb1758fea4cb9
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 8019ad13ef7f64be44d4f892af9c840179009254 upstream.
As reported by Jann, ihold() does not in fact guarantee inode
persistence. And instead of making it so, replace the usage of inode
pointers with a per boot, machine wide, unique inode identifier.
This sequence number is global, but shared (file backed) futexes are
rare enough that this should not become a performance issue.
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* aosp/upstream-f2fs-stable-linux-4.14.y:
fs-verity: use u64_to_user_ptr()
fs-verity: use mempool for hash requests
fs-verity: implement readahead of Merkle tree pages
ext4: readpages() should submit IO as read-ahead
fs-verity: implement readahead for FS_IOC_ENABLE_VERITY
fscrypt: improve format of no-key names
ubifs: allow both hash and disk name to be provided in no-key names
ubifs: don't trigger assertion on invalid no-key filename
fscrypt: clarify what is meant by a per-file key
fscrypt: derive dirhash key for casefolded directories
fscrypt: don't allow v1 policies with casefolding
fscrypt: add "fscrypt_" prefix to fname_encrypt()
fscrypt: don't print name of busy file when removing key
fscrypt: document gfp_flags for bounce page allocation
fscrypt: optimize fscrypt_zeroout_range()
fscrypt: remove redundant bi_status check
fscrypt: Allow modular crypto algorithms
fscrypt: include <linux/ioctl.h> in UAPI header
fscrypt: don't check for ENOKEY from fscrypt_get_encryption_info()
fscrypt: remove fscrypt_is_direct_key_policy()
fscrypt: move fscrypt_valid_enc_modes() to policy.c
fscrypt: check for appropriate use of DIRECT_KEY flag earlier
fscrypt: split up fscrypt_supported_policy() by policy version
fscrypt: introduce fscrypt_needs_contents_encryption()
fscrypt: move fscrypt_d_revalidate() to fname.c
fscrypt: constify inode parameter to filename encryption functions
fscrypt: constify struct fscrypt_hkdf parameter to fscrypt_hkdf_expand()
fscrypt: verify that the crypto_skcipher has the correct ivsize
fscrypt: use crypto_skcipher_driver_name()
fscrypt: support passing a keyring key to FS_IOC_ADD_ENCRYPTION_KEY
keys: Export lookup_user_key to external users
f2fs: fix build error on PAGE_KERNEL_RO
Conflicts:
fs/crypto/Kconfig
fs/crypto/bio.c
fs/crypto/fname.c
fs/crypto/fscrypt_private.h
fs/crypto/keyring.c
fs/crypto/keysetup.c
fs/ubifs/dir.c
include/uapi/linux/fscrypt.h
Resolved the conflicts as per the corresponding android-mainline change,
Ib1e6b9eda8fb5dcfc6bdc8fa89d93f72b088c5f6.
Bug: 148667616
Change-Id: I5f8b846f0cd4d5403d8c61b9e12acb4581fac6f7
Signed-off-by: Eric Biggers <ebiggers@google.com>
Casefolded encrypted directories will use a new dirhash method that
requires a secret key. If the directory uses a v2 encryption policy,
it's easy to derive this key from the master key using HKDF. However,
v1 encryption policies don't provide a way to derive additional keys.
Therefore, don't allow casefolding on directories that use a v1 policy.
Specifically, make it so that trying to enable casefolding on a
directory that has a v1 policy fails, trying to set a v1 policy on a
casefolded directory fails, and trying to open a casefolded directory
that has a v1 policy (if one somehow exists on-disk) fails.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
[EB: improved commit message, updated fscrypt.rst, and other cleanups]
Link: https://lore.kernel.org/r/20200120223201.241390-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4a/2MACgkQONu9yGCS
aT4BwA//diCficMfLINrc/9bMq3VS2Y+/lnuURMXEM9MJibjQCUS1spc6YhhNFrE
8m3aavAYywjjD3zGHj8KEaKQFDrPQxYQDzPOPK9rxjpxlUFpnYWUGlI2krpwBV6c
8xAekM62sMEIq09EHqqhKVls+WmYi47/pdfGAAt3PUR8c2eTOlxiFsiwq4nuZDdv
rcMkQm87V8Wn1Nq+Dfp6R3U+X9f4DcU5n5cKiGq6ujoalT7h5/jj36JIFxBwMapF
WjpqXMUUeylXxXnNFMUbEMg+lEqJlWfvj1sxdxyMdgS+L9rc9bXk/NTub4TZPaXu
odwMl9RKWjJvFsvn26Pc4s31K2raEhCDYdkVoFTXWsc7vbE4A/h/yAw4Wq+cuBI4
H4fBXYYZ3D0Il9kxYYbfSaki5z1YbI54tkWcrs8f8jli5C0M3Wkkux1TA4HPj2Ja
8zJFH0++cyfpuKRiYXro+H2Tq4KxBwsWEtync8230MEywlTxkz4IIue+SCgVV+WD
jmg/enRjbnkpYBSH1pKOdAAga0kHSxtwWlfLFrjhcgGse8y6sCJhUOPPcQMnf/k0
Jrmc3InHg+mtLiSsJXAp4iGABJlW+W/ouaxaxYoA9wucwQlcgxXpkigl5rOgFTma
153RYc1TSZJAe+cjx42qZxRxcD8/Vg5d6D2tL1otbMSIsD3e7Gk=
=sq63
-----END PGP SIGNATURE-----
Merge 4.14.164 into android-4.14
Changes in 4.14.164
USB: dummy-hcd: use usb_urb_dir_in instead of usb_pipein
USB: dummy-hcd: increase max number of devices to 32
locking/spinlock/debug: Fix various data races
netfilter: ctnetlink: netns exit must wait for callbacks
mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame()
libtraceevent: Fix lib installation with O=
x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage
efi/gop: Return EFI_NOT_FOUND if there are no usable GOPs
efi/gop: Return EFI_SUCCESS if a usable GOP was found
efi/gop: Fix memory leak in __gop_query32/64()
ARM: vexpress: Set-up shared OPP table instead of individual for each CPU
netfilter: uapi: Avoid undefined left-shift in xt_sctp.h
netfilter: nf_tables: validate NFT_SET_ELEM_INTERVAL_END
ARM: dts: Cygnus: Fix MDIO node address/size cells
spi: spi-cavium-thunderx: Add missing pci_release_regions()
ASoC: topology: Check return value for soc_tplg_pcm_create()
ARM: dts: bcm283x: Fix critical trip point
bpf, mips: Limit to 33 tail calls
ARM: dts: am437x-gp/epos-evm: fix panel compatible
samples: bpf: Replace symbol compare of trace_event
samples: bpf: fix syscall_tp due to unused syscall
powerpc: Ensure that swiotlb buffer is allocated from low memory
bnx2x: Do not handle requests from VFs after parity
bnx2x: Fix logic to get total no. of PFs per engine
net: usb: lan78xx: Fix error message format specifier
rfkill: Fix incorrect check to avoid NULL pointer dereference
ASoC: wm8962: fix lambda value
regulator: rn5t618: fix module aliases
kconfig: don't crash on NULL expressions in expr_eq()
perf/x86/intel: Fix PT PMI handling
fs: avoid softlockups in s_inodes iterators
net: stmmac: Do not accept invalid MTU values
net: stmmac: RX buffer size must be 16 byte aligned
s390/dasd/cio: Interpret ccw_device_get_mdc return value correctly
s390/dasd: fix memleak in path handling error case
block: fix memleak when __blk_rq_map_user_iov() is failed
parisc: Fix compiler warnings in debug_core.c
llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
hv_netvsc: Fix unwanted rx_table reset
bpf: reject passing modified ctx to helper functions
bpf: Fix passing modified ctx to ld/abs/ind instruction
PCI/switchtec: Read all 64 bits of part_event_bitmap
mmc: block: Convert RPMB to a character device
mmc: block: Delete mmc_access_rpmb()
mmc: block: Fix bug when removing RPMB chardev
mmc: core: Prevent bus reference leak in mmc_blk_init()
mmc: block: propagate correct returned value in mmc_rpmb_ioctl
gtp: fix bad unlock balance in gtp_encap_enable_socket
macvlan: do not assume mac_header is set in macvlan_broadcast()
net: dsa: mv88e6xxx: Preserve priority when setting CPU port.
net: stmmac: dwmac-sun8i: Allow all RGMII modes
net: stmmac: dwmac-sunxi: Allow all RGMII modes
net: usb: lan78xx: fix possible skb leak
pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM
USB: core: fix check for duplicate endpoints
USB: serial: option: add Telit ME910G1 0x110a composition
sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY
tcp: fix "old stuff" D-SACK causing SACK to be treated as D-SACK
vxlan: fix tos value before xmit
vlan: vlan_changelink() should propagate errors
net: sch_prio: When ungrafting, replace with FIFO
vlan: fix memory leak in vlan_dev_set_egress_priority
Linux 4.14.164
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifbce6635b5a3df896c29e23dd15098e80ecddeba
[ Upstream commit 04646aebd30b99f2cfa0182435a2ec252fcb16d0 ]
Anything that walks all inodes on sb->s_inodes list without rescheduling
risks softlockups.
Previous efforts were made in 2 functions, see:
c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
ac05fbb inode: don't softlockup when evicting inodes
but there hasn't been an audit of all walkers, so do that now. This
also consistently moves the cond_resched() calls to the bottom of each
loop in cases where it already exists.
One loop remains: remove_dquot_ref(), because I'm not quite sure how
to deal with that one w/o taking the i_lock.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
* origin/upstream-f2fs-stable-linux-4.14.y:
f2fs: use EINVAL for superblock with invalid magic
f2fs: fix to read source block before invalidating it
f2fs: remove redundant check from f2fs_setflags_common()
f2fs: use generic checking function for FS_IOC_FSSETXATTR
f2fs: use generic checking and prep function for FS_IOC_SETFLAGS
ubifs, fscrypt: cache decrypted symlink target in ->i_link
vfs: use READ_ONCE() to access ->i_link
fs, fscrypt: clear DCACHE_ENCRYPTED_NAME when unaliasing directory
fscrypt: cache decrypted symlink target in ->i_link
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
fscrypt: only set dentry_operations on ciphertext dentries
fscrypt: fix race allowing rename() and link() of ciphertext dentries
fscrypt: clean up and improve dentry revalidation
fscrypt: use READ_ONCE() to access ->i_crypt_info
fscrypt: remove WARN_ON_ONCE() when decryption fails
fscrypt: drop inode argument from fscrypt_get_ctx()
f2fs: improve print log in f2fs_sanity_check_ckpt()
f2fs: avoid out-of-range memory access
f2fs: fix to avoid long latency during umount
f2fs: allow all the users to pin a file
f2fs: support swap file w/ DIO
f2fs: allocate blocks for pinned file
f2fs: fix is_idle() check for discard type
f2fs: add a rw_sem to cover quota flag changes
f2fs: set SBI_NEED_FSCK for xattr corruption case
f2fs: use generic EFSBADCRC/EFSCORRUPTED
f2fs: Use DIV_ROUND_UP() instead of open-coding
f2fs: print kernel message if filesystem is inconsistent
f2fs: introduce f2fs_<level> macros to wrap f2fs_printk()
f2fs: avoid get_valid_blocks() for cleanup
f2fs: ioctl for removing a range from F2FS
f2fs: only set project inherit bit for directory
f2fs: separate f2fs i_flags from fs_flags and ext4 i_flags
f2fs: Add option to limit required GC for checkpoint=disable
f2fs: Fix accounting for unusable blocks
f2fs: Fix root reserved on remount
f2fs: Lower threshold for disable_cp_again
f2fs: fix sparse warning
f2fs: fix f2fs_show_options to show nodiscard mount option
f2fs: add error prints for debugging mount failure
f2fs: fix to do sanity check on segment bitmap of LFS curseg
f2fs: add missing sysfs entries in documentation
f2fs: fix to avoid deadloop if data_flush is on
f2fs: always assume that the device is idle under gc_urgent
f2fs: add bio cache for IPU
f2fs: allow ssr block allocation during checkpoint=disable period
f2fs: fix to check layout on last valid checkpoint park
Change-Id: I765f6ed215533097c63d1207a7d60ce7fc4a7269
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Make the f2fs implementation of FS_IOC_FSSETXATTR use the new VFS helper
function vfs_ioc_fssetxattr_check(), and remove the project quota check
since it's now done by the helper function.
This is based on a patch from Darrick Wong, but reworked to apply after
commit 360985573b55 ("f2fs: separate f2fs i_flags from fs_flags and ext4
i_flags").
Originally-from: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Make the f2fs implementation of FS_IOC_SETFLAGS use the new VFS helper
function vfs_ioc_setflags_prepare().
This is based on a patch from Darrick Wong, but reworked to apply after
commit 360985573b55 ("f2fs: separate f2fs i_flags from fs_flags and ext4
i_flags").
Originally-from: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0Nx7MACgkQONu9yGCS
aT6rcxAA0os++XT8OzzsnJamD3O+oqDxCm3Pd3mNj8uck2XCHtzSEvq0z8a3/Xo2
RXWBNDbYRHDGCNwK+2OxSbmyBO6ZqH7lp+KkipGTwWmc6spmm0gnDMSQ1312g6v5
uVL8Tyirc9xEvkkHHNbxxuQ5e8RoxU0P+dBktDg5ipUCNLdO397/ldrg2cpxSsT4
SeUL6kQmXyH9X/btZVq+xUdIqgn5JDuzrDbGSmFa7SV/D+IVIvxiR5tt7ZLuWVVl
n9AqjTERAg0bLcOq7j5/eyephR0ooumd1J8z6PgMnmj66UTYtgkdDBVBYEFRUlEB
9tmmX5KWlpdbVfhvTRrbKo0kfRrVCy1h3CDf0hiJpYJesf6f+CBOorUORdvXKHEp
rQ2o6nshVWGsGv0fD3j4FzURZxbWFDOvveGApRj2p5626gLnRwz9kBvsy3FK/Gb9
d9b2fZRDf1Iz6QYKTybexhPfDxA2Gy3MvZ1Yj7EXIrf4rUrRSf+WSjSE69g8i9Q0
/1lWVQ9aW1UQ9Ya/r6xS5q2VNzWUDCpYDRDqyiiND0E1MrgsvL56YsmWbzQOI/hV
tm7j5NEqCPHVY0UQqjkAUspQbkLKbkVoj4TiQKgppkp1a2uZjo48AGbIqq0tJ/h8
aHkb/PgRyxLpD/+NFhLbwJGN7kATuMlegduhwJkxfv0kCmyEwrw=
=flvv
-----END PGP SIGNATURE-----
Merge 4.14.129 into android-4.14
Changes in 4.14.129
perf machine: Guard against NULL in machine__exit()
ax25: fix inconsistent lock state in ax25_destroy_timer
be2net: Fix number of Rx queues used for flow hashing
ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero
lapb: fixed leak of control-blocks.
neigh: fix use-after-free read in pneigh_get_next
net: openvswitch: do not free vport if register_netdevice() is failed.
sctp: Free cookie before we memdup a new one
sunhv: Fix device naming inconsistency between sunhv_console and sunhv_reg
Staging: vc04_services: Fix a couple error codes
perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints
netfilter: nf_queue: fix reinject verdict handling
ipvs: Fix use-after-free in ip_vs_in
selftests: netfilter: missing error check when setting up veth interface
clk: ti: clkctrl: Fix clkdm_clk handling
powerpc/powernv: Return for invalid IMC domain
mISDN: make sure device name is NUL terminated
x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor
perf/ring_buffer: Fix exposing a temporarily decreased data_head
perf/ring_buffer: Add ordering to rb->nest increment
perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page data
gpio: fix gpio-adp5588 build errors
net: tulip: de4x5: Drop redundant MODULE_DEVICE_TABLE()
net: aquantia: fix LRO with FCS error
i2c: dev: fix potential memory leak in i2cdev_ioctl_rdwr
ALSA: hda - Force polling mode on CNL for fixing codec communication
configfs: Fix use-after-free when accessing sd->s_dentry
perf data: Fix 'strncat may truncate' build failure with recent gcc
perf record: Fix s390 missing module symbol and warning for non-root users
ia64: fix build errors by exporting paddr_to_nid()
KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu
net: sh_eth: fix mdio access in sh_eth_close() for R-Car Gen2 and RZ/A1 SoCs
net: phy: dp83867: Set up RGMII TX delay
scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route()
scsi: smartpqi: properly set both the DMA mask and the coherent DMA mask
scsi: scsi_dh_alua: Fix possible null-ptr-deref
scsi: libsas: delete sas port if expander discover failed
mlxsw: spectrum: Prevent force of 56G
HID: wacom: Don't set tool type until we're in range
HID: wacom: Don't report anything prior to the tool entering range
HID: wacom: Send BTN_TOUCH in response to INTUOSP2_BT eraser contact
coredump: fix race condition between collapse_huge_page() and core dumping
infiniband: fix race condition between infiniband mlx4, mlx5 driver and core dumping
Abort file_remove_privs() for non-reg. files
Linux 4.14.129
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit f69e749a49353d96af1a293f56b5b56de59c668a upstream.
file_remove_privs() might be called for non-regular files, e.g.
blkdev inode. There is no reason to do its job on things
like blkdev inodes, pipes, or cdevs. Hence, abort if
file does not refer to a regular inode.
AV: more to the point, for devices there might be any number of
inodes refering to given device. Which one to strip the permissions
from, even if that made any sense in the first place? All of them
will be observed with contents modified, after all.
Found by LockDoc (Alexander Lochmann, Horst Schirmeier and Olaf
Spinczyk)
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de>
Signed-off-by: Horst Schirmeier <horst.schirmeier@tu-dortmund.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAltNuVYACgkQONu9yGCS
aT7kTA/+MRHC5oFvdnhSsF6jAHsY9rgJNQXPtZCFhZnHhhYHtubQ2OJOmSZ7IfM0
9yhz/7vijC9+tLufXQxQnu2UUL3ojNu1+l+q9s0U1GUzNiONlJ9q/CyB4xjXFRCS
1RdiDZaQbIqUCYs38UCTsEJF65uKjzQ6dpF21XdIXp5FPxgiZawo4HpjQRJswbAl
Du97ybMEPN3XnAn207GjZwy58ubRLF5HDG1sqNGfjVWJ7oMTi+QJOCvY3PJtU3j2
unS0qjxLU432rOyDfaJK7Yj9s61zu0PurbJrHo+dw3O3hd/Og7soqoqohUEjZWXd
z7jjrntXZOZ/0st2yHmygfAPUJm/8jsh7Pd39Jgyfeu/3Clo51gO494rwATQsyE5
mwIdllyzyMNBEJI2F2fxE60WlFsbTjeBOX3BaOwnF8pGRJWsCAfbFknRbuKh1fO5
czFbUSOi00POw4WHT1rxV9u0yDBXmP47fy9zHquOim+PfK8pFvWuf6GSFjvqRTv8
20w1w7eixMi09ZXOkgTJ3S00MKHSpxoaenI3n2NcEVVRgDEVfh3C/zelvvfCDMHD
i36DN39Sj41PNA/R4n0TIA4W+ab9qBVzQl16yaj9JURR2rA92GyMVC1+Xjqo1Py3
GRFOf2Gprlm0/vfkiRsMu9coAJuKV6+8fHXQU4mzHulKUaDWuJ0=
=/wBU
-----END PGP SIGNATURE-----
Merge 4.14.56 into android-4.14
Changes in 4.14.56
media: rc: mce_kbd decoder: fix stuck keys
ASoC: mediatek: preallocate pages use platform device
MIPS: Call dump_stack() from show_regs()
MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()
MIPS: Fix ioremap() RAM check
mmc: sdhci-esdhc-imx: allow 1.8V modes without 100/200MHz pinctrl states
mmc: dw_mmc: fix card threshold control configuration
ibmasm: don't write out of bounds in read handler
staging: rtl8723bs: Prevent an underflow in rtw_check_beacon_data().
staging: r8822be: Fix RTL8822be can't find any wireless AP
ata: Fix ZBC_OUT command block check
ata: Fix ZBC_OUT all bit handling
vmw_balloon: fix inflation with batching
ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS
USB: serial: ch341: fix type promotion bug in ch341_control_in()
USB: serial: cp210x: add another USB ID for Qivicon ZigBee stick
USB: serial: keyspan_pda: fix modem-status error handling
USB: yurex: fix out-of-bounds uaccess in read handler
USB: serial: mos7840: fix status-register error handling
usb: quirks: add delay quirks for Corsair Strafe
xhci: xhci-mem: off by one in xhci_stream_id_to_ring()
devpts: hoist out check for DEVPTS_SUPER_MAGIC
devpts: resolve devpts bind-mounts
Fix up non-directory creation in SGID directories
genirq/affinity: assign vectors to all possible CPUs
scsi: megaraid_sas: use adapter_type for all gen controllers
scsi: megaraid_sas: replace instance->ctrl_context checks with instance->adapter_type
scsi: megaraid_sas: replace is_ventura with adapter_type checks
scsi: megaraid_sas: Create separate functions to allocate ctrl memory
scsi: megaraid_sas: fix selection of reply queue
ALSA: hda/realtek - two more lenovo models need fixup of MIC_LOCATION
ALSA: hda - Handle pm failure during hotplug
mm: do not drop unused pages when userfaultd is running
fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
fs, elf: make sure to page align bss in load_elf_library
mm: do not bug_on on incorrect length in __mm_populate()
tracing: Reorder display of TGID to be after PID
kbuild: delete INSTALL_FW_PATH from kbuild documentation
arm64: neon: Fix function may_use_simd() return error status
tools build: fix # escaping in .cmd files for future Make
IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values
i2c: tegra: Fix NACK error handling
iw_cxgb4: correctly enforce the max reg_mr depth
xen: setup pv irq ops vector earlier
nvme-pci: Remap CMB SQ entries on every controller reset
crypto: x86/salsa20 - remove x86 salsa20 implementations
uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn()
netfilter: nf_queue: augment nfqa_cfg_policy
netfilter: x_tables: initialise match/target check parameter struct
loop: add recursion validation to LOOP_CHANGE_FD
PM / hibernate: Fix oops at snapshot_write()
RDMA/ucm: Mark UCM interface as BROKEN
loop: remember whether sysfs_create_group() was done
f2fs: give message and set need_fsck given broken node id
Linux 4.14.56
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 0fa3ecd87848c9c93c2c828ef4c3a8ca36ce46c7 upstream.
sgid directories have special semantics, making newly created files in
the directory belong to the group of the directory, and newly created
subdirectories will also become sgid. This is historically used for
group-shared directories.
But group directories writable by non-group members should not imply
that such non-group members can magically join the group, so make sure
to clear the sgid bit on non-directories for non-members (but remember
that sgid without group execute means "mandatory locking", just to
confuse things even more).
Reported-by: Jann Horn <jannh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAltCEg0ACgkQONu9yGCS
aT6p4hAAnnf0LGGJAg4dtOU/xwaTosd/gtqSi7qsy6h7SzK/GC+2zeQ2iqINs4Gy
EPJRRV7CgbhiRUzLl8hfn0jMHOZd4v8BeVxngLsbkFyHPHfph5e99b70i6SLx0BV
3Evo3KjLvnijWIai2JduaN3F92iwRanLUUoqYKIBs3a5vRrRE9tTpNdz7j7273sq
QvjWoE1d1oytRQZ4I493QaDhHuWi/dFGjHrHMczs1G5uRB3klMFV/MueQmsXLs6V
pi35VX6UjpGY0y6ZZwjExCZZPFdkk9DV9qkCC3CAYPEemymJZSHPtnLIUG345Nso
B0nExOFKSIa4RA1USzg/0OMPI3tpdP5AknSzpRXrNp10SGiXHQqev7chAx9YkBiI
f5ZWE9DmT5bTY8tnx+SLwpvObXXwKkqjaRT7BkhmYmgx8gLRxO766uzKX3ucjV2a
8YPuFcrx61T5zTjHlKEc3p4HkVJIEigF2EOrnRj0z80RAgsTyGS6ZF6T1cXSPJ9h
ZAX0m76bX2lhRo5RHOteYttpZQoHb26E2+I16tc6wX7ueeaOjQ1gzdsUVZxMnGhA
+ewyAP7GrJ+tVoe26g0Jmf5k4r3wtNnfSKnm9Tykwaps2LhtP4/LxmCTzVXafSlb
8HoUB42QclzgwoKRTyMVozTZleyeu7jAk+4Q/AVc1GkGF4AeWYA=
=c3A6
-----END PGP SIGNATURE-----
Merge 4.14.54 into android-4.14
Changes in 4.14.54
usb: cdc_acm: Add quirk for Uniden UBC125 scanner
USB: serial: cp210x: add CESINEL device ids
USB: serial: cp210x: add Silicon Labs IDs for Windows Update
usb: dwc2: fix the incorrect bitmaps for the ports of multi_tt hub
acpi: Add helper for deactivating memory region
usb: typec: ucsi: acpi: Workaround for cache mode issue
usb: typec: ucsi: Fix for incorrect status data issue
xhci: Fix kernel oops in trace_xhci_free_virt_device
n_tty: Fix stall at n_tty_receive_char_special().
n_tty: Access echo_* variables carefully.
staging: android: ion: Return an ERR_PTR in ion_map_kernel
serial: 8250_pci: Remove stalled entries in blacklist
serdev: fix memleak on module unload
vt: prevent leaking uninitialized data to userspace via /dev/vcs*
drm/amdgpu: Add APU support in vi_set_uvd_clocks
drm/amdgpu: Add APU support in vi_set_vce_clocks
drm/amdgpu: fix the missed vcn fw version report
drm/qxl: Call qxl_bo_unref outside atomic context
drm/atmel-hlcdc: check stride values in the first plane
drm/amdgpu: Use kvmalloc_array for allocating VRAM manager nodes array
drm/amdgpu: Refactor amdgpu_vram_mgr_bo_invisible_size helper
drm/i915: Enable provoking vertex fix on Gen9 systems.
netfilter: nf_tables: nft_compat: fix refcount leak on xt module
netfilter: nft_compat: prepare for indirect info storage
netfilter: nft_compat: fix handling of large matchinfo size
netfilter: nf_tables: don't assume chain stats are set when jumplabel is set
netfilter: nf_tables: bogus EBUSY in chain deletions
netfilter: nft_meta: fix wrong value dereference in nft_meta_set_eval
netfilter: nf_tables: disable preemption in nft_update_chain_stats()
netfilter: nf_tables: increase nft_counters_enabled in nft_chain_stats_replace()
netfilter: nf_tables: fix memory leak on error exit return
netfilter: nf_tables: add missing netlink attrs to policies
netfilter: nf_tables: fix NULL-ptr in nf_tables_dump_obj()
md: always hold reconfig_mutex when calling mddev_suspend()
md: don't call bitmap_create() while array is quiesced.
md: move suspend_hi/lo handling into core md code
md: use mddev_suspend/resume instead of ->quiesce()
md: allow metadata update while suspending.
md: remove special meaning of ->quiesce(.., 2)
netfilter: don't set F_IFACE on ipv6 fib lookups
netfilter: ip6t_rpfilter: provide input interface for route lookup
netfilter: nf_tables: use WARN_ON_ONCE instead of BUG_ON in nft_do_chain()
ARM: dts: imx6q: Use correct SDMA script for SPI5 core
mtd: rawnand: fix return value check for bad block status
xfrm6: avoid potential infinite loop in _decode_session6()
afs: Fix directory permissions check
netfilter: ebtables: handle string from userspace with care
s390/dasd: use blk_mq_rq_from_pdu for per request data
netfilter: nft_limit: fix packet ratelimiting
ipvs: fix buffer overflow with sync daemon and service
iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs
atm: zatm: fix memcmp casting
net: qmi_wwan: Add Netgear Aircard 779S
perf test: "Session topology" dumps core on s390
perf bpf: Fix NULL return handling in bpf__prepare_load()
fs: clear writeback errors in inode_init_always
sched/core: Fix rules for running on online && !active CPUs
sched/core: Require cpu_active() in select_task_rq(), for user tasks
platform/x86: asus-wmi: Fix NULL pointer dereference
net/sonic: Use dma_mapping_error()
net: dsa: b53: Add BCM5389 support
Linux 4.14.54
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 829bc787c1a0403e4d886296dd4d90c5f9c1744a ]
In inode_init_always(), we clear the inode mapping flags, which clears
any retained error (AS_EIO, AS_ENOSPC) bits. Unfortunately, we do not
also clear wb_err, which means that old mapping errors can leak through
to new inodes.
This is crucial for the XFS inode allocation path because we recycle old
in-core inodes and we do not want error state from an old file to leak
into the new file. This bug was discovered by running generic/036 and
generic/047 in a loop and noticing that the EIOs generated by the
collision of direct and buffered writes in generic/036 would survive the
remount between 036 and 047, and get reported to the fsyncs (on
different files!) in generic/047.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This allows filesystems to use their mount private data to
influence the permssions they use in setattr2. It has
been separated into a new call to avoid disrupting current
setattr users.
Change-Id: I19959038309284448f1b7f232d579674ef546385
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.
However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:
----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()
// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull overlayfs updates from Miklos Szeredi:
"This fixes d_ino correctness in readdir, which brings overlayfs on par
with normal filesystems regarding inode number semantics, as long as
all layers are on the same filesystem.
There are also some bug fixes, one in particular (random ioctl's
shouldn't be able to modify lower layers) that touches some vfs code,
but of course no-op for non-overlay fs"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix false positive ESTALE on lookup
ovl: don't allow writing ioctl on lower layer
ovl: fix relatime for directories
vfs: add flags to d_real()
ovl: cleanup d_real for negative
ovl: constant d_ino for non-merge dirs
ovl: constant d_ino across copy up
ovl: fix readdir error value
ovl: check snprintf return
Allow interval trees to quickly check for overlaps to avoid unnecesary
tree lookups in interval_tree_iter_first().
As of this patch, all interval tree flavors will require using a
'rb_root_cached' such that we can have the leftmost node easily
available. While most users will make use of this feature, those with
special functions (in addition to the generic insert, delete, search
calls) will avoid using the cached option as they can do funky things
with insertions -- for example, vma_interval_tree_insert_after().
[jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()]
Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Need to treat non-regular overlayfs files the same as regular files when
checking for an atime update.
Add a d_real() flag to make it return the upper dentry for all file types.
Reported-by: "zhangyi (F)" <yi.zhang@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
When we introduced the bmap redo log items, we set MS_ACTIVE on the
mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
from being truncated prematurely during log recovery. This also had the
effect of putting linked inodes on the lru instead of evicting them.
Unfortunately, we neglected to find all those unreferenced lru inodes
and evict them after finishing log recovery, which means that we leak
them if anything goes wrong in the rest of xfs_mountfs, because the lru
is only cleaned out on unmount.
Therefore, evict unreferenced inodes in the lru list immediately
after clearing MS_ACTIVE.
Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: viro@ZenIV.linux.org.uk
Reviewed-by: Brian Foster <bfoster@redhat.com>
Pull misc filesystem updates from Al Viro:
"Assorted normal VFS / filesystems stuff..."
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
dentry name snapshots
Make statfs properly return read-only state after emergency remount
fs/dcache: init in_lookup_hashtable
minix: Deinline get_block, save 2691 bytes
fs: Reorder inode_owner_or_capable() to avoid needless
fs: warn in case userspace lied about modprobe return
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- Add the SYSTEM_SCHEDULING bootup state to move various scheduler
debug checks earlier into the bootup. This turns silent and
sporadically deadly bugs into nice, deterministic splats. Fix some
of the splats that triggered. (Thomas Gleixner)
- A round of restructuring and refactoring of the load-balancing and
topology code (Peter Zijlstra)
- Another round of consolidating ~20 of incremental scheduler code
history: this time in terms of wait-queue nomenclature. (I didn't
get much feedback on these renaming patches, and we can still
easily change any names I might have misplaced, so if anyone hates
a new name, please holler and I'll fix it.) (Ingo Molnar)
- sched/numa improvements, fixes and updates (Rik van Riel)
- Another round of x86/tsc scheduler clock code improvements, in hope
of making it more robust (Peter Zijlstra)
- Improve NOHZ behavior (Frederic Weisbecker)
- Deadline scheduler improvements and fixes (Luca Abeni, Daniel
Bristot de Oliveira)
- Simplify and optimize the topology setup code (Lauro Ramos
Venancio)
- Debloat and decouple scheduler code some more (Nicolas Pitre)
- Simplify code by making better use of llist primitives (Byungchul
Park)
- ... plus other fixes and improvements"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
sched/cputime: Refactor the cputime_adjust() code
sched/debug: Expose the number of RT/DL tasks that can migrate
sched/numa: Hide numa_wake_affine() from UP build
sched/fair: Remove effective_load()
sched/numa: Implement NUMA node level wake_affine()
sched/fair: Simplify wake_affine() for the single socket case
sched/numa: Override part of migrate_degrades_locality() when idle balancing
sched/rt: Move RT related code from sched/core.c to sched/rt.c
sched/deadline: Move DL related code from sched/core.c to sched/deadline.c
sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled
sched/fair: Spare idle load balancing on nohz_full CPUs
nohz: Move idle balancer registration to the idle path
sched/loadavg: Generalize "_idle" naming to "_nohz"
sched/core: Drop the unused try_get_task_struct() helper function
sched/fair: WARN() and refuse to set buddy when !se->on_rq
sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well
sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming
sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c
sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>
sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h>
...
Checking for capabilities should be the last operation when performing
access control tests so that PF_SUPERPRIV is set only when it was required
for success (implying that the capability was needed for the operation).
Reported-by: Solar Designer <solar@openwall.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Define a set of write life time hints:
RWH_WRITE_LIFE_NOT_SET No hint information set
RWH_WRITE_LIFE_NONE No hints about write life time
RWH_WRITE_LIFE_SHORT Data written has a short life time
RWH_WRITE_LIFE_MEDIUM Data written has a medium life time
RWH_WRITE_LIFE_LONG Data written has a long life time
RWH_WRITE_LIFE_EXTREME Data written has an extremely long life time
The intent is for these values to be relative to each other, no
absolute meaning should be attached to these flag names.
Add an fcntl interface for querying these flags, and also for
setting them as well:
F_GET_RW_HINT Returns the read/write hint set on the
underlying inode.
F_SET_RW_HINT Set one of the above write hints on the
underlying inode.
F_GET_FILE_RW_HINT Returns the read/write hint set on the
file descriptor.
F_SET_FILE_RW_HINT Set one of the above write hints on the
file descriptor.
The user passes in a 64-bit pointer to get/set these values, and
the interface returns 0/-1 on success/error.
Sample program testing/implementing basic setting/getting of write
hints is below.
Add support for storing the write life time hint in the inode flags
and in struct file as well, and pass them to the kiocb flags. If
both a file and its corresponding inode has a write hint, then we
use the one in the file, if available. The file hint can be used
for sync/direct IO, for buffered writeback only the inode hint
is available.
This is in preparation for utilizing these hints in the block layer,
to guide on-media data placement.
/*
* writehint.c: get or set an inode write hint
*/
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdbool.h>
#include <inttypes.h>
#ifndef F_GET_RW_HINT
#define F_LINUX_SPECIFIC_BASE 1024
#define F_GET_RW_HINT (F_LINUX_SPECIFIC_BASE + 11)
#define F_SET_RW_HINT (F_LINUX_SPECIFIC_BASE + 12)
#endif
static char *str[] = { "RWF_WRITE_LIFE_NOT_SET", "RWH_WRITE_LIFE_NONE",
"RWH_WRITE_LIFE_SHORT", "RWH_WRITE_LIFE_MEDIUM",
"RWH_WRITE_LIFE_LONG", "RWH_WRITE_LIFE_EXTREME" };
int main(int argc, char *argv[])
{
uint64_t hint;
int fd, ret;
if (argc < 2) {
fprintf(stderr, "%s: file <hint>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_RDONLY);
if (fd < 0) {
perror("open");
return 2;
}
if (argc > 2) {
hint = atoi(argv[2]);
ret = fcntl(fd, F_SET_RW_HINT, &hint);
if (ret < 0) {
perror("fcntl: F_SET_RW_HINT");
return 4;
}
}
ret = fcntl(fd, F_GET_RW_HINT, &hint);
if (ret < 0) {
perror("fcntl: F_GET_RW_HINT");
return 3;
}
printf("%s: hint %s\n", argv[1], str[hint]);
close(fd);
return 0;
}
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rename 'struct wait_bit_queue::wait' to ::wq_entry, to more clearly
name it as a wait-queue entry.
Propagate it to a couple of usage sites where the wait-bit-queue internals
are exposed.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull misc vfs updates from Al Viro:
"Assorted bits and pieces from various people. No common topic in this
pile, sorry"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/affs: add rename exchange
fs/affs: add rename2 to prepare multiple methods
Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()
fs: don't set *REFERENCED on single use objects
fs: compat: Remove warning from COMPATIBLE_IOCTL
remove pointless extern of atime_need_update_rcu()
fs: completely ignore unknown open flags
fs: add a VALID_OPEN_FLAGS
fs: remove _submit_bh()
fs: constify tree_descr arrays passed to simple_fill_super()
fs: drop duplicate header percpu-rwsem.h
fs/affs: bugfix: Write files greater than page size on OFS
fs/affs: bugfix: enable writes on OFS disks
fs/affs: remove node generation check
fs/affs: import amigaffs.h
fs/affs: bugfix: make symbolic links work again
Fix typos and add the following to the scripts/spelling.txt:
intialisation||initialisation
intialised||initialised
intialise||initialise
This commit does not intend to change the British spelling itself.
Link: http://lkml.kernel.org/r/1481573103-11329-18-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By default we set DCACHE_REFERENCED and I_REFERENCED on any dentry or
inode we create. This is problematic as this means that it takes two
trips through the LRU for any of these objects to be reclaimed,
regardless of their actual lifetime. With enough pressure from these
caches we can easily evict our working set from page cache with single
use objects. So instead only set *REFERENCED if we've already been
added to the LRU list. This means that we've been touched since the
first time we were accessed, and so more likely to need to hang out in
cache.
To illustrate this issue I wrote the following scripts
https://github.com/josefbacik/debug-scripts/tree/master/cache-pressure
on my test box. It is a single socket 4 core CPU with 16gib of RAM and
I tested on an Intel 2tib NVME drive. The cache-pressure.sh script
creates a new file system and creates 2 6.5gib files in order to take up
13gib of the 16gib of ram with pagecache. Then it runs a test program
that reads these 2 files in a loop, and keeps track of how often it has
to read bytes for each loop. On an ideal system with no pressure we
should have to read 0 bytes indefinitely. The second thing this script
does is start a fs_mark job that creates a ton of 0 length files,
putting pressure on the system with slab only allocations. On exit the
script prints out how many bytes were read by the read-file program.
The results are as follows
Without patch:
/mnt/btrfs-test/reads/file1: total read during loops 27262988288
/mnt/btrfs-test/reads/file2: total read during loops 27262976000
With patch:
/mnt/btrfs-test/reads/file2: total read during loops 18640457728
/mnt/btrfs-test/reads/file1: total read during loops 9565376512
This patch results in a 50% reduction of the amount of pages evicted
from our working set.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently we free fsnotify_mark_connector structure only when inode /
vfsmount is getting freed. This can however impose noticeable memory
overhead when marks get attached to inodes only temporarily. So free the
connector structure once the last mark is detached from the object.
Since notification infrastructure can be working with the connector
under the protection of fsnotify_mark_srcu, we have to be careful and
free the fsnotify_mark_connector only after SRCU period passes.
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently notification marks are attached to object (inode or vfsmnt) by
a hlist_head in the object. The list is also protected by a spinlock in
the object. So while there is any mark attached to the list of marks,
the object must be pinned in memory (and thus e.g. last iput() deleting
inode cannot happen). Also for list iteration in fsnotify() to work, we
must hold fsnotify_mark_srcu lock so that mark itself and
mark->obj_list.next cannot get freed. Thus we are required to wait for
response to fanotify events from userspace process with
fsnotify_mark_srcu lock held. That causes issues when userspace process
is buggy and does not reply to some event - basically the whole
notification subsystem gets eventually stuck.
So to be able to drop fsnotify_mark_srcu lock while waiting for
response, we have to pin the mark in memory and make sure it stays in
the object list (as removing the mark waiting for response could lead to
lost notification events for groups later in the list). However we don't
want inode reclaim to block on such mark as that would lead to system
just locking up elsewhere.
This commit is the first in the series that paves way towards solving
these conflicting lifetime needs. Instead of anchoring the list of marks
directly in the object, we anchor it in a dedicated structure
(fsnotify_mark_connector) and just point to that structure from the
object. The following commits will also add spinlock protecting the list
and object pointer to the structure.
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning
Pull vfs xattr updates from Al Viro:
"xattr stuff from Andreas
This completes the switch to xattr_handler ->get()/->set() from
->getxattr/->setxattr/->removexattr"
* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Remove {get,set,remove}xattr inode operations
xattr: Stop calling {get,set,remove}xattr inode operations
vfs: Check for the IOP_XATTR flag in listxattr
xattr: Add __vfs_{get,set,remove}xattr helpers
libfs: Use IOP_XATTR flag for empty directory handling
vfs: Use IOP_XATTR flag for bad-inode handling
vfs: Add IOP_XATTR inode operations flag
vfs: Move xattr_resolve_name to the front of fs/xattr.c
ecryptfs: Switch to generic xattr handlers
sockfs: Get rid of getxattr iop
sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
kernfs: Switch to generic xattr handlers
hfs: Switch to generic xattr handlers
jffs2: Remove jffs2_{get,set,remove}xattr macros
xattr: Remove unnecessary NULL attribute name check
The IOP_XATTR inode operations flag in inode->i_opflags indicates that
the inode has xattr support. The flag is automatically set by
new_inode() on filesystems with xattr support (where sb->s_xattr is
defined), and cleared otherwise. Filesystems can explicitly clear it
for inodes that should not have xattr support.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.
Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
current_fs_time() is used for inode timestamps.
Change the signature of the function to take inode pointer
instead of superblock as per Linus's suggestion.
Also, move the api under vfs as per the discussion on the
thread: https://lkml.org/lkml/2016/6/9/36 . As per Arnd's
suggestion on the thread, changing the function name.
current_fs_time() will be deleted after all the references
to it are replaced by current_time().
There was a bug reported by kbuild test bot with the change
as some of the calls to current_time() were made before the
super_block was initialized. Catch these accidental assignments
as timespec_trunc() does for wrong granularities. This allows
for the function to work right even in these circumstances.
But, adds a warning to make the user aware of the bug.
A coccinelle script was used to identify all the current
.alloc_inode super_block callbacks that updated inode timestamps.
proc filesystem was the only one that was modifying inode times
as part of this callback. The series includes a patch to fix that.
Note that timespec_trunc() will also be moved to fs/inode.c
in a separate patch when this will need to be revamped for
bounds checking purposes.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
On overlayfs relatime_need_update() needs inode times to be correct on
overlay inode. But i_mtime and i_ctime are updated by filesystem code on
underlying inode only, so they will be out-of-date on the overlay inode.
This patch copies the times from the underlying inode if needed. This
can't be done if called from RCU lookup (link following) but link m/ctime
are not updated by fs, so this is all right.
This patch doesn't change functionality for anything but overlayfs.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Pull more vfs updates from Al Viro:
"Assorted cleanups and fixes.
In the "trivial API change" department - ->d_compare() losing 'parent'
argument"
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
cachefiles: Fix race between inactivating and culling a cache object
9p: use clone_fid()
9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"
vfs: make dentry_needs_remove_privs() internal
vfs: remove file_needs_remove_privs()
vfs: fix deadlock in file_remove_privs() on overlayfs
get rid of 'parent' argument of ->d_compare()
cifs, msdos, vfat, hfs+: don't bother with parent in ->d_compare()
affs ->d_compare(): don't bother with ->d_inode
fold _d_rehash() and __d_rehash() together
fold dentry_rcuwalk_invalidate() into its only remaining caller
file_remove_privs() is called with inode lock on file_inode(), which
proceeds to calling notify_change() on file->f_path.dentry. Which triggers
the WARN_ON_ONCE(!inode_is_locked(inode)) in addition to deadlocking later
when ovl_setattr tries to lock the underlying inode again.
Fix this mess by not mixing the layers, but doing everything on underlying
dentry/inode.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 07a2daab49c5 ("ovl: Copy up underlying inode's ->i_mode to overlay inode")
Cc: <stable@vger.kernel.org>
Radix trees may be used not only for storing page cache pages, so
unconditionally accounting radix tree nodes to the current memory cgroup
is bad: if a radix tree node is used for storing data shared among
different cgroups we risk pinning dead memory cgroups forever.
So let's only account radix tree nodes if it was explicitly requested by
passing __GFP_ACCOUNT to INIT_RADIX_TREE. Currently, we only want to
account page cache entries, so mark mapping->page_tree so.
Fixes: 58e698af4c63 ("radix-tree: account radix_tree_node to memory cgroup")
Link: http://lkml.kernel.org/r/1470057188-7864-1-git-send-email-vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull userns vfs updates from Eric Biederman:
"This tree contains some very long awaited work on generalizing the
user namespace support for mounting filesystems to include filesystems
with a backing store. The real world target is fuse but the goal is
to update the vfs to allow any filesystem to be supported. This
patchset is based on a lot of code review and testing to approach that
goal.
While looking at what is needed to support the fuse filesystem it
became clear that there were things like xattrs for security modules
that needed special treatment. That the resolution of those concerns
would not be fuse specific. That sorting out these general issues
made most sense at the generic level, where the right people could be
drawn into the conversation, and the issues could be solved for
everyone.
At a high level what this patchset does a couple of simple things:
- Add a user namespace owner (s_user_ns) to struct super_block.
- Teach the vfs to handle filesystem uids and gids not mapping into
to kuids and kgids and being reported as INVALID_UID and
INVALID_GID in vfs data structures.
By assigning a user namespace owner filesystems that are mounted with
only user namespace privilege can be detected. This allows security
modules and the like to know which mounts may not be trusted. This
also allows the set of uids and gids that are communicated to the
filesystem to be capped at the set of kuids and kgids that are in the
owning user namespace of the filesystem.
One of the crazier corner casees this handles is the case of inodes
whose i_uid or i_gid are not mapped into the vfs. Most of the code
simply doesn't care but it is easy to confuse the inode writeback path
so no operation that could cause an inode write-back is permitted for
such inodes (aka only reads are allowed).
This set of changes starts out by cleaning up the code paths involved
in user namespace permirted mounts. Then when things are clean enough
adds code that cleanly sets s_user_ns. Then additional restrictions
are added that are possible now that the filesystem superblock
contains owner information.
These changes should not affect anyone in practice, but there are some
parts of these restrictions that are changes in behavior.
- Andy's restriction on suid executables that does not honor the
suid bit when the path is from another mount namespace (think
/proc/[pid]/fd/) or when the filesystem was mounted by a less
privileged user.
- The replacement of the user namespace implicit setting of MNT_NODEV
with implicitly setting SB_I_NODEV on the filesystem superblock
instead.
Using SB_I_NODEV is a stronger form that happens to make this state
user invisible. The user visibility can be managed but it caused
problems when it was introduced from applications reasonably
expecting mount flags to be what they were set to.
There is a little bit of work remaining before it is safe to support
mounting filesystems with backing store in user namespaces, beyond
what is in this set of changes.
- Verifying the mounter has permission to read/write the block device
during mount.
- Teaching the integrity modules IMA and EVM to handle filesystems
mounted with only user namespace root and to reduce trust in their
security xattrs accordingly.
- Capturing the mounters credentials and using that for permission
checks in d_automount and the like. (Given that overlayfs already
does this, and we need the work in d_automount it make sense to
generalize this case).
Furthermore there are a few changes that are on the wishlist:
- Get all filesystems supporting posix acls using the generic posix
acls so that posix_acl_fix_xattr_from_user and
posix_acl_fix_xattr_to_user may be removed. [Maintainability]
- Reducing the permission checks in places such as remount to allow
the superblock owner to perform them.
- Allowing the superblock owner to chown files with unmapped uids and
gids to something that is mapped so the files may be treated
normally.
I am not considering even obvious relaxations of permission checks
until it is clear there are no more corner cases that need to be
locked down and handled generically.
Many thanks to Seth Forshee who kept this code alive, and putting up
with me rewriting substantial portions of what he did to handle more
corner cases, and for his diligent testing and reviewing of my
changes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
fs: Call d_automount with the filesystems creds
fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
evm: Translate user/group ids relative to s_user_ns when computing HMAC
dquot: For now explicitly don't support filesystems outside of init_user_ns
quota: Handle quota data stored in s_user_ns in quota_setxquota
quota: Ensure qids map to the filesystem
vfs: Don't create inodes with a uid or gid unknown to the vfs
vfs: Don't modify inodes with a uid or gid unknown to the vfs
cred: Reject inodes with invalid ids in set_create_file_as()
fs: Check for invalid i_uid in may_follow_link()
vfs: Verify acls are valid within superblock's s_user_ns.
userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
fs: Refuse uid/gid changes which don't map into s_user_ns
selinux: Add support for unprivileged mounts from user namespaces
Smack: Handle labels consistently in untrusted mounts
Smack: Add support for unprivileged mounts from user namespaces
fs: Treat foreign mounts as nosuid
fs: Limit file caps to the user namespace of the super block
userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
userns: Remove implicit MNT_NODEV fragility.
...
wait_sb_inodes() currently does a walk of all inodes in the filesystem
to find dirty one to wait on during sync. This is highly inefficient
and wastes a lot of CPU when there are lots of clean cached inodes that
we don't need to wait on.
To avoid this "all inode" walk, we need to track inodes that are
currently under writeback that we need to wait for. We do this by
adding inodes to a writeback list on the sb when the mapping is first
tagged as having pages under writeback. wait_sb_inodes() can then walk
this list of "inodes under IO" and wait specifically just for the inodes
that the current sync(2) needs to wait for.
Define a couple helpers to add/remove an inode from the writeback list
and call them when the overall mapping is tagged for or cleared from
writeback. Update wait_sb_inodes() to walk only the inodes under
writeback due to the sync.
With this change, filesystem sync times are significantly reduced for
fs' with largely populated inode caches and otherwise no other work to
do. For example, on a 16xcpu 2GHz x86-64 server, 10TB XFS filesystem
with a ~10m entry inode cache, sync times are reduced from ~7.3s to less
than 0.1s when the filesystem is fully clean.
Link: http://lkml.kernel.org/r/1466594593-6757-2-git-send-email-bfoster@redhat.com
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Holger Hoffstätte <holger.hoffstaette@applied-asynchrony.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>