349 Commits

Author SHA1 Message Date
Blagovest Kolenichev
9f5be70fa4 Merge android-4.14-q.154 (b7f5267) into msm-4.14
* refs/heads/tmp-b7f5267:
  usb: gadget: configfs: Fix missing spin_lock_init()
  Linux 4.14.154
  kvm: x86: mmu: Recovery of shattered NX large pages
  kvm: Add helper function for creating VM worker threads
  kvm: mmu: ITLB_MULTIHIT mitigation
  KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active
  KVM: x86: add tracepoints around __direct_map and FNAME(fetch)
  KVM: x86: change kvm_mmu_page_get_gfn BUG_ON to WARN_ON
  KVM: x86: remove now unneeded hugepage gfn adjustment
  KVM: x86: make FNAME(fetch) and __direct_map more similar
  kvm: mmu: Do not release the page inside mmu_set_spte()
  kvm: Convert kvm_lock to a mutex
  kvm: x86, powerpc: do not allow clearing largepages debugfs entry
  Documentation: Add ITLB_MULTIHIT documentation
  cpu/speculation: Uninline and export CPU mitigations helpers
  x86/cpu: Add Tremont to the cpu vulnerability whitelist
  x86/bugs: Add ITLB_MULTIHIT bug infrastructure
  x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs
  x86/tsx: Add config options to set tsx=on|off|auto
  x86/speculation/taa: Add documentation for TSX Async Abort
  x86/tsx: Add "auto" option to the tsx= cmdline parameter
  kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
  x86/speculation/taa: Add sysfs reporting for TSX Async Abort
  x86/speculation/taa: Add mitigation for TSX Async Abort
  x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
  x86/cpu: Add a helper function x86_read_arch_cap_msr()
  x86/msr: Add the IA32_TSX_CTRL MSR
  KVM: x86: use Intel speculation bugs and features as derived in generic x86 code
  drm/i915/cmdparser: Fix jump whitelist clearing
  drm/i915/gen8+: Add RC6 CTX corruption WA
  drm/i915: Lower RM timeout to avoid DSI hard hangs
  drm/i915/cmdparser: Ignore Length operands during command matching
  drm/i915/cmdparser: Add support for backward jumps
  drm/i915/cmdparser: Use explicit goto for error paths
  drm/i915: Add gen9 BCS cmdparsing
  drm/i915: Allow parsing of unsized batches
  drm/i915: Support ro ppgtt mapped cmdparser shadow buffers
  drm/i915: Add support for mandatory cmdparsing
  drm/i915: Remove Master tables from cmdparser
  drm/i915: Disable Secure Batches for gen6+
  drm/i915: Rename gen7 cmdparser tables
  drm/i915: Move engine->needs_cmd_parser to engine->flags
  drm/i915: Don't use GPU relocations prior to cmdparser stalls
  drm/i915: Silence smatch for cmdparser
  drm/i915/cmdparser: Do not check past the cmd length.
  drm/i915/cmdparser: Check reg_table_count before derefencing.
  drm/i915: Prevent writing into a read-only object via a GGTT mmap
  drm/i915/gtt: Disable read-only support under GVT
  drm/i915/gtt: Read-only pages for insert_entries on bdw+
  drm/i915/gtt: Add read only pages to gen8_pte_encode
  net: prevent load/store tearing on sk->sk_stamp
  usbip: Fix free of unallocated memory in vhci tx
  cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
  mm/filemap.c: don't initiate writeback if mapping has no dirty pages
  can: flexcan: disable completely the ECC mechanism
  x86/apic/32: Avoid bogus LDR warnings
  x86/apic: Drop logical_smp_processor_id() inline
  x86/apic: Move pending interrupt check code into it's own function
  e1000: fix memory leaks
  igb: Fix constant media auto sense switching when no cable is connected
  net: ethernet: arc: add the missed clk_disable_unprepare
  NFSv4: Don't allow a cached open with a revoked delegation
  hv_netvsc: Fix error handling in netvsc_attach()
  net: hisilicon: Fix "Trying to free already-free IRQ"
  fjes: Handle workqueue allocation failure
  scsi: qla2xxx: stop timer in shutdown path
  RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case
  USB: ldusb: use unsigned size format specifiers
  USB: Skip endpoints with 0 maxpacket length
  perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h)
  perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity
  usb: dwc3: remove the call trace of USBx_GFLADJ
  usb: gadget: configfs: fix concurrent issue between composite APIs
  usb: gadget: composite: Fix possible double free memory bug
  usb: gadget: udc: atmel: Fix interrupt storm in FIFO mode.
  usb: fsl: Check memory resource before releasing it
  macsec: fix refcnt leak in module exit routine
  bonding: fix unexpected IFF_BONDING bit unset
  ipvs: move old_secure_tcp into struct netns_ipvs
  ipvs: don't ignore errors in case refcounting ip_vs module fails
  scsi: qla2xxx: Initialized mailbox to prevent driver load failure
  scsi: lpfc: Honor module parameter lpfc_use_adisc
  net: openvswitch: free vport unless register_netdevice() succeeds
  RDMA/uverbs: Prevent potential underflow
  scsi: qla2xxx: fixup incorrect usage of host_byte
  net/mlx5: prevent memory leak in mlx5_fpga_conn_create_cq
  RDMA/qedr: Fix reported firmware version
  HID: intel-ish-hid: fix wrong error handling in ishtp_cl_alloc_tx_ring()
  dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config
  PCI: tegra: Enable Relaxed Ordering only for Tegra20 & Tegra30
  usbip: Implement SG support to vhci-hcd and stub driver
  usbip: stub_rx: fix static checker warning on unnecessary checks
  usbip: Fix vhci_urb_enqueue() URB null transfer buffer error path
  lib/scatterlist: Introduce sgl_alloc() and sgl_free()
  sched/fair: Fix -Wunused-but-set-variable warnings
  sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices
  ARM: dts: dra7: Disable USB metastability workaround for USB2
  cpufreq: ti-cpufreq: add missing of_node_put()
  i2c: omap: Trigger bus recovery in lockup case
  ASoC: davinci-mcasp: Fix an error handling path in 'davinci_mcasp_probe()'
  ASoC: davinci: Kill BUG_ON() usage
  ASoC: davinci-mcasp: Handle return value of devm_kasprintf
  ASoC: tlv320dac31xx: mark expected switch fall-through
  mailbox: reset txdone_method TXDONE_BY_POLL if client knows_txdone
  misc: pci_endpoint_test: Fix BUG_ON error during pci_disable_msi()
  PCI: dra7xx: Add shutdown handler to cleanly turn off clocks
  misc: pci_endpoint_test: Prevent some integer overflows
  mtd: spi-nor: cadence-quadspi: add a delay in write sequence
  mtd: spi-nor: enable 4B opcodes for mx66l51235l
  ASoC: tlv320aic31xx: Handle inverted BCLK in non-DSP modes
  mfd: palmas: Assign the right powerhold mask for tps65917
  usb: dwc3: Allow disabling of metastability workaround
  configfs: fix a deadlock in configfs_symlink()
  configfs: provide exclusion between IO and removals
  configfs: new object reprsenting tree fragments
  configfs_register_group() shouldn't be (and isn't) called in rmdirable parts
  configfs: stash the data we need into configfs_buffer at open time
  configfs: Fix bool initialization/comparison
  can: peak_usb: fix slab info leak
  can: mcba_usb: fix use-after-free on disconnect
  can: gs_usb: gs_can_open(): prevent memory leak
  can: rx-offload: can_rx_offload_queue_sorted(): fix error handling, avoid skb mem leak
  can: peak_usb: fix a potential out-of-sync while decoding packets
  can: c_can: c_can_poll(): only read status register after status IRQ
  can: usb_8dev: fix use-after-free on disconnect
  intel_th: pci: Add Jasper Lake PCH support
  intel_th: pci: Add Comet Lake PCH support
  netfilter: ipset: Fix an error code in ip_set_sockfn_get()
  netfilter: nf_tables: Align nft_expr private data to 64-bit
  iio: srf04: fix wrong limitation in distance measuring
  iio: imu: adis16480: make sure provided frequency is positive
  iio: adc: stm32-adc: fix stopping dma
  ceph: add missing check in d_revalidate snapdir handling
  ceph: fix use-after-free in __ceph_remove_cap()
  arm64: Do not mask out PTE_RDONLY in pte_same()
  HID: wacom: generic: Treat serial number and related fields as unsigned
  drm/radeon: fix si_enable_smc_cac() failed issue
  perf tools: Fix time sorting
  tools: gpio: Use !building_out_of_srctree to determine srctree
  dump_stack: avoid the livelock of the dump_lock
  mm, vmstat: hide /proc/pagetypeinfo from normal users
  mm: thp: handle page cache THP correctly in PageTransCompoundMap
  ALSA: hda/ca0132 - Fix possible workqueue stall
  ALSA: bebob: fix to detect configured source of sampling clock for Focusrite Saffire Pro i/o series
  ALSA: timer: Fix incorrectly assigned timer instance
  qede: fix NULL pointer deref in __qede_remove()
  NFC: st21nfca: fix double free
  nfc: netlink: fix double device reference drop
  NFC: fdp: fix incorrect free object
  net: usb: qmi_wwan: add support for DW5821e with eSIM support
  net: qualcomm: rmnet: Fix potential UAF when unregistering
  net: fix data-race in neigh_event_send()
  net: ethernet: octeon_mgmt: Account for second possible VLAN header
  ipv4: Fix table id reference in fib_sync_down_addr
  CDC-NCM: handle incomplete transfer of MTU
  bonding: fix state transition issue in link monitoring

Conflicts:
	Documentation/devicetree/bindings/usb/dwc3.txt
	drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
	drivers/usb/dwc3/core.h
	kernel/cpu.c

Change-Id: I81b1613324c238a6e612cf1c6cf2c67ca17b4adc
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2020-02-12 04:35:33 -08:00
Greg Kroah-Hartman
b7f526714d This is the 4.14.154 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl3K950ACgkQONu9yGCS
 aT7RzBAAujUlFkoT646z8Zi6najTF9/Bcpkt48h0QbGAlp44OvwzqTv0Vk9cp6hy
 WdCLIvH5SE2MzokMzIoanr4p2JKTBNk1FxCUYdXgGfETNbraplNRQ9yxyWo2G7Ic
 koWhEwQO2u7bbalD+kxbfG6w3yQN1mpTzUFzTS5lsK2IMD4WrNk0V2CNEZAPNBf/
 QBig/z8ixbEHRvp7uFI/v8W/QyimQ1uVxs9nsVsAlgsQA+hUnywRXaYgeYRjEw+s
 x7g3FCxuvjzNEpboijSTXiUqbY33HCnSoRB+GC4IzdDQ6MBJ8blC96+jBpQM2Vhf
 1nHqMB6G4BRVuC4dI/XQ3KxGxsVGEwgzuQags4mMHgXeudVoYQgntjpJlJyu5h5Y
 yQ4NZSRiKDY27AO1tnst2jjQwNDGF8dS4FPB/KpCNGjOlzFb351q34Lqj8fEoCt6
 O3822FFjA5OhnrlqgRwIuf2qN8po+iJWR2Dg3jZXizs/eO2PBZ1h5tUNDB03MQEr
 Y43DeRogWCFoaYTET198qBw6pJGNYbmKR2H0LAlGpyJsGkfsSHQVks+KT5EaiCCL
 96hRGZ5V1vs/hdqAB1rDMG7AVXQC51zOK5aMjvJ0jzEjRCO2fulOHH0nfaQAGhaD
 JhZc8eYouLj1p1+3K7yHzYZSWdzDH11182a7aHA5wd8ZTBonr0A=
 =qwqY
 -----END PGP SIGNATURE-----

Merge 4.14.154 into android-4.14-q

Changes in 4.14.154
	bonding: fix state transition issue in link monitoring
	CDC-NCM: handle incomplete transfer of MTU
	ipv4: Fix table id reference in fib_sync_down_addr
	net: ethernet: octeon_mgmt: Account for second possible VLAN header
	net: fix data-race in neigh_event_send()
	net: qualcomm: rmnet: Fix potential UAF when unregistering
	net: usb: qmi_wwan: add support for DW5821e with eSIM support
	NFC: fdp: fix incorrect free object
	nfc: netlink: fix double device reference drop
	NFC: st21nfca: fix double free
	qede: fix NULL pointer deref in __qede_remove()
	ALSA: timer: Fix incorrectly assigned timer instance
	ALSA: bebob: fix to detect configured source of sampling clock for Focusrite Saffire Pro i/o series
	ALSA: hda/ca0132 - Fix possible workqueue stall
	mm: thp: handle page cache THP correctly in PageTransCompoundMap
	mm, vmstat: hide /proc/pagetypeinfo from normal users
	dump_stack: avoid the livelock of the dump_lock
	tools: gpio: Use !building_out_of_srctree to determine srctree
	perf tools: Fix time sorting
	drm/radeon: fix si_enable_smc_cac() failed issue
	HID: wacom: generic: Treat serial number and related fields as unsigned
	arm64: Do not mask out PTE_RDONLY in pte_same()
	ceph: fix use-after-free in __ceph_remove_cap()
	ceph: add missing check in d_revalidate snapdir handling
	iio: adc: stm32-adc: fix stopping dma
	iio: imu: adis16480: make sure provided frequency is positive
	iio: srf04: fix wrong limitation in distance measuring
	netfilter: nf_tables: Align nft_expr private data to 64-bit
	netfilter: ipset: Fix an error code in ip_set_sockfn_get()
	intel_th: pci: Add Comet Lake PCH support
	intel_th: pci: Add Jasper Lake PCH support
	can: usb_8dev: fix use-after-free on disconnect
	can: c_can: c_can_poll(): only read status register after status IRQ
	can: peak_usb: fix a potential out-of-sync while decoding packets
	can: rx-offload: can_rx_offload_queue_sorted(): fix error handling, avoid skb mem leak
	can: gs_usb: gs_can_open(): prevent memory leak
	can: mcba_usb: fix use-after-free on disconnect
	can: peak_usb: fix slab info leak
	configfs: Fix bool initialization/comparison
	configfs: stash the data we need into configfs_buffer at open time
	configfs_register_group() shouldn't be (and isn't) called in rmdirable parts
	configfs: new object reprsenting tree fragments
	configfs: provide exclusion between IO and removals
	configfs: fix a deadlock in configfs_symlink()
	usb: dwc3: Allow disabling of metastability workaround
	mfd: palmas: Assign the right powerhold mask for tps65917
	ASoC: tlv320aic31xx: Handle inverted BCLK in non-DSP modes
	mtd: spi-nor: enable 4B opcodes for mx66l51235l
	mtd: spi-nor: cadence-quadspi: add a delay in write sequence
	misc: pci_endpoint_test: Prevent some integer overflows
	PCI: dra7xx: Add shutdown handler to cleanly turn off clocks
	misc: pci_endpoint_test: Fix BUG_ON error during pci_disable_msi()
	mailbox: reset txdone_method TXDONE_BY_POLL if client knows_txdone
	ASoC: tlv320dac31xx: mark expected switch fall-through
	ASoC: davinci-mcasp: Handle return value of devm_kasprintf
	ASoC: davinci: Kill BUG_ON() usage
	ASoC: davinci-mcasp: Fix an error handling path in 'davinci_mcasp_probe()'
	i2c: omap: Trigger bus recovery in lockup case
	cpufreq: ti-cpufreq: add missing of_node_put()
	ARM: dts: dra7: Disable USB metastability workaround for USB2
	sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices
	sched/fair: Fix -Wunused-but-set-variable warnings
	lib/scatterlist: Introduce sgl_alloc() and sgl_free()
	usbip: Fix vhci_urb_enqueue() URB null transfer buffer error path
	usbip: stub_rx: fix static checker warning on unnecessary checks
	usbip: Implement SG support to vhci-hcd and stub driver
	PCI: tegra: Enable Relaxed Ordering only for Tegra20 & Tegra30
	dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config
	HID: intel-ish-hid: fix wrong error handling in ishtp_cl_alloc_tx_ring()
	RDMA/qedr: Fix reported firmware version
	net/mlx5: prevent memory leak in mlx5_fpga_conn_create_cq
	scsi: qla2xxx: fixup incorrect usage of host_byte
	RDMA/uverbs: Prevent potential underflow
	net: openvswitch: free vport unless register_netdevice() succeeds
	scsi: lpfc: Honor module parameter lpfc_use_adisc
	scsi: qla2xxx: Initialized mailbox to prevent driver load failure
	ipvs: don't ignore errors in case refcounting ip_vs module fails
	ipvs: move old_secure_tcp into struct netns_ipvs
	bonding: fix unexpected IFF_BONDING bit unset
	macsec: fix refcnt leak in module exit routine
	usb: fsl: Check memory resource before releasing it
	usb: gadget: udc: atmel: Fix interrupt storm in FIFO mode.
	usb: gadget: composite: Fix possible double free memory bug
	usb: gadget: configfs: fix concurrent issue between composite APIs
	usb: dwc3: remove the call trace of USBx_GFLADJ
	perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity
	perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h)
	USB: Skip endpoints with 0 maxpacket length
	USB: ldusb: use unsigned size format specifiers
	RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case
	scsi: qla2xxx: stop timer in shutdown path
	fjes: Handle workqueue allocation failure
	net: hisilicon: Fix "Trying to free already-free IRQ"
	hv_netvsc: Fix error handling in netvsc_attach()
	NFSv4: Don't allow a cached open with a revoked delegation
	net: ethernet: arc: add the missed clk_disable_unprepare
	igb: Fix constant media auto sense switching when no cable is connected
	e1000: fix memory leaks
	x86/apic: Move pending interrupt check code into it's own function
	x86/apic: Drop logical_smp_processor_id() inline
	x86/apic/32: Avoid bogus LDR warnings
	can: flexcan: disable completely the ECC mechanism
	mm/filemap.c: don't initiate writeback if mapping has no dirty pages
	cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
	usbip: Fix free of unallocated memory in vhci tx
	net: prevent load/store tearing on sk->sk_stamp
	drm/i915/gtt: Add read only pages to gen8_pte_encode
	drm/i915/gtt: Read-only pages for insert_entries on bdw+
	drm/i915/gtt: Disable read-only support under GVT
	drm/i915: Prevent writing into a read-only object via a GGTT mmap
	drm/i915/cmdparser: Check reg_table_count before derefencing.
	drm/i915/cmdparser: Do not check past the cmd length.
	drm/i915: Silence smatch for cmdparser
	drm/i915: Don't use GPU relocations prior to cmdparser stalls
	drm/i915: Move engine->needs_cmd_parser to engine->flags
	drm/i915: Rename gen7 cmdparser tables
	drm/i915: Disable Secure Batches for gen6+
	drm/i915: Remove Master tables from cmdparser
	drm/i915: Add support for mandatory cmdparsing
	drm/i915: Support ro ppgtt mapped cmdparser shadow buffers
	drm/i915: Allow parsing of unsized batches
	drm/i915: Add gen9 BCS cmdparsing
	drm/i915/cmdparser: Use explicit goto for error paths
	drm/i915/cmdparser: Add support for backward jumps
	drm/i915/cmdparser: Ignore Length operands during command matching
	drm/i915: Lower RM timeout to avoid DSI hard hangs
	drm/i915/gen8+: Add RC6 CTX corruption WA
	drm/i915/cmdparser: Fix jump whitelist clearing
	KVM: x86: use Intel speculation bugs and features as derived in generic x86 code
	x86/msr: Add the IA32_TSX_CTRL MSR
	x86/cpu: Add a helper function x86_read_arch_cap_msr()
	x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
	x86/speculation/taa: Add mitigation for TSX Async Abort
	x86/speculation/taa: Add sysfs reporting for TSX Async Abort
	kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
	x86/tsx: Add "auto" option to the tsx= cmdline parameter
	x86/speculation/taa: Add documentation for TSX Async Abort
	x86/tsx: Add config options to set tsx=on|off|auto
	x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs
	x86/bugs: Add ITLB_MULTIHIT bug infrastructure
	x86/cpu: Add Tremont to the cpu vulnerability whitelist
	cpu/speculation: Uninline and export CPU mitigations helpers
	Documentation: Add ITLB_MULTIHIT documentation
	kvm: x86, powerpc: do not allow clearing largepages debugfs entry
	kvm: Convert kvm_lock to a mutex
	kvm: mmu: Do not release the page inside mmu_set_spte()
	KVM: x86: make FNAME(fetch) and __direct_map more similar
	KVM: x86: remove now unneeded hugepage gfn adjustment
	KVM: x86: change kvm_mmu_page_get_gfn BUG_ON to WARN_ON
	KVM: x86: add tracepoints around __direct_map and FNAME(fetch)
	KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active
	kvm: mmu: ITLB_MULTIHIT mitigation
	kvm: Add helper function for creating VM worker threads
	kvm: x86: mmu: Recovery of shattered NX large pages
	Linux 4.14.154

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-11-14 14:25:57 +08:00
Tejun Heo
d527ab46d1 cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
commit 65de03e251382306a4575b1779c57c87889eee49 upstream.

cgroup writeback tries to refresh the associated wb immediately if the
current wb is dead.  This is to avoid keeping issuing IOs on the stale
wb after memcg - blkcg association has changed (ie. when blkcg got
disabled / enabled higher up in the hierarchy).

Unfortunately, the logic gets triggered spuriously on inodes which are
associated with dead cgroups.  When the logic is triggered on dead
cgroups, the attempt fails only after doing quite a bit of work
allocating and initializing a new wb.

While c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping
has no dirty pages") alleviated the issue significantly as it now only
triggers when the inode has dirty pages.  However, the condition can
still be triggered before the inode is switched to a different cgroup
and the logic simply doesn't make sense.

Skip the immediate switching if the associated memcg is dying.

This is a simplified version of the following two patches:

 * https://lore.kernel.org/linux-mm/20190513183053.GA73423@dennisz-mbp/
 * http://lkml.kernel.org/r/156355839560.2063.5265687291430814589.stgit@buzz

Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: e8a7abf5a5bd ("writeback: disassociate inodes from dying bdi_writebacks")
Acked-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12 19:18:47 +01:00
Sahitya Tummala
0f5052ff8d writeback: Fix NULL pointer dereference issue with inode->i_wb
There is a potential race between evict() and __mark_inode_dirty()
observed with EXT4 FS. The ext4/jbd2 context is marking the inode as
dirty while at the same time evict() is happening on another core.
Due to this race, __mark_inode_dirty() is adding the inode to
bdi_writeback IO dirty list, after evict() path has removed this
inode from the writeback list. This results NULL pointer dereference
issue when accessing inode->i_wb from writeback thread as below -

locked_inode_to_wb_and_lock_list+0x2c/0x2a0
inode_to_wb_and_lock_list+0x24/0x30
writeback_sb_inodes+0x2cc/0x444
__writeback_inodes_wb+0x78/0xbc
wb_writeback+0x1c8/0x3d0
wb_workfn+0x1a8/0x4d4
process_one_work+0x1c0/0x3d4
worker_thread+0x224/0x344
kthread+0x120/0x130
ret_from_fork+0x10/0x18

Fix this by checking for the inode->i_state I_WILL_FREE or I_FREEING
before and after adding to the writeback list. If it is observed to be
in freeing path before adding, then simply return. If it is after adding
to the writeback list, then remove this inode from the writeback list
in __mark_inode_dirty() context itself.

Change-Id: I1e5f15d81dae01f0dd7a108b1861adc83105ed7b
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
2019-10-29 13:34:02 +05:30
Greg Kroah-Hartman
1391d3b9b2 This is the 4.14.135 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1BJxsACgkQONu9yGCS
 aT4wBxAAymuWVXtmeWFQSFNji/RAJcHBAOvydIRMr7vwCXpojuRerNolo7WibM/B
 Mgx2OISn0d8rg98Cc3wiM6WUN9AeHr3lSWXORg3iBr0zP+ZO5Vs0Y2w9gueEJS+i
 egMvi2KZyS3Esrfmxv62pJ9DIVqyPVlvzN/Y79BARcwIeZOt+puycR5XV3WROzX9
 Wy2JBz5f56m9qzPGKXGRLlvq7LghZ5EbyFoIb/fj9K6pFdVBrpSEOeocCQos9IEz
 0+1TiWAkqOGLGZWJ3CFW/6Nbn1JO3hZpIgqxVczZXR+4UVhR+yniHUzZ20g89DzE
 mmprjKGv/8/7pXyXtGhjXuaZN5r1ldUje5SZf1X7SzxLuABSKIHykYJjKUQY2O3b
 8tpPULGA77V7Ww4TtyRLeOVPqaVslWFgLP6snyileSdoxfISebo2KptQn0pmuFX2
 Y0ePPot/aHHXmhrn5mAY9UZO9etqko8LjvVHDOsQQ99GJJ1BAz73w+wkKDtHXGuo
 iqUlSSW2YpThnAkufUlyhk10y6itGmy0P7GSrw8PCd9As2/LAz6c9+8+NPp/2P2Z
 Ffl2q7eUCqb0HixAnq5KqcPDSVdyqVtQ7XeN3lAEWVGmwpiu2xyuZgpQyT5FRqOZ
 mLYHZJF7FEZOZo+hkbH4O6j3umJ0QFJakVwrEiQ/ha0yLZpS3OM=
 =u0hP
 -----END PGP SIGNATURE-----

Merge 4.14.135 into android-4.14-q

Changes in 4.14.135
	MIPS: ath79: fix ar933x uart parity mode
	MIPS: fix build on non-linux hosts
	arm64/efi: Mark __efistub_stext_offset as an absolute symbol explicitly
	scsi: iscsi: set auth_protocol back to NULL if CHAP_A value is not supported
	dmaengine: imx-sdma: fix use-after-free on probe error path
	wil6210: fix potential out-of-bounds read
	ath10k: Do not send probe response template for mesh
	ath9k: Check for errors when reading SREV register
	ath6kl: add some bounds checking
	ath: DFS JP domain W56 fixed pulse type 3 RADAR detection
	batman-adv: fix for leaked TVLV handler.
	media: dvb: usb: fix use after free in dvb_usb_device_exit
	media: spi: IR LED: add missing of table registration
	crypto: talitos - fix skcipher failure due to wrong output IV
	media: marvell-ccic: fix DMA s/g desc number calculation
	media: vpss: fix a potential NULL pointer dereference
	media: media_device_enum_links32: clean a reserved field
	net: stmmac: dwmac1000: Clear unused address entries
	net: stmmac: dwmac4/5: Clear unused address entries
	qed: Set the doorbell address correctly
	signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig
	af_key: fix leaks in key_pol_get_resp and dump_sp.
	xfrm: Fix xfrm sel prefix length validation
	fscrypt: clean up some BUG_ON()s in block encryption/decryption
	media: mc-device.c: don't memset __user pointer contents
	media: staging: media: davinci_vpfe: - Fix for memory leak if decoder initialization fails.
	net: phy: Check against net_device being NULL
	crypto: talitos - properly handle split ICV.
	crypto: talitos - Align SEC1 accesses to 32 bits boundaries.
	tua6100: Avoid build warnings.
	locking/lockdep: Fix merging of hlocks with non-zero references
	media: wl128x: Fix some error handling in fm_v4l2_init_video_device()
	cpupower : frequency-set -r option misses the last cpu in related cpu list
	net: stmmac: dwmac4: fix flow control issue
	net: fec: Do not use netdev messages too early
	net: axienet: Fix race condition causing TX hang
	s390/qdio: handle PENDING state for QEBSM devices
	RAS/CEC: Fix pfn insertion
	net: sfp: add mutex to prevent concurrent state checks
	ipset: Fix memory accounting for hash types on resize
	perf cs-etm: Properly set the value of 'old' and 'head' in snapshot mode
	perf test 6: Fix missing kvm module load for s390
	media: fdp1: Support M3N and E3 platforms
	iommu: Fix a leak in iommu_insert_resv_region
	gpio: omap: fix lack of irqstatus_raw0 for OMAP4
	gpio: omap: ensure irq is enabled before wakeup
	regmap: fix bulk writes on paged registers
	bpf: silence warning messages in core
	rcu: Force inlining of rcu_read_lock()
	x86/cpufeatures: Add FDP_EXCPTN_ONLY and ZERO_FCS_FDS
	blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration
	xfrm: fix sa selector validation
	sched/core: Add __sched tag for io_schedule()
	x86/atomic: Fix smp_mb__{before,after}_atomic()
	perf evsel: Make perf_evsel__name() accept a NULL argument
	vhost_net: disable zerocopy by default
	ipoib: correcly show a VF hardware address
	EDAC/sysfs: Fix memory leak when creating a csrow object
	ipsec: select crypto ciphers for xfrm_algo
	ipvs: defer hook registration to avoid leaks
	media: s5p-mfc: Make additional clocks optional
	media: i2c: fix warning same module names
	ntp: Limit TAI-UTC offset
	timer_list: Guard procfs specific code
	acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
	media: coda: fix mpeg2 sequence number handling
	media: coda: fix last buffer handling in V4L2_ENC_CMD_STOP
	media: coda: increment sequence offset for the last returned frame
	media: vimc: cap: check v4l2_fill_pixfmt return value
	media: hdpvr: fix locking and a missing msleep
	rtlwifi: rtl8192cu: fix error handle when usb probe failed
	mt7601u: do not schedule rx_tasklet when the device has been disconnected
	x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c
	mt7601u: fix possible memory leak when the device is disconnected
	ipvs: fix tinfo memory leak in start_sync_thread
	ath10k: add missing error handling
	ath10k: fix PCIE device wake up failed
	perf tools: Increase MAX_NR_CPUS and MAX_CACHES
	libata: don't request sense data on !ZAC ATA devices
	clocksource/drivers/exynos_mct: Increase priority over ARM arch timer
	rslib: Fix decoding of shortened codes
	rslib: Fix handling of of caller provided syndrome
	ixgbe: Check DDM existence in transceiver before access
	crypto: serpent - mark __serpent_setkey_sbox noinline
	crypto: asymmetric_keys - select CRYPTO_HASH where needed
	EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msec
	bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush()
	net: hns3: fix a -Wformat-nonliteral compile warning
	net: hns3: add some error checking in hclge_tm module
	ath10k: destroy sdio workqueue while remove sdio module
	iwlwifi: mvm: Drop large non sta frames
	perf stat: Make metric event lookup more robust
	net: usb: asix: init MAC address buffers
	gpiolib: Fix references to gpiod_[gs]et_*value_cansleep() variants
	Bluetooth: hci_bcsp: Fix memory leak in rx_skb
	Bluetooth: 6lowpan: search for destination address in all peers
	Bluetooth: Check state in l2cap_disconnect_rsp
	gtp: add missing gtp_encap_disable_sock() in gtp_encap_enable()
	Bluetooth: validate BLE connection interval updates
	gtp: fix suspicious RCU usage
	gtp: fix Illegal context switch in RCU read-side critical section.
	gtp: fix use-after-free in gtp_encap_destroy()
	gtp: fix use-after-free in gtp_newlink()
	net: mvmdio: defer probe of orion-mdio if a clock is not ready
	iavf: fix dereference of null rx_buffer pointer
	floppy: fix div-by-zero in setup_format_params
	floppy: fix out-of-bounds read in next_valid_format
	floppy: fix invalid pointer dereference in drive_name
	floppy: fix out-of-bounds read in copy_buffer
	xen: let alloc_xenballooned_pages() fail if not enough memory free
	scsi: NCR5380: Reduce goto statements in NCR5380_select()
	scsi: NCR5380: Always re-enable reselection interrupt
	Revert "scsi: ncr5380: Increase register polling limit"
	scsi: core: Fix race on creating sense cache
	scsi: megaraid_sas: Fix calculation of target ID
	scsi: mac_scsi: Increase PIO/PDMA transfer length threshold
	scsi: mac_scsi: Fix pseudo DMA implementation, take 2
	crypto: ghash - fix unaligned memory access in ghash_setkey()
	crypto: ccp - Validate the the error value used to index error messages
	crypto: arm64/sha1-ce - correct digest for empty data in finup
	crypto: arm64/sha2-ce - correct digest for empty data in finup
	crypto: chacha20poly1305 - fix atomic sleep when using async algorithm
	crypto: ccp - memset structure fields to zero before reuse
	crypto: ccp/gcm - use const time tag comparison.
	crypto: crypto4xx - fix a potential double free in ppc4xx_trng_probe
	Input: gtco - bounds check collection indent level
	Input: alps - don't handle ALPS cs19 trackpoint-only device
	Input: synaptics - whitelist Lenovo T580 SMBus intertouch
	Input: alps - fix a mismatch between a condition check and its comment
	regulator: s2mps11: Fix buck7 and buck8 wrong voltages
	arm64: tegra: Update Jetson TX1 GPU regulator timings
	iwlwifi: pcie: don't service an interrupt that was masked
	iwlwifi: pcie: fix ALIVE interrupt handling for gen2 devices w/o MSI-X
	NFSv4: Handle the special Linux file open access mode
	pnfs/flexfiles: Fix PTR_ERR() dereferences in ff_layout_track_ds_error
	lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE
	ASoC: dapm: Adapt for debugfs API change
	ALSA: seq: Break too long mutex context in the write loop
	ALSA: hda/realtek: apply ALC891 headset fixup to one Dell machine
	media: v4l2: Test type instead of cfg->type in v4l2_ctrl_new_custom()
	media: coda: Remove unbalanced and unneeded mutex unlock
	KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
	arm64: tegra: Fix AGIC register range
	fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.
	drm/nouveau/i2c: Enable i2c pads & busses during preinit
	padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
	dm zoned: fix zone state management race
	xen/events: fix binding user event channels to cpus
	9p/xen: Add cleanup path in p9_trans_xen_init
	9p/virtio: Add cleanup path in p9_virtio_init
	x86/boot: Fix memory leak in default_get_smp_config()
	perf/x86/amd/uncore: Do not set 'ThreadMask' and 'SliceMask' for non-L3 PMCs
	perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs
	intel_th: pci: Add Ice Lake NNPI support
	PCI: Do not poll for PME if the device is in D3cold
	Btrfs: fix data loss after inode eviction, renaming it, and fsync it
	Btrfs: fix fsync not persisting dentry deletions due to inode evictions
	Btrfs: add missing inode version, ctime and mtime updates when punching hole
	HID: wacom: generic: only switch the mode on devices with LEDs
	HID: wacom: correct touch resolution x/y typo
	libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
	coda: pass the host file in vma->vm_file on mmap
	gpu: ipu-v3: ipu-ic: Fix saturation bit offset in TPMEM
	PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
	crypto: caam - limit output IV to CBC to work around CTR mode DMA issue
	parisc: Ensure userspace privilege for ptraced processes in regset functions
	parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1
	powerpc/32s: fix suspend/resume when IBATs 4-7 are used
	powerpc/watchpoint: Restore NV GPRs while returning from exception
	eCryptfs: fix a couple type promotion bugs
	intel_th: msu: Fix single mode with disabled IOMMU
	Bluetooth: Add SMP workaround Microsoft Surface Precision Mouse bug
	usb: Handle USB3 remote wakeup for LPM enabled devices correctly
	net: mvmdio: allow up to four clocks to be specified for orion-mdio
	dt-bindings: allow up to four clocks for orion-mdio
	dm bufio: fix deadlock with loop device
	compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
	compiler.h: Add read_word_at_a_time() function.
	lib/strscpy: Shut up KASAN false-positives in strscpy()
	bnx2x: Prevent load reordering in tx completion processing
	bnx2x: Prevent ptp_task to be rescheduled indefinitely
	caif-hsi: fix possible deadlock in cfhsi_exit_module()
	igmp: fix memory leak in igmpv3_del_delrec()
	ipv4: don't set IPv6 only flags to IPv4 addresses
	net: bcmgenet: use promisc for unsupported filters
	net: dsa: mv88e6xxx: wait after reset deactivation
	net: neigh: fix multiple neigh timer scheduling
	net: openvswitch: fix csum updates for MPLS actions
	nfc: fix potential illegal memory access
	rxrpc: Fix send on a connected, but unbound socket
	sky2: Disable MSI on ASUS P6T
	vrf: make sure skb->data contains ip header to make routing
	macsec: fix use-after-free of skb during RX
	macsec: fix checksumming after decryption
	netrom: fix a memory leak in nr_rx_frame()
	netrom: hold sock when setting skb->destructor
	bonding: validate ip header before check IPPROTO_IGMP
	net: make skb_dst_force return true when dst is refcounted
	tcp: fix tcp_set_congestion_control() use from bpf hook
	tcp: Reset bytes_acked and bytes_received when disconnecting
	net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling
	net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query
	net: bridge: stp: don't cache eth dest pointer before skb pull
	dma-buf: balance refcount inbalance
	dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc
	MIPS: lb60: Fix pin mappings
	ext4: don't allow any modifications to an immutable file
	ext4: enforce the immutable flag on open files
	mm: add filemap_fdatawait_range_keep_errors()
	jbd2: introduce jbd2_inode dirty range scoping
	ext4: use jbd2_inode dirty range scoping
	ext4: allow directory holes
	mm: vmscan: scan anonymous pages on file refaults
	perf/events/amd/uncore: Fix amd_uncore_llc ID to use pre-defined cpu_llc_id
	NFSv4: Fix open create exclusive when the server reboots
	nfsd: increase DRC cache limit
	nfsd: give out fewer session slots as limit approaches
	nfsd: fix performance-limiting session calculation
	nfsd: Fix overflow causing non-working mounts on 1 TB machines
	hvsock: fix epollout hang from race condition
	drm/panel: simple: Fix panel_simple_dsi_probe
	usb: core: hub: Disable hub-initiated U1/U2
	tty: max310x: Fix invalid baudrate divisors calculator
	pinctrl: rockchip: fix leaked of_node references
	tty: serial: cpm_uart - fix init when SMC is relocated
	drm/edid: Fix a missing-check bug in drm_load_edid_firmware()
	PCI: Return error if cannot probe VF
	drm/bridge: tc358767: read display_props in get_modes()
	drm/bridge: sii902x: pixel clock unit is 10kHz instead of 1kHz
	drm/crc-debugfs: User irqsafe spinlock in drm_crtc_add_crc_entry
	memstick: Fix error cleanup path of memstick_init
	tty/serial: digicolor: Fix digicolor-usart already registered warning
	tty: serial: msm_serial: avoid system lockup condition
	serial: 8250: Fix TX interrupt handling condition
	drm/virtio: Add memory barriers for capset cache.
	phy: renesas: rcar-gen2: Fix memory leak at error paths
	powerpc/pseries/mobility: prevent cpu hotplug during DT update
	drm/rockchip: Properly adjust to a true clock in adjusted_mode
	tty: serial_core: Set port active bit in uart_port_activate
	usb: gadget: Zero ffs_io_data
	powerpc/pci/of: Fix OF flags parsing for 64bit BARs
	drm/msm: Depopulate platform on probe failure
	serial: mctrl_gpio: Check if GPIO property exisits before requesting it
	PCI: sysfs: Ignore lockdep for remove attribute
	kbuild: Add -Werror=unknown-warning-option to CLANG_FLAGS
	PCI: xilinx-nwl: Fix Multi MSI data programming
	iio: iio-utils: Fix possible incorrect mask calculation
	powerpc/xmon: Fix disabling tracing while in xmon
	recordmcount: Fix spurious mcount entries on powerpc
	mfd: core: Set fwnode for created devices
	mfd: arizona: Fix undefined behavior
	mfd: hi655x-pmic: Fix missing return value check for devm_regmap_init_mmio_clk
	um: Silence lockdep complaint about mmap_sem
	powerpc/4xx/uic: clear pending interrupt after irq type/pol change
	RDMA/i40iw: Set queue pair state when being queried
	serial: sh-sci: Terminate TX DMA during buffer flushing
	serial: sh-sci: Fix TX DMA buffer flushing and workqueue races
	kallsyms: exclude kasan local symbols on s390
	perf test mmap-thread-lookup: Initialize variable to suppress memory sanitizer warning
	perf session: Fix potential NULL pointer dereference found by the smatch tool
	perf annotate: Fix dereferencing freed memory found by the smatch tool
	RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM
	PCI: dwc: pci-dra7xx: Fix compilation when !CONFIG_GPIOLIB
	powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h
	f2fs: avoid out-of-range memory access
	mailbox: handle failed named mailbox channel request
	powerpc/eeh: Handle hugepages in ioremap space
	block/bio-integrity: fix a memory leak bug
	sh: prevent warnings when using iounmap
	mm/kmemleak.c: fix check for softirq context
	9p: pass the correct prototype to read_cache_page
	mm/gup.c: mark undo_dev_pagemap as __maybe_unused
	mm/gup.c: remove some BUG_ONs from get_gate_page()
	mm/mmu_notifier: use hlist_add_head_rcu()
	locking/lockdep: Fix lock used or unused stats error
	locking/lockdep: Hide unused 'class' variable
	drm/crc: Only report a single overflow when a CRC fd is opened
	drm/crc-debugfs: Also sprinkle irqrestore over early exits
	usb: wusbcore: fix unbalanced get/put cluster_id
	usb: pci-quirks: Correct AMD PLL quirk detection
	KVM: nVMX: do not use dangling shadow VMCS after guest reset
	btrfs: inode: Don't compress if NODATASUM or NODATACOW set
	x86/sysfb_efi: Add quirks for some devices with swapped width and height
	x86/speculation/mds: Apply more accurate check on hypervisor platform
	binder: prevent transactions to context manager from its own process.
	fpga-manager: altera-ps-spi: Fix build error
	hpet: Fix division by zero in hpet_time_div()
	ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1
	ALSA: hda - Add a conexant codec entry to let mute led work
	powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask()
	powerpc/tm: Fix oops on sigreturn on systems without TM
	access: avoid the RCU grace period for the temporary subjective credentials
	Linux 4.14.135

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-07-31 08:11:10 +02:00
Tejun Heo
fa9900ef8f blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration
[ Upstream commit 6631142229005e1b1c311a09efe9fb3cfdac8559 ]

wbc_account_io() collects information on cgroup ownership of writeback
pages to determine which cgroup should own the inode.  Pages can stay
associated with dead memcgs but we want to avoid attributing IOs to
dead blkcgs as much as possible as the association is likely to be
stale.  However, currently, pages associated with dead memcgs
contribute to the accounting delaying and/or confusing the
arbitration.

Fix it by ignoring pages associated with dead memcgs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31 07:28:25 +02:00
Greg Kroah-Hartman
817de622e8 This is the 4.14.121 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzkLE0ACgkQONu9yGCS
 aT4ZkhAAoiWU5PcwwGtVonrlrtUlwFF14ZyDOMh1hkK7EWHWE+8r3In4BnUz6nW/
 wA5IZvIClDPn0go44uhGdZG078q7EqBnY2nhTEseG7XYTWSkIOaMsbcW8f5NFkCC
 Z9x7KKIIDRGt/uVNKXgIk5nqmFP7ycNgxUfq3bxMkproFLgeHmFihG43YC0O62b2
 nAF/q8OVpONqU9zPGwdVoBY+LQIfhsJi04Raoexr4+UFkvoUZF5zDKl6QZVPCXXT
 ETi7CXqntfFt92S6Y4rQfZe883oYFfWzi7GFhNL/oU4TMYDG+J8/PBS4rG3nosSp
 Lk81SCmTkAaOhG0rBvdkZFthHibGk3+kKuGWehvAhb5qFEJx+znsbwTVWIPTchAc
 axxfHOpW1X2rfrPnH/hkHb5unuJTfolquBmmy2D1Glv46LvI19rn1xgHtyGlb5dt
 84Gh8Bew372LkUeG7+CCsCKOuMu/8YuvAZ3DMntwGPo7GAnC052MqcpdyV+pj78z
 2y7mO8g9BVizaf5NkoZrf58KuSZDTLf1TfTRKHQVvTuxhzrnt/UIUF/BQmY216kd
 pEFp1Qq3zAwTaQgCV6s1ZWGHVidFIPQo7xtFND7MIZQYaZfFZivYS3AVdlox1KGd
 k2Rsb/Ub2R/KRrfMdjgIkNbEzauOS9miQTjwMr7zt2AsZJFXDwQ=
 =CsUM
 -----END PGP SIGNATURE-----

Merge 4.14.121 into android-4.14-q

Changes in 4.14.121
	net: core: another layer of lists, around PF_MEMALLOC skb handling
	locking/rwsem: Prevent decrement of reader count before increment
	PCI: hv: Fix a memory leak in hv_eject_device_work()
	PCI: hv: Add hv_pci_remove_slots() when we unload the driver
	PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary
	x86/speculation/mds: Revert CPU buffer clear on double fault exit
	x86/speculation/mds: Improve CPU buffer clear documentation
	objtool: Fix function fallthrough detection
	ARM: dts: exynos: Fix interrupt for shared EINTs on Exynos5260
	ARM: dts: exynos: Fix audio (microphone) routing on Odroid XU3
	ARM: exynos: Fix a leaked reference by adding missing of_node_put
	power: supply: axp288_charger: Fix unchecked return value
	arm64: compat: Reduce address limit
	arm64: Clear OSDLR_EL1 on CPU boot
	arm64: Save and restore OSDLR_EL1 across suspend/resume
	sched/x86: Save [ER]FLAGS on context switch
	crypto: chacha20poly1305 - set cra_name correctly
	crypto: vmx - fix copy-paste error in CTR mode
	crypto: skcipher - don't WARN on unprocessed data after slow walk step
	crypto: crct10dif-generic - fix use via crypto_shash_digest()
	crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
	crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
	crypto: rockchip - update IV buffer to contain the next IV
	crypto: arm/aes-neonbs - don't access already-freed walk.iv
	ALSA: usb-audio: Fix a memory leak bug
	ALSA: hda/hdmi - Read the pin sense from register when repolling
	ALSA: hda/hdmi - Consider eld_valid when reporting jack event
	ALSA: hda/realtek - EAPD turn on later
	ASoC: max98090: Fix restore of DAPM Muxes
	ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
	bpf, arm64: remove prefetch insn in xadd mapping
	mm/mincore.c: make mincore() more conservative
	ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
	userfaultfd: use RCU to free the task struct when fork fails
	mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L
	mfd: max77620: Fix swapped FPS_PERIOD_MAX_US values
	mtd: spi-nor: intel-spi: Avoid crossing 4K address boundary on read/write
	tty: vt.c: Fix TIOCL_BLANKSCREEN console blanking if blankinterval == 0
	tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
	jbd2: check superblock mapped prior to committing
	ext4: make sanity check in mballoc more strict
	ext4: ignore e_value_offs for xattrs with value-in-ea-inode
	ext4: avoid drop reference to iloc.bh twice
	Btrfs: do not start a transaction during fiemap
	Btrfs: do not start a transaction at iterate_extent_inodes()
	bcache: fix a race between cache register and cacheset unregister
	bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
	ext4: fix use-after-free race with debug_want_extra_isize
	ext4: actually request zeroing of inode table after grow
	ext4: fix ext4_show_options for file systems w/o journal
	ipmi:ssif: compare block number correctly for multi-part return messages
	crypto: arm64/aes-neonbs - don't access already-freed walk.iv
	crypto: salsa20 - don't access already-freed walk.iv
	crypto: ccm - fix incompatibility between "ccm" and "ccm_base"
	fib_rules: fix error in backport of e9919a24d302 ("fib_rules: return 0...")
	fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
	ext4: zero out the unused memory region in the extent tree block
	ext4: fix data corruption caused by overlapping unaligned and aligned IO
	ext4: fix use-after-free in dx_release()
	ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
	KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes
	iov_iter: optimize page_copy_sane()
	ext4: fix compile error when using BUFFER_TRACE
	Linux 4.14.121

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-05-21 19:05:48 +02:00
Jiufei Xue
f658822cd3 fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
commit ec084de929e419e51bcdafaafe567d9e7d0273b7 upstream.

synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb
switch may not go to the workqueue after synchronize_rcu().  Thus
previous scheduled switches was not finished even flushing the
workqueue, which will cause a NULL pointer dereferenced followed below.

  VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds.  Have a nice day...
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000278
    evict+0xb3/0x180
    iput+0x1b0/0x230
    inode_switch_wbs_work_fn+0x3c0/0x6a0
    worker_thread+0x4e/0x490
    ? process_one_work+0x410/0x410
    kthread+0xe6/0x100
    ret_from_fork+0x39/0x50

Replace the synchronize_rcu() call with a rcu_barrier() to wait for all
pending callbacks to finish.  And inc isw_nr_in_flight after call_rcu()
in inode_switch_wbs() to make more sense.

Link: http://lkml.kernel.org/r/20190429024108.54150-1-jiufei.xue@linux.alibaba.com
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Acked-by: Tejun Heo <tj@kernel.org>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 18:50:20 +02:00
Greg Kroah-Hartman
9ba09a2171 This is the 4.14.105 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlx+qpsACgkQONu9yGCS
 aT6+Aw//WiHXgsyk+jnwuOzJ0rvx38ky6l7RRm6uZ5r2M+aYQJ12FEc+AVomyu6T
 lA7ciLeIWNQFO0xQJ5Tg8LMPumL4JbA/EVElPu5h5yHMXySkQuC6WLXnH083d+dZ
 QbXEk6O9xuIxbk2sy58BmQnmqdXipTnpNIXpT/137Rlz1/jzxyv8PpsihitFI9/C
 3HrMHgSodKEOnZbWdruYv30Ac6nMrWolGmGaQiRszE7FfoFLyGl3KEX7kp6sMAKU
 P4L+9qUDswRVs4Zfa07XiNlXAmZUYDzZPkQVstsgKOmXNHtBrSLE94pYllsHqW8C
 GDbtW5+ZpUm89LObhlZWPcxLXQkOUJUOb2SJChtaPUmOT0rEoExTB0bH1SADQM5d
 Mb3jLGD9CVrCjgxB4J6871xeCZPIL1XIXEKs0lDz7YDOSF7WTYXTrQOAiA+PZLlR
 VcD7WO7Z25XYP15WcU4ypzswsLPGro3QQi/nzk9qyh/NQt65LlpW/USZ3hiyp/Sh
 u4Wh2rrvAWyrOi01ShRmn87S9X/IuzB9cuzthxM7rWrF+/uj+vHOQp+/BAEFkkub
 njis6oFbkB2xfykRA99w1tLe6to3qNcWFtJuOBxjVFdbB24pqIWYmj4dwyE0TQCD
 7QLfDCXx3vBDAg30Qbxx4Bh2w0ruggtHWa+D24Qz4YehJv1ppNY=
 =Ofi4
 -----END PGP SIGNATURE-----

Merge 4.14.105 into android-4.14

Changes in 4.14.105
	Revert "loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()"
	Revert "loop: Get rid of loop_index_mutex"
	Revert "loop: Fold __loop_release into loop_release"
	net: stmmac: Fix reception of Broadcom switches tags
	net: stmmac: Disable ACS Feature for GMAC >= 4
	scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached
	drm/msm: Unblock writer if reader closes file
	ASoC: Intel: Haswell/Broadwell: fix setting for .dynamic field
	ALSA: compress: prevent potential divide by zero bugs
	ASoC: Variable "val" in function rt274_i2c_probe() could be uninitialized
	clk: vc5: Abort clock configuration without upstream clock
	thermal: int340x_thermal: Fix a NULL vs IS_ERR() check
	usb: dwc3: gadget: synchronize_irq dwc irq in suspend
	usb: dwc3: gadget: Fix the uninitialized link_state when udc starts
	usb: gadget: Potential NULL dereference on allocation error
	genirq: Make sure the initial affinity is not empty
	ASoC: dapm: change snprintf to scnprintf for possible overflow
	ASoC: imx-audmux: change snprintf to scnprintf for possible overflow
	selftests: seccomp: use LDLIBS instead of LDFLAGS
	selftests: gpio-mockup-chardev: Check asprintf() for error
	ARC: fix __ffs return value to avoid build warnings
	drivers: thermal: int340x_thermal: Fix sysfs race condition
	staging: rtl8723bs: Fix build error with Clang when inlining is disabled
	mac80211: fix miscounting of ttl-dropped frames
	sched/wait: Fix rcuwait_wake_up() ordering
	futex: Fix (possible) missed wakeup
	locking/rwsem: Fix (possible) missed wakeup
	drm/amd/powerplay: OD setting fix on Vega10
	serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling
	staging: android: ion: Support cpu access during dma_buf_detach
	direct-io: allow direct writes to empty inodes
	writeback: synchronize sync(2) against cgroup writeback membership switches
	scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state()
	net: altera_tse: fix connect_local_phy error path
	hv_netvsc: Fix ethtool change hash key error
	net: usb: asix: ax88772_bind return error when hw_reset fail
	net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP
	ibmveth: Do not process frames after calling napi_reschedule
	mac80211: don't initiate TDLS connection if station is not associated to AP
	mac80211: Add attribute aligned(2) to struct 'action'
	cfg80211: extend range deviation for DMG
	svm: Fix AVIC incomplete IPI emulation
	KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
	powerpc: Always initialize input array when calling epapr_hypercall()
	mmc: spi: Fix card detection during probe
	mmc: tmio_mmc_core: don't claim spurious interrupts
	mmc: tmio: fix access width of Block Count Register
	mmc: sdhci-esdhc-imx: correct the fix of ERR004536
	mm: enforce min addr even if capable() in expand_downwards()
	MIPS: fix truncation in __cmpxchg_small for short values
	MIPS: eBPF: Fix icache flush end address
	x86/uaccess: Don't leak the AC flag into __put_user() value evaluation
	Linux 4.14.105

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-03-05 18:04:21 +01:00
Tejun Heo
494c4399ef writeback: synchronize sync(2) against cgroup writeback membership switches
[ Upstream commit 7fc5854f8c6efae9e7624970ab49a1eac2faefb1 ]

sync_inodes_sb() can race against cgwb (cgroup writeback) membership
switches and fail to writeback some inodes.  For example, if an inode
switches to another wb while sync_inodes_sb() is in progress, the new
wb might not be visible to bdi_split_work_to_wbs() at all or the inode
might jump from a wb which hasn't issued writebacks yet to one which
already has.

This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb
switch path against sync_inodes_sb() so that sync_inodes_sb() is
guaranteed to see all the target wbs and inodes can't jump wbs to
escape syncing.

v2: Fixed misplaced rwsem init.  Spotted by Jiufei.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jiufei Xue <xuejiufei@gmail.com>
Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-05 17:58:01 +01:00
Greg Kroah-Hartman
04f740d4da This is the 4.14.41 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlr753gACgkQONu9yGCS
 aT7p/Q//TIC9EKe21E2Lb1Kh4lL5SDjmwe/rkA3PxiqxbkXfUDBehMCfDk4YVNVG
 TlH1TXOubzpS/8cZJPRFHEkrYXPKIA3+hKlAvJukUJCBQqmW1ILEAX5m7jrSmf+B
 tLe/r0ijOtlfB1xQdUs5RxXGIndw0gMGhpo/QTXPAC0hGh0Ykd8v2s4YAjxOvdKw
 z4DaUKtZGEPBWFVK/Bx1Fv3iAmJMt2yerERUqz8MVegYXJt+2RUGoJtsxHuvOk1p
 9q0lzHBWYihQVt1tJ0es/8cB7WsYt8txnVmeN907sryUhDjvTWIxQJb5jEV0gxxK
 AL89PHy4Hfki6l6r+tqYi92frFda8aLfsaSseOhlmqsv0MlwngW2dx3UbjaYd4If
 IQA6n0hWHuxUvjrjsPpsMAa4lvTW+/kFilb0mD6Vixy3ru+/RelKnuawJm6kbMNu
 Cb8QSVSJrhvC/UZLvwO7a3viJdKoI5B9pTh5FTKcY5wUPI1k01pg3WlWNxmnv4ZJ
 LPImR06aoJYhvbutf94AvxbCOt/au8sY4s/yk9oHgvGUEIccrGYf3BwX6ciWRt4b
 r4ZN92C9ZuD+u/ATFgi/akngtjjixw5YrZ20aX86dYcBZ25hYOiIMoc482tYQ12Z
 1vqyvKg9o1oMypG9orF09PWstbNRu3ihGATKdXL9lfAhDklOTKc=
 =zWTK
 -----END PGP SIGNATURE-----

Merge 4.14.41 into android-4.14

Changes in 4.14.41
	ipvs: fix rtnl_lock lockups caused by start_sync_thread
	netfilter: ebtables: don't attempt to allocate 0-sized compat array
	kcm: Call strp_stop before strp_done in kcm_attach
	crypto: af_alg - fix possible uninit-value in alg_bind()
	netlink: fix uninit-value in netlink_sendmsg
	net: fix rtnh_ok()
	net: initialize skb->peeked when cloning
	net: fix uninit-value in __hw_addr_add_ex()
	dccp: initialize ireq->ir_mark
	ipv4: fix uninit-value in ip_route_output_key_hash_rcu()
	soreuseport: initialise timewait reuseport field
	inetpeer: fix uninit-value in inet_getpeer
	memcg: fix per_node_info cleanup
	perf: Remove superfluous allocation error check
	tcp: fix TCP_REPAIR_QUEUE bound checking
	bdi: wake up concurrent wb_shutdown() callers.
	bdi: Fix oops in wb_workfn()
	KVM: PPC: Book3S HV: Fix trap number return from __kvmppc_vcore_entry
	KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN
	KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
	arm64: Add work around for Arm Cortex-A55 Erratum 1024718
	compat: fix 4-byte infoleak via uninitialized struct field
	gpioib: do not free unrequested descriptors
	gpio: fix aspeed_gpio unmask irq
	gpio: fix error path in lineevent_create
	rfkill: gpio: fix memory leak in probe error path
	libata: Apply NOLPM quirk for SanDisk SD7UB3Q*G1001 SSDs
	dm integrity: use kvfree for kvmalloc'd memory
	tracing: Fix regex_match_front() to not over compare the test string
	z3fold: fix reclaim lock-ups
	mm: sections are not offlined during memory hotremove
	mm, oom: fix concurrent munlock and oom reaper unmap, v3
	ceph: fix rsize/wsize capping in ceph_direct_read_write()
	can: kvaser_usb: Increase correct stats counter in kvaser_usb_rx_can_msg()
	can: hi311x: Acquire SPI lock on ->do_get_berr_counter
	can: hi311x: Work around TX complete interrupt erratum
	drm/vc4: Fix scaling of uni-planar formats
	drm/i915: Fix drm:intel_enable_lvds ERROR message in kernel log
	drm/nouveau: Fix deadlock in nv50_mstm_register_connector()
	drm/atomic: Clean old_state/new_state in drm_atomic_state_default_clear()
	drm/atomic: Clean private obj old_state/new_state in drm_atomic_state_default_clear()
	net: atm: Fix potential Spectre v1
	atm: zatm: Fix potential Spectre v1
	PCI / PM: Always check PME wakeup capability for runtime wakeup support
	PCI / PM: Check device_may_wakeup() in pci_enable_wake()
	cpufreq: schedutil: Avoid using invalid next_freq
	Revert "Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174"
	Bluetooth: btusb: Add Dell XPS 13 9360 to btusb_needs_reset_resume_table
	Bluetooth: btusb: Only check needs_reset_resume DMI table for QCA rome chipsets
	thermal: exynos: Reading temperature makes sense only when TMU is turned on
	thermal: exynos: Propagate error value from tmu_read()
	nvme: add quirk to force medium priority for SQ creation
	smb3: directory sync should not return an error
	sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
	tracing/uprobe_event: Fix strncpy corner case
	perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_*
	perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr
	perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver
	perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[]
	perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map()
	KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler
	KVM: x86: remove APIC Timer periodic/oneshot spikes
	Linux 4.14.41

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-05-16 11:40:03 +02:00
Jan Kara
683b4520d0 bdi: Fix oops in wb_workfn()
commit b8b784958eccbf8f51ebeee65282ca3fd59ea391 upstream.

Syzbot has reported that it can hit a NULL pointer dereference in
wb_workfn() due to wb->bdi->dev being NULL. This indicates that
wb_workfn() was called for an already unregistered bdi which should not
happen as wb_shutdown() called from bdi_unregister() should make sure
all pending writeback works are completed before bdi is unregistered.
Except that wb_workfn() itself can requeue the work with:

	mod_delayed_work(bdi_wq, &wb->dwork, 0);

and if this happens while wb_shutdown() is waiting in:

	flush_delayed_work(&wb->dwork);

the dwork can get executed after wb_shutdown() has finished and
bdi_unregister() has cleared wb->bdi->dev.

Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
the necessary precautions against racing with bdi unregistration.

CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
CC: Tejun Heo <tj@kernel.org>
Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-16 10:10:25 +02:00
Greg Kroah-Hartman
e9a2c5dd1a This is the 4.14.36 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlre3ogACgkQONu9yGCS
 aT4a8Q//aR1U1nYUiiMwMTgxvWXR5Hic3jtnAxdOkpr6UNa5dDa1tijI5U9poKJW
 65EPPQNW29PxIv0UGhLRzdjpL/ac2QhMyW8gmS8ikXMbFPF2JrgvOLSZpWF70cE1
 hyFkzbnvavJe0QfWsii7Z+RdrSZMgfZheZMmLh1exv1tEmuYcfAiletdC8f5kTPU
 /aS5X9rmJM/Fyw4iQF7NEpYPY4vESsgMd7ZfHifcV07ze6f+lkW+gcKZuVi//eJ1
 NJEvSBjSvqbQoHugHvHbV/UM2RwzFFfihm6y94WOurSbToksJ141P/MEBxc9vDae
 rCA8Qwq3YZ8vPu5rb8L1UHlpR+CIuanSJnijBhC2Lh6W4CmVA70+lvveqMbZGi/X
 Tm9+QlV4F32ogOy+rNvFARoNx7KkWvjZ8kF2a/qgbkqQgPCwSku4anW3abXLQad+
 4hYbqAwunq0V1Zi4XoIAjcQWlAokau4jDxfKbpoO7CBUYoia+1vDoK4U1FHsFy77
 E4w7LktCecfoqieoBzsD5mZfTG5qrzwNhoxnnZmRGZY81TW9swVZYPkfqamG/Cbk
 7HkgOLvtQiwtY5dxsLHvMwbtXzQqxO10KuLBAao6OY9xLEAqamV1v9gGO1WyOzRd
 avVUShDL6FQHTRalzcm8K9OLUhOZWDcZLR9XgNwfgxZYjAlfqCA=
 =Gbe3
 -----END PGP SIGNATURE-----

Merge 4.14.36 into android-4.14

Changes in 4.14.36
	tty: make n_tty_read() always abort if hangup is in progress
	cpufreq: CPPC: Use transition_delay_us depending transition_latency
	ubifs: Check ubifs_wbuf_sync() return code
	ubi: fastmap: Don't flush fastmap work on detach
	ubi: Fix error for write access
	ubi: Reject MLC NAND
	mm/ksm.c: fix inconsistent accounting of zero pages
	mm/hmm: fix header file if/else/endif maze
	mm/hmm: hmm_pfns_bad() was accessing wrong struct
	task_struct: only use anon struct under randstruct plugin
	fs/reiserfs/journal.c: add missing resierfs_warning() arg
	resource: fix integer overflow at reallocation
	ipc/shm: fix use-after-free of shm file via remap_file_pages()
	mm, slab: reschedule cache_reap() on the same CPU
	usb: musb: gadget: misplaced out of bounds check
	phy: allwinner: sun4i-usb: poll vbus changes on A23/A33 when driving VBUS
	usb: gadget: udc: core: update usb_ep_queue() documentation
	ARM64: dts: meson: reduce odroid-c2 eMMC maximum rate
	KVM: arm/arm64: vgic-its: Fix potential overrun in vgic_copy_lpi_list
	ARM: dts: da850-lego-ev3: Fix battery voltage gpio
	ARM: EXYNOS: Fix coupled CPU idle freeze on Exynos4210
	arm: dts: mt7623: fix USB initialization fails on bananapi-r2
	ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property
	ARM: dts: exynos: Fix IOMMU support for GScaler devices on Exynos5250
	ARM: dts: at91: sama5d4: fix pinctrl compatible string
	spi: atmel: init FIFOs before spi enable
	spi: Fix scatterlist elements size in spi_map_buf
	spi: Fix unregistration of controller with fixed SPI bus number
	media: atomisp_fops.c: disable atomisp_compat_ioctl32
	media: vivid: check if the cec_adapter is valid
	media: vsp1: Fix BRx conditional path in WPF
	x86/xen: Delay get_cpu_cap until stack canary is established
	xen-netfront: Fix hang on device removal
	regmap: Fix reversed bounds check in regmap_raw_write()
	ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E
	ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status()
	USB: gadget: f_midi: fixing a possible double-free in f_midi
	USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
	usb: dwc3: prevent setting PRTCAP to OTG from debugfs
	usb: dwc3: pci: Properly cleanup resource
	usb: dwc3: gadget: never call ->complete() from ->ep_queue()
	cifs: fix memory leak in SMB2_open()
	fix smb3-encryption breakage when CONFIG_DEBUG_SG=y
	smb3: Fix root directory when server returns inode number of zero
	HID: i2c-hid: fix size check and type usage
	i2c: i801: Save register SMBSLVCMD value only once
	i2c: i801: Restore configuration at shutdown
	CIFS: refactor crypto shash/sdesc allocation&free
	CIFS: add sha512 secmech
	CIFS: fix sha512 check in cifs_crypto_secmech_release
	powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
	powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits
	powerpc/64: Call H_REGISTER_PROC_TBL when running as a HPT guest on POWER9
	powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently
	powerpc/kprobes: Fix call trace due to incorrect preempt count
	powerpc/kexec_file: Fix error code when trying to load kdump kernel
	powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
	powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
	HID: Fix hid_report_len usage
	HID: core: Fix size as type u32
	soc: mediatek: fix the mistaken pointer accessed when subdomains are added
	ASoC: ssm2602: Replace reg_default_raw with reg_default
	ASoC: topology: Fix kcontrol name string handling
	thunderbolt: Wait a bit longer for ICM to authenticate the active NVM
	thunderbolt: Serialize PCIe tunnel creation with PCI rescan
	thunderbolt: Resume control channel after hibernation image is created
	thunderbolt: Prevent crash when ICM firmware is not running
	irqchip/gic: Take lock when updating irq type
	random: use a tighter cap in credit_entropy_bits_safe()
	extcon: intel-cht-wc: Set direction and drv flags for V5 boost GPIO
	block: use 32-bit blk_status_t on Alpha
	jbd2: if the journal is aborted then don't allow update of the log tail
	ext4: shutdown should not prevent get_write_access
	ext4: eliminate sleep from shutdown ioctl
	ext4: pass -ESHUTDOWN code to jbd2 layer
	ext4: don't update checksum of new initialized bitmaps
	ext4: protect i_disksize update by i_data_sem in direct write path
	ext4: limit xattr size to INT_MAX
	ext4: fail ext4_iget for root directory if unallocated
	ext4: always initialize the crc32c checksum driver
	ext4: don't allow r/w mounts if metadata blocks overlap the superblock
	ext4: move call to ext4_error() into ext4_xattr_check_block()
	ext4: add bounds checking to ext4_xattr_find_entry()
	ext4: add extra checks to ext4_xattr_block_get()
	dm crypt: limit the number of allocated pages
	RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device
	RDMA/mlx5: Protect from NULL pointer derefence
	RDMA/rxe: Fix an out-of-bounds read
	ALSA: pcm: Fix UAF at PCM release via PCM timer access
	IB/srp: Fix srp_abort()
	IB/srp: Fix completion vector assignment algorithm
	dmaengine: at_xdmac: fix rare residue corruption
	cxl: Fix possible deadlock when processing page faults from cxllib
	tpm: self test failure should not cause suspend to fail
	libnvdimm, dimm: fix dpa reservation vs uninitialized label area
	libnvdimm, namespace: use a safe lookup for dimm device name
	nfit, address-range-scrub: fix scrub in-progress reporting
	nfit: skip region registration for incomplete control regions
	ring-buffer: Check if memory is available before allocation
	um: Compile with modern headers
	um: Use POSIX ucontext_t instead of struct ucontext
	iommu/vt-d: Fix a potential memory leak
	mmc: jz4740: Fix race condition in IRQ mask update
	mmc: tmio: Fix error handling when issuing CMD23
	PCI: Mark Broadcom HT1100 and HT2000 Root Port Extended Tags as broken
	clk: mvebu: armada-38x: add support for missing clocks
	clk: fix false-positive Wmaybe-uninitialized warning
	clk: mediatek: fix PWM clock source by adding a fixed-factor clock
	clk: bcm2835: De-assert/assert PLL reset signal when appropriate
	pwm: rcar: Fix a condition to prevent mismatch value setting to duty
	thermal: imx: Fix race condition in imx_thermal_probe()
	dt-bindings: clock: mediatek: add binding for fixed-factor clock axisel_d4
	watchdog: f71808e_wdt: Fix WD_EN register read
	vfio/pci: Virtualize Maximum Read Request Size
	ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation
	ALSA: pcm: Avoid potential races between OSS ioctls and read/write
	ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams
	ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls
	ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation
	drm/amdgpu: Add an ATPX quirk for hybrid laptop
	drm/amdgpu: Fix always_valid bos multiple LRU insertions.
	drm/amdgpu/sdma: fix mask in emit_pipeline_sync
	drm/amdgpu: Fix PCIe lane width calculation
	drm/amdgpu/si: implement get/set pcie_lanes asic callback
	drm/rockchip: Clear all interrupts before requesting the IRQ
	drm/radeon: add PX quirk for Asus K73TK
	drm/radeon: Fix PCIe lane width calculation
	ALSA: line6: Use correct endpoint type for midi output
	ALSA: rawmidi: Fix missing input substream checks in compat ioctls
	ALSA: hda - New VIA controller suppor no-snoop path
	ALSA: hda/realtek - set PINCFG_HEADSET_MIC to parse_flags
	ALSA: hda/realtek - adjust the location of one mic
	random: fix crng_ready() test
	random: use a different mixing algorithm for add_device_randomness()
	random: crng_reseed() should lock the crng instance that it is modifying
	random: add new ioctl RNDRESEEDCRNG
	HID: input: fix battery level reporting on BT mice
	HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
	HID: wacom: bluetooth: send exit report for recent Bluetooth devices
	MIPS: uaccess: Add micromips clobbers to bzero invocation
	MIPS: memset.S: EVA & fault support for small_memset
	MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
	MIPS: memset.S: Fix clobber of v1 in last_fixup
	powerpc/eeh: Fix enabling bridge MMIO windows
	powerpc/xive: Fix trying to "push" an already active pool VP
	powerpc/lib: Fix off-by-one in alternate feature patching
	udf: Fix leak of UTF-16 surrogates into encoded strings
	fanotify: fix logic of events on child
	mmc: sdhci-pci: Only do AMD tuning for HS200
	drm/i915: Correctly handle limited range YCbCr data on VLV/CHV
	jffs2_kill_sb(): deal with failed allocations
	hypfs_kill_super(): deal with failed allocations
	orangefs_kill_sb(): deal with allocation failures
	rpc_pipefs: fix double-dput()
	Don't leak MNT_INTERNAL away from internal mounts
	autofs: mount point create should honour passed in mode
	mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
	net: dsa: Discard frames from unused ports
	iwlwifi: add shared clock PHY config flag for some devices
	iwlwifi: add a bunch of new 9000 PCI IDs
	Revert "media: lirc_zilog: driver only sends LIRCCODE"
	media: staging: lirc_zilog: incorrect reference counting
	writeback: safer lock nesting
	Linux 4.14.36

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-04-24 11:28:33 +02:00
Greg Thelen
7c9b87a78a writeback: safer lock nesting
commit 2e898e4c0a3897ccd434adac5abb8330194f527b upstream.

lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
the page's memcg is undergoing move accounting, which occurs when a
process leaves its memcg for a new one that has
memory.move_charge_at_immigrate set.

unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
the given inode is switching writeback domains.  Switches occur when
enough writes are issued from a new domain.

This existing pattern is thus suspicious:
    lock_page_memcg(page);
    unlocked_inode_to_wb_begin(inode, &locked);
    ...
    unlocked_inode_to_wb_end(inode, locked);
    unlock_page_memcg(page);

If both inode switch and process memcg migration are both in-flight then
unlocked_inode_to_wb_end() will unconditionally enable interrupts while
still holding the lock_page_memcg() irq spinlock.  This suggests the
possibility of deadlock if an interrupt occurs before unlock_page_memcg().

    truncate
    __cancel_dirty_page
    lock_page_memcg
    unlocked_inode_to_wb_begin
    unlocked_inode_to_wb_end
    <interrupts mistakenly enabled>
                                    <interrupt>
                                    end_page_writeback
                                    test_clear_page_writeback
                                    lock_page_memcg
                                    <deadlock>
    unlock_page_memcg

Due to configuration limitations this deadlock is not currently possible
because we don't mix cgroup writeback (a cgroupv2 feature) and
memory.move_charge_at_immigrate (a cgroupv1 feature).

If the kernel is hacked to always claim inode switching and memcg
moving_account, then this script triggers lockup in less than a minute:

  cd /mnt/cgroup/memory
  mkdir a b
  echo 1 > a/memory.move_charge_at_immigrate
  echo 1 > b/memory.move_charge_at_immigrate
  (
    echo $BASHPID > a/cgroup.procs
    while true; do
      dd if=/dev/zero of=/mnt/big bs=1M count=256
    done
  ) &
  while true; do
    sync
  done &
  sleep 1h &
  SLEEP=$!
  while true; do
    echo $SLEEP > a/cgroup.procs
    echo $SLEEP > b/cgroup.procs
  done

The deadlock does not seem possible, so it's debatable if there's any
reason to modify the kernel.  I suggest we should to prevent future
surprises.  And Wang Long said "this deadlock occurs three times in our
environment", so there's more reason to apply this, even to stable.
Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
see "[PATCH for-4.4] writeback: safer lock nesting"
https://lkml.org/lkml/2018/4/11/146

Wang Long said "this deadlock occurs three times in our environment"

[gthelen@google.com: v4]
  Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
[akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Signed-off-by: Greg Thelen <gthelen@google.com>
Reported-by: Wang Long <wanglong19@meituan.com>
Acked-by: Wang Long <wanglong19@meituan.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>	[v4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[natechancellor: Adjust context due to lack of b93b016313b3b]
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24 09:36:39 +02:00
San Mehat
b2da450efe ANDROID: fs: block_dump: Don't display inode changes if block_dump < 2
Signed-off-by: San Mehat <san@android.com>
2017-12-18 21:11:22 +05:30
Nikolay Borisov
3e8f399da4 writeback: rework wb_[dec|inc]_stat family of functions
Currently the writeback statistics code uses a percpu counters to hold
various statistics.  Furthermore we have 2 families of functions - those
which disable local irq and those which doesn't and whose names begin
with double underscore.  However, they both end up calling
__add_wb_stats which in turn calls percpu_counter_add_batch which is
already irq-safe.

Exploiting this fact allows to eliminated the __wb_* functions since
they don't add any further protection than we already have.
Furthermore, refactor the wb_* function to call __add_wb_stat directly
without the irq-disabling dance.  This will likely result in better
runtime of code which deals with modifying the stat counters.

While at it also document why percpu_counter_add_batch is in fact
preempt and irq-safe since at least 3 people got confused.

Link: http://lkml.kernel.org/r/1498029937-27293-1-git-send-email-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:05 -07:00
Mauro Carvalho Chehab
0117d4272b fs: add a blank lines on some kernel-doc comments
Sphinx gets confused when it finds identation without a
good reason for it and without a preceding blank line:

	./fs/mpage.c:347: ERROR: Unexpected indentation.
	./fs/namei.c:4303: ERROR: Unexpected indentation.
	./fs/fs-writeback.c:2060: ERROR: Unexpected indentation.

No functional changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-05-16 08:44:10 -03:00
Tahsin Erdogan
4a3a485b1e writeback: fix memory leak in wb_queue_work()
When WB_registered flag is not set, wb_queue_work() skips queuing the
work, but does not perform the necessary clean up. In particular, if
work->auto_free is true, it should free the memory.

The leak condition can be reprouced by following these steps:

   mount /dev/sdb /mnt/sdb
   /* In qemu console: device_del sdb */
   umount /dev/sdb

Above will result in a wb_queue_work() call on an unregistered wb and
thus leak memory.

Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-13 08:27:34 -06:00
Tahsin Erdogan
bace924818 fs/fs-writeback.c: remove redundant if check
b_more_io non-empty check is already preceded by an opposite check.

Link: http://lkml.kernel.org/r/1478591249-30641-1-git-send-email-tahsin@google.com
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:08 -08:00
Konstantin Khlebnikov
51350ea0d7 mm, writeback: flush plugged IO in wakeup_flusher_threads()
I've found funny live-lock between raid10 barriers during resync and
memory controller hard limits. Inside mpage_readpages() task holds on to
its plug bio which blocks the barrier in raid10. Its memory cgroup have
no free memory thus the task goes into reclaimer but all reclaimable
pages are dirty and cannot be written because raid10 is rebuilding and
stuck on the barrier.

Common flush of such IO in schedule() never happens, because the caller
doesn't go to sleep.

Lock is 'live' because changing memory limit or killing tasks which
holds that stuck bio unblock whole progress.

That was what happened in 3.18.x but I see no difference in upstream
logic.  Theoretically this might happen even without memory cgroup.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-09 19:58:06 -06:00
Jan Kara
dc5ff2b1d6 writeback: Write dirty times for WB_SYNC_ALL writeback
Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and
queue_io() so that inodes which have only dirty timestamps are properly
written on fsync(2) and sync(2). However there are other call sites -
most notably going through write_inode_now() - which expect inode to be
clean after WB_SYNC_ALL writeback. This is not currently true as we do
not clear I_DIRTY_TIME in __writeback_single_inode() even for
WB_SYNC_ALL writeback in all the cases. This then resulted in the
following oops because bdev_write_inode() did not clean the inode and
writeback code later stumbled over a dirty inode with detached wb.

  general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
  Modules linked in:
  CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Workqueue: writeback wb_workfn (flush-11:0)
  task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000
  RIP: 0010:[<ffffffff818884d2>]  [<ffffffff818884d2>]
  locked_inode_to_wb_and_lock_list+0xa2/0x750
  RSP: 0018:ffff88006cdaf7d0  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050
  RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286
  RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000
  R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0
  R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40
  FS:  0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0
  DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
  Stack:
   ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948
   ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948
   ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e
  Call Trace:
   [<     inline     >] inode_to_wb_and_lock_list fs/fs-writeback.c:309
   [<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554
   [<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600
   [<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709
   [<     inline     >] wb_do_writeback fs/fs-writeback.c:1844
   [<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884
   [<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094
   [<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228
   [<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
   [<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392
  Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e
  00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42>
  80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3
  RIP  [<     inline     >] wb_get include/linux/backing-dev-defs.h:212
  RIP  [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750
  fs/fs-writeback.c:281
   RSP <ffff88006cdaf7d0>
  ---[ end trace 986a4d314dcb2694 ]---

Fix the problem by making sure __writeback_single_inode() writes inode
only with dirty times in WB_SYNC_ALL mode.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04 14:19:16 -06:00
Mel Gorman
11fb998986 mm: move most file-based accounting to the node
There are now a number of accounting oddities such as mapped file pages
being accounted for on the node while the total number of file pages are
accounted on the zone.  This can be coped with to some extent but it's
confusing so this patch moves the relevant file-based accounted.  Due to
throttling logic in the page allocator for reliable OOM detection, it is
still necessary to track dirty and writeback pages on a per-zone basis.

[mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
  Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Brian Foster
9a46b04f16 fs/fs-writeback.c: inode writeback list tracking tracepoints
The per-sb inode writeback list tracks inodes currently under writeback
to facilitate efficient sync processing.  In particular, it ensures that
sync only needs to walk through a list of inodes that were cleaned by
the sync.

Add a couple tracepoints to help identify when inodes are added/removed
to and from the writeback lists.  Piggyback off of the writeback
lazytime tracepoint template as it already tracks the relevant inode
information.

Link: http://lkml.kernel.org/r/1466594593-6757-3-git-send-email-bfoster@redhat.com
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
cc: Josef Bacik <jbacik@fb.com>
Cc: Holger Hoffstätte <holger.hoffstaette@applied-asynchrony.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Dave Chinner
6c60d2b574 fs/fs-writeback.c: add a new writeback list for sync
wait_sb_inodes() currently does a walk of all inodes in the filesystem
to find dirty one to wait on during sync.  This is highly inefficient
and wastes a lot of CPU when there are lots of clean cached inodes that
we don't need to wait on.

To avoid this "all inode" walk, we need to track inodes that are
currently under writeback that we need to wait for.  We do this by
adding inodes to a writeback list on the sb when the mapping is first
tagged as having pages under writeback.  wait_sb_inodes() can then walk
this list of "inodes under IO" and wait specifically just for the inodes
that the current sync(2) needs to wait for.

Define a couple helpers to add/remove an inode from the writeback list
and call them when the overall mapping is tagged for or cleared from
writeback.  Update wait_sb_inodes() to walk only the inodes under
writeback due to the sync.

With this change, filesystem sync times are significantly reduced for
fs' with largely populated inode caches and otherwise no other work to
do.  For example, on a 16xcpu 2GHz x86-64 server, 10TB XFS filesystem
with a ~10m entry inode cache, sync times are reduced from ~7.3s to less
than 0.1s when the filesystem is fully clean.

Link: http://lkml.kernel.org/r/1466594593-6757-2-git-send-email-bfoster@redhat.com
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Holger Hoffstätte <holger.hoffstaette@applied-asynchrony.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Tahsin Erdogan
7452495555 writeback: inode cgroup wb switch should not call ihold()
Asynchronous wb switching of inodes takes an additional ref count on an
inode to make sure inode remains valid until switchover is completed.

However, anyone calling ihold() must already have a ref count on inode,
but in this case inode->i_count may already be zero:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 917 at fs/inode.c:397 ihold+0x2b/0x30
CPU: 1 PID: 917 Comm: kworker/u4:5 Not tainted 4.7.0-rc2+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Workqueue: writeback wb_workfn (flush-8:16)
 0000000000000000 ffff88007ca0fb58 ffffffff805990af 0000000000000000
 0000000000000000 ffff88007ca0fb98 ffffffff80268702 0000018d000004e2
 ffff88007cef40e8 ffff88007c9b89a8 ffff880079e3a740 0000000000000003
Call Trace:
 [<ffffffff805990af>] dump_stack+0x4d/0x6e
 [<ffffffff80268702>] __warn+0xc2/0xe0
 [<ffffffff802687d8>] warn_slowpath_null+0x18/0x20
 [<ffffffff8035b4ab>] ihold+0x2b/0x30
 [<ffffffff80367ecc>] inode_switch_wbs+0x11c/0x180
 [<ffffffff80369110>] wbc_detach_inode+0x170/0x1a0
 [<ffffffff80369abc>] writeback_sb_inodes+0x21c/0x530
 [<ffffffff80369f7e>] wb_writeback+0xee/0x1e0
 [<ffffffff8036a147>] wb_workfn+0xd7/0x280
 [<ffffffff80287531>] ? try_to_wake_up+0x1b1/0x2b0
 [<ffffffff8027bb09>] process_one_work+0x129/0x300
 [<ffffffff8027be06>] worker_thread+0x126/0x480
 [<ffffffff8098cde7>] ? __schedule+0x1c7/0x561
 [<ffffffff8027bce0>] ? process_one_work+0x300/0x300
 [<ffffffff80280ff4>] kthread+0xc4/0xe0
 [<ffffffff80335578>] ? kfree+0xc8/0x100
 [<ffffffff809903cf>] ret_from_fork+0x1f/0x40
 [<ffffffff80280f30>] ? __kthread_parkme+0x70/0x70
---[ end trace aaefd2fd9f306bc4 ]---

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-30 13:58:41 -06:00
Tetsuo Handa
78ebc2f714 mm,writeback: don't use memory reserves for wb_start_writeback
When writeback operation cannot make forward progress because memory
allocation requests needed for doing I/O cannot be satisfied (e.g.
under OOM-livelock situation), we can observe flood of order-0 page
allocation failure messages caused by complete depletion of memory
reserves.

This is caused by unconditionally allocating "struct wb_writeback_work"
objects using GFP_ATOMIC from PF_MEMALLOC context.

__alloc_pages_nodemask() {
  __alloc_pages_slowpath() {
    __alloc_pages_direct_reclaim() {
      __perform_reclaim() {
        current->flags |= PF_MEMALLOC;
        try_to_free_pages() {
          do_try_to_free_pages() {
            wakeup_flusher_threads() {
              wb_start_writeback() {
                kzalloc(sizeof(*work), GFP_ATOMIC) {
                  /* ALLOC_NO_WATERMARKS via PF_MEMALLOC */
                }
              }
            }
          }
        }
        current->flags &= ~PF_MEMALLOC;
      }
    }
  }
}

Since I/O is stalling, allocating writeback requests forever shall
deplete memory reserves.  Fortunately, since wb_start_writeback() can
fall back to wb_wakeup() when allocating "struct wb_writeback_work"
failed, we don't need to allow wb_start_writeback() to use memory
reserves.

  Mem-Info:
  active_anon:289393 inactive_anon:2093 isolated_anon:29
   active_file:10838 inactive_file:113013 isolated_file:859
   unevictable:0 dirty:108531 writeback:5308 unstable:0
   slab_reclaimable:5526 slab_unreclaimable:7077
   mapped:9970 shmem:2159 pagetables:2387 bounce:0
   free:3042 free_pcp:0 free_cma:0
  Node 0 DMA free:6968kB min:44kB low:52kB high:64kB active_anon:6056kB inactive_anon:176kB active_file:712kB inactive_file:744kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:756kB writeback:0kB mapped:736kB shmem:184kB slab_reclaimable:48kB slab_unreclaimable:208kB kernel_stack:160kB pagetables:144kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9708 all_unreclaimable? yes
  lowmem_reserve[]: 0 1732 1732 1732
  Node 0 DMA32 free:5200kB min:5200kB low:6500kB high:7800kB active_anon:1151516kB inactive_anon:8196kB active_file:42640kB inactive_file:451076kB unevictable:0kB isolated(anon):116kB isolated(file):3564kB present:2080640kB managed:1775332kB mlocked:0kB dirty:433368kB writeback:21232kB mapped:39144kB shmem:8452kB slab_reclaimable:22056kB slab_unreclaimable:28100kB kernel_stack:20976kB pagetables:9404kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2701604 all_unreclaimable? no
  lowmem_reserve[]: 0 0 0 0
  Node 0 DMA: 25*4kB (UME) 16*8kB (UME) 3*16kB (UE) 5*32kB (UME) 2*64kB (UM) 2*128kB (ME) 2*256kB (ME) 1*512kB (E) 1*1024kB (E) 2*2048kB (ME) 0*4096kB = 6964kB
  Node 0 DMA32: 925*4kB (UME) 140*8kB (UME) 5*16kB (ME) 5*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5060kB
  Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
  Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
  126847 total pagecache pages
  0 pages in swap cache
  Swap cache stats: add 0, delete 0, find 0/0
  Free swap  = 0kB
  Total swap = 0kB
  524157 pages RAM
  0 pages HighMem/MovableOnly
  76348 pages reserved
  0 pages hwpoisoned
  Out of memory: Kill process 4450 (file_io.00) score 998 or sacrifice child
  Killed process 4450 (file_io.00) total-vm:4308kB, anon-rss:100kB, file-rss:1184kB, shmem-rss:0kB
  kthreadd: page allocation failure: order:0, mode:0x2200020
  file_io.00: page allocation failure: order:0, mode:0x2200020
  CPU: 0 PID: 4457 Comm: file_io.00 Not tainted 4.5.0-rc7+ #45
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
  Call Trace:
    warn_alloc_failed+0xf7/0x150
    __alloc_pages_nodemask+0x23f/0xa60
    alloc_pages_current+0x87/0x110
    new_slab+0x3a1/0x440
    ___slab_alloc+0x3cf/0x590
    __slab_alloc.isra.64+0x18/0x1d
    kmem_cache_alloc+0x11c/0x150
    wb_start_writeback+0x39/0x90
    wakeup_flusher_threads+0x7f/0xf0
    do_try_to_free_pages+0x1f9/0x410
    try_to_free_pages+0x94/0xc0
    __alloc_pages_nodemask+0x566/0xa60
    alloc_pages_current+0x87/0x110
    __page_cache_alloc+0xaf/0xc0
    pagecache_get_page+0x88/0x260
    grab_cache_page_write_begin+0x21/0x40
    xfs_vm_write_begin+0x2f/0xf0
    generic_perform_write+0xca/0x1c0
    xfs_file_buffered_aio_write+0xcc/0x1f0
    xfs_file_write_iter+0x84/0x140
    __vfs_write+0xc7/0x100
    vfs_write+0x9d/0x190
    SyS_write+0x50/0xc0
    entry_SYSCALL_64_fastpath+0x12/0x6a
  Mem-Info:
  active_anon:293335 inactive_anon:2093 isolated_anon:0
   active_file:10829 inactive_file:110045 isolated_file:32
   unevictable:0 dirty:109275 writeback:822 unstable:0
   slab_reclaimable:5489 slab_unreclaimable:10070
   mapped:9999 shmem:2159 pagetables:2420 bounce:0
   free:3 free_pcp:0 free_cma:0
  Node 0 DMA free:12kB min:44kB low:52kB high:64kB active_anon:6060kB inactive_anon:176kB active_file:708kB inactive_file:756kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:756kB writeback:0kB mapped:736kB shmem:184kB slab_reclaimable:48kB slab_unreclaimable:7160kB kernel_stack:160kB pagetables:144kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9844 all_unreclaimable? yes
  lowmem_reserve[]: 0 1732 1732 1732
  Node 0 DMA32 free:0kB min:5200kB low:6500kB high:7800kB active_anon:1167280kB inactive_anon:8196kB active_file:42608kB inactive_file:439424kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:2080640kB managed:1775332kB mlocked:0kB dirty:436344kB writeback:3288kB mapped:39260kB shmem:8452kB slab_reclaimable:21908kB slab_unreclaimable:33120kB kernel_stack:20976kB pagetables:9536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11073180 all_unreclaimable? yes
  lowmem_reserve[]: 0 0 0 0
  Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
  Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
  Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
  Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
  123086 total pagecache pages
  0 pages in swap cache
  Swap cache stats: add 0, delete 0, find 0/0
  Free swap  = 0kB
  Total swap = 0kB
  524157 pages RAM
  0 pages HighMem/MovableOnly
  76348 pages reserved
  0 pages hwpoisoned
  SLUB: Unable to allocate memory on node -1 (gfp=0x2088020)
    cache: kmalloc-64, object size: 64, buffer size: 64, default order: 0, min order: 0
    node 0: slabs: 3218, objs: 205952, free: 0
  file_io.00: page allocation failure: order:0, mode:0x2200020
  CPU: 0 PID: 4457 Comm: file_io.00 Not tainted 4.5.0-rc7+ #45

Assuming that somebody will find a better solution, let's apply this
patch for now to stop bleeding, for this problem frequently prevents me
from testing OOM livelock condition.

Link: http://lkml.kernel.org/r/20160318131136.GE7152@quack.suse.cz
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Tejun Heo
aaf2559332 writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode
When cgroup writeback is in use, there can be multiple wb's
(bdi_writeback's) per bdi and an inode may switch among them
dynamically.  In a couple places, the wrong wb was used leading to
performing operations on the wrong list under the wrong lock
corrupting the io lists.

* writeback_single_inode() was taking @wb parameter and used it to
  remove the inode from io lists if it becomes clean after writeback.
  The callers of this function were always passing in the root wb
  regardless of the actual wb that the inode was associated with,
  which could also change while writeback is in progress.

  Fix it by dropping the @wb parameter and using
  inode_to_wb_and_lock_list() to determine and lock the associated wb.

* After writeback_sb_inodes() writes out an inode, it re-locks @wb and
  inode to remove it from or move it to the right io list.  It assumes
  that the inode is still associated with @wb; however, the inode may
  have switched to another wb while writeback was in progress.

  Fix it by using inode_to_wb_and_lock_list() to determine and lock
  the associated wb after writeback is complete.  As the function
  requires the original @wb->list_lock locked for the next iteration,
  in the unlikely case where the inode has changed association, switch
  the locks.

Kudos to Tahsin for pinpointing these subtle breakages.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: d10c80955265 ("writeback: implement foreign cgroup inode bdi_writeback switching")
Link: http://lkml.kernel.org/g/CAAeU0aMYeM_39Y2+PaRvyB1nqAPYZSNngJ1eBRmrxn7gKAt2Mg@mail.gmail.com
Reported-and-diagnosed-by: Tahsin Erdogan <tahsin@google.com>
Tested-by: Tahsin Erdogan <tahsin@google.com>
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-20 09:44:20 -06:00
Tejun Heo
614a4e3773 writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list()
locked_inode_to_wb_and_lock_list() wb_get()'s the wb associated with
the target inode, unlocks inode, locks the wb's list_lock and verifies
that the inode is still associated with the wb.  To prevent the wb
going away between dropping inode lock and acquiring list_lock, the wb
is pinned while inode lock is held.  The wb reference is put right
after acquiring list_lock citing that the wb won't be dereferenced
anymore.

This isn't true.  If the inode is still associated with the wb, the
inode has reference and it's safe to return the wb; however, if inode
has been switched, the wb still needs to be unlocked which is a
dereference and can lead to use-after-free if it it races with wb
destruction.

Fix it by putting the reference after releasing list_lock.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 87e1d789bf55 ("writeback: implement [locked_]inode_to_wb_and_lock_list()")
Cc: stable@vger.kernel.org # v4.2+
Tested-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-20 09:44:18 -06:00
Tejun Heo
a1a0e23e49 writeback: flush inode cgroup wb switches instead of pinning super_block
If cgroup writeback is in use, inodes can be scheduled for
asynchronous wb switching.  Before 5ff8eaac1636 ("writeback: keep
superblock pinned during cgroup writeback association switches"), this
could race with umount leading to super_block being destroyed while
inodes are pinned for wb switching.  5ff8eaac1636 fixed it by bumping
s_active while wb switches are in flight; however, this allowed
in-flight wb switches to make umounts asynchronous when the userland
expected synchronosity - e.g. fsck immediately following umount may
fail because the device is still busy.

This patch removes the problematic super_block pinning and instead
makes generic_shutdown_super() flush in-flight wb switches.  wb
switches are now executed on a dedicated isw_wq so that they can be
flushed and isw_nr_in_flight keeps track of the number of in-flight wb
switches so that flushing can be avoided in most cases.

v2: Move cgroup_writeback_umount() further below and add MS_ACTIVE
    check in inode_switch_wbs() as Jan an Al suggested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tahsin Erdogan <tahsin@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com
Fixes: 5ff8eaac1636 ("writeback: keep superblock pinned during cgroup writeback association switches")
Cc: stable@vger.kernel.org #v4.5
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 14:42:50 -07:00
Tejun Heo
5ff8eaac16 writeback: keep superblock pinned during cgroup writeback association switches
If cgroup writeback is in use, an inode is associated with a cgroup
for writeback.  If the inode's main dirtier changes to another cgroup,
the association gets updated asynchronously.  Nothing was pinning the
superblock while such switches are in progress and superblock could go
away while async switching is pending or in progress leading to
crashes like the following.

 kernel BUG at fs/jbd2/transaction.c:319!
 invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
 CPU: 1 PID: 29158 Comm: kworker/1:10 Not tainted 4.5.0-rc3 #51
 Hardware name: Google Google, BIOS Google 01/01/2011
 Workqueue: events inode_switch_wbs_work_fn
 task: ffff880213dbbd40 ti: ffff880209264000 task.ti: ffff880209264000
 RIP: 0010:[<ffffffff803e6922>]  [<ffffffff803e6922>] start_this_handle+0x382/0x3e0
 RSP: 0018:ffff880209267c30  EFLAGS: 00010202
 ...
 Call Trace:
  [<ffffffff803e6be4>] jbd2__journal_start+0xf4/0x190
  [<ffffffff803cfc7e>] __ext4_journal_start_sb+0x4e/0x70
  [<ffffffff803b31ec>] ext4_evict_inode+0x12c/0x3d0
  [<ffffffff8035338b>] evict+0xbb/0x190
  [<ffffffff80354190>] iput+0x130/0x190
  [<ffffffff80360223>] inode_switch_wbs_work_fn+0x343/0x4c0
  [<ffffffff80279819>] process_one_work+0x129/0x300
  [<ffffffff80279b16>] worker_thread+0x126/0x480
  [<ffffffff8027ed14>] kthread+0xc4/0xe0
  [<ffffffff809771df>] ret_from_fork+0x3f/0x70

Fix it by bumping s_active while cgroup association switching is in
flight.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Tahsin Erdogan <tahsin@google.com>
Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com
Fixes: d10c80955265 ("writeback: implement foreign cgroup inode bdi_writeback switching")
Cc: stable@vger.kernel.org #v4.5+
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-16 11:34:07 -07:00
Tejun Heo
654a0dd095 cgroup, memcg, writeback: drop spurious rcu locking around mem_cgroup_css_from_page()
In earlier versions, mem_cgroup_css_from_page() could return non-root
css on a legacy hierarchy which can go away and required rcu locking;
however, the eventual version simply returns the root cgroup if memcg is
on a legacy hierarchy and thus doesn't need rcu locking around or in it.
Remove spurious rcu lockings.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Randy Dunlap
dbce03b9e3 fs/writeback.c: fix kernel-doc warnings
Fix kernel-doc warnings in fs/fs-writeback.c by moving a #define macro to
after the function's opening brace.  Also #undef this macro at the end of
the function.

  ../fs/fs-writeback.c:1984: warning: Excess function parameter 'inode' description in 'I_DIRTY_INODE'
  ../fs/fs-writeback.c:1984: warning: Excess function parameter 'flags' description in 'I_DIRTY_INODE'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Junichi Nomura
aa750fd71c mm/filemap.c: make global sync not clear error status of individual inodes
filemap_fdatawait() is a function to wait for on-going writeback to
complete but also consume and clear error status of the mapping set during
writeback.

The latter functionality is critical for applications to detect writeback
error with system calls like fsync(2)/fdatasync(2).

However filemap_fdatawait() is also used by sync(2) or FIFREEZE ioctl,
which don't check error status of individual mappings.

As a result, fsync() may not be able to detect writeback error if events
happen in the following order:

   Application                    System admin
   ----------------------------------------------------------
   write data on page cache
                                  Run sync command
                                  writeback completes with error
                                  filemap_fdatawait() clears error
   fsync returns success
   (but the data is not on disk)

This patch adds filemap_fdatawait_keep_errors() for call sites where
writeback error is not handled so that they don't clear error status.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: Fengguang Wu <fengguang.wu@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Tejun Heo
b33e18f61b fs/writeback, rcu: Don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs()
bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue()
to walk @bdi->wb_list.  To set up the initial iteration
condition, it uses list_entry_rcu() to calculate the entry
pointer corresponding to the list head; however, this isn't an
actual RCU dereference and using list_entry_rcu() for it ended
up breaking a proposed list_entry_rcu() change because it was
feeding an non-lvalue pointer into the macro.

Don't use the RCU variant for simple pointer offsetting.  Use
list_entry() instead.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Patrick Marlier <patrick.marlier@gmail.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pranith kumar <bobby.prani@gmail.com>
Link: http://lkml.kernel.org/r/20151027051939.GA19355@mtj.duckdns.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-28 13:17:30 +01:00
Tejun Heo
b817525a4a writeback: bdi_writeback iteration must not skip dying ones
bdi_for_each_wb() is used in several places to wake up or issue
writeback work items to all wb's (bdi_writeback's) on a given bdi.
The iteration is performed by walking bdi->cgwb_tree; however, the
tree only indexes wb's which are currently active.

For example, when a memcg gets associated with a different blkcg, the
old wb is removed from the tree so that the new one can be indexed.
The old wb starts dying from then on but will linger till all its
inodes are drained.  As these dying wb's may still host dirty inodes,
writeback operations which affect all wb's must include them.
bdi_for_each_wb() skipping dying wb's led to sync(2) missing and
failing to sync the inodes belonging to those wb's.

This patch adds a RCU protected @bdi->wb_list which lists all wb's
beloinging to that bdi.  wb's are added on creation and removed on
release rather than on the start of destruction.  bdi_for_each_wb()
usages are replaced with list_for_each[_continue]_rcu() iterations
over @bdi->wb_list and bdi_for_each_wb() and its helpers are removed.

v2: Updated as per Jan.  last_wb ref leak in bdi_split_work_to_wbs()
    fixed and unnecessary list head severing in cgwb_bdi_destroy()
    removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Artem Bityutskiy <dedekind1@gmail.com>
Fixes: ebe41ab0c79d ("writeback: implement bdi_for_each_wb()")
Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-12 10:31:12 -06:00
Tejun Heo
6fdf860f15 writeback: fix bdi_writeback iteration in wakeup_dirtytime_writeback()
wakeup_dirtytime_writeback() walks and wakes up all wb's of all bdi's;
unfortunately, it was always waking up bdi->wb instead of the wb being
walked.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 001fe6f617b1 ("writeback: make wakeup_dirtytime_writeback() handle multiple bdi_writeback's")
Reviewed-by: Jan Kara <jack@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-12 10:31:11 -06:00
Chris Mason
590dca3a71 fs-writeback: unplug before cond_resched in writeback_sb_inodes
Commit 505a666ee3fc ("writeback: plug writeback in wb_writeback() and
writeback_inodes_wb()") has us holding a plug during writeback_sb_inodes,
which increases the merge rate when relatively contiguous small files
are written by the filesystem.  It helps both on flash and spindles.

For an fs_mark workload creating 4K files in parallel across 8 drives,
this commit improves performance ~9% more by unplugging before calling
cond_resched().  cond_resched() doesn't trigger an implicit unplug, so
explicitly getting the IO down to the device before scheduling reduces
latencies for anyone waiting on clean pages.

It also cuts down on how often we use kblockd to unplug, which means
less work bouncing from one workqueue to another.

Many more details about how we got here:

  https://lkml.org/lkml/2015/9/11/570

Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-19 18:50:19 -07:00
Linus Torvalds
505a666ee3 writeback: plug writeback in wb_writeback() and writeback_inodes_wb()
We had to revert the pluggin in writeback_sb_inodes() because the
wb->list_lock is held, but we could easily plug at a higher level before
taking that lock, and unplug after releasing it.  This does that.

Chris will run performance numbers, just to verify that this approach is
comparable to the alternative (we could just drop and re-take the lock
around the blk_finish_plug() rather than these two commits.

I'd have preferred waiting for actual performance numbers before picking
one approach over the other, but I don't want to release rc1 with the
known "sleeping function called from invalid context" issue, so I'll
pick this cleanup version for now.  But if the numbers show that we
really want to plug just at the writeback_sb_inodes() level, and we
should just play ugly games with the spinlock, we'll switch to that.

Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-12 11:13:07 -07:00
Linus Torvalds
0ba13fd19d Revert "writeback: plug writeback at a high level"
This reverts commit d353d7587d02116b9732d5c06615aed75a4d3a47.

Doing the block layer plug/unplug inside writeback_sb_inodes() is
broken, because that function is actually called with a spinlock held:
wb->list_lock, as pointed out by Chris Mason.

Chris suggested just dropping and re-taking the spinlock around the
blk_finish_plug() call (the plgging itself can happen under the
spinlock), and that would technically work, but is just disgusting.

We do something fairly similar - but not quite as disgusting because we
at least have a better reason for it - in writeback_single_inode(), so
it's not like the caller can depend on the lock being held over the
call, but in this case there just isn't any good reason for that
"release and re-take the lock" pattern.

[ In general, we should really strive to avoid the "release and retake"
  pattern for locks, because in the general case it can easily cause
  subtle bugs when the caller caches any state around the call that
  might be invalidated by dropping the lock even just temporarily. ]

But in this case, the plugging should be easy to just move up to the
callers before the spinlock is taken, which should even improve the
effectiveness of the plug.  So there is really no good reason to play
games with locking here.

I'll send off a test-patch so that Dave Chinner can verify that that
plug movement works.  In the meantime this just reverts the problematic
commit and adds a comment to the function so that we hopefully don't
make this mistake again.

Reported-by: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11 13:26:39 -07:00
Linus Torvalds
b0a1ea51bd Merge branch 'for-4.3/blkcg' of git://git.kernel.dk/linux-block
Pull blk-cg updates from Jens Axboe:
 "A bit later in the cycle, but this has been in the block tree for a a
  while.  This is basically four patchsets from Tejun, that improve our
  buffered cgroup writeback.  It was dependent on the other cgroup
  changes, but they went in earlier in this cycle.

  Series 1 is set of 5 patches that has cgroup writeback updates:

   - bdi_writeback iteration fix which could lead to some wb's being
     skipped or repeated during e.g. sync under memory pressure.

   - Simplification of wb work wait mechanism.

   - Writeback tracepoints updated to report cgroup.

  Series 2 is is a set of updates for the CFQ cgroup writeback handling:

     cfq has always charged all async IOs to the root cgroup.  It didn't
     have much choice as writeback didn't know about cgroups and there
     was no way to tell who to blame for a given writeback IO.
     writeback finally grew support for cgroups and now tags each
     writeback IO with the appropriate cgroup to charge it against.

     This patchset updates cfq so that it follows the blkcg each bio is
     tagged with.  Async cfq_queues are now shared across cfq_group,
     which is per-cgroup, instead of per-request_queue cfq_data.  This
     makes all IOs follow the weight based IO resource distribution
     implemented by cfq.

     - Switched from GFP_ATOMIC to GFP_NOWAIT as suggested by Jeff.

     - Other misc review points addressed, acks added and rebased.

  Series 3 is the blkcg policy cleanup patches:

     This patchset contains assorted cleanups for blkcg_policy methods
     and blk[c]g_policy_data handling.

     - alloc/free added for blkg_policy_data.  exit dropped.

     - alloc/free added for blkcg_policy_data.

     - blk-throttle's async percpu allocation is replaced with direct
       allocation.

     - all methods now take blk[c]g_policy_data instead of blkcg_gq or
       blkcg.

  And finally, series 4 is a set of patches cleaning up the blkcg stats
  handling:

    blkcg's stats have always been somwhat of a mess.  This patchset
    tries to improve the situation a bit.

     - The following patches added to consolidate blkcg entry point and
       blkg creation.  This is in itself is an improvement and helps
       colllecting common stats on bio issue.

     - per-blkg stats now accounted on bio issue rather than request
       completion so that bio based and request based drivers can behave
       the same way.  The issue was spotted by Vivek.

     - cfq-iosched implements custom recursive stats and blk-throttle
       implements custom per-cpu stats.  This patchset make blkcg core
       support both by default.

     - cfq-iosched and blk-throttle keep track of the same stats
       multiple times.  Unify them"

* 'for-4.3/blkcg' of git://git.kernel.dk/linux-block: (45 commits)
  blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy
  blkcg: s/CFQ_WEIGHT_*/CFQ_WEIGHT_LEGACY_*/
  blkcg: implement interface for the unified hierarchy
  blkcg: misc preparations for unified hierarchy interface
  blkcg: separate out tg_conf_updated() from tg_set_conf()
  blkcg: move body parsing from blkg_conf_prep() to its callers
  blkcg: mark existing cftypes as legacy
  blkcg: rename subsystem name from blkio to io
  blkcg: refine error codes returned during blkcg configuration
  blkcg: remove unnecessary NULL checks from __cfqg_set_weight_device()
  blkcg: reduce stack usage of blkg_rwstat_recursive_sum()
  blkcg: remove cfqg_stats->sectors
  blkcg: move io_service_bytes and io_serviced stats into blkcg_gq
  blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq
  blkcg: make blkcg_[rw]stat per-cpu
  blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it
  blkcg: consolidate blkg creation in blkcg_bio_issue_check()
  blk-throttle: improve queue bypass handling
  blkcg: move root blkg lookup optimization from throtl_lookup_tg() to __blkg_lookup()
  blkcg: inline [__]blkg_lookup()
  ...
2015-09-10 18:56:14 -07:00
Linus Torvalds
7d9071a095 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "In this one:

   - d_move fixes (Eric Biederman)

   - UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
     handling ought to be fixed)

   - switch of sb_writers to percpu rwsem (Oleg Nesterov)

   - superblock scalability (Josef Bacik and Dave Chinner)

   - swapon(2) race fix (Hugh Dickins)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
  vfs: Test for and handle paths that are unreachable from their mnt_root
  dcache: Reduce the scope of i_lock in d_splice_alias
  dcache: Handle escaped paths in prepend_path
  mm: fix potential data race in SyS_swapon
  inode: don't softlockup when evicting inodes
  inode: rename i_wb_list to i_io_list
  sync: serialise per-superblock sync operations
  inode: convert inode_sb_list_lock to per-sb
  inode: add hlist_fake to avoid the inode hash lock in evict
  writeback: plug writeback at a high level
  change sb_writers to use percpu_rw_semaphore
  shift percpu_counter_destroy() into destroy_super_work()
  percpu-rwsem: kill CONFIG_PERCPU_RWSEM
  percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
  percpu-rwsem: introduce percpu_down_read_trylock()
  document rwsem_release() in sb_wait_write()
  fix the broken lockdep logic in __sb_start_write()
  introduce __sb_writers_{acquired,release}() helpers
  ufs_inode_get{frag,block}(): get rid of 'phys' argument
  ufs_getfrag_block(): tidy up a bit
  ...
2015-09-05 20:34:28 -07:00
Tejun Heo
006a0973ed writeback: sync_inodes_sb() must write out I_DIRTY_TIME inodes and always call wait_sb_inodes()
e79729123f63 ("writeback: don't issue wb_writeback_work if clean")
updated writeback path to avoid kicking writeback work items if there
are no inodes to be written out; unfortunately, the avoidance logic
was too aggressive and broke sync_inodes_sb().

* sync_inodes_sb() must write out I_DIRTY_TIME inodes but I_DIRTY_TIME
  inodes dont't contribute to bdi/wb_has_dirty_io() tests and were
  being skipped over.

* inodes are taken off wb->b_dirty/io/more_io lists after writeback
  starts on them.  sync_inodes_sb() skipping wait_sb_inodes() when
  bdi_has_dirty_io() breaks it by making it return while writebacks
  are in-flight.

This patch fixes the breakages by

* Removing bdi_has_dirty_io() shortcut from bdi_split_work_to_wbs().
  The callers are already testing the condition.

* Removing bdi_has_dirty_io() shortcut from sync_inodes_sb() so that
  it always calls into bdi_split_work_to_wbs() and wait_sb_inodes().

* Making bdi_split_work_to_wbs() consider the b_dirty_time list for
  WB_SYNC_ALL writebacks.

Kudos to Eryu, Dave and Jan for tracking down the issue.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: e79729123f63 ("writeback: don't issue wb_writeback_work if clean")
Link: http://lkml.kernel.org/g/20150812101204.GE17933@dhcp-13-216.nay.redhat.com
Reported-and-bisected-by: Eryu Guan <eguan@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.com>
Cc: Ted Ts'o <tytso@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-25 14:35:09 -06:00
Tejun Heo
5634cc2aa9 writeback: update writeback tracepoints to report cgroup
The following tracepoints are updated to report the cgroup used during
cgroup writeback.

* writeback_write_inode[_start]
* writeback_queue
* writeback_exec
* writeback_start
* writeback_written
* writeback_wait
* writeback_nowork
* writeback_wake_background
* wbc_writepage
* writeback_queue_io
* bdi_dirty_ratelimit
* balance_dirty_pages
* writeback_sb_inodes_requeue
* writeback_single_inode[_start]

Note that writeback_bdi_register is separated out from writeback_class
as reporting cgroup doesn't make sense to it.  Tracepoints which take
bdi are updated to take bdi_writeback instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:15 -07:00
Tejun Heo
60292bcc1b writeback: explain why @inode is allowed to be NULL for inode_congested()
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:15 -07:00
Tejun Heo
8a1270cda7 writeback: remove wb_writeback_work->single_wait/done
wb_writeback_work->single_wait/done are used for the wait mechanism
for synchronous wb_work (wb_writeback_work) items which are issued
when bdi_split_work_to_wbs() fails to allocate memory for asynchronous
wb_work items; however, there's no reason to use a separate wait
mechanism for this.  bdi_split_work_to_wbs() can simply use on-stack
fallback wb_work item and separate wb_completion to wait for it.

This patch removes wb_work->single_wait/done and the related code and
make bdi_split_work_to_wbs() use on-stack fallback wb_work and
wb_completion instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:15 -07:00
Tejun Heo
1ed8d48c57 writeback: bdi_for_each_wb() iteration is memcg ID based not blkcg
wb's (bdi_writeback's) are currently keyed by memcg ID; however, in an
earlier implementation, wb's were keyed by blkcg ID.
bdi_for_each_wb() walks bdi->cgwb_tree in the ascending ID order and
allows iterations to start from an arbitrary ID which is used to
interrupt and resume iterations.

Unfortunately, while changing wb to be keyed by memcg ID instead of
blkcg, bdi_for_each_wb() was missed and is still assuming that wb's
are keyed by blkcg ID.  This doesn't affect iterations which don't get
interrupted but bdi_split_work_to_wbs() makes use of iteration
resuming on allocation failures and thus may incorrectly skip or
repeat wb's.

Fix it by changing bdi_for_each_wb() to take memcg IDs instead of
blkcg IDs and updating bdi_split_work_to_wbs() accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:15 -07:00
Dave Chinner
c7f5408493 inode: rename i_wb_list to i_io_list
There's a small consistency problem between the inode and writeback
naming. Writeback calls the "for IO" inode queues b_io and
b_more_io, but the inode calls these the "writeback list" or
i_wb_list. This makes it hard to an new "under writeback" list to
the inode, or call it an "under IO" list on the bdi because either
way we'll have writeback on IO and IO on writeback and it'll just be
confusing. I'm getting confused just writing this!

So, rename the inode "for IO" list variable to i_io_list so we can
add a new "writeback list" in a subsequent patch.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-17 23:38:10 -04:00
Dave Chinner
e97fedb9ef sync: serialise per-superblock sync operations
When competing sync(2) calls walk the same filesystem, they need to
walk the list of inodes on the superblock to find all the inodes
that we need to wait for IO completion on. However, when multiple
wait_sb_inodes() calls do this at the same time, they contend on the
the inode_sb_list_lock and the contention causes system wide
slowdowns. In effect, concurrent sync(2) calls can take longer and
burn more CPU than if they were serialised.

Stop the worst of the contention by adding a per-sb mutex to wrap
around wait_sb_inodes() so that we only execute one sync(2) IO
completion walk per superblock superblock at a time and hence avoid
contention being triggered by concurrent sync(2) calls.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-17 18:39:47 -04:00
Dave Chinner
74278da9f7 inode: convert inode_sb_list_lock to per-sb
The process of reducing contention on per-superblock inode lists
starts with moving the locking to match the per-superblock inode
list. This takes the global lock out of the picture and reduces the
contention problems to within a single filesystem. This doesn't get
rid of contention as the locks still have global CPU scope, but it
does isolate operations on different superblocks form each other.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-17 18:39:46 -04:00