mirror of
https://github.com/rd-stuffs/msm-4.14.git
synced 2025-02-20 11:45:48 +08:00
349 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
9f5be70fa4 |
Merge android-4.14-q.154 (b7f5267) into msm-4.14
* refs/heads/tmp-b7f5267: usb: gadget: configfs: Fix missing spin_lock_init() Linux 4.14.154 kvm: x86: mmu: Recovery of shattered NX large pages kvm: Add helper function for creating VM worker threads kvm: mmu: ITLB_MULTIHIT mitigation KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active KVM: x86: add tracepoints around __direct_map and FNAME(fetch) KVM: x86: change kvm_mmu_page_get_gfn BUG_ON to WARN_ON KVM: x86: remove now unneeded hugepage gfn adjustment KVM: x86: make FNAME(fetch) and __direct_map more similar kvm: mmu: Do not release the page inside mmu_set_spte() kvm: Convert kvm_lock to a mutex kvm: x86, powerpc: do not allow clearing largepages debugfs entry Documentation: Add ITLB_MULTIHIT documentation cpu/speculation: Uninline and export CPU mitigations helpers x86/cpu: Add Tremont to the cpu vulnerability whitelist x86/bugs: Add ITLB_MULTIHIT bug infrastructure x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs x86/tsx: Add config options to set tsx=on|off|auto x86/speculation/taa: Add documentation for TSX Async Abort x86/tsx: Add "auto" option to the tsx= cmdline parameter kvm/x86: Export MDS_NO=0 to guests when TSX is enabled x86/speculation/taa: Add sysfs reporting for TSX Async Abort x86/speculation/taa: Add mitigation for TSX Async Abort x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default x86/cpu: Add a helper function x86_read_arch_cap_msr() x86/msr: Add the IA32_TSX_CTRL MSR KVM: x86: use Intel speculation bugs and features as derived in generic x86 code drm/i915/cmdparser: Fix jump whitelist clearing drm/i915/gen8+: Add RC6 CTX corruption WA drm/i915: Lower RM timeout to avoid DSI hard hangs drm/i915/cmdparser: Ignore Length operands during command matching drm/i915/cmdparser: Add support for backward jumps drm/i915/cmdparser: Use explicit goto for error paths drm/i915: Add gen9 BCS cmdparsing drm/i915: Allow parsing of unsized batches drm/i915: Support ro ppgtt mapped cmdparser shadow buffers drm/i915: Add support for mandatory cmdparsing drm/i915: Remove Master tables from cmdparser drm/i915: Disable Secure Batches for gen6+ drm/i915: Rename gen7 cmdparser tables drm/i915: Move engine->needs_cmd_parser to engine->flags drm/i915: Don't use GPU relocations prior to cmdparser stalls drm/i915: Silence smatch for cmdparser drm/i915/cmdparser: Do not check past the cmd length. drm/i915/cmdparser: Check reg_table_count before derefencing. drm/i915: Prevent writing into a read-only object via a GGTT mmap drm/i915/gtt: Disable read-only support under GVT drm/i915/gtt: Read-only pages for insert_entries on bdw+ drm/i915/gtt: Add read only pages to gen8_pte_encode net: prevent load/store tearing on sk->sk_stamp usbip: Fix free of unallocated memory in vhci tx cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead mm/filemap.c: don't initiate writeback if mapping has no dirty pages can: flexcan: disable completely the ECC mechanism x86/apic/32: Avoid bogus LDR warnings x86/apic: Drop logical_smp_processor_id() inline x86/apic: Move pending interrupt check code into it's own function e1000: fix memory leaks igb: Fix constant media auto sense switching when no cable is connected net: ethernet: arc: add the missed clk_disable_unprepare NFSv4: Don't allow a cached open with a revoked delegation hv_netvsc: Fix error handling in netvsc_attach() net: hisilicon: Fix "Trying to free already-free IRQ" fjes: Handle workqueue allocation failure scsi: qla2xxx: stop timer in shutdown path RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case USB: ldusb: use unsigned size format specifiers USB: Skip endpoints with 0 maxpacket length perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h) perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity usb: dwc3: remove the call trace of USBx_GFLADJ usb: gadget: configfs: fix concurrent issue between composite APIs usb: gadget: composite: Fix possible double free memory bug usb: gadget: udc: atmel: Fix interrupt storm in FIFO mode. usb: fsl: Check memory resource before releasing it macsec: fix refcnt leak in module exit routine bonding: fix unexpected IFF_BONDING bit unset ipvs: move old_secure_tcp into struct netns_ipvs ipvs: don't ignore errors in case refcounting ip_vs module fails scsi: qla2xxx: Initialized mailbox to prevent driver load failure scsi: lpfc: Honor module parameter lpfc_use_adisc net: openvswitch: free vport unless register_netdevice() succeeds RDMA/uverbs: Prevent potential underflow scsi: qla2xxx: fixup incorrect usage of host_byte net/mlx5: prevent memory leak in mlx5_fpga_conn_create_cq RDMA/qedr: Fix reported firmware version HID: intel-ish-hid: fix wrong error handling in ishtp_cl_alloc_tx_ring() dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config PCI: tegra: Enable Relaxed Ordering only for Tegra20 & Tegra30 usbip: Implement SG support to vhci-hcd and stub driver usbip: stub_rx: fix static checker warning on unnecessary checks usbip: Fix vhci_urb_enqueue() URB null transfer buffer error path lib/scatterlist: Introduce sgl_alloc() and sgl_free() sched/fair: Fix -Wunused-but-set-variable warnings sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices ARM: dts: dra7: Disable USB metastability workaround for USB2 cpufreq: ti-cpufreq: add missing of_node_put() i2c: omap: Trigger bus recovery in lockup case ASoC: davinci-mcasp: Fix an error handling path in 'davinci_mcasp_probe()' ASoC: davinci: Kill BUG_ON() usage ASoC: davinci-mcasp: Handle return value of devm_kasprintf ASoC: tlv320dac31xx: mark expected switch fall-through mailbox: reset txdone_method TXDONE_BY_POLL if client knows_txdone misc: pci_endpoint_test: Fix BUG_ON error during pci_disable_msi() PCI: dra7xx: Add shutdown handler to cleanly turn off clocks misc: pci_endpoint_test: Prevent some integer overflows mtd: spi-nor: cadence-quadspi: add a delay in write sequence mtd: spi-nor: enable 4B opcodes for mx66l51235l ASoC: tlv320aic31xx: Handle inverted BCLK in non-DSP modes mfd: palmas: Assign the right powerhold mask for tps65917 usb: dwc3: Allow disabling of metastability workaround configfs: fix a deadlock in configfs_symlink() configfs: provide exclusion between IO and removals configfs: new object reprsenting tree fragments configfs_register_group() shouldn't be (and isn't) called in rmdirable parts configfs: stash the data we need into configfs_buffer at open time configfs: Fix bool initialization/comparison can: peak_usb: fix slab info leak can: mcba_usb: fix use-after-free on disconnect can: gs_usb: gs_can_open(): prevent memory leak can: rx-offload: can_rx_offload_queue_sorted(): fix error handling, avoid skb mem leak can: peak_usb: fix a potential out-of-sync while decoding packets can: c_can: c_can_poll(): only read status register after status IRQ can: usb_8dev: fix use-after-free on disconnect intel_th: pci: Add Jasper Lake PCH support intel_th: pci: Add Comet Lake PCH support netfilter: ipset: Fix an error code in ip_set_sockfn_get() netfilter: nf_tables: Align nft_expr private data to 64-bit iio: srf04: fix wrong limitation in distance measuring iio: imu: adis16480: make sure provided frequency is positive iio: adc: stm32-adc: fix stopping dma ceph: add missing check in d_revalidate snapdir handling ceph: fix use-after-free in __ceph_remove_cap() arm64: Do not mask out PTE_RDONLY in pte_same() HID: wacom: generic: Treat serial number and related fields as unsigned drm/radeon: fix si_enable_smc_cac() failed issue perf tools: Fix time sorting tools: gpio: Use !building_out_of_srctree to determine srctree dump_stack: avoid the livelock of the dump_lock mm, vmstat: hide /proc/pagetypeinfo from normal users mm: thp: handle page cache THP correctly in PageTransCompoundMap ALSA: hda/ca0132 - Fix possible workqueue stall ALSA: bebob: fix to detect configured source of sampling clock for Focusrite Saffire Pro i/o series ALSA: timer: Fix incorrectly assigned timer instance qede: fix NULL pointer deref in __qede_remove() NFC: st21nfca: fix double free nfc: netlink: fix double device reference drop NFC: fdp: fix incorrect free object net: usb: qmi_wwan: add support for DW5821e with eSIM support net: qualcomm: rmnet: Fix potential UAF when unregistering net: fix data-race in neigh_event_send() net: ethernet: octeon_mgmt: Account for second possible VLAN header ipv4: Fix table id reference in fib_sync_down_addr CDC-NCM: handle incomplete transfer of MTU bonding: fix state transition issue in link monitoring Conflicts: Documentation/devicetree/bindings/usb/dwc3.txt drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c drivers/usb/dwc3/core.h kernel/cpu.c Change-Id: I81b1613324c238a6e612cf1c6cf2c67ca17b4adc Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org> |
||
|
b7f526714d |
This is the 4.14.154 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl3K950ACgkQONu9yGCS aT7RzBAAujUlFkoT646z8Zi6najTF9/Bcpkt48h0QbGAlp44OvwzqTv0Vk9cp6hy WdCLIvH5SE2MzokMzIoanr4p2JKTBNk1FxCUYdXgGfETNbraplNRQ9yxyWo2G7Ic koWhEwQO2u7bbalD+kxbfG6w3yQN1mpTzUFzTS5lsK2IMD4WrNk0V2CNEZAPNBf/ QBig/z8ixbEHRvp7uFI/v8W/QyimQ1uVxs9nsVsAlgsQA+hUnywRXaYgeYRjEw+s x7g3FCxuvjzNEpboijSTXiUqbY33HCnSoRB+GC4IzdDQ6MBJ8blC96+jBpQM2Vhf 1nHqMB6G4BRVuC4dI/XQ3KxGxsVGEwgzuQags4mMHgXeudVoYQgntjpJlJyu5h5Y yQ4NZSRiKDY27AO1tnst2jjQwNDGF8dS4FPB/KpCNGjOlzFb351q34Lqj8fEoCt6 O3822FFjA5OhnrlqgRwIuf2qN8po+iJWR2Dg3jZXizs/eO2PBZ1h5tUNDB03MQEr Y43DeRogWCFoaYTET198qBw6pJGNYbmKR2H0LAlGpyJsGkfsSHQVks+KT5EaiCCL 96hRGZ5V1vs/hdqAB1rDMG7AVXQC51zOK5aMjvJ0jzEjRCO2fulOHH0nfaQAGhaD JhZc8eYouLj1p1+3K7yHzYZSWdzDH11182a7aHA5wd8ZTBonr0A= =qwqY -----END PGP SIGNATURE----- Merge 4.14.154 into android-4.14-q Changes in 4.14.154 bonding: fix state transition issue in link monitoring CDC-NCM: handle incomplete transfer of MTU ipv4: Fix table id reference in fib_sync_down_addr net: ethernet: octeon_mgmt: Account for second possible VLAN header net: fix data-race in neigh_event_send() net: qualcomm: rmnet: Fix potential UAF when unregistering net: usb: qmi_wwan: add support for DW5821e with eSIM support NFC: fdp: fix incorrect free object nfc: netlink: fix double device reference drop NFC: st21nfca: fix double free qede: fix NULL pointer deref in __qede_remove() ALSA: timer: Fix incorrectly assigned timer instance ALSA: bebob: fix to detect configured source of sampling clock for Focusrite Saffire Pro i/o series ALSA: hda/ca0132 - Fix possible workqueue stall mm: thp: handle page cache THP correctly in PageTransCompoundMap mm, vmstat: hide /proc/pagetypeinfo from normal users dump_stack: avoid the livelock of the dump_lock tools: gpio: Use !building_out_of_srctree to determine srctree perf tools: Fix time sorting drm/radeon: fix si_enable_smc_cac() failed issue HID: wacom: generic: Treat serial number and related fields as unsigned arm64: Do not mask out PTE_RDONLY in pte_same() ceph: fix use-after-free in __ceph_remove_cap() ceph: add missing check in d_revalidate snapdir handling iio: adc: stm32-adc: fix stopping dma iio: imu: adis16480: make sure provided frequency is positive iio: srf04: fix wrong limitation in distance measuring netfilter: nf_tables: Align nft_expr private data to 64-bit netfilter: ipset: Fix an error code in ip_set_sockfn_get() intel_th: pci: Add Comet Lake PCH support intel_th: pci: Add Jasper Lake PCH support can: usb_8dev: fix use-after-free on disconnect can: c_can: c_can_poll(): only read status register after status IRQ can: peak_usb: fix a potential out-of-sync while decoding packets can: rx-offload: can_rx_offload_queue_sorted(): fix error handling, avoid skb mem leak can: gs_usb: gs_can_open(): prevent memory leak can: mcba_usb: fix use-after-free on disconnect can: peak_usb: fix slab info leak configfs: Fix bool initialization/comparison configfs: stash the data we need into configfs_buffer at open time configfs_register_group() shouldn't be (and isn't) called in rmdirable parts configfs: new object reprsenting tree fragments configfs: provide exclusion between IO and removals configfs: fix a deadlock in configfs_symlink() usb: dwc3: Allow disabling of metastability workaround mfd: palmas: Assign the right powerhold mask for tps65917 ASoC: tlv320aic31xx: Handle inverted BCLK in non-DSP modes mtd: spi-nor: enable 4B opcodes for mx66l51235l mtd: spi-nor: cadence-quadspi: add a delay in write sequence misc: pci_endpoint_test: Prevent some integer overflows PCI: dra7xx: Add shutdown handler to cleanly turn off clocks misc: pci_endpoint_test: Fix BUG_ON error during pci_disable_msi() mailbox: reset txdone_method TXDONE_BY_POLL if client knows_txdone ASoC: tlv320dac31xx: mark expected switch fall-through ASoC: davinci-mcasp: Handle return value of devm_kasprintf ASoC: davinci: Kill BUG_ON() usage ASoC: davinci-mcasp: Fix an error handling path in 'davinci_mcasp_probe()' i2c: omap: Trigger bus recovery in lockup case cpufreq: ti-cpufreq: add missing of_node_put() ARM: dts: dra7: Disable USB metastability workaround for USB2 sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices sched/fair: Fix -Wunused-but-set-variable warnings lib/scatterlist: Introduce sgl_alloc() and sgl_free() usbip: Fix vhci_urb_enqueue() URB null transfer buffer error path usbip: stub_rx: fix static checker warning on unnecessary checks usbip: Implement SG support to vhci-hcd and stub driver PCI: tegra: Enable Relaxed Ordering only for Tegra20 & Tegra30 dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config HID: intel-ish-hid: fix wrong error handling in ishtp_cl_alloc_tx_ring() RDMA/qedr: Fix reported firmware version net/mlx5: prevent memory leak in mlx5_fpga_conn_create_cq scsi: qla2xxx: fixup incorrect usage of host_byte RDMA/uverbs: Prevent potential underflow net: openvswitch: free vport unless register_netdevice() succeeds scsi: lpfc: Honor module parameter lpfc_use_adisc scsi: qla2xxx: Initialized mailbox to prevent driver load failure ipvs: don't ignore errors in case refcounting ip_vs module fails ipvs: move old_secure_tcp into struct netns_ipvs bonding: fix unexpected IFF_BONDING bit unset macsec: fix refcnt leak in module exit routine usb: fsl: Check memory resource before releasing it usb: gadget: udc: atmel: Fix interrupt storm in FIFO mode. usb: gadget: composite: Fix possible double free memory bug usb: gadget: configfs: fix concurrent issue between composite APIs usb: dwc3: remove the call trace of USBx_GFLADJ perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h) USB: Skip endpoints with 0 maxpacket length USB: ldusb: use unsigned size format specifiers RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case scsi: qla2xxx: stop timer in shutdown path fjes: Handle workqueue allocation failure net: hisilicon: Fix "Trying to free already-free IRQ" hv_netvsc: Fix error handling in netvsc_attach() NFSv4: Don't allow a cached open with a revoked delegation net: ethernet: arc: add the missed clk_disable_unprepare igb: Fix constant media auto sense switching when no cable is connected e1000: fix memory leaks x86/apic: Move pending interrupt check code into it's own function x86/apic: Drop logical_smp_processor_id() inline x86/apic/32: Avoid bogus LDR warnings can: flexcan: disable completely the ECC mechanism mm/filemap.c: don't initiate writeback if mapping has no dirty pages cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead usbip: Fix free of unallocated memory in vhci tx net: prevent load/store tearing on sk->sk_stamp drm/i915/gtt: Add read only pages to gen8_pte_encode drm/i915/gtt: Read-only pages for insert_entries on bdw+ drm/i915/gtt: Disable read-only support under GVT drm/i915: Prevent writing into a read-only object via a GGTT mmap drm/i915/cmdparser: Check reg_table_count before derefencing. drm/i915/cmdparser: Do not check past the cmd length. drm/i915: Silence smatch for cmdparser drm/i915: Don't use GPU relocations prior to cmdparser stalls drm/i915: Move engine->needs_cmd_parser to engine->flags drm/i915: Rename gen7 cmdparser tables drm/i915: Disable Secure Batches for gen6+ drm/i915: Remove Master tables from cmdparser drm/i915: Add support for mandatory cmdparsing drm/i915: Support ro ppgtt mapped cmdparser shadow buffers drm/i915: Allow parsing of unsized batches drm/i915: Add gen9 BCS cmdparsing drm/i915/cmdparser: Use explicit goto for error paths drm/i915/cmdparser: Add support for backward jumps drm/i915/cmdparser: Ignore Length operands during command matching drm/i915: Lower RM timeout to avoid DSI hard hangs drm/i915/gen8+: Add RC6 CTX corruption WA drm/i915/cmdparser: Fix jump whitelist clearing KVM: x86: use Intel speculation bugs and features as derived in generic x86 code x86/msr: Add the IA32_TSX_CTRL MSR x86/cpu: Add a helper function x86_read_arch_cap_msr() x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default x86/speculation/taa: Add mitigation for TSX Async Abort x86/speculation/taa: Add sysfs reporting for TSX Async Abort kvm/x86: Export MDS_NO=0 to guests when TSX is enabled x86/tsx: Add "auto" option to the tsx= cmdline parameter x86/speculation/taa: Add documentation for TSX Async Abort x86/tsx: Add config options to set tsx=on|off|auto x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs x86/bugs: Add ITLB_MULTIHIT bug infrastructure x86/cpu: Add Tremont to the cpu vulnerability whitelist cpu/speculation: Uninline and export CPU mitigations helpers Documentation: Add ITLB_MULTIHIT documentation kvm: x86, powerpc: do not allow clearing largepages debugfs entry kvm: Convert kvm_lock to a mutex kvm: mmu: Do not release the page inside mmu_set_spte() KVM: x86: make FNAME(fetch) and __direct_map more similar KVM: x86: remove now unneeded hugepage gfn adjustment KVM: x86: change kvm_mmu_page_get_gfn BUG_ON to WARN_ON KVM: x86: add tracepoints around __direct_map and FNAME(fetch) KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active kvm: mmu: ITLB_MULTIHIT mitigation kvm: Add helper function for creating VM worker threads kvm: x86: mmu: Recovery of shattered NX large pages Linux 4.14.154 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
d527ab46d1 |
cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
commit 65de03e251382306a4575b1779c57c87889eee49 upstream. cgroup writeback tries to refresh the associated wb immediately if the current wb is dead. This is to avoid keeping issuing IOs on the stale wb after memcg - blkcg association has changed (ie. when blkcg got disabled / enabled higher up in the hierarchy). Unfortunately, the logic gets triggered spuriously on inodes which are associated with dead cgroups. When the logic is triggered on dead cgroups, the attempt fails only after doing quite a bit of work allocating and initializing a new wb. While c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping has no dirty pages") alleviated the issue significantly as it now only triggers when the inode has dirty pages. However, the condition can still be triggered before the inode is switched to a different cgroup and the logic simply doesn't make sense. Skip the immediate switching if the associated memcg is dying. This is a simplified version of the following two patches: * https://lore.kernel.org/linux-mm/20190513183053.GA73423@dennisz-mbp/ * http://lkml.kernel.org/r/156355839560.2063.5265687291430814589.stgit@buzz Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: e8a7abf5a5bd ("writeback: disassociate inodes from dying bdi_writebacks") Acked-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
0f5052ff8d |
writeback: Fix NULL pointer dereference issue with inode->i_wb
There is a potential race between evict() and __mark_inode_dirty() observed with EXT4 FS. The ext4/jbd2 context is marking the inode as dirty while at the same time evict() is happening on another core. Due to this race, __mark_inode_dirty() is adding the inode to bdi_writeback IO dirty list, after evict() path has removed this inode from the writeback list. This results NULL pointer dereference issue when accessing inode->i_wb from writeback thread as below - locked_inode_to_wb_and_lock_list+0x2c/0x2a0 inode_to_wb_and_lock_list+0x24/0x30 writeback_sb_inodes+0x2cc/0x444 __writeback_inodes_wb+0x78/0xbc wb_writeback+0x1c8/0x3d0 wb_workfn+0x1a8/0x4d4 process_one_work+0x1c0/0x3d4 worker_thread+0x224/0x344 kthread+0x120/0x130 ret_from_fork+0x10/0x18 Fix this by checking for the inode->i_state I_WILL_FREE or I_FREEING before and after adding to the writeback list. If it is observed to be in freeing path before adding, then simply return. If it is after adding to the writeback list, then remove this inode from the writeback list in __mark_inode_dirty() context itself. Change-Id: I1e5f15d81dae01f0dd7a108b1861adc83105ed7b Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> |
||
|
1391d3b9b2 |
This is the 4.14.135 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1BJxsACgkQONu9yGCS aT4wBxAAymuWVXtmeWFQSFNji/RAJcHBAOvydIRMr7vwCXpojuRerNolo7WibM/B Mgx2OISn0d8rg98Cc3wiM6WUN9AeHr3lSWXORg3iBr0zP+ZO5Vs0Y2w9gueEJS+i egMvi2KZyS3Esrfmxv62pJ9DIVqyPVlvzN/Y79BARcwIeZOt+puycR5XV3WROzX9 Wy2JBz5f56m9qzPGKXGRLlvq7LghZ5EbyFoIb/fj9K6pFdVBrpSEOeocCQos9IEz 0+1TiWAkqOGLGZWJ3CFW/6Nbn1JO3hZpIgqxVczZXR+4UVhR+yniHUzZ20g89DzE mmprjKGv/8/7pXyXtGhjXuaZN5r1ldUje5SZf1X7SzxLuABSKIHykYJjKUQY2O3b 8tpPULGA77V7Ww4TtyRLeOVPqaVslWFgLP6snyileSdoxfISebo2KptQn0pmuFX2 Y0ePPot/aHHXmhrn5mAY9UZO9etqko8LjvVHDOsQQ99GJJ1BAz73w+wkKDtHXGuo iqUlSSW2YpThnAkufUlyhk10y6itGmy0P7GSrw8PCd9As2/LAz6c9+8+NPp/2P2Z Ffl2q7eUCqb0HixAnq5KqcPDSVdyqVtQ7XeN3lAEWVGmwpiu2xyuZgpQyT5FRqOZ mLYHZJF7FEZOZo+hkbH4O6j3umJ0QFJakVwrEiQ/ha0yLZpS3OM= =u0hP -----END PGP SIGNATURE----- Merge 4.14.135 into android-4.14-q Changes in 4.14.135 MIPS: ath79: fix ar933x uart parity mode MIPS: fix build on non-linux hosts arm64/efi: Mark __efistub_stext_offset as an absolute symbol explicitly scsi: iscsi: set auth_protocol back to NULL if CHAP_A value is not supported dmaengine: imx-sdma: fix use-after-free on probe error path wil6210: fix potential out-of-bounds read ath10k: Do not send probe response template for mesh ath9k: Check for errors when reading SREV register ath6kl: add some bounds checking ath: DFS JP domain W56 fixed pulse type 3 RADAR detection batman-adv: fix for leaked TVLV handler. media: dvb: usb: fix use after free in dvb_usb_device_exit media: spi: IR LED: add missing of table registration crypto: talitos - fix skcipher failure due to wrong output IV media: marvell-ccic: fix DMA s/g desc number calculation media: vpss: fix a potential NULL pointer dereference media: media_device_enum_links32: clean a reserved field net: stmmac: dwmac1000: Clear unused address entries net: stmmac: dwmac4/5: Clear unused address entries qed: Set the doorbell address correctly signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig af_key: fix leaks in key_pol_get_resp and dump_sp. xfrm: Fix xfrm sel prefix length validation fscrypt: clean up some BUG_ON()s in block encryption/decryption media: mc-device.c: don't memset __user pointer contents media: staging: media: davinci_vpfe: - Fix for memory leak if decoder initialization fails. net: phy: Check against net_device being NULL crypto: talitos - properly handle split ICV. crypto: talitos - Align SEC1 accesses to 32 bits boundaries. tua6100: Avoid build warnings. locking/lockdep: Fix merging of hlocks with non-zero references media: wl128x: Fix some error handling in fm_v4l2_init_video_device() cpupower : frequency-set -r option misses the last cpu in related cpu list net: stmmac: dwmac4: fix flow control issue net: fec: Do not use netdev messages too early net: axienet: Fix race condition causing TX hang s390/qdio: handle PENDING state for QEBSM devices RAS/CEC: Fix pfn insertion net: sfp: add mutex to prevent concurrent state checks ipset: Fix memory accounting for hash types on resize perf cs-etm: Properly set the value of 'old' and 'head' in snapshot mode perf test 6: Fix missing kvm module load for s390 media: fdp1: Support M3N and E3 platforms iommu: Fix a leak in iommu_insert_resv_region gpio: omap: fix lack of irqstatus_raw0 for OMAP4 gpio: omap: ensure irq is enabled before wakeup regmap: fix bulk writes on paged registers bpf: silence warning messages in core rcu: Force inlining of rcu_read_lock() x86/cpufeatures: Add FDP_EXCPTN_ONLY and ZERO_FCS_FDS blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration xfrm: fix sa selector validation sched/core: Add __sched tag for io_schedule() x86/atomic: Fix smp_mb__{before,after}_atomic() perf evsel: Make perf_evsel__name() accept a NULL argument vhost_net: disable zerocopy by default ipoib: correcly show a VF hardware address EDAC/sysfs: Fix memory leak when creating a csrow object ipsec: select crypto ciphers for xfrm_algo ipvs: defer hook registration to avoid leaks media: s5p-mfc: Make additional clocks optional media: i2c: fix warning same module names ntp: Limit TAI-UTC offset timer_list: Guard procfs specific code acpi/arm64: ignore 5.1 FADTs that are reported as 5.0 media: coda: fix mpeg2 sequence number handling media: coda: fix last buffer handling in V4L2_ENC_CMD_STOP media: coda: increment sequence offset for the last returned frame media: vimc: cap: check v4l2_fill_pixfmt return value media: hdpvr: fix locking and a missing msleep rtlwifi: rtl8192cu: fix error handle when usb probe failed mt7601u: do not schedule rx_tasklet when the device has been disconnected x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c mt7601u: fix possible memory leak when the device is disconnected ipvs: fix tinfo memory leak in start_sync_thread ath10k: add missing error handling ath10k: fix PCIE device wake up failed perf tools: Increase MAX_NR_CPUS and MAX_CACHES libata: don't request sense data on !ZAC ATA devices clocksource/drivers/exynos_mct: Increase priority over ARM arch timer rslib: Fix decoding of shortened codes rslib: Fix handling of of caller provided syndrome ixgbe: Check DDM existence in transceiver before access crypto: serpent - mark __serpent_setkey_sbox noinline crypto: asymmetric_keys - select CRYPTO_HASH where needed EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msec bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush() net: hns3: fix a -Wformat-nonliteral compile warning net: hns3: add some error checking in hclge_tm module ath10k: destroy sdio workqueue while remove sdio module iwlwifi: mvm: Drop large non sta frames perf stat: Make metric event lookup more robust net: usb: asix: init MAC address buffers gpiolib: Fix references to gpiod_[gs]et_*value_cansleep() variants Bluetooth: hci_bcsp: Fix memory leak in rx_skb Bluetooth: 6lowpan: search for destination address in all peers Bluetooth: Check state in l2cap_disconnect_rsp gtp: add missing gtp_encap_disable_sock() in gtp_encap_enable() Bluetooth: validate BLE connection interval updates gtp: fix suspicious RCU usage gtp: fix Illegal context switch in RCU read-side critical section. gtp: fix use-after-free in gtp_encap_destroy() gtp: fix use-after-free in gtp_newlink() net: mvmdio: defer probe of orion-mdio if a clock is not ready iavf: fix dereference of null rx_buffer pointer floppy: fix div-by-zero in setup_format_params floppy: fix out-of-bounds read in next_valid_format floppy: fix invalid pointer dereference in drive_name floppy: fix out-of-bounds read in copy_buffer xen: let alloc_xenballooned_pages() fail if not enough memory free scsi: NCR5380: Reduce goto statements in NCR5380_select() scsi: NCR5380: Always re-enable reselection interrupt Revert "scsi: ncr5380: Increase register polling limit" scsi: core: Fix race on creating sense cache scsi: megaraid_sas: Fix calculation of target ID scsi: mac_scsi: Increase PIO/PDMA transfer length threshold scsi: mac_scsi: Fix pseudo DMA implementation, take 2 crypto: ghash - fix unaligned memory access in ghash_setkey() crypto: ccp - Validate the the error value used to index error messages crypto: arm64/sha1-ce - correct digest for empty data in finup crypto: arm64/sha2-ce - correct digest for empty data in finup crypto: chacha20poly1305 - fix atomic sleep when using async algorithm crypto: ccp - memset structure fields to zero before reuse crypto: ccp/gcm - use const time tag comparison. crypto: crypto4xx - fix a potential double free in ppc4xx_trng_probe Input: gtco - bounds check collection indent level Input: alps - don't handle ALPS cs19 trackpoint-only device Input: synaptics - whitelist Lenovo T580 SMBus intertouch Input: alps - fix a mismatch between a condition check and its comment regulator: s2mps11: Fix buck7 and buck8 wrong voltages arm64: tegra: Update Jetson TX1 GPU regulator timings iwlwifi: pcie: don't service an interrupt that was masked iwlwifi: pcie: fix ALIVE interrupt handling for gen2 devices w/o MSI-X NFSv4: Handle the special Linux file open access mode pnfs/flexfiles: Fix PTR_ERR() dereferences in ff_layout_track_ds_error lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE ASoC: dapm: Adapt for debugfs API change ALSA: seq: Break too long mutex context in the write loop ALSA: hda/realtek: apply ALC891 headset fixup to one Dell machine media: v4l2: Test type instead of cfg->type in v4l2_ctrl_new_custom() media: coda: Remove unbalanced and unneeded mutex unlock KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed arm64: tegra: Fix AGIC register range fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes. drm/nouveau/i2c: Enable i2c pads & busses during preinit padata: use smp_mb in padata_reorder to avoid orphaned padata jobs dm zoned: fix zone state management race xen/events: fix binding user event channels to cpus 9p/xen: Add cleanup path in p9_trans_xen_init 9p/virtio: Add cleanup path in p9_virtio_init x86/boot: Fix memory leak in default_get_smp_config() perf/x86/amd/uncore: Do not set 'ThreadMask' and 'SliceMask' for non-L3 PMCs perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs intel_th: pci: Add Ice Lake NNPI support PCI: Do not poll for PME if the device is in D3cold Btrfs: fix data loss after inode eviction, renaming it, and fsync it Btrfs: fix fsync not persisting dentry deletions due to inode evictions Btrfs: add missing inode version, ctime and mtime updates when punching hole HID: wacom: generic: only switch the mode on devices with LEDs HID: wacom: correct touch resolution x/y typo libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields coda: pass the host file in vma->vm_file on mmap gpu: ipu-v3: ipu-ic: Fix saturation bit offset in TPMEM PCI: hv: Fix a use-after-free bug in hv_eject_device_work() crypto: caam - limit output IV to CBC to work around CTR mode DMA issue parisc: Ensure userspace privilege for ptraced processes in regset functions parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1 powerpc/32s: fix suspend/resume when IBATs 4-7 are used powerpc/watchpoint: Restore NV GPRs while returning from exception eCryptfs: fix a couple type promotion bugs intel_th: msu: Fix single mode with disabled IOMMU Bluetooth: Add SMP workaround Microsoft Surface Precision Mouse bug usb: Handle USB3 remote wakeup for LPM enabled devices correctly net: mvmdio: allow up to four clocks to be specified for orion-mdio dt-bindings: allow up to four clocks for orion-mdio dm bufio: fix deadlock with loop device compiler.h, kasan: Avoid duplicating __read_once_size_nocheck() compiler.h: Add read_word_at_a_time() function. lib/strscpy: Shut up KASAN false-positives in strscpy() bnx2x: Prevent load reordering in tx completion processing bnx2x: Prevent ptp_task to be rescheduled indefinitely caif-hsi: fix possible deadlock in cfhsi_exit_module() igmp: fix memory leak in igmpv3_del_delrec() ipv4: don't set IPv6 only flags to IPv4 addresses net: bcmgenet: use promisc for unsupported filters net: dsa: mv88e6xxx: wait after reset deactivation net: neigh: fix multiple neigh timer scheduling net: openvswitch: fix csum updates for MPLS actions nfc: fix potential illegal memory access rxrpc: Fix send on a connected, but unbound socket sky2: Disable MSI on ASUS P6T vrf: make sure skb->data contains ip header to make routing macsec: fix use-after-free of skb during RX macsec: fix checksumming after decryption netrom: fix a memory leak in nr_rx_frame() netrom: hold sock when setting skb->destructor bonding: validate ip header before check IPPROTO_IGMP net: make skb_dst_force return true when dst is refcounted tcp: fix tcp_set_congestion_control() use from bpf hook tcp: Reset bytes_acked and bytes_received when disconnecting net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query net: bridge: stp: don't cache eth dest pointer before skb pull dma-buf: balance refcount inbalance dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc MIPS: lb60: Fix pin mappings ext4: don't allow any modifications to an immutable file ext4: enforce the immutable flag on open files mm: add filemap_fdatawait_range_keep_errors() jbd2: introduce jbd2_inode dirty range scoping ext4: use jbd2_inode dirty range scoping ext4: allow directory holes mm: vmscan: scan anonymous pages on file refaults perf/events/amd/uncore: Fix amd_uncore_llc ID to use pre-defined cpu_llc_id NFSv4: Fix open create exclusive when the server reboots nfsd: increase DRC cache limit nfsd: give out fewer session slots as limit approaches nfsd: fix performance-limiting session calculation nfsd: Fix overflow causing non-working mounts on 1 TB machines hvsock: fix epollout hang from race condition drm/panel: simple: Fix panel_simple_dsi_probe usb: core: hub: Disable hub-initiated U1/U2 tty: max310x: Fix invalid baudrate divisors calculator pinctrl: rockchip: fix leaked of_node references tty: serial: cpm_uart - fix init when SMC is relocated drm/edid: Fix a missing-check bug in drm_load_edid_firmware() PCI: Return error if cannot probe VF drm/bridge: tc358767: read display_props in get_modes() drm/bridge: sii902x: pixel clock unit is 10kHz instead of 1kHz drm/crc-debugfs: User irqsafe spinlock in drm_crtc_add_crc_entry memstick: Fix error cleanup path of memstick_init tty/serial: digicolor: Fix digicolor-usart already registered warning tty: serial: msm_serial: avoid system lockup condition serial: 8250: Fix TX interrupt handling condition drm/virtio: Add memory barriers for capset cache. phy: renesas: rcar-gen2: Fix memory leak at error paths powerpc/pseries/mobility: prevent cpu hotplug during DT update drm/rockchip: Properly adjust to a true clock in adjusted_mode tty: serial_core: Set port active bit in uart_port_activate usb: gadget: Zero ffs_io_data powerpc/pci/of: Fix OF flags parsing for 64bit BARs drm/msm: Depopulate platform on probe failure serial: mctrl_gpio: Check if GPIO property exisits before requesting it PCI: sysfs: Ignore lockdep for remove attribute kbuild: Add -Werror=unknown-warning-option to CLANG_FLAGS PCI: xilinx-nwl: Fix Multi MSI data programming iio: iio-utils: Fix possible incorrect mask calculation powerpc/xmon: Fix disabling tracing while in xmon recordmcount: Fix spurious mcount entries on powerpc mfd: core: Set fwnode for created devices mfd: arizona: Fix undefined behavior mfd: hi655x-pmic: Fix missing return value check for devm_regmap_init_mmio_clk um: Silence lockdep complaint about mmap_sem powerpc/4xx/uic: clear pending interrupt after irq type/pol change RDMA/i40iw: Set queue pair state when being queried serial: sh-sci: Terminate TX DMA during buffer flushing serial: sh-sci: Fix TX DMA buffer flushing and workqueue races kallsyms: exclude kasan local symbols on s390 perf test mmap-thread-lookup: Initialize variable to suppress memory sanitizer warning perf session: Fix potential NULL pointer dereference found by the smatch tool perf annotate: Fix dereferencing freed memory found by the smatch tool RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM PCI: dwc: pci-dra7xx: Fix compilation when !CONFIG_GPIOLIB powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h f2fs: avoid out-of-range memory access mailbox: handle failed named mailbox channel request powerpc/eeh: Handle hugepages in ioremap space block/bio-integrity: fix a memory leak bug sh: prevent warnings when using iounmap mm/kmemleak.c: fix check for softirq context 9p: pass the correct prototype to read_cache_page mm/gup.c: mark undo_dev_pagemap as __maybe_unused mm/gup.c: remove some BUG_ONs from get_gate_page() mm/mmu_notifier: use hlist_add_head_rcu() locking/lockdep: Fix lock used or unused stats error locking/lockdep: Hide unused 'class' variable drm/crc: Only report a single overflow when a CRC fd is opened drm/crc-debugfs: Also sprinkle irqrestore over early exits usb: wusbcore: fix unbalanced get/put cluster_id usb: pci-quirks: Correct AMD PLL quirk detection KVM: nVMX: do not use dangling shadow VMCS after guest reset btrfs: inode: Don't compress if NODATASUM or NODATACOW set x86/sysfb_efi: Add quirks for some devices with swapped width and height x86/speculation/mds: Apply more accurate check on hypervisor platform binder: prevent transactions to context manager from its own process. fpga-manager: altera-ps-spi: Fix build error hpet: Fix division by zero in hpet_time_div() ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1 ALSA: hda - Add a conexant codec entry to let mute led work powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask() powerpc/tm: Fix oops on sigreturn on systems without TM access: avoid the RCU grace period for the temporary subjective credentials Linux 4.14.135 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
fa9900ef8f |
blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration
[ Upstream commit 6631142229005e1b1c311a09efe9fb3cfdac8559 ] wbc_account_io() collects information on cgroup ownership of writeback pages to determine which cgroup should own the inode. Pages can stay associated with dead memcgs but we want to avoid attributing IOs to dead blkcgs as much as possible as the association is likely to be stale. However, currently, pages associated with dead memcgs contribute to the accounting delaying and/or confusing the arbitration. Fix it by ignoring pages associated with dead memcgs. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
817de622e8 |
This is the 4.14.121 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzkLE0ACgkQONu9yGCS aT4ZkhAAoiWU5PcwwGtVonrlrtUlwFF14ZyDOMh1hkK7EWHWE+8r3In4BnUz6nW/ wA5IZvIClDPn0go44uhGdZG078q7EqBnY2nhTEseG7XYTWSkIOaMsbcW8f5NFkCC Z9x7KKIIDRGt/uVNKXgIk5nqmFP7ycNgxUfq3bxMkproFLgeHmFihG43YC0O62b2 nAF/q8OVpONqU9zPGwdVoBY+LQIfhsJi04Raoexr4+UFkvoUZF5zDKl6QZVPCXXT ETi7CXqntfFt92S6Y4rQfZe883oYFfWzi7GFhNL/oU4TMYDG+J8/PBS4rG3nosSp Lk81SCmTkAaOhG0rBvdkZFthHibGk3+kKuGWehvAhb5qFEJx+znsbwTVWIPTchAc axxfHOpW1X2rfrPnH/hkHb5unuJTfolquBmmy2D1Glv46LvI19rn1xgHtyGlb5dt 84Gh8Bew372LkUeG7+CCsCKOuMu/8YuvAZ3DMntwGPo7GAnC052MqcpdyV+pj78z 2y7mO8g9BVizaf5NkoZrf58KuSZDTLf1TfTRKHQVvTuxhzrnt/UIUF/BQmY216kd pEFp1Qq3zAwTaQgCV6s1ZWGHVidFIPQo7xtFND7MIZQYaZfFZivYS3AVdlox1KGd k2Rsb/Ub2R/KRrfMdjgIkNbEzauOS9miQTjwMr7zt2AsZJFXDwQ= =CsUM -----END PGP SIGNATURE----- Merge 4.14.121 into android-4.14-q Changes in 4.14.121 net: core: another layer of lists, around PF_MEMALLOC skb handling locking/rwsem: Prevent decrement of reader count before increment PCI: hv: Fix a memory leak in hv_eject_device_work() PCI: hv: Add hv_pci_remove_slots() when we unload the driver PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary x86/speculation/mds: Revert CPU buffer clear on double fault exit x86/speculation/mds: Improve CPU buffer clear documentation objtool: Fix function fallthrough detection ARM: dts: exynos: Fix interrupt for shared EINTs on Exynos5260 ARM: dts: exynos: Fix audio (microphone) routing on Odroid XU3 ARM: exynos: Fix a leaked reference by adding missing of_node_put power: supply: axp288_charger: Fix unchecked return value arm64: compat: Reduce address limit arm64: Clear OSDLR_EL1 on CPU boot arm64: Save and restore OSDLR_EL1 across suspend/resume sched/x86: Save [ER]FLAGS on context switch crypto: chacha20poly1305 - set cra_name correctly crypto: vmx - fix copy-paste error in CTR mode crypto: skcipher - don't WARN on unprocessed data after slow walk step crypto: crct10dif-generic - fix use via crypto_shash_digest() crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest() crypto: gcm - fix incompatibility between "gcm" and "gcm_base" crypto: rockchip - update IV buffer to contain the next IV crypto: arm/aes-neonbs - don't access already-freed walk.iv ALSA: usb-audio: Fix a memory leak bug ALSA: hda/hdmi - Read the pin sense from register when repolling ALSA: hda/hdmi - Consider eld_valid when reporting jack event ALSA: hda/realtek - EAPD turn on later ASoC: max98090: Fix restore of DAPM Muxes ASoC: RT5677-SPI: Disable 16Bit SPI Transfers bpf, arm64: remove prefetch insn in xadd mapping mm/mincore.c: make mincore() more conservative ocfs2: fix ocfs2 read inode data panic in ocfs2_iget userfaultfd: use RCU to free the task struct when fork fails mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L mfd: max77620: Fix swapped FPS_PERIOD_MAX_US values mtd: spi-nor: intel-spi: Avoid crossing 4K address boundary on read/write tty: vt.c: Fix TIOCL_BLANKSCREEN console blanking if blankinterval == 0 tty/vt: fix write/write race in ioctl(KDSKBSENT) handler jbd2: check superblock mapped prior to committing ext4: make sanity check in mballoc more strict ext4: ignore e_value_offs for xattrs with value-in-ea-inode ext4: avoid drop reference to iloc.bh twice Btrfs: do not start a transaction during fiemap Btrfs: do not start a transaction at iterate_extent_inodes() bcache: fix a race between cache register and cacheset unregister bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim() ext4: fix use-after-free race with debug_want_extra_isize ext4: actually request zeroing of inode table after grow ext4: fix ext4_show_options for file systems w/o journal ipmi:ssif: compare block number correctly for multi-part return messages crypto: arm64/aes-neonbs - don't access already-freed walk.iv crypto: salsa20 - don't access already-freed walk.iv crypto: ccm - fix incompatibility between "ccm" and "ccm_base" fib_rules: fix error in backport of e9919a24d302 ("fib_rules: return 0...") fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount ext4: zero out the unused memory region in the extent tree block ext4: fix data corruption caused by overlapping unaligned and aligned IO ext4: fix use-after-free in dx_release() ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes iov_iter: optimize page_copy_sane() ext4: fix compile error when using BUFFER_TRACE Linux 4.14.121 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
f658822cd3 |
fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
commit ec084de929e419e51bcdafaafe567d9e7d0273b7 upstream. synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb switch may not go to the workqueue after synchronize_rcu(). Thus previous scheduled switches was not finished even flushing the workqueue, which will cause a NULL pointer dereferenced followed below. VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds. Have a nice day... BUG: unable to handle kernel NULL pointer dereference at 0000000000000278 evict+0xb3/0x180 iput+0x1b0/0x230 inode_switch_wbs_work_fn+0x3c0/0x6a0 worker_thread+0x4e/0x490 ? process_one_work+0x410/0x410 kthread+0xe6/0x100 ret_from_fork+0x39/0x50 Replace the synchronize_rcu() call with a rcu_barrier() to wait for all pending callbacks to finish. And inc isw_nr_in_flight after call_rcu() in inode_switch_wbs() to make more sense. Link: http://lkml.kernel.org/r/20190429024108.54150-1-jiufei.xue@linux.alibaba.com Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Suggested-by: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
9ba09a2171 |
This is the 4.14.105 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlx+qpsACgkQONu9yGCS aT6+Aw//WiHXgsyk+jnwuOzJ0rvx38ky6l7RRm6uZ5r2M+aYQJ12FEc+AVomyu6T lA7ciLeIWNQFO0xQJ5Tg8LMPumL4JbA/EVElPu5h5yHMXySkQuC6WLXnH083d+dZ QbXEk6O9xuIxbk2sy58BmQnmqdXipTnpNIXpT/137Rlz1/jzxyv8PpsihitFI9/C 3HrMHgSodKEOnZbWdruYv30Ac6nMrWolGmGaQiRszE7FfoFLyGl3KEX7kp6sMAKU P4L+9qUDswRVs4Zfa07XiNlXAmZUYDzZPkQVstsgKOmXNHtBrSLE94pYllsHqW8C GDbtW5+ZpUm89LObhlZWPcxLXQkOUJUOb2SJChtaPUmOT0rEoExTB0bH1SADQM5d Mb3jLGD9CVrCjgxB4J6871xeCZPIL1XIXEKs0lDz7YDOSF7WTYXTrQOAiA+PZLlR VcD7WO7Z25XYP15WcU4ypzswsLPGro3QQi/nzk9qyh/NQt65LlpW/USZ3hiyp/Sh u4Wh2rrvAWyrOi01ShRmn87S9X/IuzB9cuzthxM7rWrF+/uj+vHOQp+/BAEFkkub njis6oFbkB2xfykRA99w1tLe6to3qNcWFtJuOBxjVFdbB24pqIWYmj4dwyE0TQCD 7QLfDCXx3vBDAg30Qbxx4Bh2w0ruggtHWa+D24Qz4YehJv1ppNY= =Ofi4 -----END PGP SIGNATURE----- Merge 4.14.105 into android-4.14 Changes in 4.14.105 Revert "loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()" Revert "loop: Get rid of loop_index_mutex" Revert "loop: Fold __loop_release into loop_release" net: stmmac: Fix reception of Broadcom switches tags net: stmmac: Disable ACS Feature for GMAC >= 4 scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached drm/msm: Unblock writer if reader closes file ASoC: Intel: Haswell/Broadwell: fix setting for .dynamic field ALSA: compress: prevent potential divide by zero bugs ASoC: Variable "val" in function rt274_i2c_probe() could be uninitialized clk: vc5: Abort clock configuration without upstream clock thermal: int340x_thermal: Fix a NULL vs IS_ERR() check usb: dwc3: gadget: synchronize_irq dwc irq in suspend usb: dwc3: gadget: Fix the uninitialized link_state when udc starts usb: gadget: Potential NULL dereference on allocation error genirq: Make sure the initial affinity is not empty ASoC: dapm: change snprintf to scnprintf for possible overflow ASoC: imx-audmux: change snprintf to scnprintf for possible overflow selftests: seccomp: use LDLIBS instead of LDFLAGS selftests: gpio-mockup-chardev: Check asprintf() for error ARC: fix __ffs return value to avoid build warnings drivers: thermal: int340x_thermal: Fix sysfs race condition staging: rtl8723bs: Fix build error with Clang when inlining is disabled mac80211: fix miscounting of ttl-dropped frames sched/wait: Fix rcuwait_wake_up() ordering futex: Fix (possible) missed wakeup locking/rwsem: Fix (possible) missed wakeup drm/amd/powerplay: OD setting fix on Vega10 serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling staging: android: ion: Support cpu access during dma_buf_detach direct-io: allow direct writes to empty inodes writeback: synchronize sync(2) against cgroup writeback membership switches scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state() net: altera_tse: fix connect_local_phy error path hv_netvsc: Fix ethtool change hash key error net: usb: asix: ax88772_bind return error when hw_reset fail net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP ibmveth: Do not process frames after calling napi_reschedule mac80211: don't initiate TDLS connection if station is not associated to AP mac80211: Add attribute aligned(2) to struct 'action' cfg80211: extend range deviation for DMG svm: Fix AVIC incomplete IPI emulation KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1 powerpc: Always initialize input array when calling epapr_hypercall() mmc: spi: Fix card detection during probe mmc: tmio_mmc_core: don't claim spurious interrupts mmc: tmio: fix access width of Block Count Register mmc: sdhci-esdhc-imx: correct the fix of ERR004536 mm: enforce min addr even if capable() in expand_downwards() MIPS: fix truncation in __cmpxchg_small for short values MIPS: eBPF: Fix icache flush end address x86/uaccess: Don't leak the AC flag into __put_user() value evaluation Linux 4.14.105 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
494c4399ef |
writeback: synchronize sync(2) against cgroup writeback membership switches
[ Upstream commit 7fc5854f8c6efae9e7624970ab49a1eac2faefb1 ] sync_inodes_sb() can race against cgwb (cgroup writeback) membership switches and fail to writeback some inodes. For example, if an inode switches to another wb while sync_inodes_sb() is in progress, the new wb might not be visible to bdi_split_work_to_wbs() at all or the inode might jump from a wb which hasn't issued writebacks yet to one which already has. This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb switch path against sync_inodes_sb() so that sync_inodes_sb() is guaranteed to see all the target wbs and inodes can't jump wbs to escape syncing. v2: Fixed misplaced rwsem init. Spotted by Jiufei. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jiufei Xue <xuejiufei@gmail.com> Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
04f740d4da |
This is the 4.14.41 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlr753gACgkQONu9yGCS aT7p/Q//TIC9EKe21E2Lb1Kh4lL5SDjmwe/rkA3PxiqxbkXfUDBehMCfDk4YVNVG TlH1TXOubzpS/8cZJPRFHEkrYXPKIA3+hKlAvJukUJCBQqmW1ILEAX5m7jrSmf+B tLe/r0ijOtlfB1xQdUs5RxXGIndw0gMGhpo/QTXPAC0hGh0Ykd8v2s4YAjxOvdKw z4DaUKtZGEPBWFVK/Bx1Fv3iAmJMt2yerERUqz8MVegYXJt+2RUGoJtsxHuvOk1p 9q0lzHBWYihQVt1tJ0es/8cB7WsYt8txnVmeN907sryUhDjvTWIxQJb5jEV0gxxK AL89PHy4Hfki6l6r+tqYi92frFda8aLfsaSseOhlmqsv0MlwngW2dx3UbjaYd4If IQA6n0hWHuxUvjrjsPpsMAa4lvTW+/kFilb0mD6Vixy3ru+/RelKnuawJm6kbMNu Cb8QSVSJrhvC/UZLvwO7a3viJdKoI5B9pTh5FTKcY5wUPI1k01pg3WlWNxmnv4ZJ LPImR06aoJYhvbutf94AvxbCOt/au8sY4s/yk9oHgvGUEIccrGYf3BwX6ciWRt4b r4ZN92C9ZuD+u/ATFgi/akngtjjixw5YrZ20aX86dYcBZ25hYOiIMoc482tYQ12Z 1vqyvKg9o1oMypG9orF09PWstbNRu3ihGATKdXL9lfAhDklOTKc= =zWTK -----END PGP SIGNATURE----- Merge 4.14.41 into android-4.14 Changes in 4.14.41 ipvs: fix rtnl_lock lockups caused by start_sync_thread netfilter: ebtables: don't attempt to allocate 0-sized compat array kcm: Call strp_stop before strp_done in kcm_attach crypto: af_alg - fix possible uninit-value in alg_bind() netlink: fix uninit-value in netlink_sendmsg net: fix rtnh_ok() net: initialize skb->peeked when cloning net: fix uninit-value in __hw_addr_add_ex() dccp: initialize ireq->ir_mark ipv4: fix uninit-value in ip_route_output_key_hash_rcu() soreuseport: initialise timewait reuseport field inetpeer: fix uninit-value in inet_getpeer memcg: fix per_node_info cleanup perf: Remove superfluous allocation error check tcp: fix TCP_REPAIR_QUEUE bound checking bdi: wake up concurrent wb_shutdown() callers. bdi: Fix oops in wb_workfn() KVM: PPC: Book3S HV: Fix trap number return from __kvmppc_vcore_entry KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing arm64: Add work around for Arm Cortex-A55 Erratum 1024718 compat: fix 4-byte infoleak via uninitialized struct field gpioib: do not free unrequested descriptors gpio: fix aspeed_gpio unmask irq gpio: fix error path in lineevent_create rfkill: gpio: fix memory leak in probe error path libata: Apply NOLPM quirk for SanDisk SD7UB3Q*G1001 SSDs dm integrity: use kvfree for kvmalloc'd memory tracing: Fix regex_match_front() to not over compare the test string z3fold: fix reclaim lock-ups mm: sections are not offlined during memory hotremove mm, oom: fix concurrent munlock and oom reaper unmap, v3 ceph: fix rsize/wsize capping in ceph_direct_read_write() can: kvaser_usb: Increase correct stats counter in kvaser_usb_rx_can_msg() can: hi311x: Acquire SPI lock on ->do_get_berr_counter can: hi311x: Work around TX complete interrupt erratum drm/vc4: Fix scaling of uni-planar formats drm/i915: Fix drm:intel_enable_lvds ERROR message in kernel log drm/nouveau: Fix deadlock in nv50_mstm_register_connector() drm/atomic: Clean old_state/new_state in drm_atomic_state_default_clear() drm/atomic: Clean private obj old_state/new_state in drm_atomic_state_default_clear() net: atm: Fix potential Spectre v1 atm: zatm: Fix potential Spectre v1 PCI / PM: Always check PME wakeup capability for runtime wakeup support PCI / PM: Check device_may_wakeup() in pci_enable_wake() cpufreq: schedutil: Avoid using invalid next_freq Revert "Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174" Bluetooth: btusb: Add Dell XPS 13 9360 to btusb_needs_reset_resume_table Bluetooth: btusb: Only check needs_reset_resume DMI table for QCA rome chipsets thermal: exynos: Reading temperature makes sense only when TMU is turned on thermal: exynos: Propagate error value from tmu_read() nvme: add quirk to force medium priority for SQ creation smb3: directory sync should not return an error sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[] tracing/uprobe_event: Fix strncpy corner case perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_* perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[] perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map() KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler KVM: x86: remove APIC Timer periodic/oneshot spikes Linux 4.14.41 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
683b4520d0 |
bdi: Fix oops in wb_workfn()
commit b8b784958eccbf8f51ebeee65282ca3fd59ea391 upstream. Syzbot has reported that it can hit a NULL pointer dereference in wb_workfn() due to wb->bdi->dev being NULL. This indicates that wb_workfn() was called for an already unregistered bdi which should not happen as wb_shutdown() called from bdi_unregister() should make sure all pending writeback works are completed before bdi is unregistered. Except that wb_workfn() itself can requeue the work with: mod_delayed_work(bdi_wq, &wb->dwork, 0); and if this happens while wb_shutdown() is waiting in: flush_delayed_work(&wb->dwork); the dwork can get executed after wb_shutdown() has finished and bdi_unregister() has cleared wb->bdi->dev. Make wb_workfn() use wakeup_wb() for requeueing the work which takes all the necessary precautions against racing with bdi unregistration. CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> CC: Tejun Heo <tj@kernel.org> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977 Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
e9a2c5dd1a |
This is the 4.14.36 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlre3ogACgkQONu9yGCS aT4a8Q//aR1U1nYUiiMwMTgxvWXR5Hic3jtnAxdOkpr6UNa5dDa1tijI5U9poKJW 65EPPQNW29PxIv0UGhLRzdjpL/ac2QhMyW8gmS8ikXMbFPF2JrgvOLSZpWF70cE1 hyFkzbnvavJe0QfWsii7Z+RdrSZMgfZheZMmLh1exv1tEmuYcfAiletdC8f5kTPU /aS5X9rmJM/Fyw4iQF7NEpYPY4vESsgMd7ZfHifcV07ze6f+lkW+gcKZuVi//eJ1 NJEvSBjSvqbQoHugHvHbV/UM2RwzFFfihm6y94WOurSbToksJ141P/MEBxc9vDae rCA8Qwq3YZ8vPu5rb8L1UHlpR+CIuanSJnijBhC2Lh6W4CmVA70+lvveqMbZGi/X Tm9+QlV4F32ogOy+rNvFARoNx7KkWvjZ8kF2a/qgbkqQgPCwSku4anW3abXLQad+ 4hYbqAwunq0V1Zi4XoIAjcQWlAokau4jDxfKbpoO7CBUYoia+1vDoK4U1FHsFy77 E4w7LktCecfoqieoBzsD5mZfTG5qrzwNhoxnnZmRGZY81TW9swVZYPkfqamG/Cbk 7HkgOLvtQiwtY5dxsLHvMwbtXzQqxO10KuLBAao6OY9xLEAqamV1v9gGO1WyOzRd avVUShDL6FQHTRalzcm8K9OLUhOZWDcZLR9XgNwfgxZYjAlfqCA= =Gbe3 -----END PGP SIGNATURE----- Merge 4.14.36 into android-4.14 Changes in 4.14.36 tty: make n_tty_read() always abort if hangup is in progress cpufreq: CPPC: Use transition_delay_us depending transition_latency ubifs: Check ubifs_wbuf_sync() return code ubi: fastmap: Don't flush fastmap work on detach ubi: Fix error for write access ubi: Reject MLC NAND mm/ksm.c: fix inconsistent accounting of zero pages mm/hmm: fix header file if/else/endif maze mm/hmm: hmm_pfns_bad() was accessing wrong struct task_struct: only use anon struct under randstruct plugin fs/reiserfs/journal.c: add missing resierfs_warning() arg resource: fix integer overflow at reallocation ipc/shm: fix use-after-free of shm file via remap_file_pages() mm, slab: reschedule cache_reap() on the same CPU usb: musb: gadget: misplaced out of bounds check phy: allwinner: sun4i-usb: poll vbus changes on A23/A33 when driving VBUS usb: gadget: udc: core: update usb_ep_queue() documentation ARM64: dts: meson: reduce odroid-c2 eMMC maximum rate KVM: arm/arm64: vgic-its: Fix potential overrun in vgic_copy_lpi_list ARM: dts: da850-lego-ev3: Fix battery voltage gpio ARM: EXYNOS: Fix coupled CPU idle freeze on Exynos4210 arm: dts: mt7623: fix USB initialization fails on bananapi-r2 ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property ARM: dts: exynos: Fix IOMMU support for GScaler devices on Exynos5250 ARM: dts: at91: sama5d4: fix pinctrl compatible string spi: atmel: init FIFOs before spi enable spi: Fix scatterlist elements size in spi_map_buf spi: Fix unregistration of controller with fixed SPI bus number media: atomisp_fops.c: disable atomisp_compat_ioctl32 media: vivid: check if the cec_adapter is valid media: vsp1: Fix BRx conditional path in WPF x86/xen: Delay get_cpu_cap until stack canary is established xen-netfront: Fix hang on device removal regmap: Fix reversed bounds check in regmap_raw_write() ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status() USB: gadget: f_midi: fixing a possible double-free in f_midi USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw usb: dwc3: prevent setting PRTCAP to OTG from debugfs usb: dwc3: pci: Properly cleanup resource usb: dwc3: gadget: never call ->complete() from ->ep_queue() cifs: fix memory leak in SMB2_open() fix smb3-encryption breakage when CONFIG_DEBUG_SG=y smb3: Fix root directory when server returns inode number of zero HID: i2c-hid: fix size check and type usage i2c: i801: Save register SMBSLVCMD value only once i2c: i801: Restore configuration at shutdown CIFS: refactor crypto shash/sdesc allocation&free CIFS: add sha512 secmech CIFS: fix sha512 check in cifs_crypto_secmech_release powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write() powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits powerpc/64: Call H_REGISTER_PROC_TBL when running as a HPT guest on POWER9 powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently powerpc/kprobes: Fix call trace due to incorrect preempt count powerpc/kexec_file: Fix error code when trying to load kdump kernel powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops HID: Fix hid_report_len usage HID: core: Fix size as type u32 soc: mediatek: fix the mistaken pointer accessed when subdomains are added ASoC: ssm2602: Replace reg_default_raw with reg_default ASoC: topology: Fix kcontrol name string handling thunderbolt: Wait a bit longer for ICM to authenticate the active NVM thunderbolt: Serialize PCIe tunnel creation with PCI rescan thunderbolt: Resume control channel after hibernation image is created thunderbolt: Prevent crash when ICM firmware is not running irqchip/gic: Take lock when updating irq type random: use a tighter cap in credit_entropy_bits_safe() extcon: intel-cht-wc: Set direction and drv flags for V5 boost GPIO block: use 32-bit blk_status_t on Alpha jbd2: if the journal is aborted then don't allow update of the log tail ext4: shutdown should not prevent get_write_access ext4: eliminate sleep from shutdown ioctl ext4: pass -ESHUTDOWN code to jbd2 layer ext4: don't update checksum of new initialized bitmaps ext4: protect i_disksize update by i_data_sem in direct write path ext4: limit xattr size to INT_MAX ext4: fail ext4_iget for root directory if unallocated ext4: always initialize the crc32c checksum driver ext4: don't allow r/w mounts if metadata blocks overlap the superblock ext4: move call to ext4_error() into ext4_xattr_check_block() ext4: add bounds checking to ext4_xattr_find_entry() ext4: add extra checks to ext4_xattr_block_get() dm crypt: limit the number of allocated pages RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device RDMA/mlx5: Protect from NULL pointer derefence RDMA/rxe: Fix an out-of-bounds read ALSA: pcm: Fix UAF at PCM release via PCM timer access IB/srp: Fix srp_abort() IB/srp: Fix completion vector assignment algorithm dmaengine: at_xdmac: fix rare residue corruption cxl: Fix possible deadlock when processing page faults from cxllib tpm: self test failure should not cause suspend to fail libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, namespace: use a safe lookup for dimm device name nfit, address-range-scrub: fix scrub in-progress reporting nfit: skip region registration for incomplete control regions ring-buffer: Check if memory is available before allocation um: Compile with modern headers um: Use POSIX ucontext_t instead of struct ucontext iommu/vt-d: Fix a potential memory leak mmc: jz4740: Fix race condition in IRQ mask update mmc: tmio: Fix error handling when issuing CMD23 PCI: Mark Broadcom HT1100 and HT2000 Root Port Extended Tags as broken clk: mvebu: armada-38x: add support for missing clocks clk: fix false-positive Wmaybe-uninitialized warning clk: mediatek: fix PWM clock source by adding a fixed-factor clock clk: bcm2835: De-assert/assert PLL reset signal when appropriate pwm: rcar: Fix a condition to prevent mismatch value setting to duty thermal: imx: Fix race condition in imx_thermal_probe() dt-bindings: clock: mediatek: add binding for fixed-factor clock axisel_d4 watchdog: f71808e_wdt: Fix WD_EN register read vfio/pci: Virtualize Maximum Read Request Size ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation ALSA: pcm: Avoid potential races between OSS ioctls and read/write ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation drm/amdgpu: Add an ATPX quirk for hybrid laptop drm/amdgpu: Fix always_valid bos multiple LRU insertions. drm/amdgpu/sdma: fix mask in emit_pipeline_sync drm/amdgpu: Fix PCIe lane width calculation drm/amdgpu/si: implement get/set pcie_lanes asic callback drm/rockchip: Clear all interrupts before requesting the IRQ drm/radeon: add PX quirk for Asus K73TK drm/radeon: Fix PCIe lane width calculation ALSA: line6: Use correct endpoint type for midi output ALSA: rawmidi: Fix missing input substream checks in compat ioctls ALSA: hda - New VIA controller suppor no-snoop path ALSA: hda/realtek - set PINCFG_HEADSET_MIC to parse_flags ALSA: hda/realtek - adjust the location of one mic random: fix crng_ready() test random: use a different mixing algorithm for add_device_randomness() random: crng_reseed() should lock the crng instance that it is modifying random: add new ioctl RNDRESEEDCRNG HID: input: fix battery level reporting on BT mice HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device HID: wacom: bluetooth: send exit report for recent Bluetooth devices MIPS: uaccess: Add micromips clobbers to bzero invocation MIPS: memset.S: EVA & fault support for small_memset MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup MIPS: memset.S: Fix clobber of v1 in last_fixup powerpc/eeh: Fix enabling bridge MMIO windows powerpc/xive: Fix trying to "push" an already active pool VP powerpc/lib: Fix off-by-one in alternate feature patching udf: Fix leak of UTF-16 surrogates into encoded strings fanotify: fix logic of events on child mmc: sdhci-pci: Only do AMD tuning for HS200 drm/i915: Correctly handle limited range YCbCr data on VLV/CHV jffs2_kill_sb(): deal with failed allocations hypfs_kill_super(): deal with failed allocations orangefs_kill_sb(): deal with allocation failures rpc_pipefs: fix double-dput() Don't leak MNT_INTERNAL away from internal mounts autofs: mount point create should honour passed in mode mm/filemap.c: fix NULL pointer in page_cache_tree_insert() net: dsa: Discard frames from unused ports iwlwifi: add shared clock PHY config flag for some devices iwlwifi: add a bunch of new 9000 PCI IDs Revert "media: lirc_zilog: driver only sends LIRCCODE" media: staging: lirc_zilog: incorrect reference counting writeback: safer lock nesting Linux 4.14.36 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
7c9b87a78a |
writeback: safer lock nesting
commit 2e898e4c0a3897ccd434adac5abb8330194f527b upstream. lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if the page's memcg is undergoing move accounting, which occurs when a process leaves its memcg for a new one that has memory.move_charge_at_immigrate set. unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if the given inode is switching writeback domains. Switches occur when enough writes are issued from a new domain. This existing pattern is thus suspicious: lock_page_memcg(page); unlocked_inode_to_wb_begin(inode, &locked); ... unlocked_inode_to_wb_end(inode, locked); unlock_page_memcg(page); If both inode switch and process memcg migration are both in-flight then unlocked_inode_to_wb_end() will unconditionally enable interrupts while still holding the lock_page_memcg() irq spinlock. This suggests the possibility of deadlock if an interrupt occurs before unlock_page_memcg(). truncate __cancel_dirty_page lock_page_memcg unlocked_inode_to_wb_begin unlocked_inode_to_wb_end <interrupts mistakenly enabled> <interrupt> end_page_writeback test_clear_page_writeback lock_page_memcg <deadlock> unlock_page_memcg Due to configuration limitations this deadlock is not currently possible because we don't mix cgroup writeback (a cgroupv2 feature) and memory.move_charge_at_immigrate (a cgroupv1 feature). If the kernel is hacked to always claim inode switching and memcg moving_account, then this script triggers lockup in less than a minute: cd /mnt/cgroup/memory mkdir a b echo 1 > a/memory.move_charge_at_immigrate echo 1 > b/memory.move_charge_at_immigrate ( echo $BASHPID > a/cgroup.procs while true; do dd if=/dev/zero of=/mnt/big bs=1M count=256 done ) & while true; do sync done & sleep 1h & SLEEP=$! while true; do echo $SLEEP > a/cgroup.procs echo $SLEEP > b/cgroup.procs done The deadlock does not seem possible, so it's debatable if there's any reason to modify the kernel. I suggest we should to prevent future surprises. And Wang Long said "this deadlock occurs three times in our environment", so there's more reason to apply this, even to stable. Stable 4.4 has minor conflicts applying this patch. For a clean 4.4 patch see "[PATCH for-4.4] writeback: safer lock nesting" https://lkml.org/lkml/2018/4/11/146 Wang Long said "this deadlock occurs three times in our environment" [gthelen@google.com: v4] Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com [akpm@linux-foundation.org: comment tweaks, struct initialization simplification] Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613 Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates") Signed-off-by: Greg Thelen <gthelen@google.com> Reported-by: Wang Long <wanglong19@meituan.com> Acked-by: Wang Long <wanglong19@meituan.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> [v4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [natechancellor: Adjust context due to lack of b93b016313b3b] Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
b2da450efe |
ANDROID: fs: block_dump: Don't display inode changes if block_dump < 2
Signed-off-by: San Mehat <san@android.com> |
||
|
3e8f399da4 |
writeback: rework wb_[dec|inc]_stat family of functions
Currently the writeback statistics code uses a percpu counters to hold various statistics. Furthermore we have 2 families of functions - those which disable local irq and those which doesn't and whose names begin with double underscore. However, they both end up calling __add_wb_stats which in turn calls percpu_counter_add_batch which is already irq-safe. Exploiting this fact allows to eliminated the __wb_* functions since they don't add any further protection than we already have. Furthermore, refactor the wb_* function to call __add_wb_stat directly without the irq-disabling dance. This will likely result in better runtime of code which deals with modifying the stat counters. While at it also document why percpu_counter_add_batch is in fact preempt and irq-safe since at least 3 people got confused. Link: http://lkml.kernel.org/r/1498029937-27293-1-git-send-email-nborisov@suse.com Signed-off-by: Nikolay Borisov <nborisov@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Josef Bacik <jbacik@fb.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jeff Layton <jlayton@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
0117d4272b |
fs: add a blank lines on some kernel-doc comments
Sphinx gets confused when it finds identation without a good reason for it and without a preceding blank line: ./fs/mpage.c:347: ERROR: Unexpected indentation. ./fs/namei.c:4303: ERROR: Unexpected indentation. ./fs/fs-writeback.c:2060: ERROR: Unexpected indentation. No functional changes. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> |
||
|
4a3a485b1e |
writeback: fix memory leak in wb_queue_work()
When WB_registered flag is not set, wb_queue_work() skips queuing the work, but does not perform the necessary clean up. In particular, if work->auto_free is true, it should free the memory. The leak condition can be reprouced by following these steps: mount /dev/sdb /mnt/sdb /* In qemu console: device_del sdb */ umount /dev/sdb Above will result in a wb_queue_work() call on an unregistered wb and thus leak memory. Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Tahsin Erdogan <tahsin@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
bace924818 |
fs/fs-writeback.c: remove redundant if check
b_more_io non-empty check is already preceded by an opposite check. Link: http://lkml.kernel.org/r/1478591249-30641-1-git-send-email-tahsin@google.com Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
51350ea0d7 |
mm, writeback: flush plugged IO in wakeup_flusher_threads()
I've found funny live-lock between raid10 barriers during resync and memory controller hard limits. Inside mpage_readpages() task holds on to its plug bio which blocks the barrier in raid10. Its memory cgroup have no free memory thus the task goes into reclaimer but all reclaimable pages are dirty and cannot be written because raid10 is rebuilding and stuck on the barrier. Common flush of such IO in schedule() never happens, because the caller doesn't go to sleep. Lock is 'live' because changing memory limit or killing tasks which holds that stuck bio unblock whole progress. That was what happened in 3.18.x but I see no difference in upstream logic. Theoretically this might happen even without memory cgroup. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
dc5ff2b1d6 |
writeback: Write dirty times for WB_SYNC_ALL writeback
Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and queue_io() so that inodes which have only dirty timestamps are properly written on fsync(2) and sync(2). However there are other call sites - most notably going through write_inode_now() - which expect inode to be clean after WB_SYNC_ALL writeback. This is not currently true as we do not clear I_DIRTY_TIME in __writeback_single_inode() even for WB_SYNC_ALL writeback in all the cases. This then resulted in the following oops because bdev_write_inode() did not clean the inode and writeback code later stumbled over a dirty inode with detached wb. general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN Modules linked in: CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: writeback wb_workfn (flush-11:0) task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000 RIP: 0010:[<ffffffff818884d2>] [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750 RSP: 0018:ffff88006cdaf7d0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050 RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286 RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000 R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0 R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40 FS: 0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0 DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Stack: ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948 ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948 ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e Call Trace: [< inline >] inode_to_wb_and_lock_list fs/fs-writeback.c:309 [<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554 [<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600 [<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709 [< inline >] wb_do_writeback fs/fs-writeback.c:1844 [<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884 [<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094 [<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228 [<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303 [<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392 Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e 00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42> 80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3 RIP [< inline >] wb_get include/linux/backing-dev-defs.h:212 RIP [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750 fs/fs-writeback.c:281 RSP <ffff88006cdaf7d0> ---[ end trace 986a4d314dcb2694 ]--- Fix the problem by making sure __writeback_single_inode() writes inode only with dirty times in WB_SYNC_ALL mode. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
11fb998986 |
mm: move most file-based accounting to the node
There are now a number of accounting oddities such as mapped file pages being accounted for on the node while the total number of file pages are accounted on the zone. This can be coped with to some extent but it's confusing so this patch moves the relevant file-based accounted. Due to throttling logic in the page allocator for reliable OOM detection, it is still necessary to track dirty and writeback pages on a per-zone basis. [mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting] Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
9a46b04f16 |
fs/fs-writeback.c: inode writeback list tracking tracepoints
The per-sb inode writeback list tracks inodes currently under writeback to facilitate efficient sync processing. In particular, it ensures that sync only needs to walk through a list of inodes that were cleaned by the sync. Add a couple tracepoints to help identify when inodes are added/removed to and from the writeback lists. Piggyback off of the writeback lazytime tracepoint template as it already tracks the relevant inode information. Link: http://lkml.kernel.org/r/1466594593-6757-3-git-send-email-bfoster@redhat.com Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> cc: Josef Bacik <jbacik@fb.com> Cc: Holger Hoffstätte <holger.hoffstaette@applied-asynchrony.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
6c60d2b574 |
fs/fs-writeback.c: add a new writeback list for sync
wait_sb_inodes() currently does a walk of all inodes in the filesystem to find dirty one to wait on during sync. This is highly inefficient and wastes a lot of CPU when there are lots of clean cached inodes that we don't need to wait on. To avoid this "all inode" walk, we need to track inodes that are currently under writeback that we need to wait for. We do this by adding inodes to a writeback list on the sb when the mapping is first tagged as having pages under writeback. wait_sb_inodes() can then walk this list of "inodes under IO" and wait specifically just for the inodes that the current sync(2) needs to wait for. Define a couple helpers to add/remove an inode from the writeback list and call them when the overall mapping is tagged for or cleared from writeback. Update wait_sb_inodes() to walk only the inodes under writeback due to the sync. With this change, filesystem sync times are significantly reduced for fs' with largely populated inode caches and otherwise no other work to do. For example, on a 16xcpu 2GHz x86-64 server, 10TB XFS filesystem with a ~10m entry inode cache, sync times are reduced from ~7.3s to less than 0.1s when the filesystem is fully clean. Link: http://lkml.kernel.org/r/1466594593-6757-2-git-send-email-bfoster@redhat.com Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Holger Hoffstätte <holger.hoffstaette@applied-asynchrony.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
7452495555 |
writeback: inode cgroup wb switch should not call ihold()
Asynchronous wb switching of inodes takes an additional ref count on an inode to make sure inode remains valid until switchover is completed. However, anyone calling ihold() must already have a ref count on inode, but in this case inode->i_count may already be zero: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 917 at fs/inode.c:397 ihold+0x2b/0x30 CPU: 1 PID: 917 Comm: kworker/u4:5 Not tainted 4.7.0-rc2+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: writeback wb_workfn (flush-8:16) 0000000000000000 ffff88007ca0fb58 ffffffff805990af 0000000000000000 0000000000000000 ffff88007ca0fb98 ffffffff80268702 0000018d000004e2 ffff88007cef40e8 ffff88007c9b89a8 ffff880079e3a740 0000000000000003 Call Trace: [<ffffffff805990af>] dump_stack+0x4d/0x6e [<ffffffff80268702>] __warn+0xc2/0xe0 [<ffffffff802687d8>] warn_slowpath_null+0x18/0x20 [<ffffffff8035b4ab>] ihold+0x2b/0x30 [<ffffffff80367ecc>] inode_switch_wbs+0x11c/0x180 [<ffffffff80369110>] wbc_detach_inode+0x170/0x1a0 [<ffffffff80369abc>] writeback_sb_inodes+0x21c/0x530 [<ffffffff80369f7e>] wb_writeback+0xee/0x1e0 [<ffffffff8036a147>] wb_workfn+0xd7/0x280 [<ffffffff80287531>] ? try_to_wake_up+0x1b1/0x2b0 [<ffffffff8027bb09>] process_one_work+0x129/0x300 [<ffffffff8027be06>] worker_thread+0x126/0x480 [<ffffffff8098cde7>] ? __schedule+0x1c7/0x561 [<ffffffff8027bce0>] ? process_one_work+0x300/0x300 [<ffffffff80280ff4>] kthread+0xc4/0xe0 [<ffffffff80335578>] ? kfree+0xc8/0x100 [<ffffffff809903cf>] ret_from_fork+0x1f/0x40 [<ffffffff80280f30>] ? __kthread_parkme+0x70/0x70 ---[ end trace aaefd2fd9f306bc4 ]--- Signed-off-by: Tahsin Erdogan <tahsin@google.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
78ebc2f714 |
mm,writeback: don't use memory reserves for wb_start_writeback
When writeback operation cannot make forward progress because memory allocation requests needed for doing I/O cannot be satisfied (e.g. under OOM-livelock situation), we can observe flood of order-0 page allocation failure messages caused by complete depletion of memory reserves. This is caused by unconditionally allocating "struct wb_writeback_work" objects using GFP_ATOMIC from PF_MEMALLOC context. __alloc_pages_nodemask() { __alloc_pages_slowpath() { __alloc_pages_direct_reclaim() { __perform_reclaim() { current->flags |= PF_MEMALLOC; try_to_free_pages() { do_try_to_free_pages() { wakeup_flusher_threads() { wb_start_writeback() { kzalloc(sizeof(*work), GFP_ATOMIC) { /* ALLOC_NO_WATERMARKS via PF_MEMALLOC */ } } } } } current->flags &= ~PF_MEMALLOC; } } } } Since I/O is stalling, allocating writeback requests forever shall deplete memory reserves. Fortunately, since wb_start_writeback() can fall back to wb_wakeup() when allocating "struct wb_writeback_work" failed, we don't need to allow wb_start_writeback() to use memory reserves. Mem-Info: active_anon:289393 inactive_anon:2093 isolated_anon:29 active_file:10838 inactive_file:113013 isolated_file:859 unevictable:0 dirty:108531 writeback:5308 unstable:0 slab_reclaimable:5526 slab_unreclaimable:7077 mapped:9970 shmem:2159 pagetables:2387 bounce:0 free:3042 free_pcp:0 free_cma:0 Node 0 DMA free:6968kB min:44kB low:52kB high:64kB active_anon:6056kB inactive_anon:176kB active_file:712kB inactive_file:744kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:756kB writeback:0kB mapped:736kB shmem:184kB slab_reclaimable:48kB slab_unreclaimable:208kB kernel_stack:160kB pagetables:144kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9708 all_unreclaimable? yes lowmem_reserve[]: 0 1732 1732 1732 Node 0 DMA32 free:5200kB min:5200kB low:6500kB high:7800kB active_anon:1151516kB inactive_anon:8196kB active_file:42640kB inactive_file:451076kB unevictable:0kB isolated(anon):116kB isolated(file):3564kB present:2080640kB managed:1775332kB mlocked:0kB dirty:433368kB writeback:21232kB mapped:39144kB shmem:8452kB slab_reclaimable:22056kB slab_unreclaimable:28100kB kernel_stack:20976kB pagetables:9404kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2701604 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 25*4kB (UME) 16*8kB (UME) 3*16kB (UE) 5*32kB (UME) 2*64kB (UM) 2*128kB (ME) 2*256kB (ME) 1*512kB (E) 1*1024kB (E) 2*2048kB (ME) 0*4096kB = 6964kB Node 0 DMA32: 925*4kB (UME) 140*8kB (UME) 5*16kB (ME) 5*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5060kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB 126847 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 524157 pages RAM 0 pages HighMem/MovableOnly 76348 pages reserved 0 pages hwpoisoned Out of memory: Kill process 4450 (file_io.00) score 998 or sacrifice child Killed process 4450 (file_io.00) total-vm:4308kB, anon-rss:100kB, file-rss:1184kB, shmem-rss:0kB kthreadd: page allocation failure: order:0, mode:0x2200020 file_io.00: page allocation failure: order:0, mode:0x2200020 CPU: 0 PID: 4457 Comm: file_io.00 Not tainted 4.5.0-rc7+ #45 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 Call Trace: warn_alloc_failed+0xf7/0x150 __alloc_pages_nodemask+0x23f/0xa60 alloc_pages_current+0x87/0x110 new_slab+0x3a1/0x440 ___slab_alloc+0x3cf/0x590 __slab_alloc.isra.64+0x18/0x1d kmem_cache_alloc+0x11c/0x150 wb_start_writeback+0x39/0x90 wakeup_flusher_threads+0x7f/0xf0 do_try_to_free_pages+0x1f9/0x410 try_to_free_pages+0x94/0xc0 __alloc_pages_nodemask+0x566/0xa60 alloc_pages_current+0x87/0x110 __page_cache_alloc+0xaf/0xc0 pagecache_get_page+0x88/0x260 grab_cache_page_write_begin+0x21/0x40 xfs_vm_write_begin+0x2f/0xf0 generic_perform_write+0xca/0x1c0 xfs_file_buffered_aio_write+0xcc/0x1f0 xfs_file_write_iter+0x84/0x140 __vfs_write+0xc7/0x100 vfs_write+0x9d/0x190 SyS_write+0x50/0xc0 entry_SYSCALL_64_fastpath+0x12/0x6a Mem-Info: active_anon:293335 inactive_anon:2093 isolated_anon:0 active_file:10829 inactive_file:110045 isolated_file:32 unevictable:0 dirty:109275 writeback:822 unstable:0 slab_reclaimable:5489 slab_unreclaimable:10070 mapped:9999 shmem:2159 pagetables:2420 bounce:0 free:3 free_pcp:0 free_cma:0 Node 0 DMA free:12kB min:44kB low:52kB high:64kB active_anon:6060kB inactive_anon:176kB active_file:708kB inactive_file:756kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:756kB writeback:0kB mapped:736kB shmem:184kB slab_reclaimable:48kB slab_unreclaimable:7160kB kernel_stack:160kB pagetables:144kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9844 all_unreclaimable? yes lowmem_reserve[]: 0 1732 1732 1732 Node 0 DMA32 free:0kB min:5200kB low:6500kB high:7800kB active_anon:1167280kB inactive_anon:8196kB active_file:42608kB inactive_file:439424kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:2080640kB managed:1775332kB mlocked:0kB dirty:436344kB writeback:3288kB mapped:39260kB shmem:8452kB slab_reclaimable:21908kB slab_unreclaimable:33120kB kernel_stack:20976kB pagetables:9536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11073180 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB 123086 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 524157 pages RAM 0 pages HighMem/MovableOnly 76348 pages reserved 0 pages hwpoisoned SLUB: Unable to allocate memory on node -1 (gfp=0x2088020) cache: kmalloc-64, object size: 64, buffer size: 64, default order: 0, min order: 0 node 0: slabs: 3218, objs: 205952, free: 0 file_io.00: page allocation failure: order:0, mode:0x2200020 CPU: 0 PID: 4457 Comm: file_io.00 Not tainted 4.5.0-rc7+ #45 Assuming that somebody will find a better solution, let's apply this patch for now to stop bleeding, for this problem frequently prevents me from testing OOM livelock condition. Link: http://lkml.kernel.org/r/20160318131136.GE7152@quack.suse.cz Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
09cbfeaf1a |
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
aaf2559332 |
writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode
When cgroup writeback is in use, there can be multiple wb's (bdi_writeback's) per bdi and an inode may switch among them dynamically. In a couple places, the wrong wb was used leading to performing operations on the wrong list under the wrong lock corrupting the io lists. * writeback_single_inode() was taking @wb parameter and used it to remove the inode from io lists if it becomes clean after writeback. The callers of this function were always passing in the root wb regardless of the actual wb that the inode was associated with, which could also change while writeback is in progress. Fix it by dropping the @wb parameter and using inode_to_wb_and_lock_list() to determine and lock the associated wb. * After writeback_sb_inodes() writes out an inode, it re-locks @wb and inode to remove it from or move it to the right io list. It assumes that the inode is still associated with @wb; however, the inode may have switched to another wb while writeback was in progress. Fix it by using inode_to_wb_and_lock_list() to determine and lock the associated wb after writeback is complete. As the function requires the original @wb->list_lock locked for the next iteration, in the unlikely case where the inode has changed association, switch the locks. Kudos to Tahsin for pinpointing these subtle breakages. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: d10c80955265 ("writeback: implement foreign cgroup inode bdi_writeback switching") Link: http://lkml.kernel.org/g/CAAeU0aMYeM_39Y2+PaRvyB1nqAPYZSNngJ1eBRmrxn7gKAt2Mg@mail.gmail.com Reported-and-diagnosed-by: Tahsin Erdogan <tahsin@google.com> Tested-by: Tahsin Erdogan <tahsin@google.com> Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
614a4e3773 |
writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list()
locked_inode_to_wb_and_lock_list() wb_get()'s the wb associated with the target inode, unlocks inode, locks the wb's list_lock and verifies that the inode is still associated with the wb. To prevent the wb going away between dropping inode lock and acquiring list_lock, the wb is pinned while inode lock is held. The wb reference is put right after acquiring list_lock citing that the wb won't be dereferenced anymore. This isn't true. If the inode is still associated with the wb, the inode has reference and it's safe to return the wb; however, if inode has been switched, the wb still needs to be unlocked which is a dereference and can lead to use-after-free if it it races with wb destruction. Fix it by putting the reference after releasing list_lock. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 87e1d789bf55 ("writeback: implement [locked_]inode_to_wb_and_lock_list()") Cc: stable@vger.kernel.org # v4.2+ Tested-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
a1a0e23e49 |
writeback: flush inode cgroup wb switches instead of pinning super_block
If cgroup writeback is in use, inodes can be scheduled for asynchronous wb switching. Before 5ff8eaac1636 ("writeback: keep superblock pinned during cgroup writeback association switches"), this could race with umount leading to super_block being destroyed while inodes are pinned for wb switching. 5ff8eaac1636 fixed it by bumping s_active while wb switches are in flight; however, this allowed in-flight wb switches to make umounts asynchronous when the userland expected synchronosity - e.g. fsck immediately following umount may fail because the device is still busy. This patch removes the problematic super_block pinning and instead makes generic_shutdown_super() flush in-flight wb switches. wb switches are now executed on a dedicated isw_wq so that they can be flushed and isw_nr_in_flight keeps track of the number of in-flight wb switches so that flushing can be avoided in most cases. v2: Move cgroup_writeback_umount() further below and add MS_ACTIVE check in inode_switch_wbs() as Jan an Al suggested. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Tahsin Erdogan <tahsin@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@ZenIV.linux.org.uk> Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com Fixes: 5ff8eaac1636 ("writeback: keep superblock pinned during cgroup writeback association switches") Cc: stable@vger.kernel.org #v4.5 Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
5ff8eaac16 |
writeback: keep superblock pinned during cgroup writeback association switches
If cgroup writeback is in use, an inode is associated with a cgroup for writeback. If the inode's main dirtier changes to another cgroup, the association gets updated asynchronously. Nothing was pinning the superblock while such switches are in progress and superblock could go away while async switching is pending or in progress leading to crashes like the following. kernel BUG at fs/jbd2/transaction.c:319! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 29158 Comm: kworker/1:10 Not tainted 4.5.0-rc3 #51 Hardware name: Google Google, BIOS Google 01/01/2011 Workqueue: events inode_switch_wbs_work_fn task: ffff880213dbbd40 ti: ffff880209264000 task.ti: ffff880209264000 RIP: 0010:[<ffffffff803e6922>] [<ffffffff803e6922>] start_this_handle+0x382/0x3e0 RSP: 0018:ffff880209267c30 EFLAGS: 00010202 ... Call Trace: [<ffffffff803e6be4>] jbd2__journal_start+0xf4/0x190 [<ffffffff803cfc7e>] __ext4_journal_start_sb+0x4e/0x70 [<ffffffff803b31ec>] ext4_evict_inode+0x12c/0x3d0 [<ffffffff8035338b>] evict+0xbb/0x190 [<ffffffff80354190>] iput+0x130/0x190 [<ffffffff80360223>] inode_switch_wbs_work_fn+0x343/0x4c0 [<ffffffff80279819>] process_one_work+0x129/0x300 [<ffffffff80279b16>] worker_thread+0x126/0x480 [<ffffffff8027ed14>] kthread+0xc4/0xe0 [<ffffffff809771df>] ret_from_fork+0x3f/0x70 Fix it by bumping s_active while cgroup association switching is in flight. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Tahsin Erdogan <tahsin@google.com> Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com Fixes: d10c80955265 ("writeback: implement foreign cgroup inode bdi_writeback switching") Cc: stable@vger.kernel.org #v4.5+ Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
654a0dd095 |
cgroup, memcg, writeback: drop spurious rcu locking around mem_cgroup_css_from_page()
In earlier versions, mem_cgroup_css_from_page() could return non-root css on a legacy hierarchy which can go away and required rcu locking; however, the eventual version simply returns the root cgroup if memcg is on a legacy hierarchy and thus doesn't need rcu locking around or in it. Remove spurious rcu lockings. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
dbce03b9e3 |
fs/writeback.c: fix kernel-doc warnings
Fix kernel-doc warnings in fs/fs-writeback.c by moving a #define macro to after the function's opening brace. Also #undef this macro at the end of the function. ../fs/fs-writeback.c:1984: warning: Excess function parameter 'inode' description in 'I_DIRTY_INODE' ../fs/fs-writeback.c:1984: warning: Excess function parameter 'flags' description in 'I_DIRTY_INODE' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
aa750fd71c |
mm/filemap.c: make global sync not clear error status of individual inodes
filemap_fdatawait() is a function to wait for on-going writeback to complete but also consume and clear error status of the mapping set during writeback. The latter functionality is critical for applications to detect writeback error with system calls like fsync(2)/fdatasync(2). However filemap_fdatawait() is also used by sync(2) or FIFREEZE ioctl, which don't check error status of individual mappings. As a result, fsync() may not be able to detect writeback error if events happen in the following order: Application System admin ---------------------------------------------------------- write data on page cache Run sync command writeback completes with error filemap_fdatawait() clears error fsync returns success (but the data is not on disk) This patch adds filemap_fdatawait_keep_errors() for call sites where writeback error is not handled so that they don't clear error status. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: Fengguang Wu <fengguang.wu@gmail.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
b33e18f61b |
fs/writeback, rcu: Don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs()
bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue() to walk @bdi->wb_list. To set up the initial iteration condition, it uses list_entry_rcu() to calculate the entry pointer corresponding to the list head; however, this isn't an actual RCU dereference and using list_entry_rcu() for it ended up breaking a proposed list_entry_rcu() change because it was feeding an non-lvalue pointer into the macro. Don't use the RCU variant for simple pointer offsetting. Use list_entry() instead. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: David Howells <dhowells@redhat.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Patrick Marlier <patrick.marlier@gmail.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: pranith kumar <bobby.prani@gmail.com> Link: http://lkml.kernel.org/r/20151027051939.GA19355@mtj.duckdns.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
b817525a4a |
writeback: bdi_writeback iteration must not skip dying ones
bdi_for_each_wb() is used in several places to wake up or issue writeback work items to all wb's (bdi_writeback's) on a given bdi. The iteration is performed by walking bdi->cgwb_tree; however, the tree only indexes wb's which are currently active. For example, when a memcg gets associated with a different blkcg, the old wb is removed from the tree so that the new one can be indexed. The old wb starts dying from then on but will linger till all its inodes are drained. As these dying wb's may still host dirty inodes, writeback operations which affect all wb's must include them. bdi_for_each_wb() skipping dying wb's led to sync(2) missing and failing to sync the inodes belonging to those wb's. This patch adds a RCU protected @bdi->wb_list which lists all wb's beloinging to that bdi. wb's are added on creation and removed on release rather than on the start of destruction. bdi_for_each_wb() usages are replaced with list_for_each[_continue]_rcu() iterations over @bdi->wb_list and bdi_for_each_wb() and its helpers are removed. v2: Updated as per Jan. last_wb ref leak in bdi_split_work_to_wbs() fixed and unnecessary list head severing in cgwb_bdi_destroy() removed. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Artem Bityutskiy <dedekind1@gmail.com> Fixes: ebe41ab0c79d ("writeback: implement bdi_for_each_wb()") Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
6fdf860f15 |
writeback: fix bdi_writeback iteration in wakeup_dirtytime_writeback()
wakeup_dirtytime_writeback() walks and wakes up all wb's of all bdi's; unfortunately, it was always waking up bdi->wb instead of the wb being walked. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 001fe6f617b1 ("writeback: make wakeup_dirtytime_writeback() handle multiple bdi_writeback's") Reviewed-by: Jan Kara <jack@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
590dca3a71 |
fs-writeback: unplug before cond_resched in writeback_sb_inodes
Commit 505a666ee3fc ("writeback: plug writeback in wb_writeback() and writeback_inodes_wb()") has us holding a plug during writeback_sb_inodes, which increases the merge rate when relatively contiguous small files are written by the filesystem. It helps both on flash and spindles. For an fs_mark workload creating 4K files in parallel across 8 drives, this commit improves performance ~9% more by unplugging before calling cond_resched(). cond_resched() doesn't trigger an implicit unplug, so explicitly getting the IO down to the device before scheduling reduces latencies for anyone waiting on clean pages. It also cuts down on how often we use kblockd to unplug, which means less work bouncing from one workqueue to another. Many more details about how we got here: https://lkml.org/lkml/2015/9/11/570 Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
505a666ee3 |
writeback: plug writeback in wb_writeback() and writeback_inodes_wb()
We had to revert the pluggin in writeback_sb_inodes() because the wb->list_lock is held, but we could easily plug at a higher level before taking that lock, and unplug after releasing it. This does that. Chris will run performance numbers, just to verify that this approach is comparable to the alternative (we could just drop and re-take the lock around the blk_finish_plug() rather than these two commits. I'd have preferred waiting for actual performance numbers before picking one approach over the other, but I don't want to release rc1 with the known "sleeping function called from invalid context" issue, so I'll pick this cleanup version for now. But if the numbers show that we really want to plug just at the writeback_sb_inodes() level, and we should just play ugly games with the spinlock, we'll switch to that. Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
0ba13fd19d |
Revert "writeback: plug writeback at a high level"
This reverts commit d353d7587d02116b9732d5c06615aed75a4d3a47. Doing the block layer plug/unplug inside writeback_sb_inodes() is broken, because that function is actually called with a spinlock held: wb->list_lock, as pointed out by Chris Mason. Chris suggested just dropping and re-taking the spinlock around the blk_finish_plug() call (the plgging itself can happen under the spinlock), and that would technically work, but is just disgusting. We do something fairly similar - but not quite as disgusting because we at least have a better reason for it - in writeback_single_inode(), so it's not like the caller can depend on the lock being held over the call, but in this case there just isn't any good reason for that "release and re-take the lock" pattern. [ In general, we should really strive to avoid the "release and retake" pattern for locks, because in the general case it can easily cause subtle bugs when the caller caches any state around the call that might be invalidated by dropping the lock even just temporarily. ] But in this case, the plugging should be easy to just move up to the callers before the spinlock is taken, which should even improve the effectiveness of the plug. So there is really no good reason to play games with locking here. I'll send off a test-patch so that Dave Chinner can verify that that plug movement works. In the meantime this just reverts the problematic commit and adds a comment to the function so that we hopefully don't make this mistake again. Reported-by: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
b0a1ea51bd |
Merge branch 'for-4.3/blkcg' of git://git.kernel.dk/linux-block
Pull blk-cg updates from Jens Axboe: "A bit later in the cycle, but this has been in the block tree for a a while. This is basically four patchsets from Tejun, that improve our buffered cgroup writeback. It was dependent on the other cgroup changes, but they went in earlier in this cycle. Series 1 is set of 5 patches that has cgroup writeback updates: - bdi_writeback iteration fix which could lead to some wb's being skipped or repeated during e.g. sync under memory pressure. - Simplification of wb work wait mechanism. - Writeback tracepoints updated to report cgroup. Series 2 is is a set of updates for the CFQ cgroup writeback handling: cfq has always charged all async IOs to the root cgroup. It didn't have much choice as writeback didn't know about cgroups and there was no way to tell who to blame for a given writeback IO. writeback finally grew support for cgroups and now tags each writeback IO with the appropriate cgroup to charge it against. This patchset updates cfq so that it follows the blkcg each bio is tagged with. Async cfq_queues are now shared across cfq_group, which is per-cgroup, instead of per-request_queue cfq_data. This makes all IOs follow the weight based IO resource distribution implemented by cfq. - Switched from GFP_ATOMIC to GFP_NOWAIT as suggested by Jeff. - Other misc review points addressed, acks added and rebased. Series 3 is the blkcg policy cleanup patches: This patchset contains assorted cleanups for blkcg_policy methods and blk[c]g_policy_data handling. - alloc/free added for blkg_policy_data. exit dropped. - alloc/free added for blkcg_policy_data. - blk-throttle's async percpu allocation is replaced with direct allocation. - all methods now take blk[c]g_policy_data instead of blkcg_gq or blkcg. And finally, series 4 is a set of patches cleaning up the blkcg stats handling: blkcg's stats have always been somwhat of a mess. This patchset tries to improve the situation a bit. - The following patches added to consolidate blkcg entry point and blkg creation. This is in itself is an improvement and helps colllecting common stats on bio issue. - per-blkg stats now accounted on bio issue rather than request completion so that bio based and request based drivers can behave the same way. The issue was spotted by Vivek. - cfq-iosched implements custom recursive stats and blk-throttle implements custom per-cpu stats. This patchset make blkcg core support both by default. - cfq-iosched and blk-throttle keep track of the same stats multiple times. Unify them" * 'for-4.3/blkcg' of git://git.kernel.dk/linux-block: (45 commits) blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy blkcg: s/CFQ_WEIGHT_*/CFQ_WEIGHT_LEGACY_*/ blkcg: implement interface for the unified hierarchy blkcg: misc preparations for unified hierarchy interface blkcg: separate out tg_conf_updated() from tg_set_conf() blkcg: move body parsing from blkg_conf_prep() to its callers blkcg: mark existing cftypes as legacy blkcg: rename subsystem name from blkio to io blkcg: refine error codes returned during blkcg configuration blkcg: remove unnecessary NULL checks from __cfqg_set_weight_device() blkcg: reduce stack usage of blkg_rwstat_recursive_sum() blkcg: remove cfqg_stats->sectors blkcg: move io_service_bytes and io_serviced stats into blkcg_gq blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq blkcg: make blkcg_[rw]stat per-cpu blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it blkcg: consolidate blkg creation in blkcg_bio_issue_check() blk-throttle: improve queue bypass handling blkcg: move root blkg lookup optimization from throtl_lookup_tg() to __blkg_lookup() blkcg: inline [__]blkg_lookup() ... |
||
|
7d9071a095 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro: "In this one: - d_move fixes (Eric Biederman) - UFS fixes (me; locking is mostly sane now, a bunch of bugs in error handling ought to be fixed) - switch of sb_writers to percpu rwsem (Oleg Nesterov) - superblock scalability (Josef Bacik and Dave Chinner) - swapon(2) race fix (Hugh Dickins)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits) vfs: Test for and handle paths that are unreachable from their mnt_root dcache: Reduce the scope of i_lock in d_splice_alias dcache: Handle escaped paths in prepend_path mm: fix potential data race in SyS_swapon inode: don't softlockup when evicting inodes inode: rename i_wb_list to i_io_list sync: serialise per-superblock sync operations inode: convert inode_sb_list_lock to per-sb inode: add hlist_fake to avoid the inode hash lock in evict writeback: plug writeback at a high level change sb_writers to use percpu_rw_semaphore shift percpu_counter_destroy() into destroy_super_work() percpu-rwsem: kill CONFIG_PERCPU_RWSEM percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire() percpu-rwsem: introduce percpu_down_read_trylock() document rwsem_release() in sb_wait_write() fix the broken lockdep logic in __sb_start_write() introduce __sb_writers_{acquired,release}() helpers ufs_inode_get{frag,block}(): get rid of 'phys' argument ufs_getfrag_block(): tidy up a bit ... |
||
|
006a0973ed |
writeback: sync_inodes_sb() must write out I_DIRTY_TIME inodes and always call wait_sb_inodes()
e79729123f63 ("writeback: don't issue wb_writeback_work if clean") updated writeback path to avoid kicking writeback work items if there are no inodes to be written out; unfortunately, the avoidance logic was too aggressive and broke sync_inodes_sb(). * sync_inodes_sb() must write out I_DIRTY_TIME inodes but I_DIRTY_TIME inodes dont't contribute to bdi/wb_has_dirty_io() tests and were being skipped over. * inodes are taken off wb->b_dirty/io/more_io lists after writeback starts on them. sync_inodes_sb() skipping wait_sb_inodes() when bdi_has_dirty_io() breaks it by making it return while writebacks are in-flight. This patch fixes the breakages by * Removing bdi_has_dirty_io() shortcut from bdi_split_work_to_wbs(). The callers are already testing the condition. * Removing bdi_has_dirty_io() shortcut from sync_inodes_sb() so that it always calls into bdi_split_work_to_wbs() and wait_sb_inodes(). * Making bdi_split_work_to_wbs() consider the b_dirty_time list for WB_SYNC_ALL writebacks. Kudos to Eryu, Dave and Jan for tracking down the issue. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: e79729123f63 ("writeback: don't issue wb_writeback_work if clean") Link: http://lkml.kernel.org/g/20150812101204.GE17933@dhcp-13-216.nay.redhat.com Reported-and-bisected-by: Eryu Guan <eguan@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.com> Cc: Ted Ts'o <tytso@google.com> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
5634cc2aa9 |
writeback: update writeback tracepoints to report cgroup
The following tracepoints are updated to report the cgroup used during cgroup writeback. * writeback_write_inode[_start] * writeback_queue * writeback_exec * writeback_start * writeback_written * writeback_wait * writeback_nowork * writeback_wake_background * wbc_writepage * writeback_queue_io * bdi_dirty_ratelimit * balance_dirty_pages * writeback_sb_inodes_requeue * writeback_single_inode[_start] Note that writeback_bdi_register is separated out from writeback_class as reporting cgroup doesn't make sense to it. Tracepoints which take bdi are updated to take bdi_writeback instead. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
60292bcc1b |
writeback: explain why @inode is allowed to be NULL for inode_congested()
Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
8a1270cda7 |
writeback: remove wb_writeback_work->single_wait/done
wb_writeback_work->single_wait/done are used for the wait mechanism for synchronous wb_work (wb_writeback_work) items which are issued when bdi_split_work_to_wbs() fails to allocate memory for asynchronous wb_work items; however, there's no reason to use a separate wait mechanism for this. bdi_split_work_to_wbs() can simply use on-stack fallback wb_work item and separate wb_completion to wait for it. This patch removes wb_work->single_wait/done and the related code and make bdi_split_work_to_wbs() use on-stack fallback wb_work and wb_completion instead. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
1ed8d48c57 |
writeback: bdi_for_each_wb() iteration is memcg ID based not blkcg
wb's (bdi_writeback's) are currently keyed by memcg ID; however, in an earlier implementation, wb's were keyed by blkcg ID. bdi_for_each_wb() walks bdi->cgwb_tree in the ascending ID order and allows iterations to start from an arbitrary ID which is used to interrupt and resume iterations. Unfortunately, while changing wb to be keyed by memcg ID instead of blkcg, bdi_for_each_wb() was missed and is still assuming that wb's are keyed by blkcg ID. This doesn't affect iterations which don't get interrupted but bdi_split_work_to_wbs() makes use of iteration resuming on allocation failures and thus may incorrectly skip or repeat wb's. Fix it by changing bdi_for_each_wb() to take memcg IDs instead of blkcg IDs and updating bdi_split_work_to_wbs() accordingly. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
c7f5408493 |
inode: rename i_wb_list to i_io_list
There's a small consistency problem between the inode and writeback naming. Writeback calls the "for IO" inode queues b_io and b_more_io, but the inode calls these the "writeback list" or i_wb_list. This makes it hard to an new "under writeback" list to the inode, or call it an "under IO" list on the bdi because either way we'll have writeback on IO and IO on writeback and it'll just be confusing. I'm getting confused just writing this! So, rename the inode "for IO" list variable to i_io_list so we can add a new "writeback list" in a subsequent patch. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com> |
||
|
e97fedb9ef |
sync: serialise per-superblock sync operations
When competing sync(2) calls walk the same filesystem, they need to walk the list of inodes on the superblock to find all the inodes that we need to wait for IO completion on. However, when multiple wait_sb_inodes() calls do this at the same time, they contend on the the inode_sb_list_lock and the contention causes system wide slowdowns. In effect, concurrent sync(2) calls can take longer and burn more CPU than if they were serialised. Stop the worst of the contention by adding a per-sb mutex to wrap around wait_sb_inodes() so that we only execute one sync(2) IO completion walk per superblock superblock at a time and hence avoid contention being triggered by concurrent sync(2) calls. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com> |
||
|
74278da9f7 |
inode: convert inode_sb_list_lock to per-sb
The process of reducing contention on per-superblock inode lists starts with moving the locking to match the per-superblock inode list. This takes the global lock out of the picture and reduces the contention problems to within a single filesystem. This doesn't get rid of contention as the locks still have global CPU scope, but it does isolate operations on different superblocks form each other. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com> |