Afer Zram backing dev setuped, zram will be treated as a ASYNC IO
swap device, no longer SYNC IO device, because WB pages do have IO
operation on swap in, so the optimization on skiping swapcache
for fast swap device will no longer be applyed on such case.
Now we refactor this optimization, skip the swapcache by checking
SYNC IO state on each swap page via ioctl. Reuse ioctl because we
should not break the GKI.
Change-Id: If01167b7ca178370833fc6e02385bb34ef84318a
Signed-off-by: huangzq2 <huangzq2@motorola.com>
Reviewed-on: https://gerrit.mot.com/1897414
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
* refs/heads/tmp-509b380:
Revert "Revert "ANDROID: security,perf: Allow further restriction of perf_event_open""
Linux 4.14.168
m68k: Call timer_interrupt() with interrupts disabled
serial: stm32: fix clearing interrupt error flags
IB/iser: Fix dma_nents type definition
arm64: dts: juno: Fix UART frequency
drm/radeon: fix bad DMA from INTERRUPT_CNTL2
dmaengine: ti: edma: fix missed failure handling
affs: fix a memory leak in affs_remount
mmc: core: fix wl1251 sdio quirks
mmc: sdio: fix wl1251 vendor id
packet: fix data-race in fanout_flow_is_huge()
net: neigh: use long type to store jiffies delta
hv_netvsc: flag software created hash value
MIPS: Loongson: Fix return value of loongson_hwmon_init
afs: Fix large file support
net: qca_spi: Move reset_count to struct qcaspi
net: netem: correct the parent's backlog when corrupted packet was dropped
net: netem: fix error path for corrupted GSO frames
dmaengine: imx-sdma: fix size check for sdma script_number
drm/msm/dsi: Implement reset correctly
tcp: annotate lockless access to tcp_memory_pressure
net: add {READ|WRITE}_ONCE() annotations on ->rskq_accept_head
net: avoid possible false sharing in sk_leave_memory_pressure()
act_mirred: Fix mirred_init_module error handling
net: stmmac: fix length of PTP clock's name string
llc: fix sk_buff refcounting in llc_conn_state_process()
llc: fix another potential sk_buff leak in llc_ui_sendmsg()
mac80211: accept deauth frames in IBSS mode
net: stmmac: gmac4+: Not all Unicast addresses may be available
nvme: retain split access workaround for capability reads
net: ethernet: stmmac: Fix signedness bug in ipq806x_gmac_of_parse()
of: mdio: Fix a signedness bug in of_phy_get_and_connect()
net: axienet: fix a signedness bug in probe
net: stmmac: dwmac-meson8b: Fix signedness bug in probe
net: broadcom/bcmsysport: Fix signedness in bcm_sysport_probe()
net: hisilicon: Fix signedness bug in hix5hd2_dev_probe()
net: aquantia: Fix aq_vec_isr_legacy() return value
iommu/amd: Wait for completion of IOTLB flush in attach_device
net/rds: Fix 'ib_evt_handler_call' element in 'rds_ib_stat_names'
RDMA/cma: Fix false error message
ath10k: adjust skb length in ath10k_sdio_mbox_rx_packet
pinctrl: iproc-gpio: Fix incorrect pinconf configurations
net: sonic: replace dev_kfree_skb in sonic_send_packet
hwmon: (shtc1) fix shtc1 and shtw1 id mask
ixgbe: sync the first fragment unconditionally
btrfs: use correct count in btrfs_file_write_iter()
Btrfs: fix inode cache waiters hanging on path allocation failure
Btrfs: fix inode cache waiters hanging on failure to start caching thread
Btrfs: fix hang when loading existing inode cache off disk
scsi: fnic: fix msix interrupt allocation
net: sonic: return NETDEV_TX_OK if failed to map buffer
tty: serial: fsl_lpuart: Use appropriate lpuart32_* I/O funcs
ath9k: dynack: fix possible deadlock in ath_dynack_node_{de}init
iio: dac: ad5380: fix incorrect assignment to val
bcma: fix incorrect update of BCMA_CORE_PCI_MDIO_DATA
irqdomain: Add the missing assignment of domain->fwnode for named fwnode
staging: greybus: light: fix a couple double frees
x86, perf: Fix the dependency of the x86 insn decoder selftest
power: supply: Init device wakeup after device_add()
hwmon: (lm75) Fix write operations for negative temperatures
Partially revert "kfifo: fix kfifo_alloc() and kfifo_init()"
ahci: Do not export local variable ahci_em_messages
iommu/mediatek: Fix iova_to_phys PA start for 4GB mode
mips: avoid explicit UB in assignment of mips_io_port_base
rtc: pcf2127: bugfix: read rtc disables watchdog
media: atmel: atmel-isi: fix timeout value for stop streaming
mac80211: minstrel_ht: fix per-group max throughput rate initialization
dmaengine: dw: platform: Switch to acpi_dma_controller_register()
ASoC: sun4i-i2s: RX and TX counter registers are swapped
signal: Allow cifs and drbd to receive their terminating signals
bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
net/rds: Add a few missing rds_stat_names entries
ASoC: wm8737: Fix copy-paste error in wm8737_snd_controls
ASoC: cs4349: Use PM ops 'cs4349_runtime_pm'
ASoC: es8328: Fix copy-paste error in es8328_right_line_controls
ext4: set error return correctly when ext4_htree_store_dirent fails
crypto: caam - free resources in case caam_rng registration failed
cifs: fix rmmod regression in cifs.ko caused by force_sig changes
net/mlx5: Fix mlx5_ifc_query_lag_out_bits
ARM: dts: stm32: add missing vdda-supply to adc on stm32h743i-eval
tipc: reduce risk of wakeup queue starvation
ALSA: aoa: onyx: always initialize register read value
crypto: ccp - Reduce maximum stack usage
x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI
mic: avoid statically declaring a 'struct device'.
usb: host: xhci-hub: fix extra endianness conversion
qed: reduce maximum stack frame size
libertas_tf: Use correct channel range in lbtf_geo_init
PM: sleep: Fix possible overflow in pm_system_cancel_wakeup()
clk: sunxi-ng: v3s: add the missing PLL_DDR1
scsi: libfc: fix null pointer dereference on a null lport
net: pasemi: fix an use-after-free in pasemi_mac_phy_init()
RDMA/hns: Fixs hw access invalid dma memory error
devres: allow const resource arguments
rxrpc: Fix uninitialized error code in rxrpc_send_data_packet()
mfd: intel-lpss: Release IDA resources
iommu/amd: Make iommu_disable safer
bnxt_en: Fix ethtool selftest crash under error conditions.
nvmem: imx-ocotp: Ensure WAIT bits are preserved when setting timing
clk: qcom: Fix -Wunused-const-variable
dmaengine: hsu: Revert "set HSU_CH_MTSR to memory width"
perf/ioctl: Add check for the sample_period value
drm/msm/a3xx: remove TPL1 regs from snapshot
rtc: pcf8563: Clear event flags and disable interrupts before requesting irq
rtc: pcf8563: Fix interrupt trigger method
ASoC: ti: davinci-mcasp: Fix slot mask settings when using multiple AXRs
net/af_iucv: always register net_device notifier
net: netem: fix backlog accounting for corrupted GSO frames
drm/msm/mdp5: Fix mdp5_cfg_init error return
powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration
powerpc/cacheinfo: add cacheinfo_teardown, cacheinfo_rebuild
qed: iWARP - Use READ_ONCE and smp_store_release to access ep->state
iommu/vt-d: Duplicate iommu_resv_region objects per device list
mpls: fix warning with multi-label encap
media: vivid: fix incorrect assignment operation when setting video mode
cpufreq: brcmstb-avs-cpufreq: Fix types for voltage/frequency
cpufreq: brcmstb-avs-cpufreq: Fix initial command check
netvsc: unshare skb in VF rx handler
inet: frags: call inet_frags_fini() after unregister_pernet_subsys()
signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig
iommu: Use right function to get group for device
misc: sgi-xp: Properly initialize buf in xpc_get_rsvd_page_pa
serial: stm32: fix wakeup source initialization
serial: stm32: Add support of TC bit status check
serial: stm32: fix transmit_chars when tx is stopped
serial: stm32: fix rx error handling
crypto: ccp - Fix 3DES complaint from ccp-crypto module
crypto: ccp - fix AES CFB error exposed by new test vectors
spi: spi-fsl-spi: call spi_finalize_current_message() at the end
RDMA/qedr: Fix incorrect device rate.
arm64: dts: meson: libretech-cc: set eMMC as removable
dmaengine: tegra210-adma: Fix crash during probe
ARM: dts: sun8i-h3: Fix wifi in Beelink X2 DT
EDAC/mc: Fix edac_mc_find() in case no device is found
thermal: cpu_cooling: Actually trace CPU load in thermal_power_cpu_get_power
backlight: lm3630a: Return 0 on success in update_status functions
kdb: do a sanity check on the cpu in kdb_per_cpu()
ARM: riscpc: fix lack of keyboard interrupts after irq conversion
pwm: meson: Don't disable PWM when setting duty repeatedly
pwm: meson: Consider 128 a valid pre-divider
netfilter: ebtables: CONFIG_COMPAT: reject trailing data after last rule
crypto: caam - fix caam_dump_sg that iterates through scatterlist
platform/x86: alienware-wmi: printing the wrong error code
media: davinci/vpbe: array underflow in vpbe_enum_outputs()
media: omap_vout: potential buffer overflow in vidioc_dqbuf()
l2tp: Fix possible NULL pointer dereference
vfio/mdev: Fix aborting mdev child device removal if one fails
vfio/mdev: Avoid release parent reference during error path
afs: Fix the afs.cell and afs.volume xattr handlers
lightnvm: pblk: fix lock order in pblk_rb_tear_down_check
mmc: core: fix possible use after free of host
dmaengine: tegra210-adma: restore channel status
net: ena: fix ena_com_fill_hash_function() implementation
net: ena: fix incorrect test of supported hash function
net: ena: fix: Free napi resources when ena_up() fails
net: ena: fix swapped parameters when calling ena_com_indirect_table_fill_entry
iommu/vt-d: Make kernel parameter igfx_off work with vIOMMU
IB/mlx5: Add missing XRC options to QP optional params mask
dwc2: gadget: Fix completed transfer size calculation in DDMA
usb: gadget: fsl: fix link error against usb-gadget module
ASoC: fix valid stream condition
packet: in recvmsg msg_name return at least sizeof sockaddr_ll
scsi: qla2xxx: Avoid that qlt_send_resp_ctio() corrupts memory
scsi: qla2xxx: Fix a format specifier
irqchip/gic-v3-its: fix some definitions of inner cacheability attributes
NFS: Don't interrupt file writeout due to fatal errors
ALSA: usb-audio: Handle the error from snd_usb_mixer_apply_create_quirk()
dmaengine: axi-dmac: Don't check the number of frames for alignment
6lowpan: Off by one handling ->nexthdr
media: ov2659: fix unbalanced mutex_lock/unlock
ARM: dts: ls1021: Fix SGMII PCS link remaining down after PHY disconnect
powerpc: vdso: Make vdso32 installation conditional in vdso_install
selftests/ipc: Fix msgque compiler warnings
tipc: set sysctl_tipc_rmem and named_timeout right range
platform/x86: alienware-wmi: fix kfree on potentially uninitialized pointer
hwmon: (w83627hf) Use request_muxed_region for Super-IO accesses
net: hns3: fix for vport->bw_limit overflow problem
ARM: pxa: ssp: Fix "WARNING: invalid free of devm_ allocated data"
scsi: target/core: Fix a race condition in the LUN lookup code
scsi: qla2xxx: Unregister chrdev if module initialization fails
ehea: Fix a copy-paste err in ehea_init_port_res
spi: bcm2835aux: fix driver to not allow 65535 (=-1) cs-gpios
soc/fsl/qe: Fix an error code in qe_pin_request()
spi: tegra114: configure dma burst size to fifo trig level
spi: tegra114: flush fifos
spi: tegra114: terminate dma and reset on transfer timeout
spi: tegra114: fix for unpacked mode transfers
spi: tegra114: clear packed bit for unpacked mode
media: tw5864: Fix possible NULL pointer dereference in tw5864_handle_frame
media: davinci-isif: avoid uninitialized variable use
ARM: OMAP2+: Fix potentially uninitialized return value for _setup_reset()
arm64: dts: allwinner: a64: Add missing PIO clocks
m68k: mac: Fix VIA timer counter accesses
tipc: tipc clang warning
jfs: fix bogus variable self-initialization
regulator: tps65086: Fix tps65086_ldoa1_ranges for selector 0xB
media: cx23885: check allocation return
media: wl128x: Fix an error code in fm_download_firmware()
media: cx18: update *pos correctly in cx18_read_pos()
media: ivtv: update *pos correctly in ivtv_read_pos()
regulator: lp87565: Fix missing register for LP87565_BUCK_0
net: sh_eth: fix a missing check of of_get_phy_mode
xen, cpu_hotplug: Prevent an out of bounds access
drivers/rapidio/rio_cm.c: fix potential oops in riocm_ch_listen()
scsi: megaraid_sas: reduce module load time
x86/mm: Remove unused variable 'cpu'
nios2: ksyms: Add missing symbol exports
powerpc/mm: Check secondary hash page table
net: aquantia: fixed instack structure overflow
NFSv4/flexfiles: Fix invalid deref in FF_LAYOUT_DEVID_NODE()
netfilter: nft_set_hash: fix lookups with fixed size hash on big endian
regulator: wm831x-dcdc: Fix list of wm831x_dcdc_ilim from mA to uA
ARM: 8848/1: virt: Align GIC version check with arm64 counterpart
ARM: 8847/1: pm: fix HYP/SVC mode mismatch when MCPM is used
mmc: sdhci-brcmstb: handle mmc_of_parse() errors during probe
NFS/pnfs: Bulk destroy of layouts needs to be safe w.r.t. umount
platform/x86: wmi: fix potential null pointer dereference
clocksource/drivers/exynos_mct: Fix error path in timer resources initialization
clocksource/drivers/sun5i: Fail gracefully when clock rate is unavailable
NFS: Fix a soft lockup in the delegation recovery code
powerpc/64s: Fix logic when handling unknown CPU features
staging: rtlwifi: Use proper enum for return in halmac_parse_psd_data_88xx
fs/nfs: Fix nfs_parse_devname to not modify it's argument
ASoC: qcom: Fix of-node refcount unbalance in apq8016_sbc_parse_of()
drm/nouveau/pmu: don't print reply values if exec is false
drm/nouveau/bios/ramcfg: fix missing parentheses when calculating RON
net: dsa: qca8k: Enable delay for RGMII_ID mode
regulator: pv88090: Fix array out-of-bounds access
regulator: pv88080: Fix array out-of-bounds access
regulator: pv88060: Fix array out-of-bounds access
cdc-wdm: pass return value of recover_from_urb_loss
dmaengine: mv_xor: Use correct device for DMA API
staging: r8822be: check kzalloc return or bail
KVM: PPC: Release all hardware TCE tables attached to a group
hwmon: (pmbus/tps53679) Fix driver info initialization in probe routine
vfio_pci: Enable memory accesses before calling pci_map_rom
keys: Timestamp new keys
block: don't use bio->bi_vcnt to figure out segment number
usb: phy: twl6030-usb: fix possible use-after-free on remove
PCI: endpoint: functions: Use memcpy_fromio()/memcpy_toio()
pinctrl: sh-pfc: sh73a0: Fix fsic_spdif pin groups
pinctrl: sh-pfc: r8a7792: Fix vin1_data18_b pin group
pinctrl: sh-pfc: r8a7791: Fix scifb2_data_c pin group
pinctrl: sh-pfc: emev2: Add missing pinmux functions
drm/etnaviv: potential NULL dereference
iw_cxgb4: use tos when finding ipv6 routes
iw_cxgb4: use tos when importing the endpoint
fbdev: chipsfb: remove set but not used variable 'size'
rtc: pm8xxx: fix unintended sign extension
rtc: 88pm80x: fix unintended sign extension
rtc: 88pm860x: fix unintended sign extension
rtc: ds1307: rx8130: Fix alarm handling
net: phy: fixed_phy: Fix fixed_phy not checking GPIO
thermal: mediatek: fix register index error
rtc: ds1672: fix unintended sign extension
staging: most: cdev: add missing check for cdev_add failure
iwlwifi: mvm: fix RSS config command
ARM: dts: lpc32xx: phy3250: fix SD card regulator voltage
ARM: dts: lpc32xx: fix ARM PrimeCell LCD controller clocks property
ARM: dts: lpc32xx: fix ARM PrimeCell LCD controller variant
ARM: dts: lpc32xx: reparent keypad controller to SIC1
ARM: dts: lpc32xx: add required clocks property to keypad device node
driver core: Do not resume suppliers under device_links_write_lock()
crypto: crypto4xx - Fix wrong ppc4xx_trng_probe()/ppc4xx_trng_remove() arguments
driver: uio: fix possible use-after-free in __uio_register_device
driver: uio: fix possible memory leak in __uio_register_device
tty: ipwireless: Fix potential NULL pointer dereference
iwlwifi: mvm: fix A-MPDU reference assignment
net/mlx5: Take lock with IRQs disabled to avoid deadlock
iwlwifi: mvm: avoid possible access out of array.
clk: sunxi-ng: sun8i-a23: Enable PLL-MIPI LDOs when ungating it
spi/topcliff_pch: Fix potential NULL dereference on allocation error
rtc: cmos: ignore bogus century byte
IB/iser: Pass the correct number of entries for dma mapped SGL
ASoC: imx-sgtl5000: put of nodes if finding codec fails
crypto: tgr192 - fix unaligned memory access
crypto: brcm - Fix some set-but-not-used warning
kbuild: mark prepare0 as PHONY to fix external module build
media: s5p-jpeg: Correct step and max values for V4L2_CID_JPEG_RESTART_INTERVAL
drm/etnaviv: NULL vs IS_ERR() buf in etnaviv_core_dump()
RDMA/iw_cxgb4: Fix the unchecked ep dereference
spi: cadence: Correct initialisation of runtime PM
arm64: dts: apq8016-sbc: Increase load on l11 for SDCARD
drm/shmob: Fix return value check in shmob_drm_probe
RDMA/qedr: Fix out of bounds index check in query pkey
RDMA/ocrdma: Fix out of bounds index check in query pkey
IB/usnic: Fix out of bounds index check in query pkey
MIPS: BCM63XX: drop unused and broken DSP platform device
clk: dove: fix refcount leak in dove_clk_init()
clk: mv98dx3236: fix refcount leak in mv98dx3236_clk_init()
clk: armada-xp: fix refcount leak in axp_clk_init()
clk: kirkwood: fix refcount leak in kirkwood_clk_init()
clk: armada-370: fix refcount leak in a370_clk_init()
clk: vf610: fix refcount leak in vf610_clocks_init()
clk: imx7d: fix refcount leak in imx7d_clocks_init()
clk: imx6sx: fix refcount leak in imx6sx_clocks_init()
clk: imx6q: fix refcount leak in imx6q_clocks_init()
clk: samsung: exynos4: fix refcount leak in exynos4_get_xom()
clk: socfpga: fix refcount leak
clk: qoriq: fix refcount leak in clockgen_init()
clk: highbank: fix refcount leak in hb_clk_init()
Input: nomadik-ske-keypad - fix a loop timeout test
vxlan: changelink: Fix handling of default remotes
pinctrl: sh-pfc: sh7734: Remove bogus IPSR10 value
pinctrl: sh-pfc: sh7269: Add missing PCIOR0 field
pinctrl: sh-pfc: r8a77995: Remove bogus SEL_PWM[0-3]_3 configurations
pinctrl: sh-pfc: sh7734: Add missing IPSR11 field
pinctrl: sh-pfc: r8a7794: Remove bogus IPSR9 field
pinctrl: sh-pfc: sh73a0: Add missing TO pin to tpu4_to3 group
pinctrl: sh-pfc: r8a7791: Remove bogus marks from vin1_b_data18 group
pinctrl: sh-pfc: r8a7791: Remove bogus ctrl marks from qspi_data4_b group
pinctrl: sh-pfc: r8a7740: Add missing LCD0 marks to lcd0_data24_1 group
pinctrl: sh-pfc: r8a7740: Add missing REF125CK pin to gether_gmii group
switchtec: Remove immediate status check after submitting MRPC command
staging: bcm2835-camera: Abort probe if there is no camera
IB/rxe: Fix incorrect cache cleanup in error flow
net: phy: Fix not to call phy_resume() if PHY is not attached
drm/dp_mst: Skip validating ports during destruction, just ref
exportfs: fix 'passing zero to ERR_PTR()' warning
pcrypt: use format specifier in kobject_add
NTB: ntb_hw_idt: replace IS_ERR_OR_NULL with regular NULL checks
mlxsw: reg: QEEC: Add minimum shaper fields
drm/sun4i: hdmi: Fix double flag assignation
pwm: lpss: Release runtime-pm reference from the driver's remove callback
staging: comedi: ni_mio_common: protect register write overflow
ALSA: usb-audio: update quirk for B&W PX to remove microphone
IB/hfi1: Add mtu check for operational data VLs
IB/rxe: replace kvfree with vfree
drm/hisilicon: hibmc: Don't overwrite fb helper surface depth
PCI: iproc: Remove PAXC slot check to allow VF support
apparmor: don't try to replace stale label in ptrace access check
ALSA: hda: fix unused variable warning
drm/virtio: fix bounds check in virtio_gpu_cmd_get_capset()
drm/sti: do not remove the drm_bridge that was never added
crypto: sun4i-ss - fix big endian issues
mt7601u: fix bbp version check in mt7601u_wait_bbp_ready
tipc: fix wrong timeout input for tipc_wait_for_cond()
powerpc/archrandom: fix arch_get_random_seed_int()
mfd: intel-lpss: Add default I2C device properties for Gemini Lake
xfs: Sanity check flags of Q_XQUOTARM call
FROMGIT: ext4: Add EXT4_IOC_FSGETXATTR/EXT4_IOC_FSSETXATTR to compat_ioctl.
ANDROID: cuttlefish_defconfig: enable CONFIG_IKHEADERS as m
ANDROID: cuttlefish_defconfig: enable NVDIMM/PMEM options
UPSTREAM: virtio-pmem: Add virtio pmem driver
BACKPORT: libnvdimm: nd_region flush callback support
UPSTREAM: libnvdimm/of_pmem: Provide a unique name for bus provider
UPSTREAM: libnvdimm/of_pmem: Fix platform_no_drv_owner.cocci warnings
UPSTREAM: libnvdimm, of_pmem: use dev_to_node() instead of of_node_to_nid()
UPSTREAM: libnvdimm: Add device-tree based driver
UPSTREAM: libnvdimm: Add of_node to region and bus descriptors
FROMLIST: security: selinux: allow per-file labelling for binderfs
UPSTREAM: mm/page_io.c: annotate refault stalls from swap_readpage
Revert "ANDROID: security,perf: Allow further restriction of perf_event_open"
ANDROID: selinux: modify RTM_GETLINK permission
UPSTREAM: lib/test_meminit.c: add bulk alloc/free tests
UPSTREAM: lib/test_meminit: add a kmem_cache_alloc_bulk() test
UPSTREAM: mm: slub: really fix slab walking for init_on_free
UPSTREAM: mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations
Conflicts:
drivers/mmc/core/quirks.h
include/uapi/linux/virtio_ids.h
New header file entries are added to .bp files.
Change-Id: I515cb78684f524e239850625b163ba023b517e10
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
The following race is observed due to which a processes faulting on a
swap entry, finds the page neither in swapcache nor swap. This causes
zram to give a zero filled page that gets mapped to the process,
resulting in a user space crash later.
Consider parent and child processes Pa and Pb sharing the same swap slot
with swap_count 2. Swap is on zram with SWP_SYNCHRONOUS_IO set.
Virtual address 'VA' of Pa and Pb points to the shared swap entry.
Pa Pb
fault on VA fault on VA
do_swap_page do_swap_page
lookup_swap_cache fails lookup_swap_cache fails
Pb scheduled out
swapin_readahead (deletes zram entry)
swap_free (makes swap_count 1)
Pb scheduled in
swap_readpage (swap_count == 1)
Takes SWP_SYNCHRONOUS_IO path
zram enrty absent
zram gives a zero filled page
Fix this by making sure that swap slot is freed only when swap count
drops down to one.
Change-Id: I85350d53ee1b9426cc960bd7ab44872a63759987
Link: http://lkml.kernel.org/r/1571743294-14285-1-git-send-email-vinmenon@codeaurora.org
Fixes: aa8d22a11da9 ("mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference")
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Suggested-by: Minchan Kim <minchan@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 5df373e95689b9519b8557da7c5bd0db0856d776
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[charante@codeaurora.org: Fixed trivial build error]
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
If a block device supports rw_page operation, it doesn't submit bios so
the annotation in submit_bio() for refault stall doesn't work. It
happens with zram in android, especially swap read path which could
consume CPU cycle for decompress. It is also a problem for zswap which
uses frontswap.
Annotate swap_readpage() to account the synchronous IO overhead to
prevent underreport memory pressure.
[akpm@linux-foundation.org: add comment, per Johannes]
Link: http://lkml.kernel.org/r/20191010152134.38545-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 937790699be9c8100e5358625e7dfa8b32bd33f2)
Bug: 142418748
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I8a63030888996b6c0a3a9abe3f2d0eca0a0d765b
With fast swap storage, the platforms want to use swap more aggressively
and swap-in is crucial to application latency.
The rw_page() based synchronous devices like zram, pmem and btt are such
fast storage. When I profile swapin performance with zram lz4
decompress test, S/W overhead is more than 70%. Maybe, it would be
bigger in nvdimm.
This patch aims to reduce swap-in latency by skipping swapcache if the
swap device is synchronous device like rw_page based device. It
enhances 45% my swapin test(5G sequential swapin, no readahead, from
2.41sec to 1.64sec).
Change-Id: I3f8d0c3b4487331e6e0c02a09bea3077223534b8
Link: http://lkml.kernel.org/r/1505886205-9671-5-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 0bcac06f27d7528591c27ac2b093ccd71c5d0168
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Ratelimit the swap write errors to avoid logbuf getting
filled up by these messages. These pages remove the useful
page allocation failure messages (when zram is used as swap)
from logbuf, thus making the debugging difficult.
Change-Id: I9d4229a76b2551d7baca603112d53012d8722415
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:
- Fix for a registration race in loop, from Anton Volkov.
- Overflow complaint fix from Arnd for DAC960.
- Series of drbd changes from the usual suspects.
- Conversion of the stec/skd driver to blk-mq. From Bart.
- A few BFQ improvements/fixes from Paolo.
- CFQ improvement from Ritesh, allowing idling for group idle.
- A few fixes found by Dan's smatch, courtesy of Dan.
- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.
- A few nbd fixes from Josef.
- Support for cgroup info in blktrace, from Shaohua.
- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.
- Various corner cases and error handling fixes from Weiping Zhang.
- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.
- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.
- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"
* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
...
To support delay splitting THP (Transparent Huge Page) after swapped
out, we need to enhance swap writing code to support to write a THP as a
whole. This will improve swap write IO performance.
As Ming Lei <ming.lei@redhat.com> pointed out, this should be based on
multipage bvec support, which hasn't been merged yet. So this patch is
only for testing the functionality of the other patches in the series.
And will be reimplemented after multipage bvec support is merged.
Link: http://lkml.kernel.org/r/20170724051840.2309-7-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c]
Cc: Shaohua Li <shli@kernel.org>
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).
For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.
Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For fast flash disk, async IO could introduce overhead because of
context switch. block-mq now supports IO poll, which improves
performance and latency a lot. swapin is a good place to use this
technique, because the task is waiting for the swapin page to continue
execution.
In my virtual machine, directly read 4k data from a NVMe with iopoll is
about 60% better than that without poll. With iopoll support in swapin
patch, my microbenchmark (a task does random memory write) is about
10%~25% faster. CPU utilization increases a lot though, 2x and even 3x
CPU utilization. This will depend on disk speed.
While iopoll in swapin isn't intended for all usage cases, it's a win
for latency sensistive workloads with high speed swap disk. block layer
has knob to control poll in runtime. If poll isn't enabled in block
layer, there should be no noticeable change in swapin.
I got a chance to run the same test in a NVMe with DRAM as the media.
In simple fio IO test, blkpoll boosts 50% performance in single thread
test and ~20% in 8 threads test. So this is the base line. In above
swap test, blkpoll boosts ~27% performance in single thread test.
blkpoll uses 2x CPU time though.
If we enable hybid polling, the performance gain has very slight drop
but CPU time is only 50% worse than that without blkpoll. Also we can
adjust parameter of hybid poll, with it, the CPU time penality is
reduced further. In 8 threads test, blkpoll doesn't help though. The
performance is similar to that without blkpoll, but cpu utilization is
similar too. There is lock contention in swap path. The cpu time
spending on blkpoll isn't high. So overall, blkpoll swapin isn't worse
than that without it.
The swapin readahead might read several pages in in the same time and
form a big IO request. Since the IO will take longer time, it doesn't
make sense to do poll, so the patch only does iopoll for single page
swapin.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/070c3c3e40b711e7b1390002c991e86a-b5408f0@7511894063d3764ff01ea8111f5a004d7dd700ed078797c204a24e620ddb965c
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Add wbc_to_write_flags(), which returns the write modifier flags to use,
based on a struct writeback_control. No functional changes in this
patch, but it prepares us for factoring other wbc fields for write type.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
So they are CONFIG_DEBUG_VM-only and more informative.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rik van Riel <riel@redhat.com>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 62c230bc1790 ("mm: add support for a filesystem to activate
swap files and use direct_IO for writing swap pages") replaced the
swap_aops dirty hook from __set_page_dirty_no_writeback() with
swap_set_page_dirty().
For normal cases without these special SWP flags code path falls back to
__set_page_dirty_no_writeback() so the behaviour is expected to be the
same as before.
But swap_set_page_dirty() makes use of the page_swap_info() helper to
get the swap_info_struct to check for the flags like SWP_FILE,
SWP_BLKDEV etc as desired for those features. This helper has
BUG_ON(!PageSwapCache(page)) which is racy and safe only for the
set_page_dirty_lock() path.
For the set_page_dirty() path which is often needed for cases to be
called from irq context, kswapd() can toggle the flag behind the back
while the call is getting executed when system is low on memory and
heavy swapping is ongoing.
This ends up with undesired kernel panic.
This patch just moves the check outside the helper to its users
appropriately to fix kernel panic for the described path. Couple of
users of helpers already take care of SwapCache condition so I skipped
them.
Link: http://lkml.kernel.org/r/1473460718-31013-1-git-send-email-santosh.shilimkar@oracle.com
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joe Perches <joe@perches.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rik van Riel <riel@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jens Axboe <axboe@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org> [4.7.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
generic_swapfile_activate() can take quite long time, it iterates over
all blocks of a file, so add cond_resched to it. I observed about 1
second stalls when activating a swapfile that was almost unfragmented -
this patch fixes it.
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1607221710580.4818@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch converts the simple bi_rw use cases in the block,
drivers, mm and fs code to set/get the bio operation using
bio_set_op_attrs/bio_op
These should be simple one or two liner cases, so I just did them
in one patch. The next patches handle the more complicated
cases in a module per patch.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Fixed up fs/ext4/crypto.c
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull vfs cleanups from Al Viro:
"More cleanups from Christoph"
* 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nfsd: use RWF_SYNC
fs: add RWF_DSYNC aand RWF_SYNC
ceph: use generic_write_sync
fs: simplify the generic_write_sync prototype
fs: add IOCB_SYNC and IOCB_DSYNC
direct-io: remove the offset argument to dio_complete
direct-io: eliminate the offset argument to ->direct_IO
xfs: eliminate the pos variable in xfs_file_dio_aio_write
filemap: remove the pos argument to generic_file_direct_write
filemap: remove pos variables in generic_file_read_iter
Including blkdev_direct_IO and dax_do_io. It has to be ki_pos to actually
work, so eliminate the superflous argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Kyeongdon reported below error which is BUG_ON(!PageSwapCache(page)) in
page_swap_info. The reason is that page_endio in rw_page unlocks the
page if read I/O is completed so we need to hold a PG_lock again to
check PageSwapCache. Otherwise, the page can be removed from swapcache.
Kernel BUG at c00f9040 [verbose debug info unavailable]
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 4 PID: 13446 Comm: RenderThread Tainted: G W 3.10.84-g9f14aec-dirty #73
task: c3b73200 ti: dd192000 task.ti: dd192000
PC is at page_swap_info+0x10/0x2c
LR is at swap_slot_free_notify+0x18/0x6c
pc : [<c00f9040>] lr : [<c00f5560>] psr: 400f0113
sp : dd193d78 ip : c2deb1e4 fp : da015180
r10: 00000000 r9 : 000200da r8 : c120fe08
r7 : 00000000 r6 : 00000000 r5 : c249a6c0 r4 : = c249a6c0
r3 : 00000000 r2 : 40080009 r1 : 200f0113 r0 : = c249a6c0
..<snip> ..
Call Trace:
page_swap_info+0x10/0x2c
swap_slot_free_notify+0x18/0x6c
swap_readpage+0x90/0x11c
read_swap_cache_async+0x134/0x1ac
swapin_readahead+0x70/0xb0
handle_pte_fault+0x320/0x6fc
handle_mm_fault+0xc0/0xf0
do_page_fault+0x11c/0x36c
do_DataAbort+0x34/0x118
Fixes: 3f2b1a04f44933f2 ("zram: revive swap_slot_free_notify")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Tested-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit b430e9d1c6d4 ("remove compressed copy from zram in-memory")
applied swap_slot_free_notify call in *end_swap_bio_read* to remove
duplicated memory between zram and memory.
However, with the introduction of rw_page in zram: 8c7f01025f7b ("zram:
implement rw_page operation of zram"), it became void because rw_page
doesn't need bio.
Memory footprint is really important in embedded platforms which have
small memory, for example, 512M) recently because it could start to kill
processes if memory footprint exceeds some threshold by LMK or some
similar memory management modules.
This patch restores the function for rw_page, thereby eliminating this
duplication.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: karam.lee <karam.lee@lge.com>
Cc: <sangseok.lee@lge.com>
Cc: Chan Jeong <chan.jeong@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most of the mm subsystem uses pr_<level> so make it consistent.
Miscellanea:
- Realign arguments
- Add missing newline to format
- kmemleak-test.c has a "kmemleak: " prefix added to the
"Kmemleak testing" logging message via pr_fmt
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Tejun Heo <tj@kernel.org> [percpu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Call pre-defined helper bio_add_page() instead of open coding for
iterating through bi_io_vec[]. Doing that, it's possible to make some
parts in filesystems and mm/page_io.c simpler than before.
Acked-by: Dave Kleikamp <shaggy@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently we have two different ways to signal an I/O error on a BIO:
(1) by clearing the BIO_UPTODATE flag
(2) by returning a Linux errno value to the bi_end_io callback
The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario. Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.
So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Stop abusing struct page functionality and the swap end_io handler, and
instead add a modified version of the blk-lib.c bio_batch helpers.
Also move the block I/O code into swap.c as they are directly tied into
each other.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Ming Lin <mlin@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Jens Axboe <axboe@fb.com>
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There is no need to pass the total request length in the kiocb, as
we already get passed in through the iov_iter argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Tetsuo Handa wrote:
"Commit 62a8067a7f35 ("bio_vec-backed iov_iter") introduced an unnamed
union inside a struct which gcc-4.4.7 cannot handle. Name the unnamed
union as u in order to fix build failure"
Let's do this instead: there is only one place in the entire tree that
steps into this breakage. Anon structs and unions work in older gcc
versions; as the matter of fact, we have those in the tree - see e.g.
struct ieee80211_tx_info in include/net/mac80211.h
What doesn't work is handling their initializers:
struct {
int a;
union {
int b;
char c;
};
} x[2] = {{.a = 1, .c = 'a'}, {.a = 0, .b = 1}};
is the obvious syntax for initializer, perfectly fine for C11 and
handled correctly by gcc-4.7 or later.
Earlier versions, though, break on it - declaration is fine and so's
access to fields (i.e. x[0].c = 'a'; would produce the right code), but
members of the anon structs and unions are not inserted into the right
namespace. Tellingly, those older versions will not barf on struct {int
a; struct {int a;};}; - looks like they just have it hacked up somewhere
around the handling of . and -> instead of doing the right thing.
The easiest way to deal with that crap is to turn initialization of
those fields (in the only place where we have such initializer of
iov_iter) into plain assignment.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs updates from Al Viro:
"This the bunch that sat in -next + lock_parent() fix. This is the
minimal set; there's more pending stuff.
In particular, I really hope to get acct.c fixes merged this cycle -
we need that to deal sanely with delayed-mntput stuff. In the next
pile, hopefully - that series is fairly short and localized
(kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more
iov_iter work. Most of prereqs for ->splice_write with sane locking
order are there and Kent's dio rewrite would also fit nicely on top of
this pile"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
lock_parent: don't step on stale ->d_parent of all-but-freed one
kill generic_file_splice_write()
ceph: switch to iter_file_splice_write()
shmem: switch to iter_file_splice_write()
nfs: switch to iter_splice_write_file()
fs/splice.c: remove unneeded exports
ocfs2: switch to iter_file_splice_write()
->splice_write() via ->write_iter()
bio_vec-backed iov_iter
optimize copy_page_{to,from}_iter()
bury generic_file_aio_{read,write}
lustre: get rid of messing with iovecs
ceph: switch to ->write_iter()
ceph_sync_direct_write: stop poking into iov_iter guts
ceph_sync_read: stop poking into iov_iter guts
new helper: copy_page_from_iter()
fuse: switch to ->write_iter()
btrfs: switch to ->write_iter()
ocfs2: switch to ->write_iter()
xfs: switch to ->write_iter()
...
By calling the device driver to write the page directly, we avoid
allocating a BIO, which allows us to free memory without allocating
memory.
[akpm@linux-foundation.org: fix used-uninitialized bug]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dheeraj Reddy <dheeraj.reddy@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
New variant of iov_iter - ITER_BVEC in iter->type, backed with
bio_vec array instead of iovec one. Primitives taught to deal
with such beasts, __swap_write() switched to using that kind
of iov_iter.
Note that bio_vec is just a <page, offset, length> triple - there's
nothing block-specific about it. I've left the definition where it
was, but took it from under ifdef CONFIG_BLOCK.
Next target: ->splice_write()...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
For now, just use the same thing we pass to ->direct_IO() - it's all
iovec-based at the moment. Pass it explicitly to iov_iter_init() and
account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
uses.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull core block IO changes from Jens Axboe:
"The major piece in here is the immutable bio_ve series from Kent, the
rest is fairly minor. It was supposed to go in last round, but
various issues pushed it to this release instead. The pull request
contains:
- Various smaller blk-mq fixes from different folks. Nothing major
here, just minor fixes and cleanups.
- Fix for a memory leak in the error path in the block ioctl code
from Christian Engelmayer.
- Header export fix from CaiZhiyong.
- Finally the immutable biovec changes from Kent Overstreet. This
enables some nice future work on making arbitrarily sized bios
possible, and splitting more efficient. Related fixes to immutable
bio_vecs:
- dm-cache immutable fixup from Mike Snitzer.
- btrfs immutable fixup from Muthu Kumar.
- bio-integrity fix from Nic Bellinger, which is also going to stable"
* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
xtensa: fixup simdisk driver to work with immutable bio_vecs
block/blk-mq-cpu.c: use hotcpu_notifier()
blk-mq: for_each_* macro correctness
block: Fix memory leak in rw_copy_check_uvector() handling
bio-integrity: Fix bio_integrity_verify segment start bug
block: remove unrelated header files and export symbol
blk-mq: uses page->list incorrectly
blk-mq: use __smp_call_function_single directly
btrfs: fix missing increment of bi_remaining
Revert "block: Warn and free bio if bi_end_io is not set"
block: Warn and free bio if bi_end_io is not set
blk-mq: fix initializing request's start time
block: blk-mq: don't export blk_mq_free_queue()
block: blk-mq: make blk_sync_queue support mq
block: blk-mq: support draining mq queue
dm cache: increment bi_remaining when bi_end_io is restored
block: fixup for generic bio chaining
block: Really silence spurious compiler warnings
block: Silence spurious compiler warnings
block: Kill bio_pair_split()
...
Most of the VM_BUG_ON assertions are performed on a page. Usually, when
one of these assertions fails we'll get a BUG_ON with a call stack and
the registers.
I've recently noticed based on the requests to add a small piece of code
that dumps the page to various VM_BUG_ON sites that the page dump is
quite useful to people debugging issues in mm.
This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
VM_BUG_ON() does, also dumps the page before executing the actual
BUG_ON.
[akpm@linux-foundation.org: fix up includes]
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This code doesn't serve any purpose anymore, since the aio retry
infrastructure has been removed.
This change should be safe because aio_read/write are also used for
synchronous IO, and called from do_sync_read()/do_sync_write() - and
there's no looping done in the sync case (the read and write syscalls).
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Swap subsystem does lazy swap slot free with expecting the page would be
swapped out again so we can avoid unnecessary write.
But the problem in in-memory swap(ex, zram) is that it consumes memory
space until vm_swap_full(ie, used half of all of swap device) condition
meet. It could be bad if we use multiple swap device, small in-memory
swap and big storage swap or in-memory swap alone.
This patch makes swap subsystem free swap slot as soon as swap-read is
completed and make the swapcache page dirty so the page should be
written out the swap device to reclaim it. It means we never lose it.
I tested this patch with kernel compile workload.
1. before
compile time : 9882.42
zram max wasted space by fragmentation: 13471881 byte
memory space consumed by zram: 174227456 byte
the number of slot free notify: 206684
2. after
compile time : 9653.90
zram max wasted space by fragmentation: 11805932 byte
memory space consumed by zram: 154001408 byte
the number of slot free notify: 426972
[akpm@linux-foundation.org: tweak comment text]
[artem.savkov@gmail.com: fix BUG due to non-swapcache pages in end_swap_bio_read()]
[akpm@linux-foundation.org: invert unlikely() test, augment comment, 80-col cleanup]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Artem Savkov <artem.savkov@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>
Cc: Shaohua Li <shli@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block core updates from Jens Axboe:
- Major bit is Kents prep work for immutable bio vecs.
- Stable candidate fix for a scheduling-while-atomic in the queue
bypass operation.
- Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
discard bios.
- Tejuns changes to convert the writeback thread pool to the generic
workqueue mechanism.
- Runtime PM framework, SCSI patches exists on top of these in James'
tree.
- A few random fixes.
* 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
relay: move remove_buf_file inside relay_close_buf
partitions/efi.c: replace useless kzalloc's by kmalloc's
fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
block: fix max discard sectors limit
blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
Documentation: cfq-iosched: update documentation help for cfq tunables
writeback: expose the bdi_wq workqueue
writeback: replace custom worker pool implementation with unbound workqueue
writeback: remove unused bdi_pending_list
aoe: Fix unitialized var usage
bio-integrity: Add explicit field for owner of bip_buf
block: Add an explicit bio flag for bios that own their bvec
block: Add bio_alloc_pages()
block: Convert some code to bio_for_each_segment_all()
block: Add bio_for_each_segment_all()
bounce: Refactor __blk_queue_bounce to not use bi_io_vec
raid1: use bio_copy_data()
pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
pktcdvd: use bio_copy_data()
block: Add bio_copy_data()
...
As pointed out by Andrew Morton, the swap-over-NFS writeback is not
setting PageWriteback before it is queued for direct IO. While swap
pages do not participate in BDI or process dirty accounting and the IO
is synchronous, the writeback bit is still required and not setting it
in this case was an oversight. swapoff depends on the page writeback to
synchronoise all pending writes on a swap page before it is reused.
Swapcache freeing and reuse depend on checking the PageWriteback under
lock to ensure the page is safe to reuse.
Direct IO handlers and the direct IO handler for NFS do not deal with
PageWriteback as they are synchronous writes. In the case of NFS, it
schedules pages (or a page in the case of swap) for IO and then waits
synchronously for IO to complete in nfs_direct_write(). It is
recognised that this is a slowdown from normal swap handling which is
asynchronous and uses a completion handler. Shoving PageWriteback
handling down into direct IO handlers looks like a bad fit to handle the
swap case although it may have to be dealt with some day if swap is
converted to use direct IO in general and bmap is finally done away
with. At that point it will be necessary to refit asynchronous direct
IO with completion handlers onto the swap subsystem.
As swapcache currently depends on PageWriteback to protect against
races, this patch sets PageWriteback under the page lock before queueing
it for direct IO. It is cleared when the direct IO handler returns. IO
errors are treated similarly to the direct-to-bio case except PageError
is not set as in the case of swap-over-NFS, it is likely to be a
transient error.
It was asked what prevents such a page being reclaimed in parallel.
With this patch applied, such a page will now be skipped (most of the
time) or blocked until the writeback completes. Reclaim checks
PageWriteback under the page lock before calling try_to_free_swap and
the page lock should prevent the page being requeued for IO before it is
freed.
This and Jerome's related patch should considered for -stable as far
back as 3.6 when swap-over-NFS was introduced.
[akpm@linux-foundation.org: use pr_err_ratelimited()]
[akpm@linux-foundation.org: remove hopefully-unneeded cast in printk]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [3.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit 62c230bc1790 ("mm: add support for a filesystem to activate
swap files and use direct_IO for writing swap pages"), swap_writepage()
calls direct_IO on swap files. However, in that case the page isn't
redirtied if I/O fails, and is therefore handled afterwards as if it has
been successfully written to the swap file, leading to memory corruption
when the page is eventually swapped back in.
This patch sets the page dirty when direct_IO() fails. It fixes a
memory corruption that happened while using swap-over-NFS.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [3.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To prevent flooding the swap device with writebacks, frontswap backends
need to count and limit the number of outstanding writebacks. The
incrementing of the counter can be done before the call to
__swap_writepage(). However, the caller must receive a notification
when the writeback completes in order to decrement the counter.
To achieve this functionality, this patch modifies __swap_writepage() to
take the bio completion callback function as an argument.
end_swap_bio_write(), the normal bio completion function, is also made
non-static so that code doing the accounting can call it after the
accounting is done.
There should be no behavioural change to existing code.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
swap_writepage() is currently where frontswap hooks into the swap write
path to capture pages with the frontswap_store() function. However, if
a frontswap backend wants to "resume" the writeback of a page to the
swap device, it can't call swap_writepage() as the page will simply
reenter the backend.
This patch separates swap_writepage() into a top and bottom half, the
bottom half named __swap_writepage() to allow a frontswap backend, like
zswap, to resume writeback beyond the frontswap_store() hook.
__add_to_swap_cache() is also made non-static so that the page for which
writeback is to be resumed can be added to the swap cache.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>