227 Commits

Author SHA1 Message Date
Richard Raya
d97f51cb57 mm: Revert some hacks
This reverts commits:
- 0fc9fbd21297173aa822f97fe33a481053cb96ec [mm + sysctl: tune swappiness and make some values read only]
- 94181990a4ea1a20bb8bf443f3fbe500d05901c3 [mm: Import oplus memory management hacks]
- 97bdd381c8292d43e68ff55bd08767db17e62810 [mm: Set swappiness for CONFIG_INCREASE_MAXIMUM_SWAPPINESS=y case]
- fa8d2aa0e20da6b943157f6ab58068bd80d68920 [mm: move variable under a proper #ifdef]
- f9daeaa423b745b2c2c34a6fb5ac6b69daf746c4 [mm: merge Samsung mm hacks]
- 1a460a832c9c6550f5cbe32dca4c15cf89806b57 [mm: Make watermark_scale_factor read-only]
- 963a3bfe3352b45ea21c58d53055689e46d81eeb [mm: Tune parameters for Android]

Change-Id: I70495ca93a05384a2d7bc2498fd2d56bd9928390
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-02-08 21:17:59 -03:00
Richard Raya
9cdc78c354 Merge branch 'android-4.14-stable' of https://android.googlesource.com/kernel/common
* 'android-4.14-stable' of https://android.googlesource.com/kernel/common: (2966 commits)
  Linux 4.14.331
  net: sched: fix race condition in qdisc_graft()
  scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
  ext4: remove gdb backup copy for meta bg in setup_new_flex_group_blocks
  ext4: correct return value of ext4_convert_meta_bg
  ext4: correct offset of gdb backup in non meta_bg group to update_backups
  ext4: apply umask if ACL support is disabled
  media: venus: hfi: fix the check to handle session buffer requirement
  media: sharp: fix sharp encoding
  i2c: i801: fix potential race in i801_block_transaction_byte_by_byte
  net: dsa: lan9303: consequently nested-lock physical MDIO
  ALSA: info: Fix potential deadlock at disconnection
  parisc/pgtable: Do not drop upper 5 address bits of physical address
  parisc: Prevent booting 64-bit kernels on PA1.x machines
  mcb: fix error handling for different scenarios when parsing
  jbd2: fix potential data lost in recovering journal raced with synchronizing fs bdev
  genirq/generic_chip: Make irq_remove_generic_chip() irqdomain aware
  mmc: meson-gx: Remove setting of CMD_CFG_ERROR
  PM: hibernate: Clean up sync_read handling in snapshot_write_next()
  PM: hibernate: Use __get_safe_page() rather than touching the list
  ...

Change-Id: I755d2aa7c525ace28adc4aee433572b3110ea39b
2023-12-07 20:15:44 -03:00
Greg Kroah-Hartman
fce78edbb4 This is the 4.14.322 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmTWAT4ACgkQONu9yGCS
 aT6kKxAA00HDcoEbS4CpQxK1ggeeW6xMFqPHHwUz62ScZPR1zcrR4ag5UrKOQALF
 cCQwt2nVBMUXciiQd3gY+MciAYPRVIXLMK9QqQEJSBZ+2p8zY3nb/HbM6o8iKQeV
 xIhUneiyHtbOyTo3oQcyET7ngwxtDp9uEnd+8I+sSbGi8Wyh8Z8L2daVQTrke1Js
 QIe3wDQsUj0pEDhRfYx29JKeQ8fBOfZlxtFEsdHvGgP/4j2EXGwyMVnt3/DVuwM8
 5/b/SML0skSh8YM9JfMQwpYpR+MAFGyyYKoF2pGu1trvyoh2Jd3TYuYcNqjwIywg
 W+ODGmULcYUYPBzUMdvrefwpn4l/2qpPCJ8FHB80h+4Jmy6PMN7lm1YnMBeQK4GP
 ACLr2BzJ4Tp5LavWZpTpqdRlC039aSZqY+7K+H/eoNstwZMU3hKc3Kn2KrPss0pp
 K0M7+8oukTnSiFNgIXVJOsr+kN1nNvtQmqCVRWlrn2cQckdDf8pVkPl/QtC3ZtWf
 aI8xYr6UpAr0z1elK5p9lO6N0R8FLwVmDG7B4b/6nLbWtRSt53ay/nMAzebodpn1
 8r+6ZoXO5LedNJsUOMJqE58X0ywbUgcx8mfkuRS8PLXEk7yI4+PR7DCeWyZ/YdVX
 dUqaYIK0yYx9yXAkMaSdrnMs+OSqa6lK9c9juPDvFox+ngLAjNk=
 =67ef
 -----END PGP SIGNATURE-----

Merge 4.14.322 into android-4.14-stable

Changes in 4.14.322
	gfs2: Don't deref jdesc in evict
	x86/microcode/AMD: Load late on both threads too
	x86/smp: Use dedicated cache-line for mwait_play_dead()
	fbdev: imsttfb: Fix use after free bug in imsttfb_probe
	drm/edid: Fix uninitialized variable in drm_cvt_modes()
	scripts/tags.sh: Resolve gtags empty index generation
	drm/amdgpu: Validate VM ioctl flags.
	treewide: Remove uninitialized_var() usage
	md/raid10: fix overflow of md/safe_mode_delay
	md/raid10: fix wrong setting of max_corr_read_errors
	md/raid10: fix io loss while replacement replace rdev
	PM: domains: fix integer overflow issues in genpd_parse_state()
	evm: Complete description of evm_inode_setattr()
	wifi: ath9k: fix AR9003 mac hardware hang check register offset calculation
	wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx
	wifi: orinoco: Fix an error handling path in spectrum_cs_probe()
	wifi: orinoco: Fix an error handling path in orinoco_cs_probe()
	wifi: atmel: Fix an error handling path in atmel_probe()
	wifi: wl3501_cs: Fix an error handling path in wl3501_probe()
	wifi: ray_cs: Fix an error handling path in ray_probe()
	wifi: ath9k: don't allow to overwrite ENDPOINT0 attributes
	watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct config
	watchdog/perf: more properly prevent false positives with turbo modes
	kexec: fix a memory leak in crash_shrink_memory()
	memstick r592: make memstick_debug_get_tpc_name() static
	wifi: ath9k: Fix possible stall on ath9k_txq_list_has_key()
	wifi: ath9k: convert msecs to jiffies where needed
	netlink: fix potential deadlock in netlink_set_err()
	netlink: do not hard code device address lenth in fdb dumps
	gtp: Fix use-after-free in __gtp_encap_destroy().
	lib/ts_bm: reset initial match offset for every block of text
	netfilter: nf_conntrack_sip: fix the ct_sip_parse_numerical_param() return value.
	netlink: Add __sock_i_ino() for __netlink_diag_dump().
	radeon: avoid double free in ci_dpm_init()
	Input: drv260x - sleep between polling GO bit
	ARM: dts: BCM5301X: Drop "clock-names" from the SPI node
	Input: adxl34x - do not hardcode interrupt trigger type
	drm/panel: simple: fix active size for Ampire AM-480272H3TMQW-T01H
	ARM: ep93xx: fix missing-prototype warnings
	ASoC: es8316: Increment max value for ALC Capture Target Volume control
	soc/fsl/qe: fix usb.c build errors
	fbdev: omapfb: lcd_mipid: Fix an error handling path in mipid_spi_probe()
	drm/radeon: fix possible division-by-zero errors
	ALSA: ac97: Fix possible NULL dereference in snd_ac97_mixer
	scsi: 3w-xxxx: Add error handling for initialization failure in tw_probe()
	PCI: Add pci_clear_master() stub for non-CONFIG_PCI
	pinctrl: cherryview: Return correct value if pin in push-pull mode
	perf dwarf-aux: Fix off-by-one in die_get_varname()
	pinctrl: at91-pio4: check return value of devm_kasprintf()
	crypto: nx - fix build warnings when DEBUG_FS is not enabled
	modpost: fix section mismatch message for R_ARM_ABS32
	modpost: fix section mismatch message for R_ARM_{PC24,CALL,JUMP24}
	modpost: fix off by one in is_executable_section()
	USB: serial: option: add LARA-R6 01B PIDs
	block: change all __u32 annotations to __be32 in affs_hardblocks.h
	w1: fix loop in w1_fini()
	sh: j2: Use ioremap() to translate device tree address into kernel memory
	media: usb: Check az6007_read() return value
	media: videodev2.h: Fix struct v4l2_input tuner index comment
	media: usb: siano: Fix warning due to null work_func_t function pointer
	extcon: Fix kernel doc of property fields to avoid warnings
	extcon: Fix kernel doc of property capability fields to avoid warnings
	usb: phy: phy-tahvo: fix memory leak in tahvo_usb_probe()
	mfd: rt5033: Drop rt5033-battery sub-device
	mfd: intel-lpss: Add missing check for platform_get_resource
	mfd: stmpe: Only disable the regulators if they are enabled
	rtc: st-lpc: Release some resources in st_rtc_probe() in case of error
	sctp: fix potential deadlock on &net->sctp.addr_wq_lock
	Add MODULE_FIRMWARE() for FIRMWARE_TG357766.
	spi: bcm-qspi: return error if neither hif_mspi nor mspi is available
	mailbox: ti-msgmgr: Fill non-message tx data fields with 0x0
	powerpc: allow PPC_EARLY_DEBUG_CPM only when SERIAL_CPM=y
	net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode
	tcp: annotate data races in __tcp_oow_rate_limited()
	net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
	sh: dma: Fix DMA channel offset calculation
	NFSD: add encoding of op_recall flag for write delegation
	mmc: core: disable TRIM on Kingston EMMC04G-M627
	mmc: core: disable TRIM on Micron MTFC4GACAJCN-1M
	integrity: Fix possible multiple allocation in integrity_inode_get()
	jffs2: reduce stack usage in jffs2_build_xattr_subsystem()
	btrfs: fix race when deleting quota root from the dirty cow roots list
	ARM: orion5x: fix d2net gpio initialization
	spi: spi-fsl-spi: remove always-true conditional in fsl_spi_do_one_msg
	spi: spi-fsl-spi: relax message sanity checking a little
	spi: spi-fsl-spi: allow changing bits_per_word while CS is still active
	netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
	netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
	netfilter: nf_tables: unbind non-anonymous set if rule construction fails
	netfilter: conntrack: Avoid nf_ct_helper_hash uses after free
	netfilter: nf_tables: prevent OOB access in nft_byteorder_eval
	workqueue: clean up WORK_* constant types, clarify masking
	net: mvneta: fix txq_map in case of txq_number==1
	udp6: fix udp6_ehashfn() typo
	ntb: idt: Fix error handling in idt_pci_driver_init()
	NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
	ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
	NTB: ntb_transport: fix possible memory leak while device_register() fails
	ipv6/addrconf: fix a potential refcount underflow for idev
	wifi: airo: avoid uninitialized warning in airo_get_rate()
	net/sched: make psched_mtu() RTNL-less safe
	tpm: tpm_vtpm_proxy: fix a race condition in /dev/vtpmx creation
	SUNRPC: Fix UAF in svc_tcp_listen_data_ready()
	perf intel-pt: Fix CYC timestamps after standalone CBR
	ext4: fix wrong unit use in ext4_mb_clear_bb
	ext4: only update i_reserved_data_blocks on successful block allocation
	jfs: jfs_dmap: Validate db_l2nbperpage while mounting
	PCI: Add function 1 DMA alias quirk for Marvell 88SE9235
	misc: pci_endpoint_test: Re-init completion for every test
	md/raid0: add discard support for the 'original' layout
	fs: dlm: return positive pid value for F_GETLK
	hwrng: imx-rngc - fix the timeout for init and self check
	meson saradc: fix clock divider mask length
	Revert "8250: add support for ASIX devices with a FIFO bug"
	tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() in case of error
	tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() when iterating clk
	ring-buffer: Fix deadloop issue on reading trace_pipe
	xtensa: ISS: fix call to split_if_spec
	scsi: qla2xxx: Wait for io return on terminate rport
	scsi: qla2xxx: Fix potential NULL pointer dereference
	scsi: qla2xxx: Check valid rport returned by fc_bsg_to_rport()
	scsi: qla2xxx: Pointer may be dereferenced
	serial: atmel: don't enable IRQs prematurely
	perf probe: Add test for regression introduced by switch to die_get_decl_file()
	fuse: revalidate: don't invalidate if interrupted
	can: bcm: Fix UAF in bcm_proc_show()
	ext4: correct inline offset when handling xattrs in inode body
	debugobjects: Recheck debug_objects_enabled before reporting
	nbd: Add the maximum limit of allocated index in nbd_dev_add
	md: fix data corruption for raid456 when reshape restart while grow up
	md/raid10: prevent soft lockup while flush writes
	posix-timers: Ensure timer ID search-loop limit is valid
	sched/fair: Don't balance task to its current running CPU
	bpf: Address KCSAN report on bpf_lru_list
	wifi: wext-core: Fix -Wstringop-overflow warning in ioctl_standard_iw_point()
	igb: Fix igb_down hung on surprise removal
	spi: bcm63xx: fix max prepend length
	fbdev: imxfb: warn about invalid left/right margin
	pinctrl: amd: Use amd_pinconf_set() for all config options
	net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()
	fbdev: au1200fb: Fix missing IRQ check in au1200fb_drv_probe
	llc: Don't drop packet from non-root netns.
	netfilter: nf_tables: fix spurious set element insertion failure
	tcp: annotate data-races around rskq_defer_accept
	tcp: annotate data-races around tp->notsent_lowat
	tcp: annotate data-races around fastopenq.max_qlen
	gpio: tps68470: Make tps68470_gpio_output() always set the initial value
	i40e: Fix an NULL vs IS_ERR() bug for debugfs_create_dir()
	ethernet: atheros: fix return value check in atl1e_tso_csum()
	ipv6 addrconf: fix bug where deleting a mngtmpaddr can create a new temporary address
	tcp: Reduce chance of collisions in inet6_hashfn().
	bonding: reset bond's flags when down link is P2P device
	team: reset team's flags when down link is P2P device
	platform/x86: msi-laptop: Fix rfkill out-of-sync on MSI Wind U100
	benet: fix return value check in be_lancer_xmit_workarounds()
	ASoC: fsl_spdif: Silence output on stop
	block: Fix a source code comment in include/uapi/linux/blkzoned.h
	dm raid: fix missing reconfig_mutex unlock in raid_ctr() error paths
	ata: pata_ns87415: mark ns87560_tf_read static
	ring-buffer: Fix wrong stat of cpu_buffer->read
	tracing: Fix warning in trace_buffered_event_disable()
	USB: serial: option: support Quectel EM060K_128
	USB: serial: option: add Quectel EC200A module support
	USB: serial: simple: add Kaufmann RKS+CAN VCP
	USB: serial: simple: sort driver entries
	can: gs_usb: gs_can_close(): add missing set of CAN state to CAN_STATE_STOPPED
	usb: ohci-at91: Fix the unhandle interrupt when resume
	usb: xhci-mtk: set the dma max_seg_size
	Documentation: security-bugs.rst: update preferences when dealing with the linux-distros group
	staging: ks7010: potential buffer overflow in ks_wlan_set_encode_ext()
	hwmon: (nct7802) Fix for temp6 (PECI1) processed even if PECI1 disabled
	tpm_tis: Explicitly check for error code
	irq-bcm6345-l1: Do not assume a fixed block to cpu mapping
	s390/dasd: fix hanging device after quiesce/resume
	ASoC: wm8904: Fill the cache for WM8904_ADC_TEST_0 register
	dm cache policy smq: ensure IO doesn't prevent cleaner policy progress
	drm/client: Fix memory leak in drm_client_target_cloned
	net/sched: cls_fw: Fix improper refcount update leads to use-after-free
	net/sched: sch_qfq: account for stab overhead in qfq_enqueue
	net/sched: cls_u32: Fix reference counter leak leading to overflow
	perf: Fix function pointer case
	word-at-a-time: use the same return type for has_zero regardless of endianness
	net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
	perf test uprobe_from_different_cu: Skip if there is no gcc
	net: add missing data-race annotations around sk->sk_peek_off
	net: add missing data-race annotation for sk_ll_usec
	net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free
	net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
	ip6mr: Fix skb_under_panic in ip6mr_cache_report()
	tcp_metrics: fix addr_same() helper
	tcp_metrics: annotate data-races around tm->tcpm_stamp
	tcp_metrics: annotate data-races around tm->tcpm_lock
	tcp_metrics: annotate data-races around tm->tcpm_vals[]
	tcp_metrics: annotate data-races around tm->tcpm_net
	tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
	loop: Select I/O scheduler 'none' from inside add_disk()
	libceph: fix potential hang in ceph_osdc_notify()
	USB: zaurus: Add ID for A-300/B-500/C-700
	fs/sysv: Null check to prevent null-ptr-deref bug
	Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb
	net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb
	ext2: Drop fragment support
	test_firmware: fix a memory leak with reqs buffer
	mtd: rawnand: omap_elm: Fix incorrect type in assignment
	drm/edid: fix objtool warning in drm_cvt_modes()
	Linux 4.14.322

Change-Id: Ia25c00bd23a112b634b83577ec7d54569e8b7c70
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-08-23 14:54:21 +00:00
Kees Cook
d68627697d treewide: Remove uninitialized_var() usage
commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream.

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11 11:33:32 +02:00
John Galt
3a03bd4b85
mm/swap.c: complete 5a9bd6bb520
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-31 10:08:34 +00:00
Minchan Kim
e13c88efb8
mm: introduce deactivate_page
perprocess reclaims needs to deactivate file pages from active LRU
when echo file > /proc/<pid>/reclaim.
Add deactivate_file pages.

Bug: 131016077
Bug: 153444106
(cherry picked from b07eab27085611203ad359b7f4eecd138d7d771a)
Change-Id: I06fed20103671e4ca6fb8663d5029736442162a5
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-31 09:52:32 +00:00
Yu Zhao
663500cfc8
mglru: fixes
Signed-off-by: Yu Zhao <yuzhao@google.com>
Change-Id: I09999793fd6481bcf2acefb29c57d5d40c0b1427
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-31 09:27:36 +00:00
Dark-Matter7232
94181990a4
mm: Import oplus memory management hacks
- Add a separate swappiness (default 60) value to use if task isn't handled by kswapd.
- Don't throttle tasks with OOM value < 0.
- Increase zsmalloc's maximum page order.
[cyberknight777]:
- Reword Kconfig option.
- Import with clean indents.
- Be more descriptive in Kconfig option's description.

Signed-off-by: Dark-Matter7232 <kerneldeveloper7232@gmail.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-26 08:09:42 +00:00
Yu Zhao
3afd29a4ec
FROMLIST: mm: multi-gen LRU: minimal implementation
To avoid confusion, the terms "promotion" and "demotion" will be
applied to the multi-gen LRU, as a new convention; the terms
"activation" and "deactivation" will be applied to the active/inactive
LRU, as usual.

The aging produces young generations. Given an lruvec, it increments
max_seq when max_seq-min_seq+1 approaches MIN_NR_GENS. The aging
promotes hot pages to the youngest generation when it finds them
accessed through page tables; the demotion of cold pages happens
consequently when it increments max_seq. The aging has the complexity
O(nr_hot_pages), since it is only interested in hot pages. Promotion
in the aging path does not require any LRU list operations, only the
updates of the gen counter and lrugen->nr_pages[]; demotion, unless as
the result of the increment of max_seq, requires LRU list operations,
e.g., lru_deactivate_fn().

The eviction consumes old generations. Given an lruvec, it increments
min_seq when the lists indexed by min_seq%MAX_NR_GENS become empty. A
feedback loop modeled after the PID controller monitors refaults over
anon and file types and decides which type to evict when both types
are available from the same generation.

Each generation is divided into multiple tiers. Tiers represent
different ranges of numbers of accesses through file descriptors. A
page accessed N times through file descriptors is in tier
order_base_2(N). Tiers do not have dedicated lrugen->lists[], only
bits in page->flags. In contrast to moving across generations, which
requires the LRU lock, moving across tiers only involves operations on
page->flags. The feedback loop also monitors refaults over all tiers
and decides when to protect pages in which tiers (N>1), using the
first tier (N=0,1) as a baseline. The first tier contains single-use
unmapped clean pages, which are most likely the best choices. The
eviction moves a page to the next generation, i.e., min_seq+1, if the
feedback loop decides so. This approach has the following advantages:
1. It removes the cost of activation in the buffered access path by
   inferring whether pages accessed multiple times through file
   descriptors are statistically hot and thus worth protecting in the
   eviction path.
2. It takes pages accessed through page tables into account and avoids
   overprotecting pages accessed multiple times through file
   descriptors. (Pages accessed through page tables are in the first
   tier, since N=0.)
3. More tiers provide better protection for pages accessed more than
   twice through file descriptors, when under heavy buffered I/O
   workloads.

Server benchmark results:
  Single workload:
    fio (buffered I/O): +[38, 40]%
                         IOPS         BW
      5.18-ed4643521e6a: 2547k        9989MiB/s
      patch1-6:          3540k        13.5GiB/s

  Single workload:
    memcached (anon): +[103, 107]%
                         Ops/sec      KB/sec
      5.18-ed4643521e6a: 469048.66    18243.91
      patch1-6:          964656.80    37520.88

  Configurations:
    CPU: two Xeon 6154
    Mem: total 256G

    Node 1 was only used as a ram disk to reduce the variance in the
    results.

    patch drivers/block/brd.c <<EOF
    99,100c99,100
    < 	gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM;
    < 	page = alloc_page(gfp_flags);
    ---
    > 	gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM | __GFP_THISNODE;
    > 	page = alloc_pages_node(1, gfp_flags, 0);
    EOF

    cat >>/etc/systemd/system.conf <<EOF
    CPUAffinity=numa
    NUMAPolicy=bind
    NUMAMask=0
    EOF

    cat >>/etc/memcached.conf <<EOF
    -m 184320
    -s /var/run/memcached/memcached.sock
    -a 0766
    -t 36
    -B binary
    EOF

    cat fio.sh
    modprobe brd rd_nr=1 rd_size=113246208
    swapoff -a
    mkfs.ext4 /dev/ram0
    mount -t ext4 /dev/ram0 /mnt

    mkdir /sys/fs/cgroup/user.slice/test
    echo 38654705664 >/sys/fs/cgroup/user.slice/test/memory.max
    echo $$ >/sys/fs/cgroup/user.slice/test/cgroup.procs
    fio -name=mglru --numjobs=72 --directory=/mnt --size=1408m \
      --buffered=1 --ioengine=io_uring --iodepth=128 \
      --iodepth_batch_submit=32 --iodepth_batch_complete=32 \
      --rw=randread --random_distribution=random --norandommap \
      --time_based --ramp_time=10m --runtime=5m --group_reporting

    cat memcached.sh
    modprobe brd rd_nr=1 rd_size=113246208
    swapoff -a
    mkswap /dev/ram0
    swapon /dev/ram0

    memtier_benchmark -S /var/run/memcached/memcached.sock \
      -P memcache_binary -n allkeys --key-minimum=1 \
      --key-maximum=65000000 --key-pattern=P:P -c 1 -t 36 \
      --ratio 1:0 --pipeline 8 -d 2000

    memtier_benchmark -S /var/run/memcached/memcached.sock \
      -P memcache_binary -n allkeys --key-minimum=1 \
      --key-maximum=65000000 --key-pattern=R:R -c 1 -t 36 \
      --ratio 0:1 --pipeline 8 --randomize --distinct-client-seed

Client benchmark results:
  kswapd profiles:
    5.18-ed4643521e6a
      39.56%  page_vma_mapped_walk
      19.32%  lzo1x_1_do_compress (real work)
       7.18%  do_raw_spin_lock
       4.23%  _raw_spin_unlock_irq
       2.26%  vma_interval_tree_subtree_search
       2.12%  vma_interval_tree_iter_next
       2.11%  folio_referenced_one
       1.90%  anon_vma_interval_tree_iter_first
       1.47%  ptep_clear_flush
       0.97%  __anon_vma_interval_tree_subtree_search

    patch1-6
      36.13%  lzo1x_1_do_compress (real work)
      19.16%  page_vma_mapped_walk
       6.55%  _raw_spin_unlock_irq
       4.02%  do_raw_spin_lock
       2.32%  anon_vma_interval_tree_iter_first
       2.11%  ptep_clear_flush
       1.76%  __zram_bvec_write
       1.64%  folio_referenced_one
       1.40%  memmove
       1.35%  obj_malloc

  Configurations:
    CPU: single Snapdragon 7c
    Mem: total 4G

    Chrome OS MemoryPressure [1]

[1] https://chromium.googlesource.com/chromiumos/platform/tast-tests/

Link: https://lore.kernel.org/r/20220309021230.721028-7-yuzhao@google.com/
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Bug: 228114874
Change-Id: I3fe4850006d7984cd9f4fd46134b826609dc2f86
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-01 09:16:44 +00:00
Yu Zhao
a17d122b82
FROMLIST: mm: multi-gen LRU: groundwork
Evictable pages are divided into multiple generations for each lruvec.
The youngest generation number is stored in lrugen->max_seq for both
anon and file types as they are aged on an equal footing. The oldest
generation numbers are stored in lrugen->min_seq[] separately for anon
and file types as clean file pages can be evicted regardless of swap
constraints. These three variables are monotonically increasing.

Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits
in order to fit into the gen counter in page->flags. Each truncated
generation number is an index to lrugen->lists[]. The sliding window
technique is used to track at least MIN_NR_GENS and at most
MAX_NR_GENS generations. The gen counter stores a value within [1,
MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it
stores 0.

There are two conceptually independent procedures: "the aging", which
produces young generations, and "the eviction", which consumes old
generations. They form a closed-loop system, i.e., "the page reclaim".
Both procedures can be invoked from userspace for the purposes of
working set estimation and proactive reclaim. These features are
required to optimize job scheduling (bin packing) in data centers. The
variable size of the sliding window is designed for such use cases
[1][2].

To avoid confusion, the terms "hot" and "cold" will be applied to the
multi-gen LRU, as a new convention; the terms "active" and "inactive"
will be applied to the active/inactive LRU, as usual.

The protection of hot pages and the selection of cold pages are based
on page access channels and patterns. There are two access channels:
one through page tables and the other through file descriptors. The
protection of the former channel is by design stronger because:
1. The uncertainty in determining the access patterns of the former
   channel is higher due to the approximation of the accessed bit.
2. The cost of evicting the former channel is higher due to the TLB
   flushes required and the likelihood of encountering the dirty bit.
3. The penalty of underprotecting the former channel is higher because
   applications usually do not prepare themselves for major page
   faults like they do for blocked I/O. E.g., GUI applications
   commonly use dedicated I/O threads to avoid blocking the rendering
   threads.
There are also two access patterns: one with temporal locality and the
other without. For the reasons listed above, the former channel is
assumed to follow the former pattern unless VM_SEQ_READ or
VM_RAND_READ is present; the latter channel is assumed to follow the
latter pattern unless outlying refaults have been observed [3][4].

The next patch will address the "outlying refaults". Three macros,
i.e., LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are
added in this patch to make the entire patchset less diffy.

A page is added to the youngest generation on faulting. The aging
needs to check the accessed bit at least twice before handing this
page over to the eviction. The first check takes care of the accessed
bit set on the initial fault; the second check makes sure this page
has not been used since then. This protocol, AKA second chance,
requires a minimum of two generations, hence MIN_NR_GENS.

[1] https://dl.acm.org/doi/10.1145/3297858.3304053
[2] https://dl.acm.org/doi/10.1145/3503222.3507731
[3] https://lwn.net/Articles/495543/
[4] https://lwn.net/Articles/815342/

Link: https://lore.kernel.org/r/20220309021230.721028-6-yuzhao@google.com/
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Bug: 228114874
Change-Id: I333ec6a1d2abfa60d93d6adc190ed3eefe441512
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-01 09:16:44 +00:00
Yu Zhao
120db00c5c
UPSTREAM: mm: VM_BUG_ON lru page flags
Move scattered VM_BUG_ONs to two essential places that cover all
lru list additions and deletions.

Link: https://lore.kernel.org/linux-mm/20201207220949.830352-8-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20210122220600.906146-8-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit bc7112719e1e80e4208eef3fc9bd8d2b6c263e7d)
Bug: 228114874
Change-Id: I950ad4171f973c740d9fc3778d44efc020d0e12c
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-01 09:15:23 +00:00
Yu Zhao
b2f66a571e
BACKPORT: mm: add __clear_page_lru_flags() to replace page_off_lru()
Similar to page_off_lru(), the new function does non-atomic clearing
of PageLRU() in addition to PageActive() and PageUnevictable(), on a
page that has no references left.

If PageActive() and PageUnevictable() are both set, refuse to clear
either and leave them to bad_page(). This is a behavior change that
is meant to help debug.

Link: https://lore.kernel.org/linux-mm/20201207220949.830352-7-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20210122220600.906146-7-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 875601796267214f286d3581fe74f2805d060fe8)
Bug: 228114874
Change-Id: I0290916fa08277c50e228a8d3f39af67d62ff9d0
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-01 09:15:23 +00:00
Yu Zhao
bb4a01c36e
BACKPORT: mm/swap.c: don't pass "enum lru_list" to del_page_from_lru_list()
The parameter is redundant in the sense that it can be potentially
extracted from the "struct page" parameter by page_lru(). We need to
make sure that existing PageActive() or PageUnevictable() remains
until the function returns. A few places don't conform, and simple
reordering fixes them.

This patch may have left page_off_lru() seemingly odd, and we'll take
care of it in the next patch.

Link: https://lore.kernel.org/linux-mm/20201207220949.830352-6-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20210122220600.906146-6-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 46ae6b2cc2a47904a368d238425531ea91f3a2a5)
Bug: 228114874
Change-Id: I1e14dcbf4111b39cf155ed3512423448865eb324
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-01 09:15:23 +00:00
Yu Zhao
a359abc52d
UPSTREAM: mm/swap.c: don't pass "enum lru_list" to trace_mm_lru_insertion()
The parameter is redundant in the sense that it can be extracted
from the "struct page" parameter by page_lru() correctly.

Link: https://lore.kernel.org/linux-mm/20201207220949.830352-5-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20210122220600.906146-5-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 861404536a3af3c39f1b10959a40def3d8efa2dd)
Bug: 228114874
Change-Id: Ia02c0c65dd427a98ffa39e9dc3e2ae701e85fad8
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-01 09:15:23 +00:00
Yu Zhao
074586d6fb
BACKPORT: mm: don't pass "enum lru_list" to lru list addition functions
The "enum lru_list" parameter to add_page_to_lru_list() and
add_page_to_lru_list_tail() is redundant in the sense that it can
be extracted from the "struct page" parameter by page_lru().

A caveat is that we need to make sure PageActive() or
PageUnevictable() is correctly set or cleared before calling
these two functions. And they are indeed.

Link: https://lore.kernel.org/linux-mm/20201207220949.830352-4-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20210122220600.906146-4-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 3a9c9788a3149d9745b7eb2eae811e57ef3b127c)
Bug: 228114874
Change-Id: I0d92b845d18e6ab3bcb5645f22e3cedb04257d98
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-01 09:15:22 +00:00
Yu Zhao
83d218260d
BACKPORT: mm: remove superfluous __ClearPageActive()
To activate a page, mark_page_accessed() always holds a reference on it.
It either gets a new reference when adding a page to
lru_pvecs.activate_page or reuses an existing one it previously got when
it added a page to lru_pvecs.lru_add.  So it doesn't call SetPageActive()
on a page that doesn't have any reference left.  Therefore, the race is
impossible these days (I didn't brother to dig into its history).

For other paths, namely reclaim and migration, a reference count is always
held while calling SetPageActive() on a page.

SetPageSlabPfmemalloc() also uses SetPageActive(), but it's irrelevant to
LRU pages.

Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/20200818184704.3625199-2-yuzhao@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6f4dd8de4835563de9bae797ce1d7a13465a7a7d)
Bug: 228114874
Change-Id: Ie432a8580024926fe9d0fa18eeb5a31b23593b21
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-01 09:14:41 +00:00
Yu Zhao
1a86bed8d9
UPSTREAM: mm: replace list_move_tail() with add_page_to_lru_list_tail()
This is a cleanup patch that replaces two historical uses of
list_move_tail() with relatively recent add_page_to_lru_list_tail().

Link: http://lkml.kernel.org/r/20190716212436.7137-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e7a1aaf28770c1f7a06c50cbd02ca0f27ce61ec5)
Bug: 228114874
Change-Id: I47d474271dfb5fdcc28e062b1db86ecbccbf85d0
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-07-01 09:14:41 +00:00
azrim
c5bdec1129
Revert "mm: only drain per-cpu pagevecs once per pagevec usage"
This reverts commit 8fbf24ee0ee3486e8985c4b963c2bc83c9e74d1c.
2022-07-01 09:13:43 +00:00
azrim
2c5929ae4f
Revert "mm, pagevec: remove cold parameter for pagevecs"
This reverts commit 9443cf7bcca4bd5a56a070fd422710e3438db00c.
2022-07-01 09:13:34 +00:00
azrim
aa4e942d07
Revert "mm, pagevec: rename pagevec drained field"
This reverts commit b698bab3fb23bbc51f113883f9c62cd3b8838342.
2022-07-01 09:13:21 +00:00
azrim
b1a539daf7
Revert "mm: introduce deactivate_page"
This reverts commit be7a87c7d612a71e12267def1761091d2817d476.
2022-07-01 09:07:54 +00:00
azrim
8aa10e2cc5
Revert "UPSTREAM: mm: replace list_move_tail() with add_page_to_lru_list_tail()"
This reverts commit 0488665840f7851a937dedef051cc5bc1a6cf3bd.
2022-07-01 09:06:22 +00:00
azrim
6bcfe1bf37
Revert "BACKPORT: mm: remove superfluous __ClearPageActive()"
This reverts commit 4c40b566cb5c48f4b9b17f29aba5044638a5f1d6.
2022-07-01 09:06:04 +00:00
azrim
f4d0b65ee5
Revert "BACKPORT: mm: don't pass "enum lru_list" to lru list addition functions"
This reverts commit a982a833bcfd9fabb38906b045d848aab2625be8.
2022-07-01 09:05:02 +00:00
azrim
3d4078ef6e
Revert "BACKPORT: mm/swap.c: don't pass "enum lru_list" to trace_mm_lru_insertion()"
This reverts commit 78733943f9a599c0319b780e1f7aeace700b58f8.
2022-07-01 09:04:50 +00:00
azrim
acf071c96b
Revert "BACKPORT: mm/swap.c: don't pass "enum lru_list" to del_page_from_lru_list()"
This reverts commit ab4b57fe4682a6246055e00f8dca5b90ea37dce0.
2022-07-01 09:04:09 +00:00
azrim
7f3803b3bd
Revert "BACKPORT: mm: add __clear_page_lru_flags() to replace page_off_lru()"
This reverts commit a68a203906589bee95afbf10c1b386bf76d612e2.
2022-07-01 09:03:57 +00:00
azrim
cba9062e83
Revert "UPSTREAM: mm: VM_BUG_ON lru page flags"
This reverts commit aae8c5a11011f8c54b0a58b536ad1361fc65d3ad.
2022-07-01 09:03:44 +00:00
azrim
10addd7e22
Revert "BACKPORT: FROMLIST: mm: multigenerational lru: activation"
This reverts commit a1790dfaabf7be9da0ef1017c5346063603b95df.
2022-07-01 08:28:50 +00:00
azrim
7be76f8359
Revert "mm/swap.c: fix compilation"
This reverts commit 314860a83b82e83aaa3d72d2bd28c524ea7efaf5.
2022-07-01 08:24:45 +00:00
azrim
314860a83b
mm/swap.c: fix compilation
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-06-10 15:54:21 +07:00
Yu Zhao
a1790dfaab
BACKPORT: FROMLIST: mm: multigenerational lru: activation
For pages mapped upon page faults, the accessed bit is set during the
initial faults. We add them to the per-zone lists index by max_seq,
i.e., the youngest generation, so that eviction will not consider them
before the aging has scanned them. Readahead pages allocated in the
page fault path will also be added to the youngest generation, since
it is assumed that they may be needed soon.

For pages accessed multiple times via file descriptors, instead of
activating them upon the second access, we activate them based on the
refault rates of their tiers. Each generation contains at most
MAX_NR_TIERS tiers, and they require additional MAX_NR_TIERS-2 bits in
page->flags. Pages accessed N times via file descriptors belong to
tier order_base_2(N). Tier 0 is the base tier and it contains pages
read ahead, accessed once via file descriptors and accessed only via
page tables. Pages from the base tier are evicted regardless of the
refault rate. Pages from upper tiers that have higher refault rates
than the base tier will be moved to the next generation. A feedback
loop modeled after the PID controller monitors refault rates across
all tiers and decides when to activate pages from which upper tiers
in the reclaim path. The advantages of this model are:
  1) It has a negligible cost in the buffered IO access path because
  activations are done optionally in the reclaim path.
  2) It takes mapped pages into account and avoids overprotecting
  pages accessed multiple times via file descriptors.
  3) More tiers offer better protection to pages accessed more than
  twice when workloads doing intensive buffered IO are under memory
  pressure.

Finally, we need to make sure deactivation works when the
multigenerational lru is enabled. We cannot use PageActive() because
it is not set on pages from active generations, in order to spare the
aging the trouble of clearing it when active generations become
inactive. So we deactivate pages unconditionally since deactivation is
not a hot code path worth additional optimizations.

Signed-off-by: Yu Zhao <yuzhao@google.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
(am from https://lore.kernel.org/patchwork/patch/1432183/)

BUG=b:123039911
TEST=Built

Change-Id: Ibc9c90757fd095cdcc0a49823ada6b55f17ffc06
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2928252
Reviewed-by: Yu Zhao <yuzhao@chromium.org>
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>
Tested-by: Yu Zhao <yuzhao@chromium.org>
Commit-Queue: Yu Zhao <yuzhao@chromium.org>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-06-10 15:54:19 +07:00
Yu Zhao
aae8c5a110
UPSTREAM: mm: VM_BUG_ON lru page flags
Move scattered VM_BUG_ONs to two essential places that cover all
lru list additions and deletions.

Link: https://lore.kernel.org/linux-mm/20201207220949.830352-8-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20210122220600.906146-8-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit bc7112719e1e80e4208eef3fc9bd8d2b6c263e7d)

BUG=b:123039911
TEST=Built

Change-Id: I46712058a18b740251a7c1c80b9dcbcc42dac457
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2913977
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Yu Zhao <yuzhao@chromium.org>
Commit-Queue: Yu Zhao <yuzhao@chromium.org>
Tested-by: Yu Zhao <yuzhao@chromium.org>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-06-10 15:54:18 +07:00
Yu Zhao
a68a203906
BACKPORT: mm: add __clear_page_lru_flags() to replace page_off_lru()
Similar to page_off_lru(), the new function does non-atomic clearing
of PageLRU() in addition to PageActive() and PageUnevictable(), on a
page that has no references left.

If PageActive() and PageUnevictable() are both set, refuse to clear
either and leave them to bad_page(). This is a behavior change that
is meant to help debug.

Link: https://lore.kernel.org/linux-mm/20201207220949.830352-7-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20210122220600.906146-7-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 875601796267214f286d3581fe74f2805d060fe8)

BUG=b:123039911
TEST=Built

Change-Id: I86b973cd52a0ddb0fb1453c5fcd787aa885297e6
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2913976
Reviewed-by: Yu Zhao <yuzhao@chromium.org>
Commit-Queue: Yu Zhao <yuzhao@chromium.org>
Tested-by: Yu Zhao <yuzhao@chromium.org>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-06-10 15:54:18 +07:00
Yu Zhao
ab4b57fe46
BACKPORT: mm/swap.c: don't pass "enum lru_list" to del_page_from_lru_list()
The parameter is redundant in the sense that it can be potentially
extracted from the "struct page" parameter by page_lru(). We need to
make sure that existing PageActive() or PageUnevictable() remains
until the function returns. A few places don't conform, and simple
reordering fixes them.

This patch may have left page_off_lru() seemingly odd, and we'll take
care of it in the next patch.

Link: https://lore.kernel.org/linux-mm/20201207220949.830352-6-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20210122220600.906146-6-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 46ae6b2cc2a47904a368d238425531ea91f3a2a5)

BUG=b:123039911
TEST=Built

Change-Id: Iaf7a9c8c71da7d41c40f566ef9be8ac33c4e012d
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2913975
Reviewed-by: Yu Zhao <yuzhao@chromium.org>
Commit-Queue: Yu Zhao <yuzhao@chromium.org>
Tested-by: Yu Zhao <yuzhao@chromium.org>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-06-10 15:54:17 +07:00
Yu Zhao
78733943f9
BACKPORT: mm/swap.c: don't pass "enum lru_list" to trace_mm_lru_insertion()
The parameter is redundant in the sense that it can be extracted
from the "struct page" parameter by page_lru() correctly.

Link: https://lore.kernel.org/linux-mm/20201207220949.830352-5-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20210122220600.906146-5-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 861404536a3af3c39f1b10959a40def3d8efa2dd)

BUG=b:123039911
TEST=Built

Change-Id: I06661696d32705a0753b45d5886ae87a59953ee7
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2913974
Reviewed-by: Yu Zhao <yuzhao@chromium.org>
Commit-Queue: Yu Zhao <yuzhao@chromium.org>
Tested-by: Yu Zhao <yuzhao@chromium.org>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-06-10 15:54:17 +07:00
Yu Zhao
a982a833bc
BACKPORT: mm: don't pass "enum lru_list" to lru list addition functions
The "enum lru_list" parameter to add_page_to_lru_list() and
add_page_to_lru_list_tail() is redundant in the sense that it can
be extracted from the "struct page" parameter by page_lru().

A caveat is that we need to make sure PageActive() or
PageUnevictable() is correctly set or cleared before calling
these two functions. And they are indeed.

Link: https://lore.kernel.org/linux-mm/20201207220949.830352-4-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20210122220600.906146-4-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 3a9c9788a3149d9745b7eb2eae811e57ef3b127c)

BUG=b:123039911
TEST=Built

Change-Id: Ib58324f3641a83a43d752af5177c40f47a42d8e1
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2913973
Reviewed-by: Yu Zhao <yuzhao@chromium.org>
Commit-Queue: Yu Zhao <yuzhao@chromium.org>
Tested-by: Yu Zhao <yuzhao@chromium.org>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-06-10 15:54:17 +07:00
Yu Zhao
4c40b566cb
BACKPORT: mm: remove superfluous __ClearPageActive()
To activate a page, mark_page_accessed() always holds a reference on it.
It either gets a new reference when adding a page to
lru_pvecs.activate_page or reuses an existing one it previously got when
it added a page to lru_pvecs.lru_add.  So it doesn't call SetPageActive()
on a page that doesn't have any reference left.  Therefore, the race is
impossible these days (I didn't brother to dig into its history).

For other paths, namely reclaim and migration, a reference count is always
held while calling SetPageActive() on a page.

SetPageSlabPfmemalloc() also uses SetPageActive(), but it's irrelevant to
LRU pages.

Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/20200818184704.3625199-2-yuzhao@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6f4dd8de4835563de9bae797ce1d7a13465a7a7d)

BUG=b:123039911
TEST=Built

Change-Id: I3e50ae28408b2936b1eb72210b3046eac8485701
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2913969
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>
Commit-Queue: Yu Zhao <yuzhao@chromium.org>
Tested-by: Yu Zhao <yuzhao@chromium.org>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-06-10 15:54:17 +07:00
Yu Zhao
0488665840
UPSTREAM: mm: replace list_move_tail() with add_page_to_lru_list_tail()
This is a cleanup patch that replaces two historical uses of
list_move_tail() with relatively recent add_page_to_lru_list_tail().

Link: http://lkml.kernel.org/r/20190716212436.7137-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e7a1aaf28770c1f7a06c50cbd02ca0f27ce61ec5)

BUG=b:123039911
TEST=Built

Change-Id: I69db770430d1d282927c31ee8a352532901c774d
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2913967
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>
Commit-Queue: Yu Zhao <yuzhao@chromium.org>
Tested-by: Yu Zhao <yuzhao@chromium.org>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-06-10 15:54:17 +07:00
Minchan Kim
be7a87c7d6
mm: introduce deactivate_page
perprocess reclaims needs to deactivate file pages from active LRU
when echo file > /proc/<pid>/reclaim.
Add deactivate_file pages.

Bug: 131016077
Bug: 153444106
(cherry picked from b07eab27085611203ad359b7f4eecd138d7d771a)
Change-Id: I06fed20103671e4ca6fb8663d5029736442162a5
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:23:44 +07:00
Mel Gorman
b698bab3fb
mm, pagevec: rename pagevec drained field
According to Vlastimil Babka, the drained field in pagevec is
potentially misleading because it might be interpreted as draining this
pagevec instead of the percpu lru pagevecs.  Rename the field for
clarity.

Link: http://lkml.kernel.org/r/20171019093346.ylahzdpzmoriyf4v@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:57 +07:00
Mel Gorman
9443cf7bcc
mm, pagevec: remove cold parameter for pagevecs
Every pagevec_init user claims the pages being released are hot even in
cases where it is unlikely the pages are hot.  As no one cares about the
hotness of pages being released to the allocator, just ditch the
parameter.

No performance impact is expected as the overhead is marginal.  The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.

Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:56 +07:00
Mel Gorman
8fbf24ee0e
mm: only drain per-cpu pagevecs once per pagevec usage
When a pagevec is initialised on the stack, it is generally used
multiple times over a range of pages, looking up entries and then
releasing them.  On each pagevec_release, the per-cpu deferred LRU
pagevecs are drained on the grounds the page being released may be on
those queues and the pages may be cache hot.  In many cases only the
first drain is necessary as it's unlikely that the range of pages being
walked is racing against LRU addition.  Even if there is such a race,
the impact is marginal where as constantly redraining the lru pagevecs
costs.

This patch ensures that pagevec is only drained once in a given
lifecycle without increasing the cache footprint of the pagevec
structure.  Only sparsetruncate tiny is shown here as large files have
many exceptional entries and calls pagecache_release less frequently.

sparsetruncate (tiny)
                              4.14.0-rc4             4.14.0-rc4
                        batchshadow-v1r1          onedrain-v1r1
Min          Time      141.00 (   0.00%)      141.00 (   0.00%)
1st-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
Max-95%      Time      146.00 (   0.00%)      145.00 (   0.68%)
Max-99%      Time      198.00 (   0.00%)      194.00 (   2.02%)
Max          Time      254.00 (   0.00%)      208.00 (  18.11%)
Amean        Time      145.12 (   0.00%)      144.30 (   0.56%)
Stddev       Time       12.74 (   0.00%)        9.62 (  24.49%)
Coeff        Time        8.78 (   0.00%)        6.67 (  24.06%)
Best99%Amean Time      144.29 (   0.00%)      143.82 (   0.32%)
Best95%Amean Time      142.68 (   0.00%)      142.31 (   0.26%)
Best90%Amean Time      142.52 (   0.00%)      142.19 (   0.24%)
Best75%Amean Time      142.26 (   0.00%)      141.98 (   0.20%)
Best50%Amean Time      141.90 (   0.00%)      141.71 (   0.13%)
Best25%Amean Time      141.80 (   0.00%)      141.43 (   0.26%)

The impact on bonnie is marginal and within the noise because a
significant percentage of the file being truncated has been reclaimed
and consists of shadow entries which reduce the hotness of the
pagevec_release path.

Link: http://lkml.kernel.org/r/20171018075952.10627-5-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-05-09 11:22:55 +07:00
Andrzej Perczak
963a3bfe33
mm: Tune parameters for Android
Since Infinity Kernel is released as standalone package other ROMs can
lack mm tuning done in rootdir. For this reason move those tweaks
directly to kernel.

Note that dirty_writeback_centisecs and drity_background_ratio have to be
restricted from writing because Android < 12 modifies these values badly
on boot.

Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-06 13:18:53 +07:00
Blagovest Kolenichev
5c033b4894 Merge android-4.14 (73421a4) into msm-4.14
* refs/heads/tmp-73421a4:
  Revert "f2fs: let discard thread wait a little longer if dev is busy"
  Revert "f2fs: clean up with is_valid_blkaddr()"
  Revert "f2fs: fix race in between GC and atomic open"
  treewide: Use array_size in f2fs_kvzalloc()
  treewide: Use array_size() in f2fs_kzalloc()
  treewide: Use array_size() in f2fs_kmalloc()
  overflow.h: Add allocation size calculation helpers
  f2fs: fix to clear FI_VOLATILE_FILE correctly
  f2fs: let sync node IO interrupt async one
  f2fs: don't change wbc->sync_mode
  f2fs: fix to update mtime correctly
  fs: f2fs: insert space around that ':' and ', '
  fs: f2fs: add missing blank lines after declarations
  fs: f2fs: changed variable type of offset "unsigned" to "loff_t"
  f2fs: clean up symbol namespace
  f2fs: make set_de_type() static
  f2fs: make __f2fs_write_data_pages() static
  f2fs: fix to avoid accessing cross the boundary
  f2fs: fix to let caller retry allocating block address
  disable loading f2fs module on PAGE_SIZE > 4KB
  f2fs: fix error path of move_data_page
  f2fs: don't drop dentry pages after fs shutdown
  f2fs: fix to avoid race during access gc_thread pointer
  f2fs: clean up with clear_radix_tree_dirty_tag
  f2fs: fix to don't trigger writeback during recovery
  f2fs: clear discard_wake earlier
  f2fs: let discard thread wait a little longer if dev is busy
  f2fs: avoid stucking GC due to atomic write
  f2fs: introduce sbi->gc_mode to determine the policy
  f2fs: keep migration IO order in LFS mode
  f2fs: fix to wait page writeback during revoking atomic write
  f2fs: Fix deadlock in shutdown ioctl
  f2fs: detect synchronous writeback more earlier
  mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()
  ceph: use pagevec_lookup_range_nr_tag()
  mm: add variant of pagevec_lookup_range_tag() taking number of pages
  mm: use pagevec_lookup_range_tag() in write_cache_pages()
  mm: use pagevec_lookup_range_tag() in __filemap_fdatawait_range()
  nilfs2: use pagevec_lookup_range_tag()
  gfs2: use pagevec_lookup_range_tag()
  f2fs: use find_get_pages_tag() for looking up single page
  f2fs: simplify page iteration loops
  f2fs: use pagevec_lookup_range_tag()
  ext4: use pagevec_lookup_range_tag()
  ceph: use pagevec_lookup_range_tag()
  btrfs: use pagevec_lookup_range_tag()
  mm: implement find_get_pages_range_tag()
  f2fs: clean up with is_valid_blkaddr()
  f2fs: fix to initialize min_mtime with ULLONG_MAX
  f2fs: fix to let checkpoint guarantee atomic page persistence
  f2fs: fix to initialize i_current_depth according to inode type
  Revert "f2fs: add ovp valid_blocks check for bg gc victim to fg_gc"
  f2fs: don't drop any page on f2fs_cp_error() case
  f2fs: fix spelling mistake: "extenstion" -> "extension"
  f2fs: enhance sanity_check_raw_super() to avoid potential overflows
  f2fs: treat volatile file's data as hot one
  f2fs: introduce release_discard_addr() for cleanup
  f2fs: fix potential overflow
  f2fs: rename dio_rwsem to i_gc_rwsem
  f2fs: move mnt_want_write_file after range check
  f2fs: fix missing clear FI_NO_PREALLOC in some error case
  f2fs: enforce fsync_mode=strict for renamed directory
  f2fs: sanity check for total valid node blocks
  f2fs: sanity check on sit entry
  f2fs: avoid bug_on on corrupted inode
  f2fs: give message and set need_fsck given broken node id
  f2fs: clean up commit_inmem_pages()
  f2fs: do not check F2FS_INLINE_DOTS in recover
  f2fs: remove duplicated dquot_initialize and fix error handling
  f2fs: fix to detect failure of dquot_initialize
  f2fs: stop issue discard if something wrong with f2fs
  f2fs: fix return value in f2fs_ioc_commit_atomic_write
  f2fs: allocate hot_data for atomic write more strictly
  f2fs: check if inmem_pages list is empty correctly
  f2fs: fix race in between GC and atomic open
  f2fs: change le32 to le16 of f2fs_inode->i_extra_size
  f2fs: check cur_valid_map_mir & raw_sit block count when flush sit entries
  f2fs: correct return value of f2fs_trim_fs
  f2fs: fix to show missing bits in FS_IOC_GETFLAGS
  f2fs: remove unneeded F2FS_PROJINHERIT_FL
  f2fs: don't use GFP_ZERO for page caches
  f2fs: issue all big range discards in umount process
  f2fs: remove redundant block plug
  f2fs: remove unmatched zero_user_segment when convert inline dentry
  f2fs: introduce private inode status mapping
  fscrypt: log the crypto algorithm implementations
  fscrypt: add Speck128/256 support
  fscrypt: only derive the needed portion of the key
  fscrypt: separate key lookup from key derivation
  fscrypt: use a common logging function
  fscrypt: remove internal key size constants
  fscrypt: remove unnecessary check for non-logon key type
  fscrypt: make fscrypt_operations.max_namelen an integer
  fscrypt: drop empty name check from fname_decrypt()
  fscrypt: drop max_namelen check from fname_decrypt()
  fscrypt: don't special-case EOPNOTSUPP from fscrypt_get_encryption_info()
  fscrypt: don't clear flags on crypto transform
  fscrypt: remove stale comment from fscrypt_d_revalidate()
  fscrypt: remove error messages for skcipher_request_alloc() failure
  fscrypt: remove unnecessary NULL check when allocating skcipher
  fscrypt: clean up after fscrypt_prepare_lookup() conversions
  ubifs: switch to fscrypt_prepare_lookup()
  ext4: switch to fscrypt_prepare_lookup()
  fscrypt: use unbound workqueue for decryption

Conflicts:
	fs/crypto/keyinfo.c
	fs/ext4/super.c
	fs/f2fs/checkpoint.c
	fs/f2fs/data.c
	fs/f2fs/f2fs.h
	fs/f2fs/file.c
	fs/f2fs/gc.c
	fs/f2fs/inline.c
	fs/f2fs/inode.c
	fs/f2fs/node.c
	fs/f2fs/recovery.c
	fs/f2fs/segment.c
	fs/f2fs/super.c
	include/linux/fscrypt_supp.h
	include/linux/overflow.h

Extra changes were required in below files due to hard
conflicts against HW based FBE:

	fs/crypto/fscrypt_ice.c
	fs/crypto/fscrypt_ice.h
	fs/crypto/fscrypt_private.h
	fs/crypto/keyinfo.c

Change-Id: I01d788e7cd36539f6228742c99820a94a2cc5e46
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2019-04-03 06:00:09 -07:00
Jan Kara
c806c4187a mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()
All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages.  Just drop the argument.

Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-28 15:49:24 -07:00
Jan Kara
b44cc9e860 mm: add variant of pagevec_lookup_range_tag() taking number of pages
Currently pagevec_lookup_range_tag() takes number of pages to look up
but most users don't need this.  Create a new function
pagevec_lookup_range_nr_tag() that takes maximum number of pages to
lookup for Ceph which wants this functionality so that we can drop
nr_pages argument from pagevec_lookup_range_tag().

Link: http://lkml.kernel.org/r/20171009151359.31984-13-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-28 15:49:24 -07:00
Jan Kara
8e8455a68c mm: implement find_get_pages_range_tag()
Patch series "Ranged pagevec tagged lookup", v3.

In this series I provide a ranged variant of pagevec_lookup_tag() and
use it in places where it makes sense.  This series removes some common
code and it also has a potential for speeding up some operations
similarly as for pagevec_lookup_range() (but for now I can think of only
artificial cases where this happens).

This patch (of 16):

Implement a variant of find_get_pages_tag() that stops iterating at
given index.  Lots of users of this function (through pagevec_lookup())
actually want a range lookup and all of them are currently open-coding
this.

Also create corresponding pagevec_lookup_range_tag() function.

Link: http://lkml.kernel.org/r/20171009151359.31984-2-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: David Howells <dhowells@redhat.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steve French <sfrench@samba.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-28 15:49:24 -07:00
Laurent Dufour
34e57df243 mm: introduce __lru_cache_add_active_or_unevictable
The speculative page fault handler which is run without holding the
mmap_sem is calling lru_cache_add_active_or_unevictable() but the vm_flags
is not guaranteed to remain constant.
Introducing __lru_cache_add_active_or_unevictable() which has the vma flags
value parameter instead of the vma pointer.

Change-Id: I68decbe0f80847403127c45c97565e47512532e9
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Patch-mainline: linux-mm @ Tue, 17 Apr 2018 16:33:20
[vinmenon@codeaurora.org: trivial merge conflict fixes]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2018-06-04 15:51:43 +05:30
Shaohua Li
24c92eb7dc mm: avoid marking swap cached page as lazyfree
MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
SwapBacked).  There is no lock to prevent the page is added to swap
cache between these two steps by page reclaim.  Page reclaim could add
the page to swap cache and unmap the page.  After page reclaim, the page
is added back to lru.  At that time, we probably start draining per-cpu
pagevec and mark the page lazyfree.  So the page could be in a state
with SwapBacked cleared and PG_swapcache set.  Next time there is a
refault in the virtual address, do_swap_page can find the page from swap
cache but the page has PageSwapCache false because SwapBacked isn't set,
so do_swap_page will bail out and do nothing.  The task will keep
running into fault handler.

Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
Link: http://lkml.kernel.org/r/6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com
Signed-off-by: Shaohua Li <shli@fb.com>
Reported-by: Artem Savkov <asavkov@redhat.com>
Tested-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>	[4.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:24 -07:00