13459 Commits

Author SHA1 Message Date
Yu Zhao
24cc942015 mm/mglru: Don't sync disk for each aging cycle
wakeup_flusher_threads() was added under the assumption that if a system
runs out of clean cold pages, it might want to write back dirty pages more
aggressively so that they can become clean and be dropped.

However, doing so can breach the rate limit a system wants to impose on
writeback, resulting in early SSD wearout.

Link: https://lkml.kernel.org/r/YzSiWq9UEER5LKup@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Reported-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Change-Id: Ib4def4286264de926b11ec5247185edc3a780619
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-02-08 21:18:18 -03:00
Richard Raya
d97f51cb57 mm: Revert some hacks
This reverts commits:
- 0fc9fbd21297173aa822f97fe33a481053cb96ec [mm + sysctl: tune swappiness and make some values read only]
- 94181990a4ea1a20bb8bf443f3fbe500d05901c3 [mm: Import oplus memory management hacks]
- 97bdd381c8292d43e68ff55bd08767db17e62810 [mm: Set swappiness for CONFIG_INCREASE_MAXIMUM_SWAPPINESS=y case]
- fa8d2aa0e20da6b943157f6ab58068bd80d68920 [mm: move variable under a proper #ifdef]
- f9daeaa423b745b2c2c34a6fb5ac6b69daf746c4 [mm: merge Samsung mm hacks]
- 1a460a832c9c6550f5cbe32dca4c15cf89806b57 [mm: Make watermark_scale_factor read-only]
- 963a3bfe3352b45ea21c58d53055689e46d81eeb [mm: Tune parameters for Android]

Change-Id: I70495ca93a05384a2d7bc2498fd2d56bd9928390
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-02-08 21:17:59 -03:00
Tashfin Shakeer Rhythm
a9566ccc56 msm-4.14: Make macros no-op using ((void)0)
Do not solely rely on compiler optimizations to get the workaround
of having macros do nothing using an empty do-while loop. It's
inefficient.

Use ((void)0) to which the standard assert macro expands when NDEBUG
is defined.

No functional change intended.

[mcdofrenchfreis]:
Implement this patch to tree using the command:
git grep -l "do {} while (0)" | xargs sed -i "s/do {} while (0)/((void)0)/g"

Change-Id: I9615c62c46670e31ed8d0d89d195144541baa3e6
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: mcdofrenchfreis <xyzevan@androidist.net>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-17 00:51:10 -03:00
John Galt
12290e8a8b Revert "mm: process reclaim: vmpressure based process reclaim"
- This reverts commit 7964b3ce47f0d87fbbb1cfdd1fb4aadb620133dd as QCOM vmpressure driven process reclaim is redundant compared to Linux PPR which meets userspace dependencies.

Change-Id: I46782f69c57febed99002681ee268fa4a3111d59
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-04 19:30:58 -03:00
Cyber Knight
06425a87ef Revert "lowmemorykiller: Introduce sysfs node for ALMK and PPR adj threshold"
- This reverts commit f326985b26c272b4a9bcc250e7cf6af28b7c3398 as it does not meet userspace dependencies.

Change-Id: I8aaefeea7cc3dcab1d4a8c94723be238616c9474
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2025-01-04 19:30:55 -03:00
Danny Lin
46c8c57dc0 mm: reclaim: Bump maximum pages per reclaim attempt
4 MiB isn't a meaningful amount to reclaim. Bump the limit to 32 MiB
instead.

Change-Id: I92fc9b35d121e6b39bced13d549e59d9e8e668e8
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-11-04 20:59:55 -03:00
Patrick Daly
6c1efde3d9 mm/oom-kill: Adjust oom_badness per Android expectations
Allow selecting only tasks with oom_score_adj >= 0.

Change-Id: Iebbb487c711da98b8fcb367ba838b5fe0b260d4f
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-11-02 02:27:24 -03:00
Richard Raya
2cd059fb56 This is the 4.14.354 OpenELA-Extended LTS stable release
-----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEERFwmR4yFob14UDOYC8702P6YulgFAmcgko0ZHHZlZ2FyZC5u
 b3NzdW1Ab3JhY2xlLmNvbQAKCRALzvTY/pi6WL/GD/0em+uP/O8QiPYqeGrEECpW
 bgRsBiN3XnyEsghAjplWX12G/zjxA0PY0u2zh9K9sdPw60n8nVZ1OxvPHINwuSC9
 kE9N60SCpJ88ju9OtU+4xz/nxtEmlel8fWy5elagB5wqbWbvsjT52ceZXqSxqhy7
 pQdIDHSiUUwx9JL6vDuJSL+Z/Y216qvBETZLnDSo90raFp/MDa5JmQsh81lLeUt8
 wGKwC/Olnbd21QTStNK34aQGyX5b+3YeACFVPud66Zs9airz9EE6Yq78gwL29L2k
 4jxzihXxSkkfa66eR63ap53+/mEqOZX72m2qEMVOvAcAwU0XsNDTdkXN7z8YQ5T3
 E1rJwr4Ox0hmM+hHBA20w9xRDXZoZmdrcjsU1aNKuK2zTJ0h9DBIvMM2XY5n5sWK
 I4F8E15KyKmu4nXBETreXZixqVLZMgjNFncRLf8XBIL1kxXm65LYCHypp3AgdVgo
 Ccdq5PbC6LAyNPrIOaftIaS9VlU15cqcalu7A+gSoWq55LGWAa3G9vX0ZtYQB9QX
 0R18fbzyjqG6Wa5J5KRDJ+HyS4IvdnEWS8hMR3jfosjMNgJhfDlDeev8NARBiDpX
 d26xogNA7xOOvtdpuwEbnxD5kR0zUdnC73pC4wxdMptYSK6ULKNPmTkA0dKE9qvl
 TDgw4DML8vXQqJ4P+w3Njw==
 =gX2R
 -----END PGP SIGNATURE-----

Merge tag 'v4.14.354-openela' of https://github.com/openela/kernel-lts

This is the 4.14.354 OpenELA-Extended LTS stable release

* tag 'v4.14.354-openela' of https://github.com/openela/kernel-lts: (90 commits)
  LTS: Update to 4.14.354
  drm/fb-helper: set x/yres_virtual in drm_fb_helper_check_var
  ipc: remove memcg accounting for sops objects in do_semtimedop()
  scsi: aacraid: Fix double-free on probe failure
  usb: core: sysfs: Unmerge @usb3_hardware_lpm_attr_group in remove_power_attributes()
  usb: dwc3: st: fix probed platform device ref count on probe error path
  usb: dwc3: core: Prevent USB core invalid event buffer address access
  usb: dwc3: omap: add missing depopulate in probe error path
  USB: serial: option: add MeiG Smart SRM825L
  cdc-acm: Add DISABLE_ECHO quirk for GE HealthCare UI Controller
  net: busy-poll: use ktime_get_ns() instead of local_clock()
  gtp: fix a potential NULL pointer dereference
  net: prevent mss overflow in skb_segment()
  ida: Fix crash in ida_free when the bitmap is empty
  net:rds: Fix possible deadlock in rds_message_put
  fbmem: Check virtual screen sizes in fb_set_var()
  fbcon: Prevent that screen size is smaller than font size
  printk: Export is_console_locked
  memcg: enable accounting of ipc resources
  cgroup/cpuset: Prevent UAF in proc_cpuset_show()
  ...

Change-Id: I7da4d8d188dec9d2833216e5d6580dbd72b99240
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-10-29 20:17:04 -03:00
Al Viro
6f2b82ee5c memcg_write_event_control(): fix a user-triggerable oops
commit 046667c4d3196938e992fba0dfcde570aa85cd0e upstream.

we are *not* guaranteed that anything past the terminating NUL
is mapped (let alone initialized with anything sane).

Fixes: 0dea116876ee ("cgroup: implement eventfd-based generic API for notifications")
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fa5bfdf6cb5846a00e712d630a43e3cf55ccb411)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-24 10:07:36 +00:00
Richard Raya
1b9f971175 This is the 4.14.353 OpenELA-Extended LTS stable release
-----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEERFwmR4yFob14UDOYC8702P6YulgFAmcHrG8ZHHZlZ2FyZC5u
 b3NzdW1Ab3JhY2xlLmNvbQAKCRALzvTY/pi6WIQuEACCYf9xCGBALlKFb0pXX3eF
 oiRkceNyy5NWSndD7t9p/3d2g4YrVptGxtTZN12IltfG4wfCQ+qC/0g2Mu4ho0Yp
 2ExKVaIli1t2csIjXCUUyjh3jU0JOkDwJap9n5QemACsX8zrDfKVwdlj9hw+e7vi
 fBWwdfl1duK5cfVbbyvL74It4WeMnjuAYrBnMTxhYBTq56xFLrbBILl8BLxAV5NN
 5wGoNCeUtj8LxUrL2qs5QoT3Bf7uoDlLnu1Ly7jDMMX34/oNh5huOjZdDFbQYxS3
 DsEe6ljOYOyB/awdUhScERfxVPimumN3nHWnRJbsQhX36uXT6U7HNJah4zauchRk
 UlKUSfG3YyOqKIwFH+8oGmkuCm6wZbVjVsNNkYhT804BCCHrasJ1SHXsSB9R0MpU
 x3IQOoiuc33bUYrSqWAO7utvt+PwG++3GHz0XQwPfZn4DHY18/e+VNsGtQTPqzRG
 tsywZVTN0DC0nO7L772nkQDb7z2mhmJGgN8q3FPbMTfp/I1phIh9C17pckfpHKAl
 ippTmTMaIYDU3Rlc1g/cu363GOaXWRN4t03VSEu/BLV0IElRktUnmuBU3B/rMb+F
 ItaBmhnZGXHUrulMTxDtzItrYMwx00USw6IrG3iYjob0MhhxhLVxEh0vKc7Te2w5
 2FZEjj2BxinK66mJgAolZw==
 =BQd/
 -----END PGP SIGNATURE-----

Merge tag 'v4.14.353-openela' of https://github.com/openela/kernel-lts

This is the 4.14.353 OpenELA-Extended LTS stable release

* tag 'v4.14.353-openela' of https://github.com/openela/kernel-lts: (173 commits)
  LTS: Update to 4.14.353
  net: fix __dst_negative_advice() race
  selftests: make order checking verbose in msg_zerocopy selftest
  selftests: fix OOM in msg_zerocopy selftest
  Revert "selftests/net: reap zerocopy completions passed up as ancillary data."
  Revert "selftests: fix OOM in msg_zerocopy selftest"
  Revert "selftests: make order checking verbose in msg_zerocopy selftest"
  nvme/pci: Add APST quirk for Lenovo N60z laptop
  exec: Fix ToCToU between perm check and set-uid/gid usage
  drm/i915/gem: Fix Virtual Memory mapping boundaries calculation
  drm/i915: Try GGTT mmapping whole object as partial
  netfilter: nf_tables: set element extended ACK reporting support
  kbuild: Fix '-S -c' in x86 stack protector scripts
  drm/mgag200: Set DDC timeout in milliseconds
  drm/bridge: analogix_dp: properly handle zero sized AUX transactions
  drm/bridge: analogix_dp: Properly log AUX CH errors
  drm/bridge: analogix_dp: Reset aux channel if an error occurred
  drm/bridge: analogix_dp: Check AUX_EN status when doing AUX transfer
  x86/mtrr: Check if fixed MTRRs exist before saving them
  tracing: Fix overflow in get_free_elt()
  ...

Change-Id: I0e92a979e31d4fa6c526c6b70a1b61711d9747bb
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-10-11 00:50:20 -03:00
Jan Kara
1967ea8b28 mm: avoid overflows in dirty throttling logic
[ Upstream commit 385d838df280eba6c8680f9777bfa0d0bfe7e8b2 ]

The dirty throttling logic is interspersed with assumptions that dirty
limits in PAGE_SIZE units fit into 32-bit (so that various multiplications
fit into 64-bits).  If limits end up being larger, we will hit overflows,
possible divisions by 0 etc.  Fix these problems by never allowing so
large dirty limits as they have dubious practical value anyway.  For
dirty_bytes / dirty_background_bytes interfaces we can just refuse to set
so large limits.  For dirty_ratio / dirty_background_ratio it isn't so
simple as the dirty limit is computed from the amount of available memory
which can change due to memory hotplug etc.  So when converting dirty
limits from ratios to numbers of pages, we just don't allow the result to
exceed UINT_MAX.

This is root-only triggerable problem which occurs when the operator
sets dirty limits to >16 TB.

Link: https://lkml.kernel.org/r/20240621144246.11148-2-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-By: Zach O'Keefe <zokeefe@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 2b2d2b8766db028bd827af34075f221ae9e9efff)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-10 10:27:25 +00:00
Richard Raya
fd082a8cad Revert "mm: shuffle initial free memory to improve memory-side-cache utilization"
This reverts commit 6e8d8f0cd2dfd44c9cf01ba432f516577c6f99a7.

Change-Id: I08c5f066bfe4ff8c34f06f3b6ad50c216e9d076d
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-09-30 01:46:45 -03:00
Richard Raya
b6eaf5c762 Revert "mm: move buddy list manipulations into helpers"
This reverts commit be1968b3980cb973d3034605735ab12e1fa4672a.

Change-Id: I203cdaca4f6fa23584a7b4c1ea15c73ff2137227
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-09-30 01:46:45 -03:00
Richard Raya
555c41b976 Revert "mm: maintain randomization of page free lists"
This reverts commit d8277c8ef105edcfb0c442e73db73669c7ca27b8.

Change-Id: Id1616a87c6755b55d108e3e529a8df825851154a
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-09-30 01:46:44 -03:00
Richard Raya
be1ff8e638 msm-4.14: Revert some unsafe optimizations
Change-Id: I2c268f87ab8d9154758384c7a7639046c3784eb8
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-09-29 17:32:34 -03:00
Michal Hocko
f88112e714 mm: Distinguish blockable mode for mmu notifiers
There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.

Currently we simply back off and mark an oom victim with blockable mmu
notifiers as done after a short sleep.  That can result in selecting a new
oom victim prematurely because the previous one still hasn't torn its
memory down yet.

We can do much better though.  Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held.  Moreover
majority of notifiers only care about a portion of the address space and
there is absolutely zero reason to fail when we are unmapping an unrelated
range.  Many notifiers do really block and wait for HW which is harder to
handle and we have to bail out though.

This patch handles the low hanging fruit.
__mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
are not allowed to sleep if the flag is set to false.  This is achieved by
using trylock instead of the sleepable lock for most callbacks and
continue as long as we do not block down the call chain.

I think we can improve that even further because there is a common pattern
to do a range lookup first and then do something about that.  The first
part can be done without a sleeping lock in most cases AFAICS.

The oom_reaper end then simply retries if there is at least one notifier
which couldn't make any progress in !blockable mode.  A retry loop is
already implemented to wait for the mmap_sem and this is basically the
same thing.

The simplest way for driver developers to test this code path is to wrap
userspace code which uses these notifiers into a memcg and set the hard
limit to hit the oom.  This can be done e.g.  after the test faults in all
the mmu notifier managed memory and set the hard limit to something really
small.  Then we are looking for a proper process tear down.

[akpm@linux-foundation.org: coding style fixes]
[akpm@linux-foundation.org: minor code simplification]
Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
Reported-by: David Rientjes <rientjes@google.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Change-Id: Ibf089b0ebbbfa7182eeca314b757caf456969758
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-08-27 11:40:02 -03:00
Richard Raya
59448bba68 This is the 4.14.351 OpenELA-Extended LTS stable release
-----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEERFwmR4yFob14UDOYC8702P6YulgFAmbJhFAZHHZlZ2FyZC5u
 b3NzdW1Ab3JhY2xlLmNvbQAKCRALzvTY/pi6WM9AD/9T4mE7CXds1QYHF3wzFinF
 t4oHyXvOiY4Mrsdy20A4FIvYfrYi5PyZ39E7G38e2FH2jG7qwHTyHXOjh94cL9gV
 5zlU7+jxWQenDKTl6LV3veYP/QNp9Yh9iQn0sgwC3HTUeq+zNd8rxvBjcAfDNiIM
 taC98s63QjtjZtQPzAaS461LH/U14dKFChuPEC36dei/M4T2UDTHZqvRdFBWB8h2
 fC/dJgtuohXTFexpGgk8p6GKNpFjyE62hBI3Xc+/k24j88r0cFqLLp6NhgF6JIpc
 6L6zGUKeyLXaIR/xoshK3MdgJ/XbocqKlRexJOFxCYmAEreAnQelS8v7QG3j6j33
 8AiUasZfpDPFNEH1CNJC0BiNs76NByFCJny+QUYlq0O9ZjfYQt+PvZZXSCx8jIn6
 A75ryAXLERNlXvh5XuEXlNJsOrN3enWnhgeJXMJOfKxtOfn7CRLmfSvpiS2/SfT3
 sxU4aNQNenbYoWwPQRPLXfNO4UvkmLfk6I6+AqRiHdykYQswhZRnpWxsPRSUwrhI
 6mErDGIXmryid/p+P/eMuviH3AO+KEpjoDzLFMFJWMpLQouTDl5qCwGu3QwVjybS
 /MOlfhi5z1so1e5qBIUmY498jZfVbZ5VMC76bOdhtC2USmvotcBSu611x5JtPaZo
 Cv3jKYl+/S0DVIZdEMPA8g==
 =wFNA
 -----END PGP SIGNATURE-----

Merge tag 'v4.14.351-openela' of https://github.com/openela/kernel-lts

This is the 4.14.351 OpenELA-Extended LTS stable release

* tag 'v4.14.351-openela' of https://github.com/openela/kernel-lts: (58 commits)
  LTS: Update to 4.14.351
  i2c: rcar: bring hardware to known state when probing
  nilfs2: fix kernel bug on rename operation of broken directory
  tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
  libceph: fix race between delayed_work() and ceph_monc_stop()
  hpet: Support 32-bit userspace
  USB: core: Fix duplicate endpoint bug by clearing reserved bits in the descriptor
  usb: gadget: configfs: Prevent OOB read/write in usb_string_copy()
  USB: Add USB_QUIRK_NO_SET_INTF quirk for START BP-850k
  USB: serial: option: add Rolling RW350-GL variants
  USB: serial: option: add Netprisma LCUK54 series modules
  USB: serial: option: add support for Foxconn T99W651
  USB: serial: option: add Fibocom FM350-GL
  USB: serial: option: add Telit FN912 rmnet compositions
  USB: serial: option: add Telit generic core-dump composition
  ARM: davinci: Convert comma to semicolon
  ppp: reject claimed-as-LCP but actually malformed packets
  net: ethernet: lantiq_etop: fix double free in detach
  net: lantiq_etop: add blank line after declaration
  tcp: fix incorrect undo caused by DSACK of TLP retransmit
  ...

Change-Id: I8bb6496007a068b83dd95a991e2f3afb0e18da82
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-08-25 23:41:30 -03:00
Jan Kara
6949c52837 Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
commit 30139c702048f1097342a31302cbd3d478f50c63 upstream.

Patch series "mm: Avoid possible overflows in dirty throttling".

Dirty throttling logic assumes dirty limits in page units fit into
32-bits.  This patch series makes sure this is true (see patch 2/2 for
more details).

This patch (of 2):

This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.

The commit is broken in several ways.  Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this).  Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs.  We have
div64_ul() in case we want to be safe & cheap.  Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.

Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-By: Zach O'Keefe <zokeefe@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 253f9ea7e8e53a5176bd80ceb174907b10724c1a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-08-23 16:50:05 +00:00
idkwhoiam322
b470582782 devfreq_boost: Update and expand to handle CPUBW/LLCCBW boosting
This will enable us to more accurately replicate Pixel 4 userspace
DDR boosting by adding support for both CPUBW and LLCCBW devices.

https://android.googlesource.com/device/google/coral/+/refs/heads/master/init.power.rc
https://android.googlesource.com/device/google/coral/+/refs/heads/master/powerhint.json

Change-Id: Iae78fa01f96de8338725fa9309e5eee6d8c82313
Signed-off-by: idkwhoiam322 <idkwhoiam322@raphielgang.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-07-02 11:42:58 -03:00
Richard Raya
3f6b79925f mm: Boost CPU when memory pressure becomes high
Change-Id: Id1b9978d0d68612af02aee88e53af9645a5951ca
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-07-02 11:42:58 -03:00
Richard Raya
c0cc305165 msm-4.14: Add CPU LLCC bus boost triggers
Change-Id: I8ed4fcc32e643e871dda2e40fb86b3fc3f4326a4
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-07-02 11:42:54 -03:00
Richard Raya
9fb5e9427f Revert "mm: Reduce swappiness to 60"
This reverts commit 59e33ada5add42ab758928f2c2a50cad13eb3db6.

Change-Id: I758823369921deae4d72fc9756d4a430a140c3a9
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-06-15 23:47:14 -03:00
Richard Raya
59e33ada5a mm: Reduce swappiness to 60
Change-Id: I4bc1bb23b05d3cdbcaedef287bf34743a913c7e6
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-06-06 20:06:56 -03:00
Richard Raya
3143685e95 Merge branch 'linux-4.14.y' of https://github.com/openela/kernel-lts
* 'linux-4.14.y' of https://github.com/openela/kernel-lts: (278 commits)
  LTS: Update to 4.14.348
  docs: kernel_include.py: Cope with docutils 0.21
  serial: kgdboc: Fix NMI-safety problems from keyboard reset code
  btrfs: add missing mutex_unlock in btrfs_relocate_sys_chunks()
  dm: limit the number of targets and parameter size area
  Revert "selftests: mm: fix map_hugetlb failure on 64K page size systems"
  LTS: Update to 4.14.347
  rds: Fix build regression.
  RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats
  af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
  net: fix out-of-bounds access in ops_init
  drm/vmwgfx: Fix invalid reads in fence signaled events
  dyndbg: fix old BUG_ON in >control parser
  tipc: fix UAF in error path
  usb: gadget: f_fs: Fix a race condition when processing setup packets.
  usb: gadget: composite: fix OS descriptors w_value logic
  firewire: nosy: ensure user_length is taken into account when fetching packet contents
  af_unix: Fix garbage collector racing against connect()
  af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
  ipv6: fib6_rules: avoid possible NULL dereference in fib6_rule_action()
  ...

Change-Id: If329d39dd4e95e14045bb7c58494c197d1352d60
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-06-04 16:33:29 -03:00
David Hildenbrand
e917dc0ff3 x86/mm/pat: fix VM_PAT handling in COW mappings
commit 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 upstream.

PAT handling won't do the right thing in COW mappings: the first PTE (or,
in fact, all PTEs) can be replaced during write faults to point at anon
folios.  Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.

Using follow_phys(), we might just get the address+protection of the anon
folio (which is very wrong), or fail on swap/nonswap entries, failing
follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
track_pfn_copy(), not properly calling free_pfn_range().

In free_pfn_range(), we either wouldn't call memtype_free() or would call
it with the wrong range, possibly leaking memory.

To fix that, let's update follow_phys() to refuse returning anon folios,
and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
if we run into that.

We will now properly handle untrack_pfn() with COW mappings, where we
don't need the cachemode.  We'll have to fail fork()->track_pfn_copy() if
the first page was replaced by an anon folio, though: we'd have to store
the cachemode in the VMA to make this work, likely growing the VMA size.

For now, lets keep it simple and let track_pfn_copy() just fail in that
case: it would have failed in the past with swap/nonswap entries already,
and it would have done the wrong thing with anon folios.

Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():

<--- C reproducer --->
 #include <stdio.h>
 #include <sys/mman.h>
 #include <unistd.h>
 #include <liburing.h>

 int main(void)
 {
         struct io_uring_params p = {};
         int ring_fd;
         size_t size;
         char *map;

         ring_fd = io_uring_setup(1, &p);
         if (ring_fd < 0) {
                 perror("io_uring_setup");
                 return 1;
         }
         size = p.sq_off.array + p.sq_entries * sizeof(unsigned);

         /* Map the submission queue ring MAP_PRIVATE */
         map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
                    ring_fd, IORING_OFF_SQ_RING);
         if (map == MAP_FAILED) {
                 perror("mmap");
                 return 1;
         }

         /* We have at least one page. Let's COW it. */
         *map = 0;
         pause();
         return 0;
 }
<--- C reproducer --->

On a system with 16 GiB RAM and swap configured:
 # ./iouring &
 # memhog 16G
 # killall iouring
[  301.552930] ------------[ cut here ]------------
[  301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
[  301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
[  301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
[  301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
[  301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
[  301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
[  301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
[  301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
[  301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
[  301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
[  301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
[  301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
[  301.564186] FS:  0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
[  301.564773] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
[  301.565725] PKRU: 55555554
[  301.565944] Call Trace:
[  301.566148]  <TASK>
[  301.566325]  ? untrack_pfn+0xf4/0x100
[  301.566618]  ? __warn+0x81/0x130
[  301.566876]  ? untrack_pfn+0xf4/0x100
[  301.567163]  ? report_bug+0x171/0x1a0
[  301.567466]  ? handle_bug+0x3c/0x80
[  301.567743]  ? exc_invalid_op+0x17/0x70
[  301.568038]  ? asm_exc_invalid_op+0x1a/0x20
[  301.568363]  ? untrack_pfn+0xf4/0x100
[  301.568660]  ? untrack_pfn+0x65/0x100
[  301.568947]  unmap_single_vma+0xa6/0xe0
[  301.569247]  unmap_vmas+0xb5/0x190
[  301.569532]  exit_mmap+0xec/0x340
[  301.569801]  __mmput+0x3e/0x130
[  301.570051]  do_exit+0x305/0xaf0
...

Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Wupeng Ma <mawupeng1@huawei.com>
Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f18681daaec9665a15c5e7e0f591aad5d0ac622b)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-05-30 09:00:45 +00:00
Vlastimil Babka
ce55bbe342 mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
commit 803de9000f334b771afacb6ff3e78622916668b0 upstream.

Sven reports an infinite loop in __alloc_pages_slowpath() for costly order
__GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO.  Such combination
can happen in a suspend/resume context where a GFP_KERNEL allocation can
have __GFP_IO masked out via gfp_allowed_mask.

Quoting Sven:

1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
   with __GFP_RETRY_MAYFAIL set.

2. page alloc's __alloc_pages_slowpath tries to get a page from the
   freelist. This fails because there is nothing free of that costly
   order.

3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
   which bails out because a zone is ready to be compacted; it pretends
   to have made a single page of progress.

4. page alloc tries to compact, but this always bails out early because
   __GFP_IO is not set (it's not passed by the snd allocator, and even
   if it were, we are suspending so the __GFP_IO flag would be cleared
   anyway).

5. page alloc believes reclaim progress was made (because of the
   pretense in item 3) and so it checks whether it should retry
   compaction. The compaction retry logic thinks it should try again,
   because:
    a) reclaim is needed because of the early bail-out in item 4
    b) a zonelist is suitable for compaction

6. goto 2. indefinite stall.

(end quote)

The immediate root cause is confusing the COMPACT_SKIPPED returned from
__alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
indicating a lack of order-0 pages, and in step 5 evaluating that in
should_compact_retry() as a reason to retry, before incrementing and
limiting the number of retries.  There are however other places that
wrongly assume that compaction can happen while we lack __GFP_IO.

To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
evaluation and switch the open-coded test in try_to_compact_pages() to use
it.

Also use the new helper in:
- compaction_ready(), which will make reclaim not bail out in step 3, so
  there's at least one attempt to actually reclaim, even if chances are
  small for a costly order
- in_reclaim_compaction() which will make should_continue_reclaim()
  return false and we don't over-reclaim unnecessarily
- in __alloc_pages_slowpath() to set a local variable can_compact,
  which is then used to avoid retrying reclaim/compaction for costly
  allocations (step 5) if we can't compact and also to skip the early
  compaction attempt that we do in some cases

Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz
Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Sven van Ashbrook <svenva@chromium.org>
Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/
Tested-by: Karthikeyan Ramasubramanian <kramasub@chromium.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Curtis Malainey <cujomalainey@chromium.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c82a659cc8bb7a7f8a8348fc7f203c412ae3636f)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-05-30 09:00:42 +00:00
Zi Yan
66bf901692 mm/migrate: set swap entry values of THP tail pages properly.
The tail pages in a THP can have swap entry information stored in their
private field. When migrating to a new page, all tail pages of the new
page need to update ->private to avoid future data corruption.

This fix is stable-only, since after commit 07e09c483cbe ("mm/huge_memory:
work on folio->swap instead of page->private when splitting folio"),
subpages of a swapcached THP no longer requires the maintenance.

Adding THPs to the swapcache was introduced in commit
38d8b4e6bdc87 ("mm, THP, swap: delay splitting THP during swap out"),
where each subpage of a THP added to the swapcache had its own swapcache
entry and required the ->private field to point to the correct swapcache
entry. Later, when THP migration functionality was implemented in commit
616b8371539a6 ("mm: thp: enable thp migration in generic path"),
it initially did not handle the subpages of swapcached THPs, failing to
update their ->private fields or replace the subpage pointers in the
swapcache. Subsequently, commit e71769ae5260 ("mm: enable thp migration
for shmem thp") addressed the swapcache update aspect. This patch fixes
the update of subpage ->private fields.

Closes: https://lore.kernel.org/linux-mm/1707814102-22682-1-git-send-email-quic_charante@quicinc.com/
Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9e92cefdaa7537515dc0ff6cc73d46fa31b062fc)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-05-30 09:00:40 +00:00
Liu Shixin
fd783c9a20 mm/memory-failure: fix an incorrect use of tail pages
When backport commit c79c5a0a00a9 to 4.19-stable, there is a mistake change.
The head page instead of tail page should be passed to try_to_unmap(),
otherwise unmap will failed as follows.

 Memory failure: 0x121c10: failed to unmap page (mapcount=1)
 Memory failure: 0x121c10: recovery action for unmapping failed page: Ignored

Fixes: c6f50413f2aa ("mm/memory-failure: check the mapcount of the precise page")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 27f83f1cacba82afa4c9697e3ec3abb15e92ec82)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-05-30 09:00:40 +00:00
Qiang Zhang
cead81caaf memtest: use {READ,WRITE}_ONCE in memory scanning
[ Upstream commit 82634d7e24271698e50a3ec811e5f50de790a65f ]

memtest failed to find bad memory when compiled with clang.  So use
{WRITE,READ}_ONCE to access memory to avoid compiler over optimization.

Link: https://lkml.kernel.org/r/20240312080422.691222-1-qiang4.zhang@intel.com
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 6e7044f155f7756e4489d8ad928f3061eab4595b)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-05-30 09:00:39 +00:00
Mark-PK Tsai
814150c479 zsmalloc: Use copy_page for full page copy
Some architectures have implemented optimized copy_page for full page
copying, such as arm.

On my arm platform, use the copy_page helper for single page copying is
about 10 percent faster than memcpy.

Link: https://lkml.kernel.org/r/20231006060245.7411-1-mark-pk.tsai@mediatek.com
Change-Id: Ie26f83ffaeeff7415304cab94ecb847606b35953
Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: YJ Chiang <yj.chiang@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-05-10 15:48:33 -03:00
Richard Raya
fec6892d1c mm: Revert PID optimizations
Change-Id: I61da569fb17d5a5ac5e814bcbeed8013c6e5a7a2
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-04-18 20:06:32 -03:00
Richard Raya
a9e2d194be Merge branch 'linux-4.14.y' of https://github.com/openela/kernel-lts
* 'linux-4.14.y' of https://github.com/openela/kernel-lts: (350 commits)
  LTS: Update to 4.14.340
  fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio
  KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table()
  PCI/MSI: Prevent MSI hardware interrupt number truncation
  s390: use the correct count for __iowrite64_copy()
  packet: move from strlcpy with unused retval to strscpy
  ipv6: sr: fix possible use-after-free and null-ptr-deref
  nouveau: fix function cast warnings
  scsi: jazz_esp: Only build if SCSI core is builtin
  RDMA/srpt: fix function pointer cast warnings
  RDMA/srpt: Support specifying the srpt_service_guid parameter
  IB/hfi1: Fix a memleak in init_credit_return
  usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
  l2tp: pass correct message length to ip6_append_data
  gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp()
  dm-crypt: don't modify the data when using authenticated encryption
  mm: memcontrol: switch to rcu protection in drain_all_stock()
  s390/qeth: Fix potential loss of L3-IP@ in case of network issues
  virtio-blk: Ensure no requests in virtqueues before deleting vqs.
  firewire: core: send bus reset promptly on gap count error
  ...

Change-Id: Ieafdd459ee41343bf15ed781b3e45adc2be29cc1
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
2024-03-26 00:15:05 -03:00
Roman Gushchin
5cf1aceb57 mm: memcontrol: switch to rcu protection in drain_all_stock()
commit e1a366be5cb4f849ec4de170d50eebc08bb0af20 upstream.

Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
introduced css_tryget()/css_put() calls in drain_all_stock(), which are
supposed to protect the target memory cgroup from being released during
the mem_cgroup_is_descendant() call.

However, it's not completely safe.  In theory, memcg can go away between
reading stock->cached pointer and calling css_tryget().

This can happen if drain_all_stock() races with drain_local_stock()
performed on the remote cpu as a result of a work, scheduled by the
previous invocation of drain_all_stock().

The race is a bit theoretical and there are few chances to trigger it, but
the current code looks a bit confusing, so it makes sense to fix it
anyway.  The code looks like as if css_tryget() and css_put() are used to
protect stocks drainage.  It's not necessary because stocked pages are
holding references to the cached cgroup.  And it obviously won't work for
works, scheduled on other cpus.

So, let's read the stock->cached pointer and evaluate the memory cgroup
inside a rcu read section, and get rid of css_tryget()/css_put() calls.

Link: http://lkml.kernel.org/r/20190802192241.3253165-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: cdec2e4265df ("memcg: coalesce charging via percpu storage")
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9b78faee4829e8d4bc88f59aa125e219ad834003)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2024-03-21 13:32:24 +00:00
GONG, Ruiqi
45dea6f77d memcg: add refcnt for pcpu stock to avoid UAF problem in drain_all_stock()
commit 1a3e1f40962c445b997151a542314f3c6097f8c3 upstream.

NOTE: This is a partial backport since we only need the refcnt between
memcg and stock to fix the problem stated below, and in this way
multiple versions use the same code and align with each other.

There was a kernel panic happened on an in-house environment running
3.10, and the same problem was reproduced on 4.19:

general protection fault: 0000 [#1] SMP PTI
CPU: 1 PID: 2085 Comm: bash Kdump: loaded Tainted: G             L    4.19.90+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010 drain_all_stock+0xad/0x140
Code: 00 00 4d 85 ff 74 2c 45 85 c9 74 27 4d 39 fc 74 42 41 80 bc 24 28 04 00 00 00 74 17 49 8b 04 24 49 8b 17 48 8b 88 90 02 00 00 <48> 39 8a 90 02 00 00 74 02 eb 86 48 63 88 3c 01 00 00 39 8a 3c 01
RSP: 0018:ffffa7efc5813d70 EFLAGS: 00010202
RAX: ffff8cb185548800 RBX: ffff8cb89f420160 RCX: ffff8cb1867b6000
RDX: babababababababa RSI: 0000000000000001 RDI: 0000000000231876
RBP: 0000000000000000 R08: 0000000000000415 R09: 0000000000000002
R10: 0000000000000000 R11: 0000000000000001 R12: ffff8cb186f89040
R13: 0000000000020160 R14: 0000000000000001 R15: ffff8cb186b27040
FS:  00007f4a308d3740(0000) GS:ffff8cb89f440000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe4d634a68 CR3: 000000010b022000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 mem_cgroup_force_empty_write+0x31/0xb0
 cgroup_file_write+0x60/0x140
 ? __check_object_size+0x136/0x147
 kernfs_fop_write+0x10e/0x190
 __vfs_write+0x37/0x1b0
 ? selinux_file_permission+0xe8/0x130
 ? security_file_permission+0x2e/0xb0
 vfs_write+0xb6/0x1a0
 ksys_write+0x57/0xd0
 do_syscall_64+0x63/0x250
 ? async_page_fault+0x8/0x30
 entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Modules linked in: ...

It is found that in case of stock->nr_pages == 0, the memcg on
stock->cached could be freed due to its refcnt decreased to 0, which
made stock->cached become a dangling pointer. It could cause a UAF
problem in drain_all_stock() in the following concurrent scenario. Note
that drain_all_stock() doesn't disable irq but only preemption.

CPU1                             CPU2
==============================================================================
stock->cached = memcgA (freed)
                                 drain_all_stock(memcgB)
                                  rcu_read_lock()
                                  memcg = CPU1's stock->cached (memcgA)
                                  (interrupted)
refill_stock(memcgC)
 drain_stock(memcgA)
 stock->cached = memcgC
 stock->nr_pages += xxx (> 0)
                                  stock->nr_pages > 0
                                  mem_cgroup_is_descendant(memcgA, memcgB) [UAF]
                                  rcu_read_unlock()

This problem is, unintentionally, fixed at 5.9, where commit
1a3e1f40962c ("mm: memcontrol: decouple reference counting from page
accounting") adds memcg refcnt for stock. Therefore affected LTS
versions include 4.19 and 5.4.

For 4.19, memcg's css offline process doesn't call drain_all_stock(). so
it's easier for the released memcg to be left on the stock. For 5.4,
although mem_cgroup_css_offline() does call drain_all_stock(), but the
flushing could be skipped when stock->nr_pages happens to be 0, and
besides the async draining could be delayed and take place after the UAF
problem has happened.

Fix this problem by adding (and decreasing) memcg's refcnt when memcg is
put onto (and removed from) stock, just like how commit 1a3e1f40962c
("mm: memcontrol: decouple reference counting from page accounting")
does. After all, "being on the stock" is a kind of reference with
regards to memcg. As such, it's guaranteed that a css on stock would not
be freed.

It's good to mention that refill_stock() is executed in an irq-disabled
context, so the drain_stock() patched with css_put() would not actually
free memcgA until the end of refill_stock(), since css_put() is an RCU
free and it's still in grace period. For CPU2, the access to CPU1's
stock->cached is protected by rcu_read_lock(), so in this case it gets
either NULL from stock->cached or a memcgA that is still good.

Cc: stable@vger.kernel.org      # 4.19 5.4
Fixes: cdec2e4265df ("memcg: coalesce charging via percpu storage")
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9e46a20397f443d02d6c6f1a72077370e8cbc8da)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2024-03-21 13:32:16 +00:00
Zach O'Keefe
deb218b841 mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78 upstream.

(struct dirty_throttle_control *)->thresh is an unsigned long, but is
passed as the u32 divisor argument to div_u64().  On architectures where
unsigned long is 64 bytes, the argument will be implicitly truncated.

Use div64_u64() instead of div_u64() so that the value used in the "is
this a safe division" check is the same as the divisor.

Also, remove redundant cast of the numerator to u64, as that should happen
implicitly.

This would be difficult to exploit in memcg domain, given the ratio-based
arithmetic domain_drity_limits() uses, but is much easier in global
writeback domain with a BDI_CAP_STRICTLIMIT-backing device, using e.g.
vm.dirty_bytes=(1<<32)*PAGE_SIZE so that dtc->thresh == (1<<32)

Link: https://lkml.kernel.org/r/20240118181954.1415197-1-zokeefe@google.com
Fixes: f6789593d5ce ("mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Maxim Patlasov <MPatlasov@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c593d26fb5d577ef31b6e49a31e08ae3ebc1bc1e)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-03-08 08:21:38 +00:00
Jiajun Xie
e5e8870a91 mm: fix unmap_mapping_range high bits shift bug
commit 9eab0421fa94a3dde0d1f7e36ab3294fc306c99d upstream.

The bug happens when highest bit of holebegin is 1, suppose holebegin is
0x8000000111111000, after shift, hba would be 0xfff8000000111111, then
vma_interval_tree_foreach would look it up fail or leads to the wrong
result.

error call seq e.g.:
- mmap(..., offset=0x8000000111111000)
  |- syscall(mmap, ... unsigned long, off):
     |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT);

  here pgoff is correctly shifted to 0x8000000111111,
  but pass 0x8000000111111000 as holebegin to unmap
  would then cause terrible result, as shown below:

- unmap_mapping_range(..., loff_t const holebegin)
  |- pgoff_t hba = holebegin >> PAGE_SHIFT;
          /* hba = 0xfff8000000111111 unexpectedly */

The issue happens in Heterogeneous computing, where the device(e.g.
gpu) and host share the same virtual address space.

A simple workflow pattern which hit the issue is:
        /* host */
    1. userspace first mmap a file backed VA range with specified offset.
                        e.g. (offset=0x800..., mmap return: va_a)
    2. write some data to the corresponding sys page
                         e.g. (va_a = 0xAABB)
        /* device */
    3. gpu workload touches VA, triggers gpu fault and notify the host.
        /* host */
    4. reviced gpu fault notification, then it will:
            4.1 unmap host pages and also takes care of cpu tlb
                  (use unmap_mapping_range with offset=0x800...)
            4.2 migrate sys page to device
            4.3 setup device page table and resolve device fault.
        /* device */
    5. gpu workload continued, it accessed va_a and got 0xAABB.
    6. gpu workload continued, it wrote 0xBBCC to va_a.
        /* host */
    7. userspace access va_a, as expected, it will:
            7.1 trigger cpu vm fault.
            7.2 driver handling fault to migrate gpu local page to host.
    8. userspace then could correctly get 0xBBCC from va_a
    9. done

But in step 4.1, if we hit the bug this patch mentioned, then userspace
would never trigger cpu fault, and still get the old value: 0xAABB.

Making holebegin unsigned first fixes the bug.

Link: https://lkml.kernel.org/r/20231220052839.26970-1-jiajun.xie.sh@gmail.com
Signed-off-by: Jiajun Xie <jiajun.xie.sh@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2db1c46c3913b8bc92fed235a344de2671fe9d8d)
[conflict: cleanup commit 977fbdcd5986 ("mm: add unmap_mapping_pages()")
is not in this branch]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-01-18 11:22:33 +00:00
Matthew Wilcox (Oracle)
ff510bc907 mm/memory-failure: check the mapcount of the precise page
[ Upstream commit c79c5a0a00a9457718056b588f312baadf44e471 ]

A process may map only some of the pages in a folio, and might be missed
if it maps the poisoned page but not the head page.  Or it might be
unnecessarily hit if it maps the head page, but not the poisoned page.

Link: https://lkml.kernel.org/r/20231218135837.3310403-3-willy@infradead.org
Fixes: 7af446a841a2 ("HWPOISON, hugetlb: enable error handling path for hugepage")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit c6f50413f2aacc919b5de443aa080b94f5ebb21d)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-01-18 11:22:33 +00:00
Richard Raya
9cdc78c354 Merge branch 'android-4.14-stable' of https://android.googlesource.com/kernel/common
* 'android-4.14-stable' of https://android.googlesource.com/kernel/common: (2966 commits)
  Linux 4.14.331
  net: sched: fix race condition in qdisc_graft()
  scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
  ext4: remove gdb backup copy for meta bg in setup_new_flex_group_blocks
  ext4: correct return value of ext4_convert_meta_bg
  ext4: correct offset of gdb backup in non meta_bg group to update_backups
  ext4: apply umask if ACL support is disabled
  media: venus: hfi: fix the check to handle session buffer requirement
  media: sharp: fix sharp encoding
  i2c: i801: fix potential race in i801_block_transaction_byte_by_byte
  net: dsa: lan9303: consequently nested-lock physical MDIO
  ALSA: info: Fix potential deadlock at disconnection
  parisc/pgtable: Do not drop upper 5 address bits of physical address
  parisc: Prevent booting 64-bit kernels on PA1.x machines
  mcb: fix error handling for different scenarios when parsing
  jbd2: fix potential data lost in recovering journal raced with synchronizing fs bdev
  genirq/generic_chip: Make irq_remove_generic_chip() irqdomain aware
  mmc: meson-gx: Remove setting of CMD_CFG_ERROR
  PM: hibernate: Clean up sync_read handling in snapshot_write_next()
  PM: hibernate: Use __get_safe_page() rather than touching the list
  ...

Change-Id: I755d2aa7c525ace28adc4aee433572b3110ea39b
2023-12-07 20:15:44 -03:00
Richard Raya
79840c9d70 Merge tag 'LA.UM.9.1.r1-14600-SMxxx0.QSSI14.0' of https://git.codelinaro.org/clo/la/kernel/msm-4.14
"LA.UM.9.1.r1-14600-SMxxx0.QSSI14.0"

* tag 'LA.UM.9.1.r1-14600-SMxxx0.QSSI14.0' of https://git.codelinaro.org/clo/la/kernel/msm-4.14: (103 commits)
  msm: npu: Fix use after free issue
  iommu: Fix missing return check of arm_lpae_init_pte
  msm: kgsl: Prevent wrap around during user address mapping
  iommu: Fix missing return check of arm_lpae_init_pte
  UPSTREAM: security: selinux: allow per-file labeling for bpffs
  UPSTREAM: security: selinux: allow per-file labeling for bpffs
  arm: configs: Enable QCOM_SHOW_RESUME_IRQ module for mdm9607
  Revert "irqchip/gic-v2: implement suspend and resume"
  exec: Force single empty string when argv is empty
  bus: mhi: misc: Add check for dev_rp if it is iommu range or not
  BACKPORT: FROMLIST: mm: protect free_pgtables with mmap_lock write lock in exit_mmap
  bus: mhi: misc: Add check for dev_rp if it is iommu range or not
  mdm: dataipa: increase the size of prefetch buffer
  msm: ais: core: validation of session/device/link handle
  soc: qcom: minidump: check the size parameter passed to qcom_smem_get()
  msm: camera: core: validation of session/device/link handle
  qcedev: vote for crypto clocks during module close
  msm: ais: smmu: Use get_file to increase ref count
  pinctrl: qcom: Using readl_relaxed/writel_relaxed APIs
  net: qrtr: Add bounds check in rx path
  ...

Change-Id: Ia2603d18afb240a1fcdce609944dd4038c988dbf
2023-12-07 20:11:49 -03:00
Sultan Alsawaf
111fe1c861 mm: Always indicate OOM kill progress when Simple LMK is enabled
When Simple LMK is enabled, the page allocator slowpath always thinks that
no OOM kill progress is made because out_of_memory() returns false. As a
result, spurious page allocation failures are observed when memory is low
and Simple LMK is killing tasks, simply because the page allocator slowpath
doesn't think that any OOM killing is taking place.

Fix this by simply making out_of_memory() always return true when Simple
LMK is enabled.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Change-Id: Ib91e593e67b8d155bb8c1a1de807b524f9348d61
2023-11-14 20:23:14 -03:00
Greg Kroah-Hartman
fce78edbb4 This is the 4.14.322 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmTWAT4ACgkQONu9yGCS
 aT6kKxAA00HDcoEbS4CpQxK1ggeeW6xMFqPHHwUz62ScZPR1zcrR4ag5UrKOQALF
 cCQwt2nVBMUXciiQd3gY+MciAYPRVIXLMK9QqQEJSBZ+2p8zY3nb/HbM6o8iKQeV
 xIhUneiyHtbOyTo3oQcyET7ngwxtDp9uEnd+8I+sSbGi8Wyh8Z8L2daVQTrke1Js
 QIe3wDQsUj0pEDhRfYx29JKeQ8fBOfZlxtFEsdHvGgP/4j2EXGwyMVnt3/DVuwM8
 5/b/SML0skSh8YM9JfMQwpYpR+MAFGyyYKoF2pGu1trvyoh2Jd3TYuYcNqjwIywg
 W+ODGmULcYUYPBzUMdvrefwpn4l/2qpPCJ8FHB80h+4Jmy6PMN7lm1YnMBeQK4GP
 ACLr2BzJ4Tp5LavWZpTpqdRlC039aSZqY+7K+H/eoNstwZMU3hKc3Kn2KrPss0pp
 K0M7+8oukTnSiFNgIXVJOsr+kN1nNvtQmqCVRWlrn2cQckdDf8pVkPl/QtC3ZtWf
 aI8xYr6UpAr0z1elK5p9lO6N0R8FLwVmDG7B4b/6nLbWtRSt53ay/nMAzebodpn1
 8r+6ZoXO5LedNJsUOMJqE58X0ywbUgcx8mfkuRS8PLXEk7yI4+PR7DCeWyZ/YdVX
 dUqaYIK0yYx9yXAkMaSdrnMs+OSqa6lK9c9juPDvFox+ngLAjNk=
 =67ef
 -----END PGP SIGNATURE-----

Merge 4.14.322 into android-4.14-stable

Changes in 4.14.322
	gfs2: Don't deref jdesc in evict
	x86/microcode/AMD: Load late on both threads too
	x86/smp: Use dedicated cache-line for mwait_play_dead()
	fbdev: imsttfb: Fix use after free bug in imsttfb_probe
	drm/edid: Fix uninitialized variable in drm_cvt_modes()
	scripts/tags.sh: Resolve gtags empty index generation
	drm/amdgpu: Validate VM ioctl flags.
	treewide: Remove uninitialized_var() usage
	md/raid10: fix overflow of md/safe_mode_delay
	md/raid10: fix wrong setting of max_corr_read_errors
	md/raid10: fix io loss while replacement replace rdev
	PM: domains: fix integer overflow issues in genpd_parse_state()
	evm: Complete description of evm_inode_setattr()
	wifi: ath9k: fix AR9003 mac hardware hang check register offset calculation
	wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx
	wifi: orinoco: Fix an error handling path in spectrum_cs_probe()
	wifi: orinoco: Fix an error handling path in orinoco_cs_probe()
	wifi: atmel: Fix an error handling path in atmel_probe()
	wifi: wl3501_cs: Fix an error handling path in wl3501_probe()
	wifi: ray_cs: Fix an error handling path in ray_probe()
	wifi: ath9k: don't allow to overwrite ENDPOINT0 attributes
	watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct config
	watchdog/perf: more properly prevent false positives with turbo modes
	kexec: fix a memory leak in crash_shrink_memory()
	memstick r592: make memstick_debug_get_tpc_name() static
	wifi: ath9k: Fix possible stall on ath9k_txq_list_has_key()
	wifi: ath9k: convert msecs to jiffies where needed
	netlink: fix potential deadlock in netlink_set_err()
	netlink: do not hard code device address lenth in fdb dumps
	gtp: Fix use-after-free in __gtp_encap_destroy().
	lib/ts_bm: reset initial match offset for every block of text
	netfilter: nf_conntrack_sip: fix the ct_sip_parse_numerical_param() return value.
	netlink: Add __sock_i_ino() for __netlink_diag_dump().
	radeon: avoid double free in ci_dpm_init()
	Input: drv260x - sleep between polling GO bit
	ARM: dts: BCM5301X: Drop "clock-names" from the SPI node
	Input: adxl34x - do not hardcode interrupt trigger type
	drm/panel: simple: fix active size for Ampire AM-480272H3TMQW-T01H
	ARM: ep93xx: fix missing-prototype warnings
	ASoC: es8316: Increment max value for ALC Capture Target Volume control
	soc/fsl/qe: fix usb.c build errors
	fbdev: omapfb: lcd_mipid: Fix an error handling path in mipid_spi_probe()
	drm/radeon: fix possible division-by-zero errors
	ALSA: ac97: Fix possible NULL dereference in snd_ac97_mixer
	scsi: 3w-xxxx: Add error handling for initialization failure in tw_probe()
	PCI: Add pci_clear_master() stub for non-CONFIG_PCI
	pinctrl: cherryview: Return correct value if pin in push-pull mode
	perf dwarf-aux: Fix off-by-one in die_get_varname()
	pinctrl: at91-pio4: check return value of devm_kasprintf()
	crypto: nx - fix build warnings when DEBUG_FS is not enabled
	modpost: fix section mismatch message for R_ARM_ABS32
	modpost: fix section mismatch message for R_ARM_{PC24,CALL,JUMP24}
	modpost: fix off by one in is_executable_section()
	USB: serial: option: add LARA-R6 01B PIDs
	block: change all __u32 annotations to __be32 in affs_hardblocks.h
	w1: fix loop in w1_fini()
	sh: j2: Use ioremap() to translate device tree address into kernel memory
	media: usb: Check az6007_read() return value
	media: videodev2.h: Fix struct v4l2_input tuner index comment
	media: usb: siano: Fix warning due to null work_func_t function pointer
	extcon: Fix kernel doc of property fields to avoid warnings
	extcon: Fix kernel doc of property capability fields to avoid warnings
	usb: phy: phy-tahvo: fix memory leak in tahvo_usb_probe()
	mfd: rt5033: Drop rt5033-battery sub-device
	mfd: intel-lpss: Add missing check for platform_get_resource
	mfd: stmpe: Only disable the regulators if they are enabled
	rtc: st-lpc: Release some resources in st_rtc_probe() in case of error
	sctp: fix potential deadlock on &net->sctp.addr_wq_lock
	Add MODULE_FIRMWARE() for FIRMWARE_TG357766.
	spi: bcm-qspi: return error if neither hif_mspi nor mspi is available
	mailbox: ti-msgmgr: Fill non-message tx data fields with 0x0
	powerpc: allow PPC_EARLY_DEBUG_CPM only when SERIAL_CPM=y
	net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode
	tcp: annotate data races in __tcp_oow_rate_limited()
	net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
	sh: dma: Fix DMA channel offset calculation
	NFSD: add encoding of op_recall flag for write delegation
	mmc: core: disable TRIM on Kingston EMMC04G-M627
	mmc: core: disable TRIM on Micron MTFC4GACAJCN-1M
	integrity: Fix possible multiple allocation in integrity_inode_get()
	jffs2: reduce stack usage in jffs2_build_xattr_subsystem()
	btrfs: fix race when deleting quota root from the dirty cow roots list
	ARM: orion5x: fix d2net gpio initialization
	spi: spi-fsl-spi: remove always-true conditional in fsl_spi_do_one_msg
	spi: spi-fsl-spi: relax message sanity checking a little
	spi: spi-fsl-spi: allow changing bits_per_word while CS is still active
	netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
	netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
	netfilter: nf_tables: unbind non-anonymous set if rule construction fails
	netfilter: conntrack: Avoid nf_ct_helper_hash uses after free
	netfilter: nf_tables: prevent OOB access in nft_byteorder_eval
	workqueue: clean up WORK_* constant types, clarify masking
	net: mvneta: fix txq_map in case of txq_number==1
	udp6: fix udp6_ehashfn() typo
	ntb: idt: Fix error handling in idt_pci_driver_init()
	NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
	ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
	NTB: ntb_transport: fix possible memory leak while device_register() fails
	ipv6/addrconf: fix a potential refcount underflow for idev
	wifi: airo: avoid uninitialized warning in airo_get_rate()
	net/sched: make psched_mtu() RTNL-less safe
	tpm: tpm_vtpm_proxy: fix a race condition in /dev/vtpmx creation
	SUNRPC: Fix UAF in svc_tcp_listen_data_ready()
	perf intel-pt: Fix CYC timestamps after standalone CBR
	ext4: fix wrong unit use in ext4_mb_clear_bb
	ext4: only update i_reserved_data_blocks on successful block allocation
	jfs: jfs_dmap: Validate db_l2nbperpage while mounting
	PCI: Add function 1 DMA alias quirk for Marvell 88SE9235
	misc: pci_endpoint_test: Re-init completion for every test
	md/raid0: add discard support for the 'original' layout
	fs: dlm: return positive pid value for F_GETLK
	hwrng: imx-rngc - fix the timeout for init and self check
	meson saradc: fix clock divider mask length
	Revert "8250: add support for ASIX devices with a FIFO bug"
	tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() in case of error
	tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() when iterating clk
	ring-buffer: Fix deadloop issue on reading trace_pipe
	xtensa: ISS: fix call to split_if_spec
	scsi: qla2xxx: Wait for io return on terminate rport
	scsi: qla2xxx: Fix potential NULL pointer dereference
	scsi: qla2xxx: Check valid rport returned by fc_bsg_to_rport()
	scsi: qla2xxx: Pointer may be dereferenced
	serial: atmel: don't enable IRQs prematurely
	perf probe: Add test for regression introduced by switch to die_get_decl_file()
	fuse: revalidate: don't invalidate if interrupted
	can: bcm: Fix UAF in bcm_proc_show()
	ext4: correct inline offset when handling xattrs in inode body
	debugobjects: Recheck debug_objects_enabled before reporting
	nbd: Add the maximum limit of allocated index in nbd_dev_add
	md: fix data corruption for raid456 when reshape restart while grow up
	md/raid10: prevent soft lockup while flush writes
	posix-timers: Ensure timer ID search-loop limit is valid
	sched/fair: Don't balance task to its current running CPU
	bpf: Address KCSAN report on bpf_lru_list
	wifi: wext-core: Fix -Wstringop-overflow warning in ioctl_standard_iw_point()
	igb: Fix igb_down hung on surprise removal
	spi: bcm63xx: fix max prepend length
	fbdev: imxfb: warn about invalid left/right margin
	pinctrl: amd: Use amd_pinconf_set() for all config options
	net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()
	fbdev: au1200fb: Fix missing IRQ check in au1200fb_drv_probe
	llc: Don't drop packet from non-root netns.
	netfilter: nf_tables: fix spurious set element insertion failure
	tcp: annotate data-races around rskq_defer_accept
	tcp: annotate data-races around tp->notsent_lowat
	tcp: annotate data-races around fastopenq.max_qlen
	gpio: tps68470: Make tps68470_gpio_output() always set the initial value
	i40e: Fix an NULL vs IS_ERR() bug for debugfs_create_dir()
	ethernet: atheros: fix return value check in atl1e_tso_csum()
	ipv6 addrconf: fix bug where deleting a mngtmpaddr can create a new temporary address
	tcp: Reduce chance of collisions in inet6_hashfn().
	bonding: reset bond's flags when down link is P2P device
	team: reset team's flags when down link is P2P device
	platform/x86: msi-laptop: Fix rfkill out-of-sync on MSI Wind U100
	benet: fix return value check in be_lancer_xmit_workarounds()
	ASoC: fsl_spdif: Silence output on stop
	block: Fix a source code comment in include/uapi/linux/blkzoned.h
	dm raid: fix missing reconfig_mutex unlock in raid_ctr() error paths
	ata: pata_ns87415: mark ns87560_tf_read static
	ring-buffer: Fix wrong stat of cpu_buffer->read
	tracing: Fix warning in trace_buffered_event_disable()
	USB: serial: option: support Quectel EM060K_128
	USB: serial: option: add Quectel EC200A module support
	USB: serial: simple: add Kaufmann RKS+CAN VCP
	USB: serial: simple: sort driver entries
	can: gs_usb: gs_can_close(): add missing set of CAN state to CAN_STATE_STOPPED
	usb: ohci-at91: Fix the unhandle interrupt when resume
	usb: xhci-mtk: set the dma max_seg_size
	Documentation: security-bugs.rst: update preferences when dealing with the linux-distros group
	staging: ks7010: potential buffer overflow in ks_wlan_set_encode_ext()
	hwmon: (nct7802) Fix for temp6 (PECI1) processed even if PECI1 disabled
	tpm_tis: Explicitly check for error code
	irq-bcm6345-l1: Do not assume a fixed block to cpu mapping
	s390/dasd: fix hanging device after quiesce/resume
	ASoC: wm8904: Fill the cache for WM8904_ADC_TEST_0 register
	dm cache policy smq: ensure IO doesn't prevent cleaner policy progress
	drm/client: Fix memory leak in drm_client_target_cloned
	net/sched: cls_fw: Fix improper refcount update leads to use-after-free
	net/sched: sch_qfq: account for stab overhead in qfq_enqueue
	net/sched: cls_u32: Fix reference counter leak leading to overflow
	perf: Fix function pointer case
	word-at-a-time: use the same return type for has_zero regardless of endianness
	net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
	perf test uprobe_from_different_cu: Skip if there is no gcc
	net: add missing data-race annotations around sk->sk_peek_off
	net: add missing data-race annotation for sk_ll_usec
	net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free
	net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
	ip6mr: Fix skb_under_panic in ip6mr_cache_report()
	tcp_metrics: fix addr_same() helper
	tcp_metrics: annotate data-races around tm->tcpm_stamp
	tcp_metrics: annotate data-races around tm->tcpm_lock
	tcp_metrics: annotate data-races around tm->tcpm_vals[]
	tcp_metrics: annotate data-races around tm->tcpm_net
	tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
	loop: Select I/O scheduler 'none' from inside add_disk()
	libceph: fix potential hang in ceph_osdc_notify()
	USB: zaurus: Add ID for A-300/B-500/C-700
	fs/sysv: Null check to prevent null-ptr-deref bug
	Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb
	net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb
	ext2: Drop fragment support
	test_firmware: fix a memory leak with reqs buffer
	mtd: rawnand: omap_elm: Fix incorrect type in assignment
	drm/edid: fix objtool warning in drm_cvt_modes()
	Linux 4.14.322

Change-Id: Ia25c00bd23a112b634b83577ec7d54569e8b7c70
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-08-23 14:54:21 +00:00
Kees Cook
d68627697d treewide: Remove uninitialized_var() usage
commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream.

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11 11:33:32 +02:00
Suren Baghdasaryan
5d0de97b15 BACKPORT: FROMLIST: mm: protect free_pgtables with mmap_lock write lock in exit_mmap
oom-reaper and process_mrelease system call should protect against
races with exit_mmap which can destroy page tables while they
walk the VMA tree. oom-reaper protects from that race by setting
MMF_OOM_VICTIM and by relying on exit_mmap to set MMF_OOM_SKIP
before taking and releasing mmap_write_lock. process_mrelease has
to elevate mm->mm_users to prevent such race. Both oom-reaper and
process_mrelease hold mmap_read_lock when walking the VMA tree.
The locking rules and mechanisms could be simpler if exit_mmap takes
mmap_write_lock while executing destructive operations such as
free_pgtables.
Change exit_mmap to hold the mmap_write_lock when calling
free_pgtables. Operations like unmap_vmas() and unlock_range() are not
destructive and could run under mmap_read_lock but for simplicity we
take one mmap_write_lock during almost the entire operation. Note
also that because oom-reaper checks VM_LOCKED flag, unlock_range()
should not be allowed to race with it.
In most cases this lock should be uncontended. Previously, Kirill
reported ~4% regression caused by a similar change [1]. We reran the
same test and although the individual results are quite noisy, the
percentiles show lower regression with 1.6% being the worst case [2].
The change allows oom-reaper and process_mrelease to execute safely
under mmap_read_lock without worries that exit_mmap might destroy page
tables from under them.

[1] https://lore.kernel.org/all/20170725141723.ivukwhddk2voyhuc@node.shutemov.name/
[2] https://lore.kernel.org/all/CAJuCfpGC9-c9P40x7oy=jy5SphMcd0o0G_6U1-+JAziGKG6dGA@mail.gmail.com/

Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Link: https://lore.kernel.org/all/20211124235906.14437-1-surenb@google.com/

Bug: 130172058
Bug: 189803002
Change-Id: Ic87272d09a0b68a1b0e968e8f1a1510fd6fc776a
Git-commit: 28358ebf2adb31117893813992fefcfd359a6a16
Git-repo: https://android.googlesource.com/kernel/common/
[quic_gkohli@quicinc.com: Resolved cherry-pick conflict in mm/mmap.c due
 to mmap lock was implemented differently in older kernel, and
 Although process_mrelease is not applicable in older kernel, but this
 patch is required to take exclusive lock in exit_mmap path so that
 SPF knows an isolated vma was freed from this path]
Signed-off-by: Gaurav Kohli <quic_gkohli@quicinc.com>
2023-08-11 01:28:19 -07:00
Greg Kroah-Hartman
0efbe093b6 This is the 4.14.315 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmRkmmMACgkQONu9yGCS
 aT5S/g/+LHkUcwpnnPu5llymtK0jd/0WvwWUJfJAOlGpa3l9CkoPtjHzNwtagoFR
 2+woN7zhC7UteTz20/RXMFtNv7zFOMA91nsVSmYp4Cc997XpILeTkzpQMzoCm8Qt
 YFMpKEX0op6sAR+NUJ5Vaj/HaFBvO9J2ZMGGrxeUKVPAAgRk3AdvTGfHFwzXlmfb
 AKVo9jhG7NszYeLYIHRONMDJRyiBLJXrLSLfn+u+uKKRjNnBqJJEDQu3zYt6kavy
 M/8CE6QgOoCAcbyTIgVw9ZU51ydWfbKiEnMpEwPAEHy6C4xrYfMnWqF8LDjkSNCL
 xsNYbAyaPh/MdJoLGdTcuRSp58xP5dNT366xShN78RLqbeKPfg0nZCHMDWnC4BZP
 ET+zAwiueaf64Hu3NWHq8IC74EhgM8ZCzLiVb9CqCyllcVCT2xjdRE8eJtXz5Vgq
 ahsuJmvzGdSIkX6HFh8QKpWdoeRSPbOol+/xD/0fPFf97EiAvMZX5kLgfI+o0rGj
 6fZuENIECp/WHiIqHJ2bsGb69M/OeJfoISxUUVFrCnGduXA59Gnj9zKftNHyNMQZ
 GCu2yHYkkM50RRw9xSO/286Z3mbz84fFRc8PKwWzu7veghuPXYOOKaA4Eleaw/Oy
 Sx92e2OTKjQVGKadHT4HfTd1xabks/9qLGBpx20GuRsfhHt/yJo=
 =ef7P
 -----END PGP SIGNATURE-----

Merge 4.14.315 into android-4.14-stable

Changes in 4.14.315
	wifi: brcmfmac: slab-out-of-bounds read in brcmf_get_assoc_ies()
	bluetooth: Perform careful capability checks in hci_sock_ioctl()
	USB: serial: option: add UNISOC vendor and TOZED LT70C product
	iio: adc: palmas_gpadc: fix NULL dereference on rmmod
	USB: dwc3: fix runtime pm imbalance on unbind
	perf sched: Cast PTHREAD_STACK_MIN to int as it may turn into sysconf(__SC_THREAD_STACK_MIN_VALUE)
	staging: iio: resolver: ads1210: fix config mode
	MIPS: fw: Allow firmware to pass a empty env
	ring-buffer: Sync IRQ works before buffer destruction
	reiserfs: Add security prefix to xattr name in reiserfs_security_write()
	i2c: omap: Fix standard mode false ACK readings
	Revert "ubifs: dirty_cow_znode: Fix memleak in error handling path"
	ubi: Fix return value overwrite issue in try_write_vid_and_data()
	ubifs: Free memory for tmpfile name
	selinux: fix Makefile dependencies of flask.h
	selinux: ensure av_permissions.h is built when needed
	drm/rockchip: Drop unbalanced obj unref
	drm/vgem: add missing mutex_destroy
	drm/probe-helper: Cancel previous job before starting new one
	media: bdisp: Add missing check for create_workqueue
	media: av7110: prevent underflow in write_ts_to_decoder()
	x86/apic: Fix atomic update of offset in reserve_eilvt_offset()
	media: dm1105: Fix use after free bug in dm1105_remove due to race condition
	x86/ioapic: Don't return 0 from arch_dynirq_lower_bound()
	arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
	wifi: ath6kl: minor fix for allocation size
	wifi: ath5k: fix an off by one check in ath5k_eeprom_read_freq_list()
	wifi: ath6kl: reduce WARN to dev_dbg() in callback
	scm: fix MSG_CTRUNC setting condition for SO_PASSSEC
	vlan: partially enable SIOCSHWTSTAMP in container
	net/packet: convert po->origdev to an atomic flag
	net/packet: convert po->auxdata to an atomic flag
	scsi: target: iscsit: Fix TAS handling during conn cleanup
	scsi: megaraid: Fix mega_cmd_done() CMDID_INT_CMDS
	md/raid10: fix leak of 'r10bio->remaining' for recovery
	wifi: iwlwifi: make the loop for card preparation effective
	wifi: iwlwifi: mvm: check firmware response size
	ixgbe: Allow flow hash to be set via ethtool
	ixgbe: Enable setting RSS table to default values
	ipv4: Fix potential uninit variable access bug in __ip_make_skb()
	Revert "Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work"
	net: amd: Fix link leak when verifying config failed
	tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
	pstore: Revert pmsg_lock back to a normal mutex
	linux/vt_buffer.h: allow either builtin or modular for macros
	spi: fsl-spi: Fix CPM/QE mode Litte Endian
	of: Fix modalias string generation
	ia64: mm/contig: fix section mismatch warning/error
	uapi/linux/const.h: prefer ISO-friendly __typeof__
	sh: sq: Fix incorrect element size for allocating bitmap buffer
	usb: chipidea: fix missing goto in `ci_hdrc_probe`
	tty: serial: fsl_lpuart: adjust buffer length to the intended size
	serial: 8250: Add missing wakeup event reporting
	staging: rtl8192e: Fix W_DISABLE# does not work after stop/start
	spmi: Add a check for remove callback when removing a SPMI driver
	macintosh/windfarm_smu_sat: Add missing of_node_put()
	powerpc/mpc512x: fix resource printk format warning
	powerpc/wii: fix resource printk format warnings
	powerpc/sysdev/tsi108: fix resource printk format warnings
	macintosh: via-pmu-led: requires ATA to be set
	powerpc/rtas: use memmove for potentially overlapping buffer copy
	perf/core: Fix hardlockup failure caused by perf throttle
	RDMA/rdmavt: Delete unnecessary NULL check
	power: supply: generic-adc-battery: fix unit scaling
	clk: add missing of_node_put() in "assigned-clocks" property parsing
	IB/hfi1: Fix SDMA mmu_rb_node not being evicted in LRU order
	NFSv4.1: Always send a RECLAIM_COMPLETE after establishing lease
	SUNRPC: remove the maximum number of retries in call_bind_status
	phy: tegra: xusb: Add missing tegra_xusb_port_unregister for usb2_port and ulpi_port
	dmaengine: at_xdmac: do not enable all cyclic channels
	parisc: Fix argument pointer in real64_call_asm()
	nilfs2: do not write dirty data after degenerating to read-only
	nilfs2: fix infinite loop in nilfs_mdt_get_block()
	wifi: rtl8xxxu: RTL8192EU always needs full init
	clk: rockchip: rk3399: allow clk_cifout to force clk_cifout_src to reparent
	btrfs: scrub: reject unsupported scrub flags
	s390/dasd: fix hanging blockdevice after request requeue
	dm integrity: call kmem_cache_destroy() in dm_integrity_init() error path
	dm flakey: fix a crash with invalid table line
	dm ioctl: fix nested locking in table_clear() to remove deadlock concern
	perf auxtrace: Fix address filter entire kernel size
	netfilter: nf_tables: split set destruction in deactivate and destroy phase
	netfilter: nf_tables: unbind set in rule from commit path
	netfilter: nft_hash: fix nft_hash_deactivate
	netfilter: nf_tables: use-after-free in failing rule with bound set
	netfilter: nf_tables: bogus EBUSY when deleting set after flush
	netfilter: nf_tables: deactivate anonymous set from preparation phase
	sit: update dev->needed_headroom in ipip6_tunnel_bind_dev()
	writeback: fix call of incorrect macro
	net/sched: act_mirred: Add carrier check
	af_packet: Don't send zero-byte data in packet_sendmsg_spkt().
	ALSA: caiaq: input: Add error handling for unsupported input methods in `snd_usb_caiaq_input_init`
	perf vendor events power9: Remove UTF-8 characters from JSON files
	perf map: Delete two variable initialisations before null pointer checks in sort__sym_from_cmp()
	perf symbols: Fix return incorrect build_id size in elf_read_build_id()
	btrfs: fix btrfs_prev_leaf() to not return the same key twice
	btrfs: print-tree: parent bytenr must be aligned to sector size
	cifs: fix pcchunk length type in smb2_copychunk_range
	sh: math-emu: fix macro redefined warning
	sh: nmi_debug: fix return value of __setup handler
	ARM: dts: exynos: fix WM8960 clock name in Itop Elite
	ARM: dts: s5pv210: correct MIPI CSIS clock name
	HID: wacom: Set a default resolution for older tablets
	ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum
	ext4: improve error recovery code paths in __ext4_remount()
	ext4: add bounds checking in get_max_inline_xattr_value_size()
	ext4: bail out of ext4_xattr_ibody_get() fails for any reason
	ext4: remove a BUG_ON in ext4_mb_release_group_pa()
	ext4: fix invalid free tracking in ext4_xattr_move_to_block()
	perf bench: Share some global variables to fix build with gcc 10
	tty: Prevent writing chars during tcsetattr TCSADRAIN/FLUSH
	serial: 8250: Fix serial8250_tx_empty() race with DMA Tx
	drbd: correctly submit flush bio on barrier
	printk: declare printk_deferred_{enter,safe}() in include/linux/printk.h
	mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
	Linux 4.14.315

Change-Id: I7e3fda05118b08edc995f33280f9eec1f563b951
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-05-18 15:49:55 +00:00
Tetsuo Handa
63c79247fe mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
commit 1007843a91909a4995ee78a538f62d8665705b66 upstream.

syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.

One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.

  CPU0
  ----
  __build_all_zonelists() {
    write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
    // e.g. timer interrupt handler runs at this moment
      some_timer_func() {
        kmalloc(GFP_ATOMIC) {
          __alloc_pages_slowpath() {
            read_seqbegin(&zonelist_update_seq) {
              // spins forever because zonelist_update_seq.seqcount is odd
            }
          }
        }
      }
    // e.g. timer interrupt handler finishes
    write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
  }

This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests.  But Michal Hocko does not know whether we should go with this
approach.

Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.

  CPU0                                   CPU1
  ----                                   ----
  pty_write() {
    tty_insert_flip_string_and_push_buffer() {
                                         __build_all_zonelists() {
                                           write_seqlock(&zonelist_update_seq);
                                           build_zonelists() {
                                             printk() {
                                               vprintk() {
                                                 vprintk_default() {
                                                   vprintk_emit() {
                                                     console_unlock() {
                                                       console_flush_all() {
                                                         console_emit_next_record() {
                                                           con->write() = serial8250_console_write() {
      spin_lock_irqsave(&port->lock, flags);
      tty_insert_flip_string() {
        tty_insert_flip_string_fixed_flag() {
          __tty_buffer_request_room() {
            tty_buffer_alloc() {
              kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
                __alloc_pages_slowpath() {
                  zonelist_iter_begin() {
                    read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
                                                             spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
                    }
                  }
                }
              }
            }
          }
        }
      }
      spin_unlock_irqrestore(&port->lock, flags);
                                                             // message is printed to console
                                                             spin_unlock_irqrestore(&port->lock, flags);
                                                           }
                                                         }
                                                       }
                                                     }
                                                   }
                                                 }
                                               }
                                             }
                                           }
                                           write_sequnlock(&zonelist_update_seq);
                                         }
    }
  }

This deadlock scenario can be eliminated by

  preventing interrupt context from calling kmalloc(GFP_ATOMIC)

and

  preventing printk() from calling console_flush_all()

while zonelist_update_seq.seqcount is odd.

Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by

  disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)

and

  disabling synchronous printk() in order to avoid console_flush_all()

.

As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced.  Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...

Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp
Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>
  Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Patrick Daly <quic_pdaly@quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17 11:11:51 +02:00
Greg Kroah-Hartman
7b854fbace This is the 4.14.313 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmRBDasACgkQONu9yGCS
 aT6b+RAA10Y7oyJ3XTY4Iezj9155aG+8pQdraHCUeQ2mQSf5vQXszDZY466dsaam
 7ONyW4cjZBBcQHAfiN2LYIPBmEq27ooDBoUZt8r9xX2I/xXSrYKJ64sI7QObpXz/
 fJ5H94lLaxkldYmXl/o6fVstRcn5dPJ0FXaKvdWLwD/G/3y6Z/odFEmmbeZiHEtm
 G4owwbKMDxJ82sDBi9jTOVFy3ciINDbixydGF1g8VkV3aL2mk8lPd5nPsSxf1b3N
 GE+gKHIlW44/TuObYPewd6c9uQerIk7RG/pgo3z2vda0i2X3WYxF1bYmCjeHuoKE
 zmv3/mtltymRQf2nszyWcK3mEuGiQVOb4ikx0sDoo02+9YVF2kC/hs/vFJE8MR8J
 3IkgMy675EEwQcoK21W8PqYhXwyJNaf53PWsxa5J6FdGby/9BJnQ94K3Ri06SlAi
 6fB1xXvc+qRm0+ARssxO4e/d3zTZlhFgKwvrCyt2vQEvAZc4+NksrPeGpzMkIKLj
 44fBwo+tDZ4Xg7rfYS+/lsN0ZxvkMdz06AF54MRGPSxjDIGqU94/jrZ1oqb3uvtl
 ta5LZsZvTXXUIFhrfi65/yBoEhAvGpkYbVcCeqqA+U97mtQ2yd24fV8oHwYVGu/g
 zoYfPIlWxrRx9TN1W6wwQvJxfdPbK67W5akfikqvB8fHeX7/xMw=
 =/dv7
 -----END PGP SIGNATURE-----

Merge 4.14.313 into android-4.14-stable

Changes in 4.14.313
	pwm: cros-ec: Explicitly set .polarity in .get_state()
	wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta
	icmp: guard against too small mtu
	ipv6: Fix an uninit variable access bug in __ip6_make_skb()
	gpio: davinci: Add irq chip flag to skip set wake
	USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
	USB: serial: option: add Telit FE990 compositions
	USB: serial: option: add Quectel RM500U-CN modem
	iio: dac: cio-dac: Fix max DAC write value check for 12-bit
	tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
	nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
	nilfs2: fix sysfs interface lifetime
	perf/core: Fix the same task check in perf_event_set_output
	ftrace: Mark get_lock_parent_ip() __always_inline
	ring-buffer: Fix race while reader and writer are on the same page
	mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
	ALSA: emu10k1: fix capture interrupt handler unlinking
	ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
	ALSA: i2c/cs8427: fix iec958 mixer control deactivation
	ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
	Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
	Bluetooth: Fix race condition in hidp_session_thread
	mtdblock: tolerate corrected bit-flips
	9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
	niu: Fix missing unwind goto in niu_alloc_channels()
	qlcnic: check pci_reset_function result
	net: macb: fix a memory corruption in extended buffer descriptor mode
	i2c: imx-lpi2c: clean rx/tx buffers upon new message
	efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
	verify_pefile: relax wrapper length check
	ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
	cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
	watchdog: sbsa_wdog: Make sure the timeout programming is within the limits
	coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
	KVM: arm64: Factor out core register ID enumeration
	KVM: arm64: Filter out invalid core register IDs in KVM_GET_REG_LIST
	arm64: KVM: Fix system register enumeration
	Linux 4.14.313

Change-Id: I9dcef9855d47e02e4ccbfcc7dd59e976c6ab9fb1
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-21 13:29:20 +00:00
Rongwei Wang
111a79d9b9 mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
commit 6fe7d6b992113719e96744d974212df3fcddc76c upstream.

The si->lock must be held when deleting the si from the available list.
Otherwise, another thread can re-add the si to the available list, which
can lead to memory corruption.  The only place we have found where this
happens is in the swapoff path.  This case can be described as below:

core 0                       core 1
swapoff

del_from_avail_list(si)      waiting

try lock si->lock            acquire swap_avail_lock
                             and re-add si into
                             swap_avail_head

acquire si->lock but missing si already being added again, and continuing
to clear SWP_WRITEOK, etc.

It can be easily found that a massive warning messages can be triggered
inside get_swap_pages() by some special cases, for example, we call
madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile,
run much swapon-swapoff operations (e.g.  stress-ng-swap).

However, in the worst case, panic can be caused by the above scene.  In
swapoff(), the memory used by si could be kept in swap_info[] after
turning off a swap.  This means memory corruption will not be caused
immediately until allocated and reset for a new swap in the swapon path.
A panic message caused: (with CONFIG_PLIST_DEBUG enabled)

------------[ cut here ]------------
top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a
prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d
next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a
WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70
Modules linked in: rfkill(E) crct10dif_ce(E)...
CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+
Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
pc : plist_check_prev_next_node+0x50/0x70
lr : plist_check_prev_next_node+0x50/0x70
sp : ffff0018009d3c30
x29: ffff0018009d3c40 x28: ffff800011b32a98
x27: 0000000000000000 x26: ffff001803908000
x25: ffff8000128ea088 x24: ffff800011b32a48
x23: 0000000000000028 x22: ffff001800875c00
x21: ffff800010f9e520 x20: ffff001800875c00
x19: ffff001800fdc6e0 x18: 0000000000000030
x17: 0000000000000000 x16: 0000000000000000
x15: 0736076307640766 x14: 0730073007380731
x13: 0736076307640766 x12: 0730073007380731
x11: 000000000004058d x10: 0000000085a85b76
x9 : ffff8000101436e4 x8 : ffff800011c8ce08
x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff0017df9ed338 x4 : 0000000000000001
x3 : ffff8017ce62a000 x2 : ffff0017df9ed340
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 plist_check_prev_next_node+0x50/0x70
 plist_check_head+0x80/0xf0
 plist_add+0x28/0x140
 add_to_avail_list+0x9c/0xf0
 _enable_swap_info+0x78/0xb4
 __do_sys_swapon+0x918/0xa10
 __arm64_sys_swapon+0x20/0x30
 el0_svc_common+0x8c/0x220
 do_el0_svc+0x2c/0x90
 el0_svc+0x1c/0x30
 el0_sync_handler+0xa8/0xb0
 el0_sync+0x148/0x180
irq event stamp: 2082270

Now, si->lock locked before calling 'del_from_avail_list()' to make sure
other thread see the si had been deleted and SWP_WRITEOK cleared together,
will not reinsert again.

This problem exists in versions after stable 5.10.y.

Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.com
Fixes: a2468cc9bfdff ("swap: choose swap device according to numa node")
Tested-by: Yongchen Yin <wb-yyc939293@alibaba-inc.com>
Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:02:11 +02:00
Patrick Daly
b4e4425181 ANDROID: mm/filemap: Fix missing put_page() for speculative page fault
find_get_page() returns a page with increased refcount, assuming a page
exists at the given index. Ensure this refcount is dropped on error.

Bug: 271079833
Fixes: 59d4d125 ("BACKPORT: FROMLIST: mm: implement speculative handling in filemap_fault()")
Change-Id: Idc7b9e3f11f32a02bed4c6f4e11cec9200a5c790
Signed-off-by: Patrick Daly <quic_pdaly@quicinc.com>
(cherry picked from commit 6232eecfa7ca0d8d0ca088da6d0edb2c3a879ff9)
Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Git-commit: 1d05213028b6dbdb8801e20f29b6a6f91c216033
Git-repo: https://android.googlesource.com/kernel/common/
Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com>
2023-04-05 14:30:47 +05:30
Kalesh Singh
765b588f3b ANDROID: Re-enable fast mremap and fix UAF with SPF
SPF attempts page faults without taking the mmap lock, but takes the
PTL. If there is a concurrent fast mremap (at PMD/PUD level), this
can lead to a UAF as fast mremap will only take the PTL locks at the
PMD/PUD level. SPF cannot take the PTL locks at the larger subtree
granularity since this introduces much contention in the page fault
paths.

To address the race:
  1) Only try fast mremaps if there are no users of the VMA. Android
     is concerned with this optimization in the context of
     GC stop-the-world pause. So there are no other threads active
     and this should almost always succeed.
  2) Speculative faults detect ongoing fast mremaps and fallback
     to conventional fault handling (taking mmap read lock).

Bug: 263177905
Change-Id: I23917e493ddc8576de19883cac053dfde9982b7f
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Git-commit: 529351c4c8202aa7f5bc4a8a100e583a70ab6110
Git-repo: https://android.googlesource.com/kernel/common/
[quic_c_spathi@quicinc.com: resolve merge conflicts. Not applying
 mremap changes due to absence of applicable configs for race.]
Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com>
2023-04-05 14:30:42 +05:30
Suren Baghdasaryan
21eb97e426 ANDROID: mm: fix invalid backport in speculative page fault path
Invalid condition was introduced when porting the original SPF patch
which would affect NUMA mode.

Fixes: 736ae8bde8da3 ("FROMLIST: mm: adding speculative page fault failure trace events")
Bug: 257443051
Change-Id: Ib20c625615b279dc467588933a1f598dc179861b
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Git-commit: 1900436df5d947c2ee74bd78cde1366556c93b51
Git-repo: https://android.googlesource.com/kernel/common/
[quic_c_spathi@quicinc.com: resolve trivial merge conflicts]
Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com>
2023-04-05 14:30:39 +05:30