mirror of
https://github.com/rd-stuffs/msm-4.14.git
synced 2025-02-20 11:45:48 +08:00
13459 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
24cc942015 |
mm/mglru: Don't sync disk for each aging cycle
wakeup_flusher_threads() was added under the assumption that if a system runs out of clean cold pages, it might want to write back dirty pages more aggressively so that they can become clean and be dropped. However, doing so can breach the rate limit a system wants to impose on writeback, resulting in early SSD wearout. Link: https://lkml.kernel.org/r/YzSiWq9UEER5LKup@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Reported-by: Axel Rasmussen <axelrasmussen@google.com> Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Change-Id: Ib4def4286264de926b11ec5247185edc3a780619 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
d97f51cb57 |
mm: Revert some hacks
This reverts commits: - 0fc9fbd21297173aa822f97fe33a481053cb96ec [mm + sysctl: tune swappiness and make some values read only] - 94181990a4ea1a20bb8bf443f3fbe500d05901c3 [mm: Import oplus memory management hacks] - 97bdd381c8292d43e68ff55bd08767db17e62810 [mm: Set swappiness for CONFIG_INCREASE_MAXIMUM_SWAPPINESS=y case] - fa8d2aa0e20da6b943157f6ab58068bd80d68920 [mm: move variable under a proper #ifdef] - f9daeaa423b745b2c2c34a6fb5ac6b69daf746c4 [mm: merge Samsung mm hacks] - 1a460a832c9c6550f5cbe32dca4c15cf89806b57 [mm: Make watermark_scale_factor read-only] - 963a3bfe3352b45ea21c58d53055689e46d81eeb [mm: Tune parameters for Android] Change-Id: I70495ca93a05384a2d7bc2498fd2d56bd9928390 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
a9566ccc56 |
msm-4.14: Make macros no-op using ((void)0)
Do not solely rely on compiler optimizations to get the workaround of having macros do nothing using an empty do-while loop. It's inefficient. Use ((void)0) to which the standard assert macro expands when NDEBUG is defined. No functional change intended. [mcdofrenchfreis]: Implement this patch to tree using the command: git grep -l "do {} while (0)" | xargs sed -i "s/do {} while (0)/((void)0)/g" Change-Id: I9615c62c46670e31ed8d0d89d195144541baa3e6 Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: mcdofrenchfreis <xyzevan@androidist.net> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
12290e8a8b |
Revert "mm: process reclaim: vmpressure based process reclaim"
- This reverts commit 7964b3ce47f0d87fbbb1cfdd1fb4aadb620133dd as QCOM vmpressure driven process reclaim is redundant compared to Linux PPR which meets userspace dependencies. Change-Id: I46782f69c57febed99002681ee268fa4a3111d59 Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
06425a87ef |
Revert "lowmemorykiller: Introduce sysfs node for ALMK and PPR adj threshold"
- This reverts commit f326985b26c272b4a9bcc250e7cf6af28b7c3398 as it does not meet userspace dependencies. Change-Id: I8aaefeea7cc3dcab1d4a8c94723be238616c9474 Signed-off-by: Cyber Knight <cyberknight755@gmail.com> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
46c8c57dc0 |
mm: reclaim: Bump maximum pages per reclaim attempt
4 MiB isn't a meaningful amount to reclaim. Bump the limit to 32 MiB instead. Change-Id: I92fc9b35d121e6b39bced13d549e59d9e8e668e8 Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
6c1efde3d9 |
mm/oom-kill: Adjust oom_badness per Android expectations
Allow selecting only tasks with oom_score_adj >= 0. Change-Id: Iebbb487c711da98b8fcb367ba838b5fe0b260d4f Signed-off-by: Patrick Daly <pdaly@codeaurora.org> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
2cd059fb56 |
This is the 4.14.354 OpenELA-Extended LTS stable release
-----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEERFwmR4yFob14UDOYC8702P6YulgFAmcgko0ZHHZlZ2FyZC5u b3NzdW1Ab3JhY2xlLmNvbQAKCRALzvTY/pi6WL/GD/0em+uP/O8QiPYqeGrEECpW bgRsBiN3XnyEsghAjplWX12G/zjxA0PY0u2zh9K9sdPw60n8nVZ1OxvPHINwuSC9 kE9N60SCpJ88ju9OtU+4xz/nxtEmlel8fWy5elagB5wqbWbvsjT52ceZXqSxqhy7 pQdIDHSiUUwx9JL6vDuJSL+Z/Y216qvBETZLnDSo90raFp/MDa5JmQsh81lLeUt8 wGKwC/Olnbd21QTStNK34aQGyX5b+3YeACFVPud66Zs9airz9EE6Yq78gwL29L2k 4jxzihXxSkkfa66eR63ap53+/mEqOZX72m2qEMVOvAcAwU0XsNDTdkXN7z8YQ5T3 E1rJwr4Ox0hmM+hHBA20w9xRDXZoZmdrcjsU1aNKuK2zTJ0h9DBIvMM2XY5n5sWK I4F8E15KyKmu4nXBETreXZixqVLZMgjNFncRLf8XBIL1kxXm65LYCHypp3AgdVgo Ccdq5PbC6LAyNPrIOaftIaS9VlU15cqcalu7A+gSoWq55LGWAa3G9vX0ZtYQB9QX 0R18fbzyjqG6Wa5J5KRDJ+HyS4IvdnEWS8hMR3jfosjMNgJhfDlDeev8NARBiDpX d26xogNA7xOOvtdpuwEbnxD5kR0zUdnC73pC4wxdMptYSK6ULKNPmTkA0dKE9qvl TDgw4DML8vXQqJ4P+w3Njw== =gX2R -----END PGP SIGNATURE----- Merge tag 'v4.14.354-openela' of https://github.com/openela/kernel-lts This is the 4.14.354 OpenELA-Extended LTS stable release * tag 'v4.14.354-openela' of https://github.com/openela/kernel-lts: (90 commits) LTS: Update to 4.14.354 drm/fb-helper: set x/yres_virtual in drm_fb_helper_check_var ipc: remove memcg accounting for sops objects in do_semtimedop() scsi: aacraid: Fix double-free on probe failure usb: core: sysfs: Unmerge @usb3_hardware_lpm_attr_group in remove_power_attributes() usb: dwc3: st: fix probed platform device ref count on probe error path usb: dwc3: core: Prevent USB core invalid event buffer address access usb: dwc3: omap: add missing depopulate in probe error path USB: serial: option: add MeiG Smart SRM825L cdc-acm: Add DISABLE_ECHO quirk for GE HealthCare UI Controller net: busy-poll: use ktime_get_ns() instead of local_clock() gtp: fix a potential NULL pointer dereference net: prevent mss overflow in skb_segment() ida: Fix crash in ida_free when the bitmap is empty net:rds: Fix possible deadlock in rds_message_put fbmem: Check virtual screen sizes in fb_set_var() fbcon: Prevent that screen size is smaller than font size printk: Export is_console_locked memcg: enable accounting of ipc resources cgroup/cpuset: Prevent UAF in proc_cpuset_show() ... Change-Id: I7da4d8d188dec9d2833216e5d6580dbd72b99240 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
6f2b82ee5c |
memcg_write_event_control(): fix a user-triggerable oops
commit 046667c4d3196938e992fba0dfcde570aa85cd0e upstream. we are *not* guaranteed that anything past the terminating NUL is mapped (let alone initialized with anything sane). Fixes: 0dea116876ee ("cgroup: implement eventfd-based generic API for notifications") Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit fa5bfdf6cb5846a00e712d630a43e3cf55ccb411) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> |
||
|
1b9f971175 |
This is the 4.14.353 OpenELA-Extended LTS stable release
-----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEERFwmR4yFob14UDOYC8702P6YulgFAmcHrG8ZHHZlZ2FyZC5u b3NzdW1Ab3JhY2xlLmNvbQAKCRALzvTY/pi6WIQuEACCYf9xCGBALlKFb0pXX3eF oiRkceNyy5NWSndD7t9p/3d2g4YrVptGxtTZN12IltfG4wfCQ+qC/0g2Mu4ho0Yp 2ExKVaIli1t2csIjXCUUyjh3jU0JOkDwJap9n5QemACsX8zrDfKVwdlj9hw+e7vi fBWwdfl1duK5cfVbbyvL74It4WeMnjuAYrBnMTxhYBTq56xFLrbBILl8BLxAV5NN 5wGoNCeUtj8LxUrL2qs5QoT3Bf7uoDlLnu1Ly7jDMMX34/oNh5huOjZdDFbQYxS3 DsEe6ljOYOyB/awdUhScERfxVPimumN3nHWnRJbsQhX36uXT6U7HNJah4zauchRk UlKUSfG3YyOqKIwFH+8oGmkuCm6wZbVjVsNNkYhT804BCCHrasJ1SHXsSB9R0MpU x3IQOoiuc33bUYrSqWAO7utvt+PwG++3GHz0XQwPfZn4DHY18/e+VNsGtQTPqzRG tsywZVTN0DC0nO7L772nkQDb7z2mhmJGgN8q3FPbMTfp/I1phIh9C17pckfpHKAl ippTmTMaIYDU3Rlc1g/cu363GOaXWRN4t03VSEu/BLV0IElRktUnmuBU3B/rMb+F ItaBmhnZGXHUrulMTxDtzItrYMwx00USw6IrG3iYjob0MhhxhLVxEh0vKc7Te2w5 2FZEjj2BxinK66mJgAolZw== =BQd/ -----END PGP SIGNATURE----- Merge tag 'v4.14.353-openela' of https://github.com/openela/kernel-lts This is the 4.14.353 OpenELA-Extended LTS stable release * tag 'v4.14.353-openela' of https://github.com/openela/kernel-lts: (173 commits) LTS: Update to 4.14.353 net: fix __dst_negative_advice() race selftests: make order checking verbose in msg_zerocopy selftest selftests: fix OOM in msg_zerocopy selftest Revert "selftests/net: reap zerocopy completions passed up as ancillary data." Revert "selftests: fix OOM in msg_zerocopy selftest" Revert "selftests: make order checking verbose in msg_zerocopy selftest" nvme/pci: Add APST quirk for Lenovo N60z laptop exec: Fix ToCToU between perm check and set-uid/gid usage drm/i915/gem: Fix Virtual Memory mapping boundaries calculation drm/i915: Try GGTT mmapping whole object as partial netfilter: nf_tables: set element extended ACK reporting support kbuild: Fix '-S -c' in x86 stack protector scripts drm/mgag200: Set DDC timeout in milliseconds drm/bridge: analogix_dp: properly handle zero sized AUX transactions drm/bridge: analogix_dp: Properly log AUX CH errors drm/bridge: analogix_dp: Reset aux channel if an error occurred drm/bridge: analogix_dp: Check AUX_EN status when doing AUX transfer x86/mtrr: Check if fixed MTRRs exist before saving them tracing: Fix overflow in get_free_elt() ... Change-Id: I0e92a979e31d4fa6c526c6b70a1b61711d9747bb Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
1967ea8b28 |
mm: avoid overflows in dirty throttling logic
[ Upstream commit 385d838df280eba6c8680f9777bfa0d0bfe7e8b2 ] The dirty throttling logic is interspersed with assumptions that dirty limits in PAGE_SIZE units fit into 32-bit (so that various multiplications fit into 64-bits). If limits end up being larger, we will hit overflows, possible divisions by 0 etc. Fix these problems by never allowing so large dirty limits as they have dubious practical value anyway. For dirty_bytes / dirty_background_bytes interfaces we can just refuse to set so large limits. For dirty_ratio / dirty_background_ratio it isn't so simple as the dirty limit is computed from the amount of available memory which can change due to memory hotplug etc. So when converting dirty limits from ratios to numbers of pages, we just don't allow the result to exceed UINT_MAX. This is root-only triggerable problem which occurs when the operator sets dirty limits to >16 TB. Link: https://lkml.kernel.org/r/20240621144246.11148-2-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Zach O'Keefe <zokeefe@google.com> Reviewed-By: Zach O'Keefe <zokeefe@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 2b2d2b8766db028bd827af34075f221ae9e9efff) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> |
||
|
fd082a8cad |
Revert "mm: shuffle initial free memory to improve memory-side-cache utilization"
This reverts commit 6e8d8f0cd2dfd44c9cf01ba432f516577c6f99a7. Change-Id: I08c5f066bfe4ff8c34f06f3b6ad50c216e9d076d Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
b6eaf5c762 |
Revert "mm: move buddy list manipulations into helpers"
This reverts commit be1968b3980cb973d3034605735ab12e1fa4672a. Change-Id: I203cdaca4f6fa23584a7b4c1ea15c73ff2137227 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
555c41b976 |
Revert "mm: maintain randomization of page free lists"
This reverts commit d8277c8ef105edcfb0c442e73db73669c7ca27b8. Change-Id: Id1616a87c6755b55d108e3e529a8df825851154a Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
be1ff8e638 |
msm-4.14: Revert some unsafe optimizations
Change-Id: I2c268f87ab8d9154758384c7a7639046c3784eb8 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
f88112e714 |
mm: Distinguish blockable mode for mmu notifiers
There are several blockable mmu notifiers which might sleep in mmu_notifier_invalidate_range_start and that is a problem for the oom_reaper because it needs to guarantee a forward progress so it cannot depend on any sleepable locks. Currently we simply back off and mark an oom victim with blockable mmu notifiers as done after a short sleep. That can result in selecting a new oom victim prematurely because the previous one still hasn't torn its memory down yet. We can do much better though. Even if mmu notifiers use sleepable locks there is no reason to automatically assume those locks are held. Moreover majority of notifiers only care about a portion of the address space and there is absolutely zero reason to fail when we are unmapping an unrelated range. Many notifiers do really block and wait for HW which is harder to handle and we have to bail out though. This patch handles the low hanging fruit. __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks are not allowed to sleep if the flag is set to false. This is achieved by using trylock instead of the sleepable lock for most callbacks and continue as long as we do not block down the call chain. I think we can improve that even further because there is a common pattern to do a range lookup first and then do something about that. The first part can be done without a sleeping lock in most cases AFAICS. The oom_reaper end then simply retries if there is at least one notifier which couldn't make any progress in !blockable mode. A retry loop is already implemented to wait for the mmap_sem and this is basically the same thing. The simplest way for driver developers to test this code path is to wrap userspace code which uses these notifiers into a memcg and set the hard limit to hit the oom. This can be done e.g. after the test faults in all the mmu notifier managed memory and set the hard limit to something really small. Then we are looking for a proper process tear down. [akpm@linux-foundation.org: coding style fixes] [akpm@linux-foundation.org: minor code simplification] Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp Reported-by: David Rientjes <rientjes@google.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Change-Id: Ibf089b0ebbbfa7182eeca314b757caf456969758 Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
59448bba68 |
This is the 4.14.351 OpenELA-Extended LTS stable release
-----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEERFwmR4yFob14UDOYC8702P6YulgFAmbJhFAZHHZlZ2FyZC5u b3NzdW1Ab3JhY2xlLmNvbQAKCRALzvTY/pi6WM9AD/9T4mE7CXds1QYHF3wzFinF t4oHyXvOiY4Mrsdy20A4FIvYfrYi5PyZ39E7G38e2FH2jG7qwHTyHXOjh94cL9gV 5zlU7+jxWQenDKTl6LV3veYP/QNp9Yh9iQn0sgwC3HTUeq+zNd8rxvBjcAfDNiIM taC98s63QjtjZtQPzAaS461LH/U14dKFChuPEC36dei/M4T2UDTHZqvRdFBWB8h2 fC/dJgtuohXTFexpGgk8p6GKNpFjyE62hBI3Xc+/k24j88r0cFqLLp6NhgF6JIpc 6L6zGUKeyLXaIR/xoshK3MdgJ/XbocqKlRexJOFxCYmAEreAnQelS8v7QG3j6j33 8AiUasZfpDPFNEH1CNJC0BiNs76NByFCJny+QUYlq0O9ZjfYQt+PvZZXSCx8jIn6 A75ryAXLERNlXvh5XuEXlNJsOrN3enWnhgeJXMJOfKxtOfn7CRLmfSvpiS2/SfT3 sxU4aNQNenbYoWwPQRPLXfNO4UvkmLfk6I6+AqRiHdykYQswhZRnpWxsPRSUwrhI 6mErDGIXmryid/p+P/eMuviH3AO+KEpjoDzLFMFJWMpLQouTDl5qCwGu3QwVjybS /MOlfhi5z1so1e5qBIUmY498jZfVbZ5VMC76bOdhtC2USmvotcBSu611x5JtPaZo Cv3jKYl+/S0DVIZdEMPA8g== =wFNA -----END PGP SIGNATURE----- Merge tag 'v4.14.351-openela' of https://github.com/openela/kernel-lts This is the 4.14.351 OpenELA-Extended LTS stable release * tag 'v4.14.351-openela' of https://github.com/openela/kernel-lts: (58 commits) LTS: Update to 4.14.351 i2c: rcar: bring hardware to known state when probing nilfs2: fix kernel bug on rename operation of broken directory tcp: use signed arithmetic in tcp_rtx_probe0_timed_out() libceph: fix race between delayed_work() and ceph_monc_stop() hpet: Support 32-bit userspace USB: core: Fix duplicate endpoint bug by clearing reserved bits in the descriptor usb: gadget: configfs: Prevent OOB read/write in usb_string_copy() USB: Add USB_QUIRK_NO_SET_INTF quirk for START BP-850k USB: serial: option: add Rolling RW350-GL variants USB: serial: option: add Netprisma LCUK54 series modules USB: serial: option: add support for Foxconn T99W651 USB: serial: option: add Fibocom FM350-GL USB: serial: option: add Telit FN912 rmnet compositions USB: serial: option: add Telit generic core-dump composition ARM: davinci: Convert comma to semicolon ppp: reject claimed-as-LCP but actually malformed packets net: ethernet: lantiq_etop: fix double free in detach net: lantiq_etop: add blank line after declaration tcp: fix incorrect undo caused by DSACK of TLP retransmit ... Change-Id: I8bb6496007a068b83dd95a991e2f3afb0e18da82 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
6949c52837 |
Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
commit 30139c702048f1097342a31302cbd3d478f50c63 upstream. Patch series "mm: Avoid possible overflows in dirty throttling". Dirty throttling logic assumes dirty limits in page units fit into 32-bits. This patch series makes sure this is true (see patch 2/2 for more details). This patch (of 2): This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78. The commit is broken in several ways. Firstly, the removed (u64) cast from the multiplication will introduce a multiplication overflow on 32-bit archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the default settings with 4GB of RAM will trigger this). Secondly, the div64_u64() is unnecessarily expensive on 32-bit archs. We have div64_ul() in case we want to be safe & cheap. Thirdly, if dirty thresholds are larger than 1<<32 pages, then dirty balancing is going to blow up in many other spectacular ways anyway so trying to fix one possible overflow is just moot. Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-By: Zach O'Keefe <zokeefe@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 253f9ea7e8e53a5176bd80ceb174907b10724c1a) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> |
||
|
b470582782 |
devfreq_boost: Update and expand to handle CPUBW/LLCCBW boosting
This will enable us to more accurately replicate Pixel 4 userspace DDR boosting by adding support for both CPUBW and LLCCBW devices. https://android.googlesource.com/device/google/coral/+/refs/heads/master/init.power.rc https://android.googlesource.com/device/google/coral/+/refs/heads/master/powerhint.json Change-Id: Iae78fa01f96de8338725fa9309e5eee6d8c82313 Signed-off-by: idkwhoiam322 <idkwhoiam322@raphielgang.org> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
3f6b79925f |
mm: Boost CPU when memory pressure becomes high
Change-Id: Id1b9978d0d68612af02aee88e53af9645a5951ca Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
c0cc305165 |
msm-4.14: Add CPU LLCC bus boost triggers
Change-Id: I8ed4fcc32e643e871dda2e40fb86b3fc3f4326a4 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
9fb5e9427f |
Revert "mm: Reduce swappiness to 60"
This reverts commit 59e33ada5add42ab758928f2c2a50cad13eb3db6. Change-Id: I758823369921deae4d72fc9756d4a430a140c3a9 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
59e33ada5a |
mm: Reduce swappiness to 60
Change-Id: I4bc1bb23b05d3cdbcaedef287bf34743a913c7e6 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
3143685e95 |
Merge branch 'linux-4.14.y' of https://github.com/openela/kernel-lts
* 'linux-4.14.y' of https://github.com/openela/kernel-lts: (278 commits) LTS: Update to 4.14.348 docs: kernel_include.py: Cope with docutils 0.21 serial: kgdboc: Fix NMI-safety problems from keyboard reset code btrfs: add missing mutex_unlock in btrfs_relocate_sys_chunks() dm: limit the number of targets and parameter size area Revert "selftests: mm: fix map_hugetlb failure on 64K page size systems" LTS: Update to 4.14.347 rds: Fix build regression. RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc(). net: fix out-of-bounds access in ops_init drm/vmwgfx: Fix invalid reads in fence signaled events dyndbg: fix old BUG_ON in >control parser tipc: fix UAF in error path usb: gadget: f_fs: Fix a race condition when processing setup packets. usb: gadget: composite: fix OS descriptors w_value logic firewire: nosy: ensure user_length is taken into account when fetching packet contents af_unix: Fix garbage collector racing against connect() af_unix: Do not use atomic ops for unix_sk(sk)->inflight. ipv6: fib6_rules: avoid possible NULL dereference in fib6_rule_action() ... Change-Id: If329d39dd4e95e14045bb7c58494c197d1352d60 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
e917dc0ff3 |
x86/mm/pat: fix VM_PAT handling in COW mappings
commit 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 upstream. PAT handling won't do the right thing in COW mappings: the first PTE (or, in fact, all PTEs) can be replaced during write faults to point at anon folios. Reliably recovering the correct PFN and cachemode using follow_phys() from PTEs will not work in COW mappings. Using follow_phys(), we might just get the address+protection of the anon folio (which is very wrong), or fail on swap/nonswap entries, failing follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and track_pfn_copy(), not properly calling free_pfn_range(). In free_pfn_range(), we either wouldn't call memtype_free() or would call it with the wrong range, possibly leaking memory. To fix that, let's update follow_phys() to refuse returning anon folios, and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings if we run into that. We will now properly handle untrack_pfn() with COW mappings, where we don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if the first page was replaced by an anon folio, though: we'd have to store the cachemode in the VMA to make this work, likely growing the VMA size. For now, lets keep it simple and let track_pfn_copy() just fail in that case: it would have failed in the past with swap/nonswap entries already, and it would have done the wrong thing with anon folios. Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn(): <--- C reproducer ---> #include <stdio.h> #include <sys/mman.h> #include <unistd.h> #include <liburing.h> int main(void) { struct io_uring_params p = {}; int ring_fd; size_t size; char *map; ring_fd = io_uring_setup(1, &p); if (ring_fd < 0) { perror("io_uring_setup"); return 1; } size = p.sq_off.array + p.sq_entries * sizeof(unsigned); /* Map the submission queue ring MAP_PRIVATE */ map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, ring_fd, IORING_OFF_SQ_RING); if (map == MAP_FAILED) { perror("mmap"); return 1; } /* We have at least one page. Let's COW it. */ *map = 0; pause(); return 0; } <--- C reproducer ---> On a system with 16 GiB RAM and swap configured: # ./iouring & # memhog 16G # killall iouring [ 301.552930] ------------[ cut here ]------------ [ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100 [ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g [ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1 [ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4 [ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100 [ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000 [ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282 [ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047 [ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200 [ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000 [ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000 [ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000 [ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000 [ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0 [ 301.565725] PKRU: 55555554 [ 301.565944] Call Trace: [ 301.566148] <TASK> [ 301.566325] ? untrack_pfn+0xf4/0x100 [ 301.566618] ? __warn+0x81/0x130 [ 301.566876] ? untrack_pfn+0xf4/0x100 [ 301.567163] ? report_bug+0x171/0x1a0 [ 301.567466] ? handle_bug+0x3c/0x80 [ 301.567743] ? exc_invalid_op+0x17/0x70 [ 301.568038] ? asm_exc_invalid_op+0x1a/0x20 [ 301.568363] ? untrack_pfn+0xf4/0x100 [ 301.568660] ? untrack_pfn+0x65/0x100 [ 301.568947] unmap_single_vma+0xa6/0xe0 [ 301.569247] unmap_vmas+0xb5/0x190 [ 301.569532] exit_mmap+0xec/0x340 [ 301.569801] __mmput+0x3e/0x130 [ 301.570051] do_exit+0x305/0xaf0 ... Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Wupeng Ma <mawupeng1@huawei.com> Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines") Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3") Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit f18681daaec9665a15c5e7e0f591aad5d0ac622b) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> |
||
|
ce55bbe342 |
mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
commit 803de9000f334b771afacb6ff3e78622916668b0 upstream. Sven reports an infinite loop in __alloc_pages_slowpath() for costly order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination can happen in a suspend/resume context where a GFP_KERNEL allocation can have __GFP_IO masked out via gfp_allowed_mask. Quoting Sven: 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) with __GFP_RETRY_MAYFAIL set. 2. page alloc's __alloc_pages_slowpath tries to get a page from the freelist. This fails because there is nothing free of that costly order. 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, which bails out because a zone is ready to be compacted; it pretends to have made a single page of progress. 4. page alloc tries to compact, but this always bails out early because __GFP_IO is not set (it's not passed by the snd allocator, and even if it were, we are suspending so the __GFP_IO flag would be cleared anyway). 5. page alloc believes reclaim progress was made (because of the pretense in item 3) and so it checks whether it should retry compaction. The compaction retry logic thinks it should try again, because: a) reclaim is needed because of the early bail-out in item 4 b) a zonelist is suitable for compaction 6. goto 2. indefinite stall. (end quote) The immediate root cause is confusing the COMPACT_SKIPPED returned from __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be indicating a lack of order-0 pages, and in step 5 evaluating that in should_compact_retry() as a reason to retry, before incrementing and limiting the number of retries. There are however other places that wrongly assume that compaction can happen while we lack __GFP_IO. To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO evaluation and switch the open-coded test in try_to_compact_pages() to use it. Also use the new helper in: - compaction_ready(), which will make reclaim not bail out in step 3, so there's at least one attempt to actually reclaim, even if chances are small for a costly order - in_reclaim_compaction() which will make should_continue_reclaim() return false and we don't over-reclaim unnecessarily - in __alloc_pages_slowpath() to set a local variable can_compact, which is then used to avoid retrying reclaim/compaction for costly allocations (step 5) if we can't compact and also to skip the early compaction attempt that we do in some cases Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Sven van Ashbrook <svenva@chromium.org> Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/ Tested-by: Karthikeyan Ramasubramanian <kramasub@chromium.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Curtis Malainey <cujomalainey@chromium.org> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Takashi Iwai <tiwai@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit c82a659cc8bb7a7f8a8348fc7f203c412ae3636f) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> |
||
|
66bf901692 |
mm/migrate: set swap entry values of THP tail pages properly.
The tail pages in a THP can have swap entry information stored in their private field. When migrating to a new page, all tail pages of the new page need to update ->private to avoid future data corruption. This fix is stable-only, since after commit 07e09c483cbe ("mm/huge_memory: work on folio->swap instead of page->private when splitting folio"), subpages of a swapcached THP no longer requires the maintenance. Adding THPs to the swapcache was introduced in commit 38d8b4e6bdc87 ("mm, THP, swap: delay splitting THP during swap out"), where each subpage of a THP added to the swapcache had its own swapcache entry and required the ->private field to point to the correct swapcache entry. Later, when THP migration functionality was implemented in commit 616b8371539a6 ("mm: thp: enable thp migration in generic path"), it initially did not handle the subpages of swapcached THPs, failing to update their ->private fields or replace the subpage pointers in the swapcache. Subsequently, commit e71769ae5260 ("mm: enable thp migration for shmem thp") addressed the swapcache update aspect. This patch fixes the update of subpage ->private fields. Closes: https://lore.kernel.org/linux-mm/1707814102-22682-1-git-send-email-quic_charante@quicinc.com/ Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 9e92cefdaa7537515dc0ff6cc73d46fa31b062fc) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> |
||
|
fd783c9a20 |
mm/memory-failure: fix an incorrect use of tail pages
When backport commit c79c5a0a00a9 to 4.19-stable, there is a mistake change. The head page instead of tail page should be passed to try_to_unmap(), otherwise unmap will failed as follows. Memory failure: 0x121c10: failed to unmap page (mapcount=1) Memory failure: 0x121c10: recovery action for unmapping failed page: Ignored Fixes: c6f50413f2aa ("mm/memory-failure: check the mapcount of the precise page") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 27f83f1cacba82afa4c9697e3ec3abb15e92ec82) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> |
||
|
cead81caaf |
memtest: use {READ,WRITE}_ONCE in memory scanning
[ Upstream commit 82634d7e24271698e50a3ec811e5f50de790a65f ] memtest failed to find bad memory when compiled with clang. So use {WRITE,READ}_ONCE to access memory to avoid compiler over optimization. Link: https://lkml.kernel.org/r/20240312080422.691222-1-qiang4.zhang@intel.com Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 6e7044f155f7756e4489d8ad928f3061eab4595b) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> |
||
|
814150c479 |
zsmalloc: Use copy_page for full page copy
Some architectures have implemented optimized copy_page for full page copying, such as arm. On my arm platform, use the copy_page helper for single page copying is about 10 percent faster than memcpy. Link: https://lkml.kernel.org/r/20231006060245.7411-1-mark-pk.tsai@mediatek.com Change-Id: Ie26f83ffaeeff7415304cab94ecb847606b35953 Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: YJ Chiang <yj.chiang@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
fec6892d1c |
mm: Revert PID optimizations
Change-Id: I61da569fb17d5a5ac5e814bcbeed8013c6e5a7a2 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
a9e2d194be |
Merge branch 'linux-4.14.y' of https://github.com/openela/kernel-lts
* 'linux-4.14.y' of https://github.com/openela/kernel-lts: (350 commits) LTS: Update to 4.14.340 fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table() PCI/MSI: Prevent MSI hardware interrupt number truncation s390: use the correct count for __iowrite64_copy() packet: move from strlcpy with unused retval to strscpy ipv6: sr: fix possible use-after-free and null-ptr-deref nouveau: fix function cast warnings scsi: jazz_esp: Only build if SCSI core is builtin RDMA/srpt: fix function pointer cast warnings RDMA/srpt: Support specifying the srpt_service_guid parameter IB/hfi1: Fix a memleak in init_credit_return usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs l2tp: pass correct message length to ip6_append_data gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp() dm-crypt: don't modify the data when using authenticated encryption mm: memcontrol: switch to rcu protection in drain_all_stock() s390/qeth: Fix potential loss of L3-IP@ in case of network issues virtio-blk: Ensure no requests in virtqueues before deleting vqs. firewire: core: send bus reset promptly on gap count error ... Change-Id: Ieafdd459ee41343bf15ed781b3e45adc2be29cc1 Signed-off-by: Richard Raya <rdxzv.dev@gmail.com> |
||
|
5cf1aceb57 |
mm: memcontrol: switch to rcu protection in drain_all_stock()
commit e1a366be5cb4f849ec4de170d50eebc08bb0af20 upstream. Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge") introduced css_tryget()/css_put() calls in drain_all_stock(), which are supposed to protect the target memory cgroup from being released during the mem_cgroup_is_descendant() call. However, it's not completely safe. In theory, memcg can go away between reading stock->cached pointer and calling css_tryget(). This can happen if drain_all_stock() races with drain_local_stock() performed on the remote cpu as a result of a work, scheduled by the previous invocation of drain_all_stock(). The race is a bit theoretical and there are few chances to trigger it, but the current code looks a bit confusing, so it makes sense to fix it anyway. The code looks like as if css_tryget() and css_put() are used to protect stocks drainage. It's not necessary because stocked pages are holding references to the cached cgroup. And it obviously won't work for works, scheduled on other cpus. So, let's read the stock->cached pointer and evaluate the memory cgroup inside a rcu read section, and get rid of css_tryget()/css_put() calls. Link: http://lkml.kernel.org/r/20190802192241.3253165-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: cdec2e4265df ("memcg: coalesce charging via percpu storage") Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 9b78faee4829e8d4bc88f59aa125e219ad834003) Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> |
||
|
45dea6f77d |
memcg: add refcnt for pcpu stock to avoid UAF problem in drain_all_stock()
commit 1a3e1f40962c445b997151a542314f3c6097f8c3 upstream. NOTE: This is a partial backport since we only need the refcnt between memcg and stock to fix the problem stated below, and in this way multiple versions use the same code and align with each other. There was a kernel panic happened on an in-house environment running 3.10, and the same problem was reproduced on 4.19: general protection fault: 0000 [#1] SMP PTI CPU: 1 PID: 2085 Comm: bash Kdump: loaded Tainted: G L 4.19.90+ #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010 drain_all_stock+0xad/0x140 Code: 00 00 4d 85 ff 74 2c 45 85 c9 74 27 4d 39 fc 74 42 41 80 bc 24 28 04 00 00 00 74 17 49 8b 04 24 49 8b 17 48 8b 88 90 02 00 00 <48> 39 8a 90 02 00 00 74 02 eb 86 48 63 88 3c 01 00 00 39 8a 3c 01 RSP: 0018:ffffa7efc5813d70 EFLAGS: 00010202 RAX: ffff8cb185548800 RBX: ffff8cb89f420160 RCX: ffff8cb1867b6000 RDX: babababababababa RSI: 0000000000000001 RDI: 0000000000231876 RBP: 0000000000000000 R08: 0000000000000415 R09: 0000000000000002 R10: 0000000000000000 R11: 0000000000000001 R12: ffff8cb186f89040 R13: 0000000000020160 R14: 0000000000000001 R15: ffff8cb186b27040 FS: 00007f4a308d3740(0000) GS:ffff8cb89f440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffe4d634a68 CR3: 000000010b022000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mem_cgroup_force_empty_write+0x31/0xb0 cgroup_file_write+0x60/0x140 ? __check_object_size+0x136/0x147 kernfs_fop_write+0x10e/0x190 __vfs_write+0x37/0x1b0 ? selinux_file_permission+0xe8/0x130 ? security_file_permission+0x2e/0xb0 vfs_write+0xb6/0x1a0 ksys_write+0x57/0xd0 do_syscall_64+0x63/0x250 ? async_page_fault+0x8/0x30 entry_SYSCALL_64_after_hwframe+0x5c/0xc1 Modules linked in: ... It is found that in case of stock->nr_pages == 0, the memcg on stock->cached could be freed due to its refcnt decreased to 0, which made stock->cached become a dangling pointer. It could cause a UAF problem in drain_all_stock() in the following concurrent scenario. Note that drain_all_stock() doesn't disable irq but only preemption. CPU1 CPU2 ============================================================================== stock->cached = memcgA (freed) drain_all_stock(memcgB) rcu_read_lock() memcg = CPU1's stock->cached (memcgA) (interrupted) refill_stock(memcgC) drain_stock(memcgA) stock->cached = memcgC stock->nr_pages += xxx (> 0) stock->nr_pages > 0 mem_cgroup_is_descendant(memcgA, memcgB) [UAF] rcu_read_unlock() This problem is, unintentionally, fixed at 5.9, where commit 1a3e1f40962c ("mm: memcontrol: decouple reference counting from page accounting") adds memcg refcnt for stock. Therefore affected LTS versions include 4.19 and 5.4. For 4.19, memcg's css offline process doesn't call drain_all_stock(). so it's easier for the released memcg to be left on the stock. For 5.4, although mem_cgroup_css_offline() does call drain_all_stock(), but the flushing could be skipped when stock->nr_pages happens to be 0, and besides the async draining could be delayed and take place after the UAF problem has happened. Fix this problem by adding (and decreasing) memcg's refcnt when memcg is put onto (and removed from) stock, just like how commit 1a3e1f40962c ("mm: memcontrol: decouple reference counting from page accounting") does. After all, "being on the stock" is a kind of reference with regards to memcg. As such, it's guaranteed that a css on stock would not be freed. It's good to mention that refill_stock() is executed in an irq-disabled context, so the drain_stock() patched with css_put() would not actually free memcgA until the end of refill_stock(), since css_put() is an RCU free and it's still in grace period. For CPU2, the access to CPU1's stock->cached is protected by rcu_read_lock(), so in this case it gets either NULL from stock->cached or a memcgA that is still good. Cc: stable@vger.kernel.org # 4.19 5.4 Fixes: cdec2e4265df ("memcg: coalesce charging via percpu storage") Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 9e46a20397f443d02d6c6f1a72077370e8cbc8da) Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> |
||
|
deb218b841 |
mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78 upstream. (struct dirty_throttle_control *)->thresh is an unsigned long, but is passed as the u32 divisor argument to div_u64(). On architectures where unsigned long is 64 bytes, the argument will be implicitly truncated. Use div64_u64() instead of div_u64() so that the value used in the "is this a safe division" check is the same as the divisor. Also, remove redundant cast of the numerator to u64, as that should happen implicitly. This would be difficult to exploit in memcg domain, given the ratio-based arithmetic domain_drity_limits() uses, but is much easier in global writeback domain with a BDI_CAP_STRICTLIMIT-backing device, using e.g. vm.dirty_bytes=(1<<32)*PAGE_SIZE so that dtc->thresh == (1<<32) Link: https://lkml.kernel.org/r/20240118181954.1415197-1-zokeefe@google.com Fixes: f6789593d5ce ("mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()") Signed-off-by: Zach O'Keefe <zokeefe@google.com> Cc: Maxim Patlasov <MPatlasov@parallels.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit c593d26fb5d577ef31b6e49a31e08ae3ebc1bc1e) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> |
||
|
e5e8870a91 |
mm: fix unmap_mapping_range high bits shift bug
commit 9eab0421fa94a3dde0d1f7e36ab3294fc306c99d upstream. The bug happens when highest bit of holebegin is 1, suppose holebegin is 0x8000000111111000, after shift, hba would be 0xfff8000000111111, then vma_interval_tree_foreach would look it up fail or leads to the wrong result. error call seq e.g.: - mmap(..., offset=0x8000000111111000) |- syscall(mmap, ... unsigned long, off): |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT); here pgoff is correctly shifted to 0x8000000111111, but pass 0x8000000111111000 as holebegin to unmap would then cause terrible result, as shown below: - unmap_mapping_range(..., loff_t const holebegin) |- pgoff_t hba = holebegin >> PAGE_SHIFT; /* hba = 0xfff8000000111111 unexpectedly */ The issue happens in Heterogeneous computing, where the device(e.g. gpu) and host share the same virtual address space. A simple workflow pattern which hit the issue is: /* host */ 1. userspace first mmap a file backed VA range with specified offset. e.g. (offset=0x800..., mmap return: va_a) 2. write some data to the corresponding sys page e.g. (va_a = 0xAABB) /* device */ 3. gpu workload touches VA, triggers gpu fault and notify the host. /* host */ 4. reviced gpu fault notification, then it will: 4.1 unmap host pages and also takes care of cpu tlb (use unmap_mapping_range with offset=0x800...) 4.2 migrate sys page to device 4.3 setup device page table and resolve device fault. /* device */ 5. gpu workload continued, it accessed va_a and got 0xAABB. 6. gpu workload continued, it wrote 0xBBCC to va_a. /* host */ 7. userspace access va_a, as expected, it will: 7.1 trigger cpu vm fault. 7.2 driver handling fault to migrate gpu local page to host. 8. userspace then could correctly get 0xBBCC from va_a 9. done But in step 4.1, if we hit the bug this patch mentioned, then userspace would never trigger cpu fault, and still get the old value: 0xAABB. Making holebegin unsigned first fixes the bug. Link: https://lkml.kernel.org/r/20231220052839.26970-1-jiajun.xie.sh@gmail.com Signed-off-by: Jiajun Xie <jiajun.xie.sh@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 2db1c46c3913b8bc92fed235a344de2671fe9d8d) [conflict: cleanup commit 977fbdcd5986 ("mm: add unmap_mapping_pages()") is not in this branch] Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> |
||
|
ff510bc907 |
mm/memory-failure: check the mapcount of the precise page
[ Upstream commit c79c5a0a00a9457718056b588f312baadf44e471 ] A process may map only some of the pages in a folio, and might be missed if it maps the poisoned page but not the head page. Or it might be unnecessarily hit if it maps the head page, but not the poisoned page. Link: https://lkml.kernel.org/r/20231218135837.3310403-3-willy@infradead.org Fixes: 7af446a841a2 ("HWPOISON, hugetlb: enable error handling path for hugepage") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit c6f50413f2aacc919b5de443aa080b94f5ebb21d) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> |
||
|
9cdc78c354 |
Merge branch 'android-4.14-stable' of https://android.googlesource.com/kernel/common
* 'android-4.14-stable' of https://android.googlesource.com/kernel/common: (2966 commits) Linux 4.14.331 net: sched: fix race condition in qdisc_graft() scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids ext4: remove gdb backup copy for meta bg in setup_new_flex_group_blocks ext4: correct return value of ext4_convert_meta_bg ext4: correct offset of gdb backup in non meta_bg group to update_backups ext4: apply umask if ACL support is disabled media: venus: hfi: fix the check to handle session buffer requirement media: sharp: fix sharp encoding i2c: i801: fix potential race in i801_block_transaction_byte_by_byte net: dsa: lan9303: consequently nested-lock physical MDIO ALSA: info: Fix potential deadlock at disconnection parisc/pgtable: Do not drop upper 5 address bits of physical address parisc: Prevent booting 64-bit kernels on PA1.x machines mcb: fix error handling for different scenarios when parsing jbd2: fix potential data lost in recovering journal raced with synchronizing fs bdev genirq/generic_chip: Make irq_remove_generic_chip() irqdomain aware mmc: meson-gx: Remove setting of CMD_CFG_ERROR PM: hibernate: Clean up sync_read handling in snapshot_write_next() PM: hibernate: Use __get_safe_page() rather than touching the list ... Change-Id: I755d2aa7c525ace28adc4aee433572b3110ea39b |
||
|
79840c9d70 |
Merge tag 'LA.UM.9.1.r1-14600-SMxxx0.QSSI14.0' of https://git.codelinaro.org/clo/la/kernel/msm-4.14
"LA.UM.9.1.r1-14600-SMxxx0.QSSI14.0" * tag 'LA.UM.9.1.r1-14600-SMxxx0.QSSI14.0' of https://git.codelinaro.org/clo/la/kernel/msm-4.14: (103 commits) msm: npu: Fix use after free issue iommu: Fix missing return check of arm_lpae_init_pte msm: kgsl: Prevent wrap around during user address mapping iommu: Fix missing return check of arm_lpae_init_pte UPSTREAM: security: selinux: allow per-file labeling for bpffs UPSTREAM: security: selinux: allow per-file labeling for bpffs arm: configs: Enable QCOM_SHOW_RESUME_IRQ module for mdm9607 Revert "irqchip/gic-v2: implement suspend and resume" exec: Force single empty string when argv is empty bus: mhi: misc: Add check for dev_rp if it is iommu range or not BACKPORT: FROMLIST: mm: protect free_pgtables with mmap_lock write lock in exit_mmap bus: mhi: misc: Add check for dev_rp if it is iommu range or not mdm: dataipa: increase the size of prefetch buffer msm: ais: core: validation of session/device/link handle soc: qcom: minidump: check the size parameter passed to qcom_smem_get() msm: camera: core: validation of session/device/link handle qcedev: vote for crypto clocks during module close msm: ais: smmu: Use get_file to increase ref count pinctrl: qcom: Using readl_relaxed/writel_relaxed APIs net: qrtr: Add bounds check in rx path ... Change-Id: Ia2603d18afb240a1fcdce609944dd4038c988dbf |
||
|
111fe1c861 |
mm: Always indicate OOM kill progress when Simple LMK is enabled
When Simple LMK is enabled, the page allocator slowpath always thinks that no OOM kill progress is made because out_of_memory() returns false. As a result, spurious page allocation failures are observed when memory is low and Simple LMK is killing tasks, simply because the page allocator slowpath doesn't think that any OOM killing is taking place. Fix this by simply making out_of_memory() always return true when Simple LMK is enabled. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Change-Id: Ib91e593e67b8d155bb8c1a1de807b524f9348d61 |
||
|
fce78edbb4 |
This is the 4.14.322 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmTWAT4ACgkQONu9yGCS aT6kKxAA00HDcoEbS4CpQxK1ggeeW6xMFqPHHwUz62ScZPR1zcrR4ag5UrKOQALF cCQwt2nVBMUXciiQd3gY+MciAYPRVIXLMK9QqQEJSBZ+2p8zY3nb/HbM6o8iKQeV xIhUneiyHtbOyTo3oQcyET7ngwxtDp9uEnd+8I+sSbGi8Wyh8Z8L2daVQTrke1Js QIe3wDQsUj0pEDhRfYx29JKeQ8fBOfZlxtFEsdHvGgP/4j2EXGwyMVnt3/DVuwM8 5/b/SML0skSh8YM9JfMQwpYpR+MAFGyyYKoF2pGu1trvyoh2Jd3TYuYcNqjwIywg W+ODGmULcYUYPBzUMdvrefwpn4l/2qpPCJ8FHB80h+4Jmy6PMN7lm1YnMBeQK4GP ACLr2BzJ4Tp5LavWZpTpqdRlC039aSZqY+7K+H/eoNstwZMU3hKc3Kn2KrPss0pp K0M7+8oukTnSiFNgIXVJOsr+kN1nNvtQmqCVRWlrn2cQckdDf8pVkPl/QtC3ZtWf aI8xYr6UpAr0z1elK5p9lO6N0R8FLwVmDG7B4b/6nLbWtRSt53ay/nMAzebodpn1 8r+6ZoXO5LedNJsUOMJqE58X0ywbUgcx8mfkuRS8PLXEk7yI4+PR7DCeWyZ/YdVX dUqaYIK0yYx9yXAkMaSdrnMs+OSqa6lK9c9juPDvFox+ngLAjNk= =67ef -----END PGP SIGNATURE----- Merge 4.14.322 into android-4.14-stable Changes in 4.14.322 gfs2: Don't deref jdesc in evict x86/microcode/AMD: Load late on both threads too x86/smp: Use dedicated cache-line for mwait_play_dead() fbdev: imsttfb: Fix use after free bug in imsttfb_probe drm/edid: Fix uninitialized variable in drm_cvt_modes() scripts/tags.sh: Resolve gtags empty index generation drm/amdgpu: Validate VM ioctl flags. treewide: Remove uninitialized_var() usage md/raid10: fix overflow of md/safe_mode_delay md/raid10: fix wrong setting of max_corr_read_errors md/raid10: fix io loss while replacement replace rdev PM: domains: fix integer overflow issues in genpd_parse_state() evm: Complete description of evm_inode_setattr() wifi: ath9k: fix AR9003 mac hardware hang check register offset calculation wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx wifi: orinoco: Fix an error handling path in spectrum_cs_probe() wifi: orinoco: Fix an error handling path in orinoco_cs_probe() wifi: atmel: Fix an error handling path in atmel_probe() wifi: wl3501_cs: Fix an error handling path in wl3501_probe() wifi: ray_cs: Fix an error handling path in ray_probe() wifi: ath9k: don't allow to overwrite ENDPOINT0 attributes watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct config watchdog/perf: more properly prevent false positives with turbo modes kexec: fix a memory leak in crash_shrink_memory() memstick r592: make memstick_debug_get_tpc_name() static wifi: ath9k: Fix possible stall on ath9k_txq_list_has_key() wifi: ath9k: convert msecs to jiffies where needed netlink: fix potential deadlock in netlink_set_err() netlink: do not hard code device address lenth in fdb dumps gtp: Fix use-after-free in __gtp_encap_destroy(). lib/ts_bm: reset initial match offset for every block of text netfilter: nf_conntrack_sip: fix the ct_sip_parse_numerical_param() return value. netlink: Add __sock_i_ino() for __netlink_diag_dump(). radeon: avoid double free in ci_dpm_init() Input: drv260x - sleep between polling GO bit ARM: dts: BCM5301X: Drop "clock-names" from the SPI node Input: adxl34x - do not hardcode interrupt trigger type drm/panel: simple: fix active size for Ampire AM-480272H3TMQW-T01H ARM: ep93xx: fix missing-prototype warnings ASoC: es8316: Increment max value for ALC Capture Target Volume control soc/fsl/qe: fix usb.c build errors fbdev: omapfb: lcd_mipid: Fix an error handling path in mipid_spi_probe() drm/radeon: fix possible division-by-zero errors ALSA: ac97: Fix possible NULL dereference in snd_ac97_mixer scsi: 3w-xxxx: Add error handling for initialization failure in tw_probe() PCI: Add pci_clear_master() stub for non-CONFIG_PCI pinctrl: cherryview: Return correct value if pin in push-pull mode perf dwarf-aux: Fix off-by-one in die_get_varname() pinctrl: at91-pio4: check return value of devm_kasprintf() crypto: nx - fix build warnings when DEBUG_FS is not enabled modpost: fix section mismatch message for R_ARM_ABS32 modpost: fix section mismatch message for R_ARM_{PC24,CALL,JUMP24} modpost: fix off by one in is_executable_section() USB: serial: option: add LARA-R6 01B PIDs block: change all __u32 annotations to __be32 in affs_hardblocks.h w1: fix loop in w1_fini() sh: j2: Use ioremap() to translate device tree address into kernel memory media: usb: Check az6007_read() return value media: videodev2.h: Fix struct v4l2_input tuner index comment media: usb: siano: Fix warning due to null work_func_t function pointer extcon: Fix kernel doc of property fields to avoid warnings extcon: Fix kernel doc of property capability fields to avoid warnings usb: phy: phy-tahvo: fix memory leak in tahvo_usb_probe() mfd: rt5033: Drop rt5033-battery sub-device mfd: intel-lpss: Add missing check for platform_get_resource mfd: stmpe: Only disable the regulators if they are enabled rtc: st-lpc: Release some resources in st_rtc_probe() in case of error sctp: fix potential deadlock on &net->sctp.addr_wq_lock Add MODULE_FIRMWARE() for FIRMWARE_TG357766. spi: bcm-qspi: return error if neither hif_mspi nor mspi is available mailbox: ti-msgmgr: Fill non-message tx data fields with 0x0 powerpc: allow PPC_EARLY_DEBUG_CPM only when SERIAL_CPM=y net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode tcp: annotate data races in __tcp_oow_rate_limited() net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX sh: dma: Fix DMA channel offset calculation NFSD: add encoding of op_recall flag for write delegation mmc: core: disable TRIM on Kingston EMMC04G-M627 mmc: core: disable TRIM on Micron MTFC4GACAJCN-1M integrity: Fix possible multiple allocation in integrity_inode_get() jffs2: reduce stack usage in jffs2_build_xattr_subsystem() btrfs: fix race when deleting quota root from the dirty cow roots list ARM: orion5x: fix d2net gpio initialization spi: spi-fsl-spi: remove always-true conditional in fsl_spi_do_one_msg spi: spi-fsl-spi: relax message sanity checking a little spi: spi-fsl-spi: allow changing bits_per_word while CS is still active netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain netfilter: nf_tables: unbind non-anonymous set if rule construction fails netfilter: conntrack: Avoid nf_ct_helper_hash uses after free netfilter: nf_tables: prevent OOB access in nft_byteorder_eval workqueue: clean up WORK_* constant types, clarify masking net: mvneta: fix txq_map in case of txq_number==1 udp6: fix udp6_ehashfn() typo ntb: idt: Fix error handling in idt_pci_driver_init() NTB: amd: Fix error handling in amd_ntb_pci_driver_init() ntb: intel: Fix error handling in intel_ntb_pci_driver_init() NTB: ntb_transport: fix possible memory leak while device_register() fails ipv6/addrconf: fix a potential refcount underflow for idev wifi: airo: avoid uninitialized warning in airo_get_rate() net/sched: make psched_mtu() RTNL-less safe tpm: tpm_vtpm_proxy: fix a race condition in /dev/vtpmx creation SUNRPC: Fix UAF in svc_tcp_listen_data_ready() perf intel-pt: Fix CYC timestamps after standalone CBR ext4: fix wrong unit use in ext4_mb_clear_bb ext4: only update i_reserved_data_blocks on successful block allocation jfs: jfs_dmap: Validate db_l2nbperpage while mounting PCI: Add function 1 DMA alias quirk for Marvell 88SE9235 misc: pci_endpoint_test: Re-init completion for every test md/raid0: add discard support for the 'original' layout fs: dlm: return positive pid value for F_GETLK hwrng: imx-rngc - fix the timeout for init and self check meson saradc: fix clock divider mask length Revert "8250: add support for ASIX devices with a FIFO bug" tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() in case of error tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() when iterating clk ring-buffer: Fix deadloop issue on reading trace_pipe xtensa: ISS: fix call to split_if_spec scsi: qla2xxx: Wait for io return on terminate rport scsi: qla2xxx: Fix potential NULL pointer dereference scsi: qla2xxx: Check valid rport returned by fc_bsg_to_rport() scsi: qla2xxx: Pointer may be dereferenced serial: atmel: don't enable IRQs prematurely perf probe: Add test for regression introduced by switch to die_get_decl_file() fuse: revalidate: don't invalidate if interrupted can: bcm: Fix UAF in bcm_proc_show() ext4: correct inline offset when handling xattrs in inode body debugobjects: Recheck debug_objects_enabled before reporting nbd: Add the maximum limit of allocated index in nbd_dev_add md: fix data corruption for raid456 when reshape restart while grow up md/raid10: prevent soft lockup while flush writes posix-timers: Ensure timer ID search-loop limit is valid sched/fair: Don't balance task to its current running CPU bpf: Address KCSAN report on bpf_lru_list wifi: wext-core: Fix -Wstringop-overflow warning in ioctl_standard_iw_point() igb: Fix igb_down hung on surprise removal spi: bcm63xx: fix max prepend length fbdev: imxfb: warn about invalid left/right margin pinctrl: amd: Use amd_pinconf_set() for all config options net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field() fbdev: au1200fb: Fix missing IRQ check in au1200fb_drv_probe llc: Don't drop packet from non-root netns. netfilter: nf_tables: fix spurious set element insertion failure tcp: annotate data-races around rskq_defer_accept tcp: annotate data-races around tp->notsent_lowat tcp: annotate data-races around fastopenq.max_qlen gpio: tps68470: Make tps68470_gpio_output() always set the initial value i40e: Fix an NULL vs IS_ERR() bug for debugfs_create_dir() ethernet: atheros: fix return value check in atl1e_tso_csum() ipv6 addrconf: fix bug where deleting a mngtmpaddr can create a new temporary address tcp: Reduce chance of collisions in inet6_hashfn(). bonding: reset bond's flags when down link is P2P device team: reset team's flags when down link is P2P device platform/x86: msi-laptop: Fix rfkill out-of-sync on MSI Wind U100 benet: fix return value check in be_lancer_xmit_workarounds() ASoC: fsl_spdif: Silence output on stop block: Fix a source code comment in include/uapi/linux/blkzoned.h dm raid: fix missing reconfig_mutex unlock in raid_ctr() error paths ata: pata_ns87415: mark ns87560_tf_read static ring-buffer: Fix wrong stat of cpu_buffer->read tracing: Fix warning in trace_buffered_event_disable() USB: serial: option: support Quectel EM060K_128 USB: serial: option: add Quectel EC200A module support USB: serial: simple: add Kaufmann RKS+CAN VCP USB: serial: simple: sort driver entries can: gs_usb: gs_can_close(): add missing set of CAN state to CAN_STATE_STOPPED usb: ohci-at91: Fix the unhandle interrupt when resume usb: xhci-mtk: set the dma max_seg_size Documentation: security-bugs.rst: update preferences when dealing with the linux-distros group staging: ks7010: potential buffer overflow in ks_wlan_set_encode_ext() hwmon: (nct7802) Fix for temp6 (PECI1) processed even if PECI1 disabled tpm_tis: Explicitly check for error code irq-bcm6345-l1: Do not assume a fixed block to cpu mapping s390/dasd: fix hanging device after quiesce/resume ASoC: wm8904: Fill the cache for WM8904_ADC_TEST_0 register dm cache policy smq: ensure IO doesn't prevent cleaner policy progress drm/client: Fix memory leak in drm_client_target_cloned net/sched: cls_fw: Fix improper refcount update leads to use-after-free net/sched: sch_qfq: account for stab overhead in qfq_enqueue net/sched: cls_u32: Fix reference counter leak leading to overflow perf: Fix function pointer case word-at-a-time: use the same return type for has_zero regardless of endianness net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer() perf test uprobe_from_different_cu: Skip if there is no gcc net: add missing data-race annotations around sk->sk_peek_off net: add missing data-race annotation for sk_ll_usec net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free ip6mr: Fix skb_under_panic in ip6mr_cache_report() tcp_metrics: fix addr_same() helper tcp_metrics: annotate data-races around tm->tcpm_stamp tcp_metrics: annotate data-races around tm->tcpm_lock tcp_metrics: annotate data-races around tm->tcpm_vals[] tcp_metrics: annotate data-races around tm->tcpm_net tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen loop: Select I/O scheduler 'none' from inside add_disk() libceph: fix potential hang in ceph_osdc_notify() USB: zaurus: Add ID for A-300/B-500/C-700 fs/sysv: Null check to prevent null-ptr-deref bug Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb ext2: Drop fragment support test_firmware: fix a memory leak with reqs buffer mtd: rawnand: omap_elm: Fix incorrect type in assignment drm/edid: fix objtool warning in drm_cvt_modes() Linux 4.14.322 Change-Id: Ia25c00bd23a112b634b83577ec7d54569e8b7c70 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
d68627697d |
treewide: Remove uninitialized_var() usage
commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream. Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
5d0de97b15 |
BACKPORT: FROMLIST: mm: protect free_pgtables with mmap_lock write lock in exit_mmap
oom-reaper and process_mrelease system call should protect against races with exit_mmap which can destroy page tables while they walk the VMA tree. oom-reaper protects from that race by setting MMF_OOM_VICTIM and by relying on exit_mmap to set MMF_OOM_SKIP before taking and releasing mmap_write_lock. process_mrelease has to elevate mm->mm_users to prevent such race. Both oom-reaper and process_mrelease hold mmap_read_lock when walking the VMA tree. The locking rules and mechanisms could be simpler if exit_mmap takes mmap_write_lock while executing destructive operations such as free_pgtables. Change exit_mmap to hold the mmap_write_lock when calling free_pgtables. Operations like unmap_vmas() and unlock_range() are not destructive and could run under mmap_read_lock but for simplicity we take one mmap_write_lock during almost the entire operation. Note also that because oom-reaper checks VM_LOCKED flag, unlock_range() should not be allowed to race with it. In most cases this lock should be uncontended. Previously, Kirill reported ~4% regression caused by a similar change [1]. We reran the same test and although the individual results are quite noisy, the percentiles show lower regression with 1.6% being the worst case [2]. The change allows oom-reaper and process_mrelease to execute safely under mmap_read_lock without worries that exit_mmap might destroy page tables from under them. [1] https://lore.kernel.org/all/20170725141723.ivukwhddk2voyhuc@node.shutemov.name/ [2] https://lore.kernel.org/all/CAJuCfpGC9-c9P40x7oy=jy5SphMcd0o0G_6U1-+JAziGKG6dGA@mail.gmail.com/ Signed-off-by: Suren Baghdasaryan <surenb@google.com> Link: https://lore.kernel.org/all/20211124235906.14437-1-surenb@google.com/ Bug: 130172058 Bug: 189803002 Change-Id: Ic87272d09a0b68a1b0e968e8f1a1510fd6fc776a Git-commit: 28358ebf2adb31117893813992fefcfd359a6a16 Git-repo: https://android.googlesource.com/kernel/common/ [quic_gkohli@quicinc.com: Resolved cherry-pick conflict in mm/mmap.c due to mmap lock was implemented differently in older kernel, and Although process_mrelease is not applicable in older kernel, but this patch is required to take exclusive lock in exit_mmap path so that SPF knows an isolated vma was freed from this path] Signed-off-by: Gaurav Kohli <quic_gkohli@quicinc.com> |
||
|
0efbe093b6 |
This is the 4.14.315 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmRkmmMACgkQONu9yGCS aT5S/g/+LHkUcwpnnPu5llymtK0jd/0WvwWUJfJAOlGpa3l9CkoPtjHzNwtagoFR 2+woN7zhC7UteTz20/RXMFtNv7zFOMA91nsVSmYp4Cc997XpILeTkzpQMzoCm8Qt YFMpKEX0op6sAR+NUJ5Vaj/HaFBvO9J2ZMGGrxeUKVPAAgRk3AdvTGfHFwzXlmfb AKVo9jhG7NszYeLYIHRONMDJRyiBLJXrLSLfn+u+uKKRjNnBqJJEDQu3zYt6kavy M/8CE6QgOoCAcbyTIgVw9ZU51ydWfbKiEnMpEwPAEHy6C4xrYfMnWqF8LDjkSNCL xsNYbAyaPh/MdJoLGdTcuRSp58xP5dNT366xShN78RLqbeKPfg0nZCHMDWnC4BZP ET+zAwiueaf64Hu3NWHq8IC74EhgM8ZCzLiVb9CqCyllcVCT2xjdRE8eJtXz5Vgq ahsuJmvzGdSIkX6HFh8QKpWdoeRSPbOol+/xD/0fPFf97EiAvMZX5kLgfI+o0rGj 6fZuENIECp/WHiIqHJ2bsGb69M/OeJfoISxUUVFrCnGduXA59Gnj9zKftNHyNMQZ GCu2yHYkkM50RRw9xSO/286Z3mbz84fFRc8PKwWzu7veghuPXYOOKaA4Eleaw/Oy Sx92e2OTKjQVGKadHT4HfTd1xabks/9qLGBpx20GuRsfhHt/yJo= =ef7P -----END PGP SIGNATURE----- Merge 4.14.315 into android-4.14-stable Changes in 4.14.315 wifi: brcmfmac: slab-out-of-bounds read in brcmf_get_assoc_ies() bluetooth: Perform careful capability checks in hci_sock_ioctl() USB: serial: option: add UNISOC vendor and TOZED LT70C product iio: adc: palmas_gpadc: fix NULL dereference on rmmod USB: dwc3: fix runtime pm imbalance on unbind perf sched: Cast PTHREAD_STACK_MIN to int as it may turn into sysconf(__SC_THREAD_STACK_MIN_VALUE) staging: iio: resolver: ads1210: fix config mode MIPS: fw: Allow firmware to pass a empty env ring-buffer: Sync IRQ works before buffer destruction reiserfs: Add security prefix to xattr name in reiserfs_security_write() i2c: omap: Fix standard mode false ACK readings Revert "ubifs: dirty_cow_znode: Fix memleak in error handling path" ubi: Fix return value overwrite issue in try_write_vid_and_data() ubifs: Free memory for tmpfile name selinux: fix Makefile dependencies of flask.h selinux: ensure av_permissions.h is built when needed drm/rockchip: Drop unbalanced obj unref drm/vgem: add missing mutex_destroy drm/probe-helper: Cancel previous job before starting new one media: bdisp: Add missing check for create_workqueue media: av7110: prevent underflow in write_ts_to_decoder() x86/apic: Fix atomic update of offset in reserve_eilvt_offset() media: dm1105: Fix use after free bug in dm1105_remove due to race condition x86/ioapic: Don't return 0 from arch_dynirq_lower_bound() arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step wifi: ath6kl: minor fix for allocation size wifi: ath5k: fix an off by one check in ath5k_eeprom_read_freq_list() wifi: ath6kl: reduce WARN to dev_dbg() in callback scm: fix MSG_CTRUNC setting condition for SO_PASSSEC vlan: partially enable SIOCSHWTSTAMP in container net/packet: convert po->origdev to an atomic flag net/packet: convert po->auxdata to an atomic flag scsi: target: iscsit: Fix TAS handling during conn cleanup scsi: megaraid: Fix mega_cmd_done() CMDID_INT_CMDS md/raid10: fix leak of 'r10bio->remaining' for recovery wifi: iwlwifi: make the loop for card preparation effective wifi: iwlwifi: mvm: check firmware response size ixgbe: Allow flow hash to be set via ethtool ixgbe: Enable setting RSS table to default values ipv4: Fix potential uninit variable access bug in __ip_make_skb() Revert "Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work" net: amd: Fix link leak when verifying config failed tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp. pstore: Revert pmsg_lock back to a normal mutex linux/vt_buffer.h: allow either builtin or modular for macros spi: fsl-spi: Fix CPM/QE mode Litte Endian of: Fix modalias string generation ia64: mm/contig: fix section mismatch warning/error uapi/linux/const.h: prefer ISO-friendly __typeof__ sh: sq: Fix incorrect element size for allocating bitmap buffer usb: chipidea: fix missing goto in `ci_hdrc_probe` tty: serial: fsl_lpuart: adjust buffer length to the intended size serial: 8250: Add missing wakeup event reporting staging: rtl8192e: Fix W_DISABLE# does not work after stop/start spmi: Add a check for remove callback when removing a SPMI driver macintosh/windfarm_smu_sat: Add missing of_node_put() powerpc/mpc512x: fix resource printk format warning powerpc/wii: fix resource printk format warnings powerpc/sysdev/tsi108: fix resource printk format warnings macintosh: via-pmu-led: requires ATA to be set powerpc/rtas: use memmove for potentially overlapping buffer copy perf/core: Fix hardlockup failure caused by perf throttle RDMA/rdmavt: Delete unnecessary NULL check power: supply: generic-adc-battery: fix unit scaling clk: add missing of_node_put() in "assigned-clocks" property parsing IB/hfi1: Fix SDMA mmu_rb_node not being evicted in LRU order NFSv4.1: Always send a RECLAIM_COMPLETE after establishing lease SUNRPC: remove the maximum number of retries in call_bind_status phy: tegra: xusb: Add missing tegra_xusb_port_unregister for usb2_port and ulpi_port dmaengine: at_xdmac: do not enable all cyclic channels parisc: Fix argument pointer in real64_call_asm() nilfs2: do not write dirty data after degenerating to read-only nilfs2: fix infinite loop in nilfs_mdt_get_block() wifi: rtl8xxxu: RTL8192EU always needs full init clk: rockchip: rk3399: allow clk_cifout to force clk_cifout_src to reparent btrfs: scrub: reject unsupported scrub flags s390/dasd: fix hanging blockdevice after request requeue dm integrity: call kmem_cache_destroy() in dm_integrity_init() error path dm flakey: fix a crash with invalid table line dm ioctl: fix nested locking in table_clear() to remove deadlock concern perf auxtrace: Fix address filter entire kernel size netfilter: nf_tables: split set destruction in deactivate and destroy phase netfilter: nf_tables: unbind set in rule from commit path netfilter: nft_hash: fix nft_hash_deactivate netfilter: nf_tables: use-after-free in failing rule with bound set netfilter: nf_tables: bogus EBUSY when deleting set after flush netfilter: nf_tables: deactivate anonymous set from preparation phase sit: update dev->needed_headroom in ipip6_tunnel_bind_dev() writeback: fix call of incorrect macro net/sched: act_mirred: Add carrier check af_packet: Don't send zero-byte data in packet_sendmsg_spkt(). ALSA: caiaq: input: Add error handling for unsupported input methods in `snd_usb_caiaq_input_init` perf vendor events power9: Remove UTF-8 characters from JSON files perf map: Delete two variable initialisations before null pointer checks in sort__sym_from_cmp() perf symbols: Fix return incorrect build_id size in elf_read_build_id() btrfs: fix btrfs_prev_leaf() to not return the same key twice btrfs: print-tree: parent bytenr must be aligned to sector size cifs: fix pcchunk length type in smb2_copychunk_range sh: math-emu: fix macro redefined warning sh: nmi_debug: fix return value of __setup handler ARM: dts: exynos: fix WM8960 clock name in Itop Elite ARM: dts: s5pv210: correct MIPI CSIS clock name HID: wacom: Set a default resolution for older tablets ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum ext4: improve error recovery code paths in __ext4_remount() ext4: add bounds checking in get_max_inline_xattr_value_size() ext4: bail out of ext4_xattr_ibody_get() fails for any reason ext4: remove a BUG_ON in ext4_mb_release_group_pa() ext4: fix invalid free tracking in ext4_xattr_move_to_block() perf bench: Share some global variables to fix build with gcc 10 tty: Prevent writing chars during tcsetattr TCSADRAIN/FLUSH serial: 8250: Fix serial8250_tx_empty() race with DMA Tx drbd: correctly submit flush bio on barrier printk: declare printk_deferred_{enter,safe}() in include/linux/printk.h mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock Linux 4.14.315 Change-Id: I7e3fda05118b08edc995f33280f9eec1f563b951 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
63c79247fe |
mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
commit 1007843a91909a4995ee78a538f62d8665705b66 upstream. syzbot is reporting circular locking dependency which involves zonelist_update_seq seqlock [1], for this lock is checked by memory allocation requests which do not need to be retried. One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler. CPU0 ---- __build_all_zonelists() { write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd // e.g. timer interrupt handler runs at this moment some_timer_func() { kmalloc(GFP_ATOMIC) { __alloc_pages_slowpath() { read_seqbegin(&zonelist_update_seq) { // spins forever because zonelist_update_seq.seqcount is odd } } } } // e.g. timer interrupt handler finishes write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even } This deadlock scenario can be easily eliminated by not calling read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation requests. But Michal Hocko does not know whether we should go with this approach. Another deadlock scenario which syzbot is reporting is a race between kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with port->lock held and printk() from __build_all_zonelists() with zonelist_update_seq held. CPU0 CPU1 ---- ---- pty_write() { tty_insert_flip_string_and_push_buffer() { __build_all_zonelists() { write_seqlock(&zonelist_update_seq); build_zonelists() { printk() { vprintk() { vprintk_default() { vprintk_emit() { console_unlock() { console_flush_all() { console_emit_next_record() { con->write() = serial8250_console_write() { spin_lock_irqsave(&port->lock, flags); tty_insert_flip_string() { tty_insert_flip_string_fixed_flag() { __tty_buffer_request_room() { tty_buffer_alloc() { kmalloc(GFP_ATOMIC | __GFP_NOWARN) { __alloc_pages_slowpath() { zonelist_iter_begin() { read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held } } } } } } } } spin_unlock_irqrestore(&port->lock, flags); // message is printed to console spin_unlock_irqrestore(&port->lock, flags); } } } } } } } } } write_sequnlock(&zonelist_update_seq); } } } This deadlock scenario can be eliminated by preventing interrupt context from calling kmalloc(GFP_ATOMIC) and preventing printk() from calling console_flush_all() while zonelist_update_seq.seqcount is odd. Since Petr Mladek thinks that __build_all_zonelists() can become a candidate for deferring printk() [2], let's address this problem by disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC) and disabling synchronous printk() in order to avoid console_flush_all() . As a side effect of minimizing duration of zonelist_update_seq.seqcount being odd by disabling synchronous printk(), latency at read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and __GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e. do not record unnecessary locking dependency) from interrupt context is still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC) inside write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq) section... Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2] Reported-by: syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Petr Mladek <pmladek@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Patrick Daly <quic_pdaly@quicinc.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
7b854fbace |
This is the 4.14.313 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmRBDasACgkQONu9yGCS aT6b+RAA10Y7oyJ3XTY4Iezj9155aG+8pQdraHCUeQ2mQSf5vQXszDZY466dsaam 7ONyW4cjZBBcQHAfiN2LYIPBmEq27ooDBoUZt8r9xX2I/xXSrYKJ64sI7QObpXz/ fJ5H94lLaxkldYmXl/o6fVstRcn5dPJ0FXaKvdWLwD/G/3y6Z/odFEmmbeZiHEtm G4owwbKMDxJ82sDBi9jTOVFy3ciINDbixydGF1g8VkV3aL2mk8lPd5nPsSxf1b3N GE+gKHIlW44/TuObYPewd6c9uQerIk7RG/pgo3z2vda0i2X3WYxF1bYmCjeHuoKE zmv3/mtltymRQf2nszyWcK3mEuGiQVOb4ikx0sDoo02+9YVF2kC/hs/vFJE8MR8J 3IkgMy675EEwQcoK21W8PqYhXwyJNaf53PWsxa5J6FdGby/9BJnQ94K3Ri06SlAi 6fB1xXvc+qRm0+ARssxO4e/d3zTZlhFgKwvrCyt2vQEvAZc4+NksrPeGpzMkIKLj 44fBwo+tDZ4Xg7rfYS+/lsN0ZxvkMdz06AF54MRGPSxjDIGqU94/jrZ1oqb3uvtl ta5LZsZvTXXUIFhrfi65/yBoEhAvGpkYbVcCeqqA+U97mtQ2yd24fV8oHwYVGu/g zoYfPIlWxrRx9TN1W6wwQvJxfdPbK67W5akfikqvB8fHeX7/xMw= =/dv7 -----END PGP SIGNATURE----- Merge 4.14.313 into android-4.14-stable Changes in 4.14.313 pwm: cros-ec: Explicitly set .polarity in .get_state() wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta icmp: guard against too small mtu ipv6: Fix an uninit variable access bug in __ip6_make_skb() gpio: davinci: Add irq chip flag to skip set wake USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs USB: serial: option: add Telit FE990 compositions USB: serial: option: add Quectel RM500U-CN modem iio: dac: cio-dac: Fix max DAC write value check for 12-bit tty: serial: sh-sci: Fix Rx on RZ/G2L SCI nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread() nilfs2: fix sysfs interface lifetime perf/core: Fix the same task check in perf_event_set_output ftrace: Mark get_lock_parent_ip() __always_inline ring-buffer: Fix race while reader and writer are on the same page mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() ALSA: emu10k1: fix capture interrupt handler unlinking ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard ALSA: i2c/cs8427: fix iec958 mixer control deactivation ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp} Bluetooth: Fix race condition in hidp_session_thread mtdblock: tolerate corrected bit-flips 9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition niu: Fix missing unwind goto in niu_alloc_channels() qlcnic: check pci_reset_function result net: macb: fix a memory corruption in extended buffer descriptor mode i2c: imx-lpi2c: clean rx/tx buffers upon new message efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L verify_pefile: relax wrapper length check ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach() watchdog: sbsa_wdog: Make sure the timeout programming is within the limits coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug KVM: arm64: Factor out core register ID enumeration KVM: arm64: Filter out invalid core register IDs in KVM_GET_REG_LIST arm64: KVM: Fix system register enumeration Linux 4.14.313 Change-Id: I9dcef9855d47e02e4ccbfcc7dd59e976c6ab9fb1 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
111a79d9b9 |
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
commit 6fe7d6b992113719e96744d974212df3fcddc76c upstream. The si->lock must be held when deleting the si from the available list. Otherwise, another thread can re-add the si to the available list, which can lead to memory corruption. The only place we have found where this happens is in the swapoff path. This case can be described as below: core 0 core 1 swapoff del_from_avail_list(si) waiting try lock si->lock acquire swap_avail_lock and re-add si into swap_avail_head acquire si->lock but missing si already being added again, and continuing to clear SWP_WRITEOK, etc. It can be easily found that a massive warning messages can be triggered inside get_swap_pages() by some special cases, for example, we call madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile, run much swapon-swapoff operations (e.g. stress-ng-swap). However, in the worst case, panic can be caused by the above scene. In swapoff(), the memory used by si could be kept in swap_info[] after turning off a swap. This means memory corruption will not be caused immediately until allocated and reset for a new swap in the swapon path. A panic message caused: (with CONFIG_PLIST_DEBUG enabled) ------------[ cut here ]------------ top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70 Modules linked in: rfkill(E) crct10dif_ce(E)... CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+ Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015 pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) pc : plist_check_prev_next_node+0x50/0x70 lr : plist_check_prev_next_node+0x50/0x70 sp : ffff0018009d3c30 x29: ffff0018009d3c40 x28: ffff800011b32a98 x27: 0000000000000000 x26: ffff001803908000 x25: ffff8000128ea088 x24: ffff800011b32a48 x23: 0000000000000028 x22: ffff001800875c00 x21: ffff800010f9e520 x20: ffff001800875c00 x19: ffff001800fdc6e0 x18: 0000000000000030 x17: 0000000000000000 x16: 0000000000000000 x15: 0736076307640766 x14: 0730073007380731 x13: 0736076307640766 x12: 0730073007380731 x11: 000000000004058d x10: 0000000085a85b76 x9 : ffff8000101436e4 x8 : ffff800011c8ce08 x7 : 0000000000000000 x6 : 0000000000000001 x5 : ffff0017df9ed338 x4 : 0000000000000001 x3 : ffff8017ce62a000 x2 : ffff0017df9ed340 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: plist_check_prev_next_node+0x50/0x70 plist_check_head+0x80/0xf0 plist_add+0x28/0x140 add_to_avail_list+0x9c/0xf0 _enable_swap_info+0x78/0xb4 __do_sys_swapon+0x918/0xa10 __arm64_sys_swapon+0x20/0x30 el0_svc_common+0x8c/0x220 do_el0_svc+0x2c/0x90 el0_svc+0x1c/0x30 el0_sync_handler+0xa8/0xb0 el0_sync+0x148/0x180 irq event stamp: 2082270 Now, si->lock locked before calling 'del_from_avail_list()' to make sure other thread see the si had been deleted and SWP_WRITEOK cleared together, will not reinsert again. This problem exists in versions after stable 5.10.y. Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.com Fixes: a2468cc9bfdff ("swap: choose swap device according to numa node") Tested-by: Yongchen Yin <wb-yyc939293@alibaba-inc.com> Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Aaron Lu <aaron.lu@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
b4e4425181 |
ANDROID: mm/filemap: Fix missing put_page() for speculative page fault
find_get_page() returns a page with increased refcount, assuming a page exists at the given index. Ensure this refcount is dropped on error. Bug: 271079833 Fixes: 59d4d125 ("BACKPORT: FROMLIST: mm: implement speculative handling in filemap_fault()") Change-Id: Idc7b9e3f11f32a02bed4c6f4e11cec9200a5c790 Signed-off-by: Patrick Daly <quic_pdaly@quicinc.com> (cherry picked from commit 6232eecfa7ca0d8d0ca088da6d0edb2c3a879ff9) Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> Git-commit: 1d05213028b6dbdb8801e20f29b6a6f91c216033 Git-repo: https://android.googlesource.com/kernel/common/ Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com> |
||
|
765b588f3b |
ANDROID: Re-enable fast mremap and fix UAF with SPF
SPF attempts page faults without taking the mmap lock, but takes the PTL. If there is a concurrent fast mremap (at PMD/PUD level), this can lead to a UAF as fast mremap will only take the PTL locks at the PMD/PUD level. SPF cannot take the PTL locks at the larger subtree granularity since this introduces much contention in the page fault paths. To address the race: 1) Only try fast mremaps if there are no users of the VMA. Android is concerned with this optimization in the context of GC stop-the-world pause. So there are no other threads active and this should almost always succeed. 2) Speculative faults detect ongoing fast mremaps and fallback to conventional fault handling (taking mmap read lock). Bug: 263177905 Change-Id: I23917e493ddc8576de19883cac053dfde9982b7f Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Git-commit: 529351c4c8202aa7f5bc4a8a100e583a70ab6110 Git-repo: https://android.googlesource.com/kernel/common/ [quic_c_spathi@quicinc.com: resolve merge conflicts. Not applying mremap changes due to absence of applicable configs for race.] Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com> |
||
|
21eb97e426 |
ANDROID: mm: fix invalid backport in speculative page fault path
Invalid condition was introduced when porting the original SPF patch which would affect NUMA mode. Fixes: 736ae8bde8da3 ("FROMLIST: mm: adding speculative page fault failure trace events") Bug: 257443051 Change-Id: Ib20c625615b279dc467588933a1f598dc179861b Signed-off-by: Suren Baghdasaryan <surenb@google.com> Git-commit: 1900436df5d947c2ee74bd78cde1366556c93b51 Git-repo: https://android.googlesource.com/kernel/common/ [quic_c_spathi@quicinc.com: resolve trivial merge conflicts] Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com> |