When there's only one device, use an on-stack buffer to store the device
pointers in order to avoid dynamic memory allocation from this hot path.
Change-Id: Ib54e0a5de7f0168b6f246a85306a146d1384b39a
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
fscrypt_get_encryption_info() has never actually been safe to call in a
context that needs GFP_NOFS, since it calls crypto_alloc_skcipher().
crypto_alloc_skcipher() isn't GFP_NOFS-safe, even if called under
memalloc_nofs_save(). This is because it may load kernel modules, and
also because it internally takes crypto_alg_sem. Other tasks can do
GFP_KERNEL allocations while holding crypto_alg_sem for write.
The use of fscrypt_init_mutex isn't GFP_NOFS-safe either.
So, stop pretending that fscrypt_get_encryption_info() is nofs-safe.
I.e., when it allocates memory, just use GFP_KERNEL instead of GFP_NOFS.
Link: https://lore.kernel.org/r/20200917041136.178600-10-ebiggers@kernel.org
Change-Id: I0530b5580741e77ad50e607225cf2d7d894afd27
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: Iff137b9643843d99367d5b97606c7cd687dc8709
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I2d2f5a6ef9c3f8929c6342d290f1e9010f97898b
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Those function are frequently used in various places and declaring them inline
can reduce overheads.
Change-Id: I4c0845686f758eddeae0bd1a89ea09d551fb332f
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
__d_lookup_rcu() is one of the hottest functions in the kernel on
certain loads, and it is complicated by filesystems that might want to
have their own name compare function.
We can improve code generation by moving the test of DCACHE_OP_COMPARE
outside the loop, which makes the loop itself much simpler, at the cost
of some code duplication. But both cases end up being simpler, and the
"native" direct case-sensitive compare particularly so.
Change-Id: Ib45cdc9d56c950f472cf2bea34330c18926ba925
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This reverts commit 2a888d52929ec2345b38b12872b4b14cb7a8afe8.
Change-Id: Id477e99d72436940e812c613f10c9fe7492a596b
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
commit 3c12466b6b7bf1e56f9b32c366a3d83d87afb4de upstream.
Currently EROFS can map another compressed buffer for inplace
decompression, that was used to handle the cases that some pages of
compressed data are actually not in-place I/O.
However, like most simple LZ77 algorithms, LZ4 expects the compressed
data is arranged at the end of the decompressed buffer and it
explicitly uses memmove() to handle overlapping:
__________________________________________________________
|_ direction of decompression --> ____ |_ compressed data _|
Although EROFS arranges compressed data like this, it typically maps two
individual virtual buffers so the relative order is uncertain.
Previously, it was hardly observed since LZ4 only uses memmove() for
short overlapped literals and x86/arm64 memmove implementations seem to
completely cover it up and they don't have this issue. Juhyung reported
that EROFS data corruption can be found on a new Intel x86 processor.
After some analysis, it seems that recent x86 processors with the new
FSRM feature expose this issue with "rep movsb".
Let's strictly use the decompressed buffer for lz4 inplace
decompression for now. Later, as an useful improvement, we could try
to tie up these two buffers together in the correct order.
Reported-and-tested-by: Juhyung Park <qkrwngud825@gmail.com>
Closes: https://lore.kernel.org/r/CAD14+f2AVKf8Fa2OO1aAUdDNTDsVzzR6ctU_oJSmTyd6zSYR2Q@mail.gmail.com
Fixes: 0ffd71bcc3a0 ("staging: erofs: introduce LZ4 decompression inplace")
Fixes: 598162d05080 ("erofs: support decompress big pcluster for lz4 backend")
Cc: stable <stable@vger.kernel.org> # 5.4+
Tested-by: Yifan Zhao <zhaoyifan@sjtu.edu.cn>
Change-Id: Ib7a578283e33f0329ae2133223878ddf0738aba4
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231206045534.3920847-1-hsiangkao@linux.alibaba.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[mkbestas: Adapt for android <=5.4 kernel which contains backports that
caused various conflicts]
Signed-off-by: Michael Bestas <mkbestas@lineageos.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I00388d639f7b62355550197c9033dab7a996b916
Signed-off-by: Dark-Matter7232 <me@const.eu.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
MIUI-1428085
Change-Id: I7c910321b66c6877cbc5656b3b3e426557dc3314
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
MIUI-1428085
The discard thread can only process 8 requests at a time by default.
So fstrim need to handle the remaining discard requests while using
discard option.
Change-Id: I5eac38c34182607e8dceeb13273522b10ce02af8
Signed-off-by: liuchao12 <liuchao12@xiaomi.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Some architectures have implemented optimized copy_page for full page
copying, such as arm.
On my arm platform, use the copy_page helper for single page copying is
about 10 percent faster than memcpy.
Change-Id: Ie28de9ef5954d0c232b418f382471bc7c125563f
Signed-off-by: Dark-Matter7232 <me@const.eu.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The logcat spam from perfd when the kernel doesn't have everything it
wants destroys logcats and creates excessive logd overhead from the sheer
number of logs it prints.
Block perfd from writing to the logd socket so that it cannot write to
logcat at all. This way, all perfd attempts to write to logcat are
killed early on before they can incur overhead, which solves the excessive
logd overhead problem.
Change-Id: I9489c5eb8c041b998affd154d1beca82459b5fcc
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
These are all handled by the random driver, so instead of listing
each ioctl, we can use the generic compat_ptr_ioctl() helper.
Change-Id: I3271ba0b0b3d6ae904cc87f983d7d3f99935e767
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Many drivers have ioctl() handlers that are completely compatible between
32-bit and 64-bit architectures, except for the argument that is passed
down from user space and may have to be passed through compat_ptr()
in order to become a valid 64-bit pointer.
Using ".compat_ptr = compat_ptr_ioctl" in file operations should let
us simplify a lot of those drivers to avoid #ifdef checks, and convert
additional drivers that don't have proper compat handling yet.
On most architectures, the compat_ptr_ioctl() just passes all arguments
to the corresponding ->ioctl handler. The exception is arch/s390, where
compat_ptr() clears the top bit of a 32-bit pointer value, so user space
pointers to the second 2GB alias the first 2GB, as is the case for native
32-bit s390 user space.
The compat_ptr_ioctl() function must therefore be used only with
ioctl functions that either ignore the argument or pass a pointer to a
compatible data type.
If any ioctl command handled by fops->unlocked_ioctl passes a plain
integer instead of a pointer, or any of the passed data types is
incompatible between 32-bit and 64-bit architectures, a proper handler
is required instead of compat_ptr_ioctl.
Change-Id: Ic216898976f9369d6f8a316bf75eb9d8a00386fe
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The function close_fd_get_file is explicitly a variant of
__close_fd[1]. Now that __close_fd has been renamed close_fd, rename
close_fd_get_file to be consistent with close_fd.
When __alloc_fd, __close_fd and __fd_install were introduced the
double underscore indicated that the function took a struct
files_struct parameter. The function __close_fd_get_file never has so
the naming has always been inconsistent. This just cleans things up
so there are not any lingering mentions or references __close_fd left
in the code.
[1] 80cd795630d6 ("binder: fix use-after-free due to ksys_close() during fdget()")
Link: https://lkml.kernel.org/r/20201120231441.29911-23-ebiederm@xmission.com
Change-Id: I191f5fa6c99112d22e90566727433a9314d2bb00
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Change-Id: I9fe7c84d1b81fd8e2ef0ad89361c615c62693593
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Avoid passing tail pages to isolate_lru_page. In the case
process reclaim, since only lru pages are considered, this
is just to avoid a warning from isolate_lru_page.
Change-Id: I1f54dcec15f8c2d5ba16738657e79d2793d36c77
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
BSPSYS-10620
This change includes:
- fix reclaim count calculation issue
- only reclaim un-shared pages
- avoid burning CPU when nr_to_reclaim pages
- export reclaim_pte_range() API for rtmm
Change-Id: I04e77a2811f5ac4cae6d261c3eaaf88e26eee2ce
Signed-off-by: fangqinwu <fangqinwu@xiaomi.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Other userspace apps like AppCOmpaction would like to use this node,
so update permission.
Change-Id: Ied22bd6ad489bef4028cde943ac185d1354ab971
Signed-off-by: <laoyi@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
App compaction can not be aborted due to current design,
it causes app get stuck for a while when it resume to foreground
if app is compacting by kernel. So now we add check in kernel
while loop to abort the compaction if adj is zero.
Change-Id: I16f36a08d1a5efcb36827e84f5058abfa0a7dae0
Signed-off-by: huangzq2 <huangzq2@motorola.com>
Reviewed-on: https://gerrit.mot.com/2014822
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Allow selecting only tasks with oom_score_adj >= 0.
Change-Id: Iebbb487c711da98b8fcb367ba838b5fe0b260d4f
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
To turn the KMSG_DUMP_* reasons into a more ordered list, collapse
the redundant KMSG_DUMP_(RESTART|HALT|POWEROFF) reasons into
KMSG_DUMP_SHUTDOWN. The current users already don't meaningfully
distinguish between them, so there's no need to, as discussed here:
https://lore.kernel.org/lkml/CA+CK2bAPv5u1ih5y9t5FUnTyximtFCtDYXJCpuyjOyHNOkRdqw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/20200515184434.8470-2-keescook@chromium.org/
Change-Id: I23f5d8ed4b16c2b506b1727abfe2b40bb9dc6db0
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Now that pstore_register() can correctly pass max_reason to the kmesg
dump facility, introduce a new "max_reason" module parameter and
"max-reason" Device Tree field.
The "dump_oops" module parameter and "dump-oops" Device
Tree field are now considered deprecated, but are now automatically
converted to their corresponding max_reason values when present, though
the new max_reason setting has precedence.
For struct ramoops_platform_data, the "dump_oops" member is entirely
replaced by a new "max_reason" member, with the only existing user
updated in place.
Additionally remove the "reason" filter logic from ramoops_pstore_write(),
as that is not specifically needed anymore, though technically
this is a change in behavior for any ramoops users also setting the
printk.always_kmsg_dump boot param, which will cause ramoops to behave as
if max_reason was set to KMSG_DUMP_MAX.
Link: https://lore.kernel.org/lkml/20200515184434.8470-6-keescook@chromium.org/
Change-Id: I563849afcc4975f36b616c2976202df92cfe4d14
Co-developed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Add a new member to struct pstore_info for passing information about
kmesg dump maximum reason. This allows a finer control of what kmesg
dumps are sent to pstore storage backends.
Those backends that do not explicitly set this field (keeping it equal to
0), get the default behavior: store only Oopses and Panics, or everything
if the printk.always_kmsg_dump boot param is set.
Link: https://lore.kernel.org/lkml/20200515184434.8470-5-keescook@chromium.org/
Change-Id: Iedf5f428a0bdb38f1fe5effa6226292b7144bd95
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Refactor device tree size parsing routines to be able to pass a non-zero
default value for providing a configurable default for the coming
"max_reason" field. Also rename the helpers, since we're not always
parsing a size -- we're parsing a u32 and making sure it's not greater
than INT_MAX.
Link: https://lore.kernel.org/lkml/20200506211523.15077-4-keescook@chromium.org/
Link: https://lore.kernel.org/lkml/20200521205223.175957-1-tyhicks@linux.microsoft.com
Change-Id: If44ba60c12ecb5595d6c124ce9b44f886cd27018
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
A couple module parameters had 0600 permissions, but changing them would
have no impact on ramoops, so switch these to 0400 to reflect reality.
Link: https://lore.kernel.org/lkml/20200506211523.15077-7-keescook@chromium.org/
Change-Id: Iab22b003fa4f7c9cb9b5fd26d3d176af37c7c454
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Since the header is a fixed small maximum size, just use a stack variable
to avoid memory allocation in the write path.
Change-Id: I9f8a09f0f95058a40647e1cd55238c80f95d5e4b
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
If zero-length header happened in ramoops_write_kmsg_hdr(), that means
we will not be able to read back dmesg record later, since it will be
treated as invalid header in ramoops_pstore_read(). So we should not
execute the following code but return the error.
Change-Id: I86519a8fb83bacde9879b8149d80a34c37ace8a8
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Since only one single ramoops area allowed at a time, other probes
(like device tree) are meaningless, as it will waste CPU resources.
So let's check for being already initialized first.
Change-Id: I37823492ff578d8b4485d2b762a1bcbbba2ddc49
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Sometimes pstore_console_write() will write records with zero size
to persistent ram zone, which is unnecessary. It will only increase
resource consumption. Also adjust ramoops_write_kmsg_hdr() to have
same logic if memory allocation fails.
Change-Id: I66e5aa533774ce429feec697056e1eda5ef56b03
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEERFwmR4yFob14UDOYC8702P6YulgFAmciUfMZHHZlZ2FyZC5u
b3NzdW1Ab3JhY2xlLmNvbQAKCRALzvTY/pi6WLuWD/9Ygfcw1geLS8AUG78WElHy
l9L6JBbhAMCd7UTIn6ulSxNalJSMUg9P5xbtBVwz64qLCeJYEpJZ23UOj1fNK9Sm
+zBI3R3LURhRr/qwRVN8xJvKfGNODx8LsI61H8bcowKOjC1zdtd11vtZfu+KmumM
G26MeAabNszHOaFzoAoD/0VR6xxbHbvWv9oiyaCvyyB1iPU2wpGfO26dQ6YOO87K
914tsdY+I70tpAgCks3DyZaN+h0kXGww9k9YCG8awxHHgAvxsVQ5+cZtC0QO04bN
uEwOHapieFtoFGG/c6cUq9ARiVkWdgXV0+xeGzefDbygcfZjfC4b6a+iSf9j42oq
X1hM7mGxJMvzoiweXOF9XhfrmBKpnXcLgZqpFTk3Iy/EALstd9AnvY7SJCJLa+u8
Dp2NOZ/9tKfvrfiI/AA+uwjlFLeA1dbfvf5HcUtfKuJ7Y1Qq4q9QxRr9svk+7+ZG
nbrYxNfpxmrgffrm5W9gt/02M8v+ymC9fIFK82V6EEPbnikPE8SGJ2JAuirjiG1i
u1bvIzTCBtu6074tuqCjHKAyTUFNZMKS8xxT+prEolXguBiuEmlJodK7kkJKFurc
uAiLlYpMZNvFJox5E0vCLD8HInTHNR3yd8D/kNg733ukg6o2mKvLkR83lHpvhe67
0bxx5K76GV7p4I/wvIJ6zw==
=JpII
-----END PGP SIGNATURE-----
Merge tag 'v4.14.355-openela' of https://github.com/openela/kernel-lts
This is the 4.14.355 OpenELA-Extended LTS stable release
* tag 'v4.14.355-openela' of https://github.com/openela/kernel-lts: (79 commits)
LTS: Update to 4.14.355
Revert "parisc: Use irq_enter_rcu() to fix warning at kernel/context_tracking.c:367"
netns: restore ops before calling ops_exit_list
cx82310_eth: fix error return code in cx82310_bind()
rtmutex: Drop rt_mutex::wait_lock before scheduling
locking/rtmutex: Handle non enqueued waiters gracefully in remove_waiter()
drm/i915/fence: Mark debug_fence_free() with __maybe_unused
ACPI: processor: Fix memory leaks in error paths of processor_add()
ACPI: processor: Return an error if acpi_processor_get_info() fails in processor_add()
netns: add pre_exit method to struct pernet_operations
net: Add comment about pernet_operations methods and synchronization
nilfs2: protect references to superblock parameters exposed in sysfs
nilfs2: replace snprintf in show functions with sysfs_emit
nilfs2: use time64_t internally
tracing: Avoid possible softlockup in tracing_iter_reset()
ring-buffer: Rename ring_buffer_read() to read_buffer_iter_advance()
uprobes: Use kzalloc to allocate xol area
clocksource/drivers/imx-tpm: Fix next event not taking effect sometime
clocksource/drivers/imx-tpm: Fix return -ETIME when delta exceeds INT_MAX
VMCI: Fix use-after-free when removing resource in vmci_resource_remove()
...
Change-Id: I237799395c31c147d9e602b34bff999c65fe9ef0
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
[ Upstream commit 683408258917541bdb294cd717c210a04381931e ]
The superblock buffers of nilfs2 can not only be overwritten at runtime
for modifications/repairs, but they are also regularly swapped, replaced
during resizing, and even abandoned when degrading to one side due to
backing device issues. So, accessing them requires mutual exclusion using
the reader/writer semaphore "nilfs->ns_sem".
Some sysfs attribute show methods read this superblock buffer without the
necessary mutual exclusion, which can cause problems with pointer
dereferencing and memory access, so fix it.
Link: https://lkml.kernel.org/r/20240811100320.9913-1-konishi.ryusuke@gmail.com
Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b90beafac05931cbfcb6b1bd4f67c1923f47040e)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 3bcd6c5bd483287f4a09d3d59a012d47677b6edc ]
Patch series "nilfs2 updates".
This patch (of 2):
coccicheck complains about the use of snprintf() in sysfs show functions.
Fix the coccicheck warning:
WARNING: use scnprintf or sprintf.
Use sysfs_emit instead of scnprintf or sprintf makes more sense.
Link: https://lkml.kernel.org/r/1635151862-11547-1-git-send-email-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/1634095759-4625-1-git-send-email-wangqing@vivo.com
Link: https://lkml.kernel.org/r/1635151862-11547-2-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Qing Wang <wangqing@vivo.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stable-dep-of: 683408258917 ("nilfs2: protect references to superblock parameters exposed in sysfs")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 51af9b589ab68a94485ee55811d1243c6bf665b4)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
The superblock and segment timestamps are used only internally in nilfs2
and can be read out using sysfs.
Since we are using the old 'get_seconds()' interface and store the data
as timestamps, the behavior differs slightly between 64-bit and 32-bit
kernels, the latter will show incorrect timestamps after 2038 in sysfs,
and presumably fail completely in 2106 as comparisons go wrong.
This changes nilfs2 to use time64_t with ktime_get_real_seconds() to
handle timestamps, making the behavior consistent and correct on both
32-bit and 64-bit machines.
The on-disk format already uses 64-bit timestamps, so nothing changes
there.
Link: http://lkml.kernel.org/r/20180122211050.1286441-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit fb04b91bc2c3a83e9e2ba9c5ce0f0124dd3ffef0)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 810ee43d9cd245d138a2733d87a24858a23f577d ]
Syzkiller reports a "KMSAN: uninit-value in pick_link" bug.
This is caused by an uninitialised page, which is ultimately caused
by a corrupted symbolic link size read from disk.
The reason why the corrupted symlink size causes an uninitialised
page is due to the following sequence of events:
1. squashfs_read_inode() is called to read the symbolic
link from disk. This assigns the corrupted value
3875536935 to inode->i_size.
2. Later squashfs_symlink_read_folio() is called, which assigns
this corrupted value to the length variable, which being a
signed int, overflows producing a negative number.
3. The following loop that fills in the page contents checks that
the copied bytes is less than length, which being negative means
the loop is skipped, producing an uninitialised page.
This patch adds a sanity check which checks that the symbolic
link size is not larger than expected.
--
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Link: https://lore.kernel.org/r/20240811232821.13903-1-phillip@squashfs.org.uk
Reported-by: Lizhi Xu <lizhi.xu@windriver.com>
Reported-by: syzbot+24ac24ff58dc5b0d26b9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000a90e8c061e86a76b@google.com/
V2: fix spelling mistake.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f82cb7f24032ed023fc67d26ea9bf322d8431a90)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit b8e947e9f64cac9df85a07672b658df5b2bcff07 ]
Some arch + compiler combinations report a potentially unused variable
location in btrfs_lookup_dentry(). This is a false alert as the variable
is passed by value and always valid or there's an error. The compilers
cannot probably reason about that although btrfs_inode_by_name() is in
the same file.
> + /kisskb/src/fs/btrfs/inode.c: error: 'location.objectid' may be used
+uninitialized in this function [-Werror=maybe-uninitialized]: => 5603:9
> + /kisskb/src/fs/btrfs/inode.c: error: 'location.type' may be used
+uninitialized in this function [-Werror=maybe-uninitialized]: => 5674:5
m68k-gcc8/m68k-allmodconfig
mips-gcc8/mips-allmodconfig
powerpc-gcc5/powerpc-all{mod,yes}config
powerpc-gcc5/ppc64_defconfig
Initialize it to zero, this should fix the warnings and won't change the
behaviour as btrfs_inode_by_name() accepts only a root or inode item
types, otherwise returns an error.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/linux-btrfs/bd4e9928-17b3-9257-8ba7-6b7f9bbb639a@linux-m68k.org/
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d09f1bf3d7f029558704f077097dbcd4dc5db8da)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit b8ccef048354074a548f108e51d0557d6adfd3a3 ]
In reada we BUG_ON(refs == 0), which could be unkind since we aren't
holding a lock on the extent leaf and thus could get a transient
incorrect answer. In walk_down_proc we also BUG_ON(refs == 0), which
could happen if we have extent tree corruption. Change that to return
-EUCLEAN. In do_walk_down() we catch this case and handle it correctly,
however we return -EIO, which -EUCLEAN is a more appropriate error code.
Finally in walk_up_proc we have the same BUG_ON(refs == 0), so convert
that to proper error handling. Also adjust the error message so we can
actually do something with the information.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit c847b28a799733b04574060ab9d00f215970627d)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 1f9d44c0a12730a24f8bb75c5e1102207413cc9b ]
We have a couple of areas where we check to make sure the tree block is
locked before looking up or messing with references. This is old code
so it has this as BUG_ON(). Convert this to ASSERT() for developers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 2df0e48615f438cdf93f1469ed9f289f71440d2d)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit 6576dd6695f2afca3f4954029ac4a64f82ba60ab upstream.
After commit a694291a6211 ("nilfs2: separate wait function from
nilfs_segctor_write") was applied, the log writing function
nilfs_segctor_do_construct() was able to issue I/O requests continuously
even if user data blocks were split into multiple logs across segments,
but two potential flaws were introduced in its error handling.
First, if nilfs_segctor_begin_construction() fails while creating the
second or subsequent logs, the log writing function returns without
calling nilfs_segctor_abort_construction(), so the writeback flag set on
pages/folios will remain uncleared. This causes page cache operations to
hang waiting for the writeback flag. For example,
truncate_inode_pages_final(), which is called via nilfs_evict_inode() when
an inode is evicted from memory, will hang.
Second, the NILFS_I_COLLECTED flag set on normal inodes remain uncleared.
As a result, if the next log write involves checkpoint creation, that's
fine, but if a partial log write is performed that does not, inodes with
NILFS_I_COLLECTED set are erroneously removed from the "sc_dirty_files"
list, and their data and b-tree blocks may not be written to the device,
corrupting the block mapping.
Fix these issues by uniformly calling nilfs_segctor_abort_construction()
on failure of each step in the loop in nilfs_segctor_do_construct(),
having it clean up logs and segment usages according to progress, and
correcting the conditions for calling nilfs_redirty_inodes() to ensure
that the NILFS_I_COLLECTED flag is cleared.
Link: https://lkml.kernel.org/r/20240814101119.4070-1-konishi.ryusuke@gmail.com
Fixes: a694291a6211 ("nilfs2: separate wait function from nilfs_segctor_write")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 40a2757de2c376ef8a08d9ee9c81e77f3c750adf)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit 5787fcaab9eb5930f5378d6a1dd03d916d146622 upstream.
In an error injection test of a routine for mount-time recovery, KASAN
found a use-after-free bug.
It turned out that if data recovery was performed using partial logs
created by dsync writes, but an error occurred before starting the log
writer to create a recovered checkpoint, the inodes whose data had been
recovered were left in the ns_dirty_files list of the nilfs object and
were not freed.
Fix this issue by cleaning up inodes that have read the recovery data if
the recovery routine fails midway before the log writer starts.
Link: https://lkml.kernel.org/r/20240810065242.3701-1-konishi.ryusuke@gmail.com
Fixes: 0f3e1c7f23f8 ("nilfs2: recovery functions")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 35a9a7a7d94662146396199b0cfd95f9517cdd14)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit b18915248a15eae7d901262f108d6ff0ffb4ffc1 upstream.
The existing code uses min_t(ssize_t, outarg.size, XATTR_LIST_MAX) when
parsing the FUSE daemon's response to a zero-length getxattr/listxattr
request.
On 32-bit kernels, where ssize_t and outarg.size are the same size, this is
wrong: The min_t() will pass through any size values that are negative when
interpreted as signed.
fuse_listxattr() will then return this userspace-supplied negative value,
which callers will treat as an error value.
This kind of bug pattern can lead to fairly bad security bugs because of
how error codes are used in the Linux kernel. If a caller were to convert
the numeric error into an error pointer, like so:
struct foo *func(...) {
int len = fuse_getxattr(..., NULL, 0);
if (len < 0)
return ERR_PTR(len);
...
}
then it would end up returning this userspace-supplied negative value cast
to a pointer - but the caller of this function wouldn't recognize it as an
error pointer (IS_ERR_VALUE() only detects values in the narrow range in
which legitimate errno values are), and so it would just be treated as a
kernel pointer.
I think there is at least one theoretical codepath where this could happen,
but that path would involve virtio-fs with submounts plus some weird
SELinux configuration, so I think it's probably not a concern in practice.
Cc: stable@vger.kernel.org # v4.9
Fixes: 63401ccdb2ca ("fuse: limit xattr returned size")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 13d787bb4f21b6dbc8d8291bf179d36568893c25)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
commit c2efd13a2ed4f29bf9ef14ac2fbb7474084655f8 upstream.
UDF disk format supports in principle file sizes up to 1<<64-1. However
the file space (including holes) is described by a linked list of
extents, each of which can have at most 1GB. Thus the creation and
handling of extents gets unusably slow beyond certain point. Limit the
file size to 4TB to avoid locking up the kernel too easily.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a6211d4d3df3a5f90d8bcd11acd91baf7a3c2b5d)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
The locks_remove_posix() function in fcntl_setlk/fcntl_setlk64 is designed
to reliably remove locks when an fcntl/close race is detected. However, it
was passing in the wrong filelock owner, it looks like a mistake and
resulting in a failure to remove locks. More critically, if the lock
removal fails, it could lead to a uaf issue while traversing the locks.
This problem occurs only in the 4.19/5.4 stable version.
Fixes: a561145f3ae9 ("filelock: Fix fcntl/close race recovery compat path")
Fixes: d30ff3304083 ("filelock: Remove locks reliably when fcntl/close race is detected")
Cc: stable@vger.kernel.org
Signed-off-by: Long Li <leo.lilong@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a1177ea83da87a87cc352aa41f24d61c08c80b5d)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 261341a932d9244cbcd372a3659428c8723e5a49 ]
The max_zeroout is of type int and the s_extent_max_zeroout_kb is of
type uint, and the s_extent_max_zeroout_kb can be freely modified via
the sysfs interface. When the block size is 1024, max_zeroout may
overflow, so declare it as unsigned int to avoid overflow.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240319113325.3110393-9-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 2f64ae32831e5a2bfd0e404c6e63b399eb180a0a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>