msm-4.14

mirror of https://github.com/rd-stuffs/msm-4.14.git synced 2025-02-20 11:45:48 +08:00

Author	SHA1	Message	Date
Alessio Balsini	a9490d32bc	FROMLIST: fuse: Introduce passthrough for mmap Enabling FUSE passthrough for mmap-ed operations not only affects performance, but has also been shown as mandatory for the correct functioning of FUSE passthrough. yanwu noticed [1] that a FUSE file with passthrough enabled may suffer data inconsistencies if the same file is also accessed with mmap. What happens is that read/write operations are directly applied to the lower file system (and its cache), while mmap-ed operations are affecting the FUSE cache. Extend the FUSE passthrough implementation to also handle memory-mapped FUSE file, to both fix the cache inconsistencies and extend the passthrough performance benefits to mmap-ed operations. [1] https://lore.kernel.org/lkml/20210119110654.11817-1-wu-yan@tcl.com/ Bug: 179164095 Link: https://lore.kernel.org/lkml/20210125153057.3623715-9-balsini@android.com/ Signed-off-by: Alessio Balsini <balsini@android.com> Change-Id: Ifad4698b0380f6e004c487940ac6907b9a9f2964 Signed-off-by: Alessio Balsini <balsini@google.com> (cherry picked from commit bf5cb932f0e0dd028dcebf3a6c2fcfedb4fd8265) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:56 +05:30
Alessio Balsini	658dc211a1	FROMLIST: fuse: Use daemon creds in passthrough mode When using FUSE passthrough, read/write operations are directly forwarded to the lower file system file through VFS, but there is no guarantee that the process that is triggering the request has the right permissions to access the lower file system. This would cause the read/write access to fail. In passthrough file systems, where the FUSE daemon is responsible for the enforcement of the lower file system access policies, often happens that the process dealing with the FUSE file system doesn't have access to the lower file system. Being the FUSE daemon in charge of implementing the FUSE file operations, that in the case of read/write operations usually simply results in the copy of memory buffers from/to the lower file system respectively, these operations are executed with the FUSE daemon privileges. This patch adds a reference to the FUSE daemon credentials, referenced at FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl() time so that they can be used to temporarily raise the user credentials when accessing lower file system files in passthrough. The process accessing the FUSE file with passthrough enabled temporarily receives the privileges of the FUSE daemon while performing read/write operations. Similar behavior is implemented in overlayfs. These privileges will be reverted as soon as the IO operation completes. This feature does not provide any higher security privileges to those processes accessing the FUSE file system with passthrough enabled. This is because it is still the FUSE daemon responsible for enabling or not the passthrough feature at file open time, and should enable the feature only after appropriate access policy checks. Bug: 179164095 Link: https://lore.kernel.org/lkml/20210125153057.3623715-8-balsini@android.com/ Signed-off-by: Alessio Balsini <balsini@android.com> Change-Id: Idb4f03a2ce7c536691e5eaf8fadadfcf002e1677 Signed-off-by: Alessio Balsini <balsini@google.com> (cherry picked from commit 5f3d78268b21d381310574af1c16882c7680ceb1) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:56 +05:30
Alessio Balsini	5a0f00bb01	FROMLIST: fuse: Handle asynchronous read and write in passthrough Extend the passthrough feature by handling asynchronous IO both for read and write operations. When an AIO request is received, if the request targets a FUSE file with the passthrough functionality enabled, a new identical AIO request is created. The new request targets the lower file system file and gets assigned a special FUSE passthrough AIO completion callback. When the lower file system AIO request is completed, the FUSE passthrough AIO completion callback is executed and propagates the completion signal to the FUSE AIO request by triggering its completion callback as well. Bug: 179164095 Link: https://lore.kernel.org/lkml/20210125153057.3623715-7-balsini@android.com/ Signed-off-by: Alessio Balsini <balsini@android.com> Change-Id: I47671ef36211102da6dd3ee8b2f226d1e6cd9d5c Signed-off-by: Alessio Balsini <balsini@google.com> (cherry picked from commit ea2b7a36847b14dee60d1f5dbf2aa26cf101c426) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:56 +05:30
Alessio Balsini	0fe77264fe	FROMLIST: fuse: Introduce synchronous read and write for passthrough All the read and write operations performed on fuse_files which have the passthrough feature enabled are forwarded to the associated lower file system file via VFS. Sending the request directly to the lower file system avoids the userspace round-trip that, because of possible context switches and additional operations might reduce the overall performance, especially in those cases where caching doesn't help, for example in reads at random offsets. Verifying if a fuse_file has a lower file system file associated with can be done by checking the validity of its passthrough_filp pointer. This pointer is not NULL only if passthrough has been successfully enabled via the appropriate ioctl(). When a read/write operation is requested for a FUSE file with passthrough enabled, a new equivalent VFS request is generated, which instead targets the lower file system file. The VFS layer performs additional checks that allow for safer operations but may cause the operation to fail if the process accessing the FUSE file system does not have access to the lower file system. This change only implements synchronous requests in passthrough, returning an error in the case of asynchronous operations, yet covering the majority of the use cases. Bug: 179164095 Link: https://lore.kernel.org/lkml/20210125153057.3623715-6-balsini@android.com/ Signed-off-by: Alessio Balsini <balsini@android.com> Change-Id: Ifbe6a247fe7338f87d078fde923f0252eeaeb668 Signed-off-by: Alessio Balsini <balsini@google.com> (cherry picked from commit ea9685a7f9cb16b30e25386386274fdd30627c3a) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:56 +05:30
Alessio Balsini	58aebf8c21	FROMLIST: fuse: Passthrough initialization and release Implement the FUSE passthrough ioctl that associates the lower (passthrough) file system file with the fuse_file. The file descriptor passed to the ioctl by the FUSE daemon is used to access the relative file pointer, that will be copied to the fuse_file data structure to consolidate the link between the FUSE and lower file system. To enable the passthrough mode, user space triggers the FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl and, if the call succeeds, receives back an identifier that will be used at open/create response time in the fuse_open_out field to associate the FUSE file to the lower file system file. The value returned by the ioctl to user space can be: - > 0: success, the identifier can be used as part of an open/create reply. - <= 0: an error occurred. The value 0 represents an error to preserve backward compatibility: the fuse_open_out field that is used to pass the passthrough_fh back to the kernel uses the same bits that were previously as struct padding, and is commonly zero-initialized (e.g., in the libfuse implementation). Removing 0 from the correct values fixes the ambiguity between the case in which 0 corresponds to a real passthrough_fh, a missing implementation of FUSE passthrough or a request for a normal FUSE file, simplifying the user space implementation. For the passthrough mode to be successfully activated, the lower file system file must implement both read_iter and write_iter file operations. This extra check avoids special pseudo files to be targeted for this feature. Passthrough comes with another limitation: no further file system stacking is allowed for those FUSE file systems using passthrough. Bug: 179164095 Link: https://lore.kernel.org/lkml/20210125153057.3623715-5-balsini@android.com/ Signed-off-by: Alessio Balsini <balsini@android.com> Change-Id: I4d8290012302fb4547bce9bb261a03cc4f66b5aa Signed-off-by: Alessio Balsini <balsini@google.com> (cherry picked from commit 28e86146c501a0f943fe9dc0ec0252df066a2b3d) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:55 +05:30
Alessio Balsini	1f04880bf8	FROMLIST: fuse: Definitions and ioctl for passthrough Expose the FUSE_PASSTHROUGH interface to user space and declare all the basic data structures and functions as the skeleton on top of which the FUSE passthrough functionality will be built. As part of this, introduce the new FUSE passthrough ioctl, which allows the FUSE daemon to specify a direct connection between a FUSE file and a lower file system file. Such ioctl requires user space to pass the file descriptor of one of its opened files through the fuse_passthrough_out data structure introduced in this patch. This structure includes extra fields for possible future extensions. Also, add the passthrough functions for the set-up and tear-down of the data structures and locks that will be used both when fuse_conns and fuse_files are created/deleted. Bug: 179164095 Link: https://lore.kernel.org/lkml/20210125153057.3623715-4-balsini@android.com/ Signed-off-by: Alessio Balsini <balsini@android.com> Change-Id: I732532581348adadda5b5048a9346c2b0868d539 Signed-off-by: Alessio Balsini <balsini@google.com> (cherry picked from commit d02368d67989781a3484cd8dd71e0079d0d1bda2) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:55 +05:30
Alessio Balsini	248bfcfd3d	FROMLIST: fuse: 32-bit user space ioctl compat for fuse device With a 64-bit kernel build the FUSE device cannot handle ioctl requests coming from 32-bit user space. This is due to the ioctl command translation that generates different command identifiers that thus cannot be used for direct comparisons without proper manipulation. Explicitly extract type and number from the ioctl command to enable 32-bit user space compatibility on 64-bit kernel builds. Bug: 179164095 Link: https://lore.kernel.org/lkml/20210125153057.3623715-3-balsini@android.com/ Signed-off-by: Alessio Balsini <balsini@android.com> Change-Id: I595517c54d551be70e83c7fcb4b62397a3615004 Signed-off-by: Alessio Balsini <balsini@google.com> (cherry picked from commit af4048924e191bda0bb85b4bf127f22cf3c70fba) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:55 +05:30
Miklos Szeredi	f40391d56f	fuse: dir: Honor AT_STATX_DONT_SYNC The description of this flag says "Don't sync attributes with the server". In other words: always use the attributes cached in the kernel and don't send network or local messages to refresh the attributes. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> (cherry picked from commit ff1b89f389a8e64d0a583ce0b0308696f4ab5860) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:54 +05:30
Seth Forshee	cb2cc6e61b	fuse: Restrict allow_other to the superblock's namespace or a descendant Unprivileged users are normally restricted from mounting with the allow_other option by system policy, but this could be bypassed for a mount done with user namespace root permissions. In such cases allow_other should not allow users outside the userns to access the mount as doing so would give the unprivileged user the ability to manipulate processes it would otherwise be unable to manipulate. Restrict allow_other to apply to users in the same userns used at mount or a descendant of that namespace. Also export current_in_userns() for use by fuse when built as a module. Reviewed-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Dongsu Park <dongsu@kinvolk.io> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> (cherry picked from commit 73f03c2b4b527346778c711c2734dbff3442b139) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:54 +05:30
Eric W. Biederman	96a80d0ebb	fuse: Support fuse filesystems outside of init_user_ns In order to support mounts from namespaces other than init_user_ns, fuse must translate uids and gids to/from the userns of the process servicing requests on /dev/fuse. This patch does that, with a couple of restrictions on the namespace: - The userns for the fuse connection is fixed to the namespace from which /dev/fuse is opened. - The namespace must be the same as s_user_ns. These restrictions simplify the implementation by avoiding the need to pass around userns references and by allowing fuse to rely on the checks in setattr_prepare for ownership changes. Either restriction could be relaxed in the future if needed. For cuse the userns used is the opener of /dev/cuse. Semantically the cuse support does not appear safe for unprivileged users. Practically the permissions on /dev/cuse only make it accessible to the global root user. If something slips through the cracks in a user namespace the only users who will be able to use the cuse device are those users mapped into the user namespace. Translation in the posix acl is updated to use the uuser namespace of the filesystem. Avoiding cases which might bypass this translation is handled in a following change. This change is stronlgy based on a similar change from Seth Forshee and Dongsu Park. Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Dongsu Park <dongsu@kinvolk.io> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> (cherry picked from commit 8cb08329b0809453722bc12aa912be34355bcb66) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:54 +05:30
Eric W. Biederman	ae7c1521b5	fuse: Fail all requests with invalid uids or gids Upon a cursory examinination the uid and gid of a fuse request are necessary for correct operation. Failing a fuse request where those values are not reliable seems a straight forward and reliable means of ensuring that fuse requests with bad data are not sent or processed. In most cases the vfs will avoid actions it suspects will cause an inode write back of an inode with an invalid uid or gid. But that does not map precisely to what fuse is doing, so test for this and solve this at the fuse level as well. Performing this work in fuse_req_init_context is cheap as the code is already performing the translation here and only needs to check the result of the translation to see if things are not representable in a form the fuse server can handle. [SzM] Don't zero the context for the nofail case, just keep using the munging version (makes sense for debugging and doesn't hurt). Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> (cherry picked from commit c9582eb0ff7d2b560be60eafab29183882cdc82b) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:54 +05:30
Eric W. Biederman	f28dd40557	fuse: Remove the buggy retranslation of pids in fuse_dev_do_read At the point of fuse_dev_do_read the user space process that initiated the action on the fuse filesystem may no longer exist. The process have been killed or may have fired an asynchronous request and exited. If the initial process has exited, the code "pid_vnr(find_pid_ns(in->h.pid, fc->pid_ns)" will either return a pid of 0, or in the unlikely event that the pid has been reallocated it can return practically any pid. Any pid is possible as the pid allocator allocates pid numbers in different pid namespaces independently. The only way to make translation in fuse_dev_do_read reliable is to call get_pid in fuse_req_init_context, and pid_vnr followed by put_pid in fuse_dev_do_read. That reference counting in other contexts has been shown to bounce cache lines between processors and in general be slow. So that is not desirable. The only known user of running the fuse server in a different pid namespace from the filesystem does not care what the pids are in the fuse messages so removing this code should not matter. Getting the translation to a server running outside of the pid namespace of a container can still be achieved by playing setns games at mount time. It is also possible to add an option to pass a pid namespace into the fuse filesystem at mount time. Fixes: 5d6d3a301c4e ("fuse: allow server to run in different pid_ns") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> (cherry picked from commit dbf107b2a7f36fa635b40e0b554514f599c75b33) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:54 +05:30
Szymon Lukasz	c1236a421d	fuse: Return -ECONNABORTED on /dev/fuse read after abort Currently the userspace has no way of knowing whether the fuse connection ended because of umount or abort via sysfs. It makes it hard for filesystems to free the mountpoint after abort without worrying about removing some new mount. The patch fixes it by returning different errors when userspace reads from /dev/fuse (-ENODEV for umount and -ECONNABORTED for abort). Add a new capability flag FUSE_ABORT_ERROR. If set and the connection is gone because of sysfs abort, reading from the device will return -ECONNABORTED. Signed-off-by: Szymon Lukasz <noh4hss@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> (cherry picked from commit 3b7008b226f3de811d4ac34238e9cf670f7c9fe7) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:53 +05:30
Adam Manzanares	ee0c19576b	fs: Add aio iopriority support This is the per-I/O equivalent of the ioprio_set system call. When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field. This patch depends on block: add ioprio_check_cap function. Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit d9a08a9e616beeccdbd0e7262b7225ffdfa49e92) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:53 +05:30
Adam Manzanares	e9bab1a88e	fs: Convert kiocb rw_hint from enum to u16 In order to avoid kiocb bloat for per command iopriority support, rw_hint is converted from enum to a u16. Added a guard around ki_hint assignment. Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit fc28724d67c90ff48b976e0687caf79993160bed) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:53 +05:30
Christoph Hellwig	27275d0df5	fs: aio: Refactor read/write iocb setup Don't reference the kiocb structure from the common aio code, and move any use of it into helper specific to the read/write path. This is in preparation for aio_poll support that wants to use the space for different fields. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> (cherry picked from commit 54843f875f7a9f802bbb0d9895c3266b4a0b2f37) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:52 +05:30
Christoph Hellwig	976d474581	aio: Remove the extra get_file/fput pair in io_submit_one If we release the lockdep write protection token before calling into ->write_iter and thus never access the file pointer after an -EIOCBQUEUED return from ->write_iter or ->read_iter we don't need this extra reference. Signed-off-by: Christoph Hellwig <hch@lst.de> (cherry picked from commit 92ce4728563ad1fc42466f9bbecc1ac31d675894) Signed-off-by: alk3pInjection <webmaster@raspii.tech> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:39:52 +05:30
Adithya R	0a57010f20	Revert "proc: cmdline: Patch SafetyNet flags" * most roms do this in system/core and magisk hide does it as well, while in some roms breaks boot due to avb being enforced This reverts commit fb6704a8d07cf7ad9a46e4ecc4a9e94fbba8ca32. Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:34:33 +05:30
Yaroslav Furman	c8fe0232d8	xattr: Reduce the size of on-stack allocations 4kb on stack allocations are pretty unsafe. Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:33:28 +05:30
celtare21	90b59717cb	fs: f2fs: Set DEF_CP_INTERVAL to 200secs Signed-off-by: celtare21 <celtare21@gmail.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:31:33 +05:30
Jebaitedneko	2724d35e11	fs: pstore: Always execute ramoops_pstore_write() Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:31:31 +05:30
Tyler Nijmeh	a10d8080fb	fs: Reduce cache pressure We can more utilize our available RAM better like this. Signed-off-by: Tyler Nijmeh <tylernij@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:31:28 +05:30
Khazhismel Kumykov	371b06c93f	fs: ext4: cond_resched in work-heavy group loops Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:31:27 +05:30
Kees Cook	68766e5802	pstore/ram: Do not use stack VLA for parity workspace Instead of using a stack VLA for the parity workspace, preallocate a memory region. The preallocation is done to keep from needing to perform allocations during crash dump writing, etc. This also fixes a missed release of librs on free. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:21:37 +05:30
Kees Cook	81348fff97	ntfs: decompress: remove VLA usage In the quest to remove all stack VLA usage from the kernel[1], this moves the stack buffer used during decompression to be allocated externally. The existing "dest_max_index" used in the VLA is bounded by cb_max_page. cb_max_page is bounded by max_page, and max_page is bounded by nr_pages. Since nr_pages is used for the "pages" allocation, it can similarly be used for the "completed_pages" allocation and passed into the decompression function. The error paths are updated to free the new allocation. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Link: http://lkml.kernel.org/r/20180626172909.41453-3-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:21:35 +05:30
Dmitry Safonov	acf3c274d3	mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio As kernel expect to see only one of such mappings, any further operations on the VMA-copy may be unexpected by the kernel. Maybe it's being on the safe side, but there doesn't seem to be any expected use-case for this, so restrict it now. Link: https://lkml.kernel.org/r/20201013013416.390574-4-dima@arista.com Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()") Signed-off-by: Dmitry Safonov <dima@arista.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Alex Winkowski <dereference23@outlook.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:21:30 +05:30
Peter Collingbourne	6bcfe3cb84	mm: remove unnecessary wrapper function do_mmap_pgoff() The current split between do_mmap() and do_mmap_pgoff() was introduced in commit 1fcfd8db7f82 ("mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff()") to support MPX. The wrapper function do_mmap_pgoff() always passed 0 as the value of the vm_flags argument to do_mmap(). However, MPX support has subsequently been removed from the kernel and there were no more direct callers of do_mmap(); all calls were going via do_mmap_pgoff(). Simplify the code by removing do_mmap_pgoff() and changing all callers to directly call do_mmap(), which now no longer takes a vm_flags argument. Signed-off-by: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Link: http://lkml.kernel.org/r/20200727194109.1371462-1-pcc@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Alex Winkowski <dereference23@outlook.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:21:13 +05:30
Linus Torvalds	b36a01d3c1	mm: use helper functions for allocating and freeing vm_area structs The vm_area_struct is one of the most fundamental memory management objects, but the management of it is entirely open-coded evertwhere, ranging from allocation and freeing (using kmem_cache_[z]alloc and kmem_cache_free) to initializing all the fields. We want to unify this in order to end up having some unified initialization of the vmas, and the first step to this is to at least have basic allocation functions. Right now those functions are literally just wrappers around the kmem_cache_*() calls. This is a purely mechanical conversion: # new vma: kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc() # copy old vma kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old) # free vma kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma) to the point where the old vma passed in to the vm_area_dup() function isn't even used yet (because I've left all the old manual initialization alone). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Alex Winkowski <dereference23@outlook.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:21:02 +05:30
Arun KS	b0e4766681	mm: convert totalram_pages and totalhigh_pages variables to atomic totalram_pages and totalhigh_pages are made static inline function. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org Signed-off-by: Arun KS <arunks@codeaurora.org> Suggested-by: Michal Hocko <mhocko@suse.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Alex Winkowski <dereference23@outlook.com> Change-Id: Iad12311402dcdebc7804fc3e4866b67c25eb4d00 Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:20:57 +05:30
trautamaki	05df20b719	pstore: Dump ramoops even when kernel doesn't crash Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:20:48 +05:30
Sultan Alsawaf	8a9268e00b	mm: Eliminate d_path_outlen() and further speed up show_map_vma() d_path_outlen() isn't needed because we know that d_path() always populates the given buffer backwards starting from the last byte; with this, we can easily calculate the length of the generated string by using the returned pointer from d_path() and the size of the buffer given to d_path(). This eliminates the need for d_path_outlen() and removes the bizarre strlen() usage, which makes things simpler and faster. We also now avoid a memmove() when d_path() completely uses up its provided buffer. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:19:08 +05:30
Jens Axboe	4e5a138e49	buffer: eliminate the need to call free_more_memory() in __getblk_slow() Since the previous commit removed any case where grow_buffers() would return failure due to memory allocations, we can safely remove the case where we have to call free_more_memory() in this function. Since this is also the last user of free_more_memory(), kill it off completely. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:19:08 +05:30
Jens Axboe	efb8d821e6	buffer: grow_dev_page() should use __GFP_NOFAIL for all cases We currently use it for find_or_create_page(), which means that it cannot fail. Ensure we also pass in 'retry == true' to alloc_page_buffers(), which also ensure that it cannot fail. After this, there are no failure cases in grow_dev_page() that occur because of a failed memory allocation. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:19:07 +05:30
Jens Axboe	268070cc02	buffer: have alloc_page_buffers() use __GFP_NOFAIL Instead of adding weird retry logic in that function, utilize __GFP_NOFAIL to ensure that the vm takes care of handling any potential retries appropriately. This means we don't have to call free_more_memory() from here. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:19:07 +05:30
Sultan Alsawaf	27f9509f00	mm: Micro-optimize PID map reads for arm64 while retaining output format Android and various applications in Android need to read PID map data in order to work. Some processes can contain over 10,000 mappings, which results in lots of time wasted on simply generating strings. This wasted time adds up, especially in the case of Unity-based games, which utilize the Boehm garbage collector. A game's main process typically has well over 10,000 mappings due to the loaded textures, and the Boehm GC reads PID maps several times a second. This results in over 100,000 map entries being printed out per second, so micro-optimization here is important. Before this commit, show_vma_header_prefix() would typically take around 1000 ns to run on a Snapdragon 855; now it only takes about 50 ns to run, which is a 20x improvement. The primary micro-optimizations here assume that there are no more than 40 bits in the virtual address space, hence the CONFIG_ARM64_VA_BITS check. Arm64 uses a virtual address size of 39 bits, so this perfectly covers it. This also removes padding used to beautify PID map output to further speed up reads and reduce the amount of bytes printed, and optimizes the dentry path retrieval for file-backed mappings. Note, however, that the trailing space at the end of the line for non-file-backed mappings cannot be omitted, as it breaks some PID map parsers. This still retains insignificant leading zeros from printed hex values to maintain the current output format. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:18:46 +05:30
Al Viro	77d312d3e3	fs: eventpoll: Clean the failure exits up a bit commit 52c479697c9b73f628140dcdfcd39ea302d05482 upstream. Bug: 147802478 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: If479181d881c59c6d136299ba97e2cc850aa325c Signed-off-by: Forenche <prahul2003@gmail.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:17:28 +05:30
Marc Zyngier	dd521b94e4	epoll: Keep a reference on files added to the check list commit a9ed4a6560b8562b7e2e2bed9527e88001f7b682 upstream. When adding a new fd to an epoll, and that this new fd is an epoll fd itself, we recursively scan the fds attached to it to detect cycles, and add non-epool files to a "check list" that gets subsequently parsed. However, this check list isn't completely safe when deletions can happen concurrently. To sidestep the issue, make sure that a struct file placed on the check list sees its f_count increased, ensuring that a concurrent deletion won't result in the file disapearing from under our feet. Bug: 147802478 Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: Iee8a2e6770ccdf96898058a0a7d953ace080dae7 (cherry picked from commit c5dda0b69cf92399ce410cbb8cfdaf382e51dd6b) Signed-off-by: Forenche <prahul2003@gmail.com> commit 3dbf6600a559834da65e17c8cac075493bfd9fe7 Author: Al Viro <viro@zeniv.linux.org.uk> Date: Wed Sep 2 11:30:48 2020 -0400 fix regression in "epoll: Keep a reference on files added to the check list" [ Upstream commit 77f4689de17c0887775bb77896f4cc11a39bf848 ] epoll_loop_check_proc() can run into a file already committed to destruction; we can't grab a reference on those and don't need to add them to the set for reverse path check anyway. Bug: 147802478 Tested-by: Marc Zyngier <maz@kernel.org> Fixes: a9ed4a6560b8 ("epoll: Keep a reference on files added to the check list") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org> Change-Id: I541299a6325a6e9765add9e920cfa0203de9f4a0 Signed-off-by: Forenche <prahul2003@gmail.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:17:27 +05:30
laoyi	b4107647bb	fs: proc: Update perms of process_reclaim node Other userspace apps like AppCOmpaction would like to use this node, so update permission. Change-Id: Ied22bd6ad489bef4028cde943ac185d1354ab971 Signed-off-by: <laoyi@codeaurora.org> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:14:02 +05:30
John Dias	9b1cd81198	fs: Improve eventpoll logging to stop indicting timerfd timerfd doesn't create any wakelocks; eventpoll can, and is creating the wakelocks we see called "[timerfd]". eventpoll creates two kinds of wakelocks: a single top-level lock associated with the eventpoll fd itself, and one additional lock for each fd it is polling that needs such a lock (e.g. those using EPOLLWAKEUP). Current code names the per-fd locks using the undecorated names of the fds' associated files (hence "[timerfd]"), and is naming the top-level lock after the PID of the caller and the name of the file behind the first fd for which a per-fd lock is created. To make things clearer, the top-level lock is now named using the caller PID and an "epollfd" designation, while the per-fd locks are also named with the caller's PID (to associate them with the top-level lock) and their respective fds' file names. Port of fix already applied to previous 2 generations. Note that this set of changes does not fully solve the problem of eventpoll/timerfd wakelock attribution to the original process, since most activity is relayed through system_server, but it does at least ensure that different eventpoll wakelocks - and their stats - are properly disambiguated. Test: Ran on device and observed new wakelock naming in /d/wakeup_sources and (file naming in) lsof output. Bug: 116363986 Change-Id: I34bada5ddab04cf3830762c745f46bfcd1549cb8 Signed-off-by: John Dias <joaodias@google.com> Signed-off-by: Kelly Rossmoyer <krossmo@google.com> Signed-off-by: Miguel de Dios <migueldedios@google.com> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:14:02 +05:30
Adhitya Mohan	f591fea7a8	fs/fuse: shortcircuit: Make it compile Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:12:14 +05:30
LibXZR	5bdd63eaa2	fs/fuse: shortcircuit: Disable logging Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:12:14 +05:30
LibXZR	0b49172223	fs: fuse: Implement fuse short circuit * This significantly improves i/o performance under /sdcard * From OnePlus 8T Oxygen OS 11.0.8.11.KB05AA and OnePlus 8 Oxygen OS 11.0.5.5.IN21AA and OnePlus 8 Pro Oxygen OS 11.0.5.5.IN11AA RealJohnGalt: make proper Kconfig, add back dependencies from OnePlus source onto our CAF tree. Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:12:14 +05:30
Theodore Ts'o	d88efacc59	ext4: Improve smp scalability for inode generation ->s_next_generation is protected by s_next_gen_lock but its usage pattern is very primitive. We don't actually need sequentially increasing new generation numbers, so let's use prandom_u32() instead. Reported-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: kdrag0n <dragon@khronodragon.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:12:13 +05:30
Dave Kleikamp	715f4c034d	AIO: Don't plug the I/O queue in do_io_submit() Asynchronous I/O latency to a solid-state disk greatly increased between the 2.6.32 and 3.0 kernels. By removing the plug from do_io_submit(), we observed a 34% improvement in the I/O latency. Unfortunately, at this level, we don't know if the request is to a rotating disk or not. Change-Id: I7101df956473ed9fd5dcff18e473dd93b688a5c1 Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: linux-aio@kvack.org Cc: Chris Mason <chris.mason@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:12:04 +05:30
Park Ju Hyung	2d59edb069	kernfs: Use kmem_cache pool for struct kernfs_open_node/file These get allocated and freed millions of times on this kernel tree. Use a dedicated kmem_cache pool and avoid costly dynamic memory allocations. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:11:53 +05:30
Park Ju Hyung	be788d1d3d	sdcardfs: Use kmem_cache pool for struct sdcardfs_file_info These get allocated and freed millions of times on this kernel tree. Use a dedicated kmem_cache pool and avoid costly dynamic memory allocations. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:11:53 +05:30
Sayali Lokhande	b7d8d03984	f2fs: Avoid double lock for cp_rwsem during checkpoint There could be a scenario where f2fs_sync_node_pages gets called during checkpoint, which in turn tries to flush inline data and calls iput(). This results in deadlock as iput() tries to hold cp_rwsem, which is already held at the beginning by checkpoint->block_operations(). Call stack : Thread A Thread B f2fs_write_checkpoint() - block_operations(sbi) - f2fs_lock_all(sbi); - down_write(&sbi->cp_rwsem); - open() - igrab() - write() write inline data - unlink() - f2fs_sync_node_pages() - if (is_inline_node(page)) - flush_inline_data() - ilookup() page = f2fs_pagecache_get_page() if (!page) goto iput_out; iput_out: -close() -iput() iput(inode); - f2fs_evict_inode() - f2fs_truncate_blocks() - f2fs_lock_op() - down_read(&sbi->cp_rwsem); Change-Id: I048bbf42c0b11108e2444f4d9df5d58e7a779c3c Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Git-commit: 34c061ad85a2f5d5e9e3b045d72f3b211db6e282 Git-repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/ Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> Signed-off-by: celtare21 <celtare21@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:11:16 +05:30
Adithya R	13520f3f70	fs/ext4: inode: Remove an unused variable Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:10:29 +05:30
Sultan Alsawaf	c8ce6f4d68	mm: Perform PID map reads on the little CPU cluster PID map reads for processes with thousands of mappings can be done extensively by certain Android apps, burning through CPU time on higher-performance CPUs even though reading PID maps is never a performance-critical task. We can relieve the load on the important CPUs by moving PID map reads to little CPUs via sched_migrate_to_cpumask_*(). Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:10:10 +05:30
Alexey Dobriyan	f14405e529	proc: reject "." and ".." as filenames Various subsystems can create files and directories in /proc with names directly controlled by userspace. Which means "/", "." and ".." are no-no. "/" split is already taken care of, do the other 2 prohibited names. Link: http://lkml.kernel.org/r/20180310001223.GB12443@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Panchajanya1999 <rsk52959@gmail.com> Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live> (cherry picked from commit 175721be2e505649ee481a9cd2bf14b228d12f2b) Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com>	2022-04-02 13:09:15 +05:30

1 2 3 4 5 ...

54462 Commits