mirror of
https://github.com/rd-stuffs/msm-4.14.git
synced 2025-02-20 11:45:48 +08:00
717 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
19dea83889 |
nilfs2: fix possible out-of-bounds segment allocation in resize ioctl
commit fee5eaecca86afa544355569b831c1f90f334b85 upstream. Syzbot reports that in its stress test for resize ioctl, the log writing function nilfs_segctor_do_construct hits a WARN_ON in nilfs_segctor_truncate_segments(). It turned out that there is a problem with the current implementation of the resize ioctl, which changes the writable range on the device (the range of allocatable segments) at the end of the resize process. This order is necessary for file system expansion to avoid corrupting the superblock at trailing edge. However, in the case of a file system shrink, if log writes occur after truncating out-of-bounds trailing segments and before the resize is complete, segments may be allocated from the truncated space. The userspace resize tool was fine as it limits the range of allocatable segments before performing the resize, but it can run into this issue if the resize ioctl is called alone. Fix this issue by changing nilfs_sufile_resize() to update the range of allocatable segments immediately after successful truncation of segment space in case of file system shrink. Link: https://lkml.kernel.org/r/20230524094348.3784-1-konishi.ryusuke@gmail.com Fixes: 4e33f9eab07e ("nilfs2: implement resize ioctl") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+33494cd0df2ec2931851@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000005434c405fbbafdc5@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
ca91ab4bc3 |
nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()
commit 2f012f2baca140c488e43d27a374029c1e59098d upstream. A syzbot fault injection test reported that nilfs_btnode_create_block, a helper function that allocates a new node block for b-trees, causes a kernel BUG for disk images where the file system block size is smaller than the page size. This was due to unexpected flags on the newly allocated buffer head, and it turned out to be because the buffer flags were not cleared by nilfs_btnode_abort_change_key() after an error occurred during a b-tree update operation and the buffer was later reused in that state. Fix this issue by using nilfs_btnode_delete() to abandon the unused preallocated buffer in nilfs_btnode_abort_change_key(). Link: https://lkml.kernel.org/r/20230513102428.10223-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+b0a35a5c1f7e846d3b09@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000d1d6c205ebc4d512@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
f31e18131e |
nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()
commit 9b5a04ac3ad9898c4745cba46ea26de74ba56a8e upstream. During unmount process of nilfs2, nothing holds nilfs_root structure after nilfs2 detaches its writer in nilfs_detach_log_writer(). However, since nilfs_evict_inode() uses nilfs_root for some cleanup operations, it may cause use-after-free read if inodes are left in "garbage_list" and released by nilfs_dispose_list() at the end of nilfs_detach_log_writer(). Fix this issue by modifying nilfs_evict_inode() to only clear inode without additional metadata changes that use nilfs_root if the file system is degraded to read-only or the writer is detached. Link: https://lkml.kernel.org/r/20230509152956.8313-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+78d4495558999f55d1da@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000099e5ac05fb1c3b85@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
cfb0bb4fbd |
nilfs2: fix infinite loop in nilfs_mdt_get_block()
commit a6a491c048882e7e424d407d32cba0b52d9ef2bf upstream. If the disk image that nilfs2 mounts is corrupted and a virtual block address obtained by block lookup for a metadata file is invalid, nilfs_bmap_lookup_at_level() may return the same internal return code as -ENOENT, meaning the block does not exist in the metadata file. This duplication of return codes confuses nilfs_mdt_get_block(), causing it to read and create a metadata block indefinitely. In particular, if this happens to the inode metadata file, ifile, semaphore i_rwsem can be left held, causing task hangs in lock_mount. Fix this issue by making nilfs_bmap_lookup_at_level() treat virtual block address translation failures with -ENOENT as metadata corruption instead of returning the error code. Link: https://lkml.kernel.org/r/20230430193046.6769-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+221d75710bde87fa0e97@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=221d75710bde87fa0e97 Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
bd89073fc7 |
nilfs2: do not write dirty data after degenerating to read-only
commit 28a65b49eb53e172d23567005465019658bfdb4d upstream. According to syzbot's report, mark_buffer_dirty() called from nilfs_segctor_do_construct() outputs a warning with some patterns after nilfs2 detects metadata corruption and degrades to read-only mode. After such read-only degeneration, page cache data may be cleared through nilfs_clear_dirty_page() which may also clear the uptodate flag for their buffer heads. However, even after the degeneration, log writes are still performed by unmount processing etc., which causes mark_buffer_dirty() to be called for buffer heads without the "uptodate" flag and causes the warning. Since any writes should not be done to a read-only file system in the first place, this fixes the warning in mark_buffer_dirty() by letting nilfs_segctor_do_construct() abort early if in read-only mode. This also changes the retry check of nilfs_segctor_write_out() to avoid unnecessary log write retries if it detects -EROFS that nilfs_segctor_do_construct() returned. Link: https://lkml.kernel.org/r/20230427011526.13457-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+2af3bc9585be7f23f290@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=2af3bc9585be7f23f290 Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
46d328363e |
nilfs2: initialize unused bytes in segment summary blocks
commit ef832747a82dfbc22a3702219cc716f449b24e4a upstream. Syzbot still reports uninit-value in nilfs_add_checksums_on_logs() for KMSAN enabled kernels after applying commit 7397031622e0 ("nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field"). This is because the unused bytes at the end of each block in segment summaries are not initialized. So this fixes the issue by padding the unused bytes with null bytes. Link: https://lkml.kernel.org/r/20230417173513.12598-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+048585f3f4227bb2b49b@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b Cc: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
d20dcec8f3 |
nilfs2: fix sysfs interface lifetime
commit 42560f9c92cc43dce75dbf06cc0d840dced39b12 upstream. The current nilfs2 sysfs support has issues with the timing of creation and deletion of sysfs entries, potentially leading to null pointer dereferences, use-after-free, and lockdep warnings. Some of the sysfs attributes for nilfs2 per-filesystem instance refer to metadata file "cpfile", "sufile", or "dat", but nilfs_sysfs_create_device_group that creates those attributes is executed before the inodes for these metadata files are loaded, and nilfs_sysfs_delete_device_group which deletes these sysfs entries is called after releasing their metadata file inodes. Therefore, access to some of these sysfs attributes may occur outside of the lifetime of these metadata files, resulting in inode NULL pointer dereferences or use-after-free. In addition, the call to nilfs_sysfs_create_device_group() is made during the locking period of the semaphore "ns_sem" of nilfs object, so the shrinker call caused by the memory allocation for the sysfs entries, may derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in nilfs_evict_inode()". Since nilfs2 may acquire "ns_sem" deep in the call stack holding other locks via its error handler __nilfs_error(), this causes lockdep to report circular locking. This is a false positive and no circular locking actually occurs as no inodes exist yet when nilfs_sysfs_create_device_group() is called. Fortunately, the lockdep warnings can be resolved by simply moving the call to nilfs_sysfs_create_device_group() out of "ns_sem". This fixes these sysfs issues by revising where the device's sysfs interface is created/deleted and keeping its lifetime within the lifetime of the metadata files above. Link: https://lkml.kernel.org/r/20230330205515.6167-1-konishi.ryusuke@gmail.com Fixes: dd70edbde262 ("nilfs2: integrate sysfs support into driver") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+979fa7f9c0d086fdc282@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/0000000000003414b505f7885f7e@google.com Reported-by: syzbot+5b7d542076d9bddc3c6a@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/0000000000006ac86605f5f44eb9@google.com Cc: Viacheslav Dubeyko <slava@dubeyko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
034cce77d5 |
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
commit 6be49d100c22ffea3287a4b19d7639d259888e33 upstream. The finalization of nilfs_segctor_thread() can race with nilfs_segctor_kill_thread() which terminates that thread, potentially causing a use-after-free BUG as KASAN detected. At the end of nilfs_segctor_thread(), it assigns NULL to "sc_task" member of "struct nilfs_sc_info" to indicate the thread has finished, and then notifies nilfs_segctor_kill_thread() of this using waitqueue "sc_wait_task" on the struct nilfs_sc_info. However, here, immediately after the NULL assignment to "sc_task", it is possible that nilfs_segctor_kill_thread() will detect it and return to continue the deallocation, freeing the nilfs_sc_info structure before the thread does the notification. This fixes the issue by protecting the NULL assignment to "sc_task" and its notification, with spinlock "sc_state_lock" of the struct nilfs_sc_info. Since nilfs_segctor_kill_thread() does a final check to see if "sc_task" is NULL with "sc_state_lock" locked, this can eliminate the race. Link: https://lkml.kernel.org/r/20230327175318.8060-1-konishi.ryusuke@gmail.com Reported-by: syzbot+b08ebcc22f8f3e6be43a@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/00000000000000660d05f7dfa877@google.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
a94932381e |
nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy()
commit 003587000276f81d0114b5ce773d80c119d8cb30 upstream. The ioctl helper function nilfs_ioctl_wrap_copy(), which exchanges a metadata array to/from user space, may copy uninitialized buffer regions to user space memory for read-only ioctl commands NILFS_IOCTL_GET_SUINFO and NILFS_IOCTL_GET_CPINFO. This can occur when the element size of the user space metadata given by the v_size member of the argument nilfs_argv structure is larger than the size of the metadata element (nilfs_suinfo structure or nilfs_cpinfo structure) on the file system side. KMSAN-enabled kernels detect this issue as follows: BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_user+0xc0/0x100 lib/usercopy.c:33 instrument_copy_to_user include/linux/instrumented.h:121 [inline] _copy_to_user+0xc0/0x100 lib/usercopy.c:33 copy_to_user include/linux/uaccess.h:169 [inline] nilfs_ioctl_wrap_copy+0x6fa/0xc10 fs/nilfs2/ioctl.c:99 nilfs_ioctl_get_info fs/nilfs2/ioctl.c:1173 [inline] nilfs_ioctl+0x2402/0x4450 fs/nilfs2/ioctl.c:1290 nilfs_compat_ioctl+0x1b8/0x200 fs/nilfs2/ioctl.c:1343 __do_compat_sys_ioctl fs/ioctl.c:968 [inline] __se_compat_sys_ioctl+0x7dd/0x1000 fs/ioctl.c:910 __ia32_compat_sys_ioctl+0x93/0xd0 fs/ioctl.c:910 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82 Uninit was created at: __alloc_pages+0x9f6/0xe90 mm/page_alloc.c:5572 alloc_pages+0xab0/0xd80 mm/mempolicy.c:2287 __get_free_pages+0x34/0xc0 mm/page_alloc.c:5599 nilfs_ioctl_wrap_copy+0x223/0xc10 fs/nilfs2/ioctl.c:74 nilfs_ioctl_get_info fs/nilfs2/ioctl.c:1173 [inline] nilfs_ioctl+0x2402/0x4450 fs/nilfs2/ioctl.c:1290 nilfs_compat_ioctl+0x1b8/0x200 fs/nilfs2/ioctl.c:1343 __do_compat_sys_ioctl fs/ioctl.c:968 [inline] __se_compat_sys_ioctl+0x7dd/0x1000 fs/ioctl.c:910 __ia32_compat_sys_ioctl+0x93/0xd0 fs/ioctl.c:910 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82 Bytes 16-127 of 3968 are uninitialized ... This eliminates the leak issue by initializing the page allocated as buffer using get_zeroed_page(). Link: https://lkml.kernel.org/r/20230307085548.6290-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+132fdd2f1e1805fdc591@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/000000000000a5bd2d05f63f04ae@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
2f7a1135b2 |
nilfs2: fix underflow in second superblock position calculations
commit 99b9402a36f0799f25feee4465bfa4b8dfa74b4d upstream. Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second superblock, underflows when the argument device size is less than 4096 bytes. Therefore, when using this macro, it is necessary to check in advance that the device size is not less than a lower limit, or at least that underflow does not occur. The current nilfs2 implementation lacks this check, causing out-of-bound block access when mounting devices smaller than 4096 bytes: I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 NILFS (loop0): unable to read secondary superblock (blocksize = 1024) In addition, when trying to resize the filesystem to a size below 4096 bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number of segments to nilfs_sufile_resize(), corrupting parameters such as the number of segments in superblocks. This causes excessive loop iterations in nilfs_sufile_resize() during a subsequent resize ioctl, causing semaphore ns_segctor_sem to block for a long time and hang the writer thread: INFO: task segctord:5067 blocked for more than 143 seconds. Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:segctord state:D stack:23456 pid:5067 ppid:2 flags:0x00004000 Call Trace: <TASK> context_switch kernel/sched/core.c:5293 [inline] __schedule+0x1409/0x43f0 kernel/sched/core.c:6606 schedule+0xc3/0x190 kernel/sched/core.c:6682 rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190 nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline] nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570 kthread+0x270/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> ... Call Trace: <TASK> folio_mark_accessed+0x51c/0xf00 mm/swap.c:515 __nilfs_get_page_block fs/nilfs2/page.c:42 [inline] nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61 nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121 nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176 nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251 nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline] nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline] nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777 nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422 nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline] nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301 ... This fixes these issues by inserting appropriate minimum device size checks or anti-underflow checks, depending on where the macro is used. Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: <syzbot+f0c4082ce5ebebdac63b@syzkaller.appspotmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
3c2a2ff67d |
nilfs2: fix general protection fault in nilfs_btree_insert()
commit 7633355e5c7f29c049a9048e461427d1d8ed3051 upstream. If nilfs2 reads a corrupted disk image and tries to reads a b-tree node block by calling __nilfs_btree_get_block() against an invalid virtual block address, it returns -ENOENT because conversion of the virtual block address to a disk block address fails. However, this return value is the same as the internal code that b-tree lookup routines return to indicate that the block being searched does not exist, so functions that operate on that b-tree may misbehave. When nilfs_btree_insert() receives this spurious 'not found' code from nilfs_btree_do_lookup(), it misunderstands that the 'not found' check was successful and continues the insert operation using incomplete lookup path data, causing the following crash: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] ... RIP: 0010:nilfs_btree_get_nonroot_node fs/nilfs2/btree.c:418 [inline] RIP: 0010:nilfs_btree_prepare_insert fs/nilfs2/btree.c:1077 [inline] RIP: 0010:nilfs_btree_insert+0x6d3/0x1c10 fs/nilfs2/btree.c:1238 Code: bc 24 80 00 00 00 4c 89 f8 48 c1 e8 03 42 80 3c 28 00 74 08 4c 89 ff e8 4b 02 92 fe 4d 8b 3f 49 83 c7 28 4c 89 f8 48 c1 e8 03 <42> 80 3c 28 00 74 08 4c 89 ff e8 2e 02 92 fe 4d 8b 3f 49 83 c7 02 ... Call Trace: <TASK> nilfs_bmap_do_insert fs/nilfs2/bmap.c:121 [inline] nilfs_bmap_insert+0x20d/0x360 fs/nilfs2/bmap.c:147 nilfs_get_block+0x414/0x8d0 fs/nilfs2/inode.c:101 __block_write_begin_int+0x54c/0x1a80 fs/buffer.c:1991 __block_write_begin fs/buffer.c:2041 [inline] block_write_begin+0x93/0x1e0 fs/buffer.c:2102 nilfs_write_begin+0x9c/0x110 fs/nilfs2/inode.c:261 generic_perform_write+0x2e4/0x5e0 mm/filemap.c:3772 __generic_file_write_iter+0x176/0x400 mm/filemap.c:3900 generic_file_write_iter+0xab/0x310 mm/filemap.c:3932 call_write_iter include/linux/fs.h:2186 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x7dc/0xc50 fs/read_write.c:584 ksys_write+0x177/0x2a0 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd ... </TASK> This patch fixes the root cause of this problem by replacing the error code that __nilfs_btree_get_block() returns on block address conversion failure from -ENOENT to another internal code -EINVAL which means that the b-tree metadata is corrupted. By returning -EINVAL, it propagates without glitches, and for all relevant b-tree operations, functions in the upper bmap layer output an error message indicating corrupted b-tree metadata via nilfs_bmap_convert_error(), and code -EIO will be eventually returned as it should be. Link: https://lkml.kernel.org/r/000000000000bd89e205f0e38355@google.com Link: https://lkml.kernel.org/r/20230105055356.8811-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+ede796cecd5296353515@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
a6f89b1004 |
nilfs2: fix shift-out-of-bounds/overflow in nilfs_sb2_bad_offset()
[ Upstream commit 610a2a3d7d8be3537458a378ec69396a76c385b6 ] Patch series "nilfs2: fix UBSAN shift-out-of-bounds warnings on mount time". The first patch fixes a bug reported by syzbot, and the second one fixes the remaining bug of the same kind. Although they are triggered by the same super block data anomaly, I divided it into the above two because the details of the issues and how to fix it are different. Both are required to eliminate the shift-out-of-bounds issues at mount time. This patch (of 2): If the block size exponent information written in an on-disk superblock is corrupted, nilfs_sb2_bad_offset helper function can trigger shift-out-of-bounds warning followed by a kernel panic (if panic_on_warn is set): shift exponent 38983 is too large for 64-bit type 'unsigned long long' Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:151 [inline] __ubsan_handle_shift_out_of_bounds+0x33d/0x3b0 lib/ubsan.c:322 nilfs_sb2_bad_offset fs/nilfs2/the_nilfs.c:449 [inline] nilfs_load_super_block+0xdf5/0xe00 fs/nilfs2/the_nilfs.c:523 init_nilfs+0xb7/0x7d0 fs/nilfs2/the_nilfs.c:577 nilfs_fill_super+0xb1/0x5d0 fs/nilfs2/super.c:1047 nilfs_mount+0x613/0x9b0 fs/nilfs2/super.c:1317 ... In addition, since nilfs_sb2_bad_offset() performs multiplication without considering the upper bound, the computation may overflow if the disk layout parameters are not normal. This fixes these issues by inserting preliminary sanity checks for those parameters and by converting the comparison from one involving multiplication and left bit-shifting to one using division and right bit-shifting. Link: https://lkml.kernel.org/r/20221027044306.42774-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20221027044306.42774-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+e91619dd4c11c4960706@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
165c7a3b27 |
nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
commit f0a0ccda18d6fd826d7c7e7ad48a6ed61c20f8b4 upstream. Syzbot reported a null-ptr-deref bug: NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] CPU: 1 PID: 3603 Comm: segctord Not tainted 6.1.0-rc2-syzkaller-00105-gb229b6ca5abb #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 RIP: 0010:nilfs_palloc_commit_free_entry+0xe5/0x6b0 fs/nilfs2/alloc.c:608 Code: 00 00 00 00 fc ff df 80 3c 02 00 0f 85 cd 05 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 73 08 49 8d 7e 10 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 26 05 00 00 49 8b 46 10 be a6 00 00 00 48 c7 c7 RSP: 0018:ffffc90003dff830 EFLAGS: 00010212 RAX: dffffc0000000000 RBX: ffff88802594e218 RCX: 000000000000000d RDX: 0000000000000002 RSI: 0000000000002000 RDI: 0000000000000010 RBP: ffff888071880222 R08: 0000000000000005 R09: 000000000000003f R10: 000000000000000d R11: 0000000000000000 R12: ffff888071880158 R13: ffff88802594e220 R14: 0000000000000000 R15: 0000000000000004 FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb1c08316a8 CR3: 0000000018560000 CR4: 0000000000350ee0 Call Trace: <TASK> nilfs_dat_commit_free fs/nilfs2/dat.c:114 [inline] nilfs_dat_commit_end+0x464/0x5f0 fs/nilfs2/dat.c:193 nilfs_dat_commit_update+0x26/0x40 fs/nilfs2/dat.c:236 nilfs_btree_commit_update_v+0x87/0x4a0 fs/nilfs2/btree.c:1940 nilfs_btree_commit_propagate_v fs/nilfs2/btree.c:2016 [inline] nilfs_btree_propagate_v fs/nilfs2/btree.c:2046 [inline] nilfs_btree_propagate+0xa00/0xd60 fs/nilfs2/btree.c:2088 nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337 nilfs_collect_file_data+0x45/0xd0 fs/nilfs2/segment.c:568 nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1018 nilfs_segctor_scan_file+0x3f4/0x6f0 fs/nilfs2/segment.c:1067 nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1197 [inline] nilfs_segctor_collect fs/nilfs2/segment.c:1503 [inline] nilfs_segctor_do_construct+0x12fc/0x6af0 fs/nilfs2/segment.c:2045 nilfs_segctor_construct+0x8e3/0xb30 fs/nilfs2/segment.c:2379 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2487 [inline] nilfs_segctor_thread+0x3c3/0xf30 fs/nilfs2/segment.c:2570 kthread+0x2e4/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> ... If DAT metadata file is corrupted on disk, there is a case where req->pr_desc_bh is NULL and blocknr is 0 at nilfs_dat_commit_end() during a b-tree operation that cascadingly updates ancestor nodes of the b-tree, because nilfs_dat_commit_alloc() for a lower level block can initialize the blocknr on the same DAT entry between nilfs_dat_prepare_end() and nilfs_dat_commit_end(). If this happens, nilfs_dat_commit_end() calls nilfs_dat_commit_free() without valid buffer heads in req->pr_desc_bh and req->pr_bitmap_bh, and causes the NULL pointer dereference above in nilfs_palloc_commit_free_entry() function, which leads to a crash. Fix this by adding a NULL check on req->pr_desc_bh and req->pr_bitmap_bh before nilfs_palloc_commit_free_entry() in nilfs_dat_commit_free(). This also calls nilfs_error() in that case to notify that there is a fatal flaw in the filesystem metadata and prevent further operations. Link: https://lkml.kernel.org/r/00000000000097c20205ebaea3d6@google.com Link: https://lkml.kernel.org/r/20221114040441.1649940-1-zhangpeng362@huawei.com Link: https://lkml.kernel.org/r/20221119120542.17204-1-konishi.ryusuke@gmail.com Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+ebe05ee8e98f755f61d0@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
9a59a38cd3 |
nilfs2: fix nilfs_sufile_mark_dirty() not set segment usage as dirty
commit 512c5ca01a3610ab14ff6309db363de51f1c13a6 upstream. When extending segments, nilfs_sufile_alloc() is called to get an unassigned segment, then mark it as dirty to avoid accidentally allocating the same segment in the future. But for some special cases such as a corrupted image it can be unreliable. If such corruption of the dirty state of the segment occurs, nilfs2 may reallocate a segment that is in use and pick the same segment for writing twice at the same time. This will cause the problem reported by syzkaller: https://syzkaller.appspot.com/bug?id=c7c4748e11ffcc367cef04f76e02e931833cbd24 This case started with segbuf1.segnum = 3, nextnum = 4 when constructed. It supposed segment 4 has already been allocated and marked as dirty. However the dirty state was corrupted and segment 4 usage was not dirty. For the first time nilfs_segctor_extend_segments() segment 4 was allocated again, which made segbuf2 and next segbuf3 had same segment 4. sb_getblk() will get same bh for segbuf2 and segbuf3, and this bh is added to both buffer lists of two segbuf. It makes the lists broken which causes NULL pointer dereference. Fix the problem by setting usage as dirty every time in nilfs_sufile_mark_dirty(), which is called during constructing current segment to be written out and before allocating next segment. [chenzhongjin@huawei.com: add lock protection per Ryusuke] Link: https://lkml.kernel.org/r/20221121091141.214703-1-chenzhongjin@huawei.com Link: https://lkml.kernel.org/r/20221118063304.140187-1-chenzhongjin@huawei.com Fixes: 9ff05123e3bf ("nilfs2: segment constructor") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Reported-by: <syzbot+77e4f0...@syzkaller.appspotmail.com> Reported-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
39a3ed6827 |
nilfs2: fix use-after-free bug of ns_writer on remount
commit 8cccf05fe857a18ee26e20d11a8455a73ffd4efd upstream. If a nilfs2 filesystem is downgraded to read-only due to metadata corruption on disk and is remounted read/write, or if emergency read-only remount is performed, detaching a log writer and synchronizing the filesystem can be done at the same time. In these cases, use-after-free of the log writer (hereinafter nilfs->ns_writer) can happen as shown in the scenario below: Task1 Task2 -------------------------------- ------------------------------ nilfs_construct_segment nilfs_segctor_sync init_wait init_waitqueue_entry add_wait_queue schedule nilfs_remount (R/W remount case) nilfs_attach_log_writer nilfs_detach_log_writer nilfs_segctor_destroy kfree finish_wait _raw_spin_lock_irqsave __raw_spin_lock_irqsave do_raw_spin_lock debug_spin_lock_before <-- use-after-free While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1 waked up, Task1 accesses nilfs->ns_writer which is already freed. This scenario diagram is based on the Shigeru Yoshida's post [1]. This patch fixes the issue by not detaching nilfs->ns_writer on remount so that this UAF race doesn't happen. Along with this change, this patch also inserts a few necessary read-only checks with superblock instance where only the ns_writer pointer was used to check if the filesystem is read-only. Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df8b Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1] Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+f816fa82f8783f7a02bb@syzkaller.appspotmail.com Reported-by: Shigeru Yoshida <syoshida@redhat.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
8b4506cff6 |
nilfs2: fix deadlock in nilfs_count_free_blocks()
commit 8ac932a4921a96ca52f61935dbba64ea87bbd5dc upstream. A semaphore deadlock can occur if nilfs_get_block() detects metadata corruption while locating data blocks and a superblock writeback occurs at the same time: task 1 task 2 ------ ------ * A file operation * nilfs_truncate() nilfs_get_block() down_read(rwsem A) <-- nilfs_bmap_lookup_contig() ... generic_shutdown_super() nilfs_put_super() * Prepare to write superblock * down_write(rwsem B) <-- nilfs_cleanup_super() * Detect b-tree corruption * nilfs_set_log_cursor() nilfs_bmap_convert_error() nilfs_count_free_blocks() __nilfs_error() down_read(rwsem A) <-- nilfs_set_error() down_write(rwsem B) <-- *** DEADLOCK *** Here, nilfs_get_block() readlocks rwsem A (= NILFS_MDT(dat_inode)->mi_sem) and then calls nilfs_bmap_lookup_contig(), but if it fails due to metadata corruption, __nilfs_error() is called from nilfs_bmap_convert_error() inside the lock section. Since __nilfs_error() calls nilfs_set_error() unless the filesystem is read-only and nilfs_set_error() attempts to writelock rwsem B (= nilfs->ns_sem) to write back superblock exclusively, hierarchical lock acquisition occurs in the order rwsem A -> rwsem B. Now, if another task starts updating the superblock, it may writelock rwsem B during the lock sequence above, and can deadlock trying to readlock rwsem A in nilfs_count_free_blocks(). However, there is actually no need to take rwsem A in nilfs_count_free_blocks() because it, within the lock section, only reads a single integer data on a shared struct with nilfs_sufile_get_ncleansegs(). This has been the case after commit aa474a220180 ("nilfs2: add local variable to cache the number of clean segments"), that is, even before this bug was introduced. So, this resolves the deadlock problem by just not taking the semaphore in nilfs_count_free_blocks(). Link: https://lkml.kernel.org/r/20221029044912.9139-1-konishi.ryusuke@gmail.com Fixes: e828949e5b42 ("nilfs2: call nilfs_error inside bmap routines") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+45d6ce7b7ad7ef455d03@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> [2.6.38+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
4799a8c35e |
nilfs2: fix lockdep warnings during disk space reclamation
commit 6e211930f79aa45d422009a5f2e5467d2369ffe5 upstream. During disk space reclamation, nilfs2 still emits the following lockdep warning due to page/folio operations on shadowed page caches that nilfs2 uses to get a snapshot of DAT file in memory: WARNING: CPU: 0 PID: 2643 at include/linux/backing-dev.h:272 __folio_mark_dirty+0x645/0x670 ... RIP: 0010:__folio_mark_dirty+0x645/0x670 ... Call Trace: filemap_dirty_folio+0x74/0xd0 __set_page_dirty_nobuffers+0x85/0xb0 nilfs_copy_dirty_pages+0x288/0x510 [nilfs2] nilfs_mdt_save_to_shadow_map+0x50/0xe0 [nilfs2] nilfs_clean_segments+0xee/0x5d0 [nilfs2] nilfs_ioctl_clean_segments.isra.19+0xb08/0xf40 [nilfs2] nilfs_ioctl+0xc52/0xfb0 [nilfs2] __x64_sys_ioctl+0x11d/0x170 This fixes the remaining warning by using inode objects to hold those page caches. Link: https://lkml.kernel.org/r/1647867427-30498-3-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
6c3da8c0a3 |
nilfs2: fix lockdep warnings in page operations for btree nodes
commit e897be17a441fa637cd166fc3de1445131e57692 upstream. Patch series "nilfs2 lockdep warning fixes". The first two are to resolve the lockdep warning issue, and the last one is the accompanying cleanup and low priority. Based on your comment, this series solves the issue by separating inode object as needed. Since I was worried about the impact of the object composition changes, I tested the series carefully not to cause regressions especially for delicate functions such like disk space reclamation and snapshots. This patch (of 3): If CONFIG_LOCKDEP is enabled, nilfs2 hits lockdep warnings at inode_to_wb() during page/folio operations for btree nodes: WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 inode_to_wb include/linux/backing-dev.h:269 [inline] WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 folio_account_dirtied mm/page-writeback.c:2460 [inline] WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 __folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509 Modules linked in: ... RIP: 0010:inode_to_wb include/linux/backing-dev.h:269 [inline] RIP: 0010:folio_account_dirtied mm/page-writeback.c:2460 [inline] RIP: 0010:__folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509 ... Call Trace: __set_page_dirty include/linux/pagemap.h:834 [inline] mark_buffer_dirty+0x4e6/0x650 fs/buffer.c:1145 nilfs_btree_propagate_p fs/nilfs2/btree.c:1889 [inline] nilfs_btree_propagate+0x4ae/0xea0 fs/nilfs2/btree.c:2085 nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337 nilfs_collect_dat_data+0x45/0xd0 fs/nilfs2/segment.c:625 nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1009 nilfs_segctor_scan_file+0x47a/0x700 fs/nilfs2/segment.c:1048 nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1224 [inline] nilfs_segctor_collect fs/nilfs2/segment.c:1494 [inline] nilfs_segctor_do_construct+0x14f3/0x6c60 fs/nilfs2/segment.c:2036 nilfs_segctor_construct+0x7a7/0xb30 fs/nilfs2/segment.c:2372 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2480 [inline] nilfs_segctor_thread+0x3c3/0xf90 fs/nilfs2/segment.c:2563 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 This is because nilfs2 uses two page caches for each inode and inode->i_mapping never points to one of them, the btree node cache. This causes inode_to_wb(inode) to refer to a different page cache than the caller page/folio operations such like __folio_start_writeback(), __folio_end_writeback(), or __folio_mark_dirty() acquired the lock. This patch resolves the issue by allocating and using an additional inode to hold the page cache of btree nodes. The inode is attached one-to-one to the traditional nilfs2 inode if it requires a block mapping with b-tree. This setup change is in memory only and does not affect the disk format. Link: https://lkml.kernel.org/r/1647867427-30498-1-git-send-email-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/1647867427-30498-2-git-send-email-konishi.ryusuke@gmail.com Link: https://lore.kernel.org/r/YXrYvIo8YRnAOJCj@casper.infradead.org Link: https://lore.kernel.org/r/9a20b33d-b38f-b4a2-4742-c1eb5b8e4d6c@redhat.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+0d5b462a6f07447991b3@syzkaller.appspotmail.com Reported-by: syzbot+34ef28bb2aeb28724aa0@syzkaller.appspotmail.com Reported-by: Hao Sun <sunhao.th@gmail.com> Reported-by: David Hildenbrand <david@redhat.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
26b9b66610 |
nilfs2: fix use-after-free bug of struct nilfs_root
commit d325dc6eb763c10f591c239550b8c7e5466a5d09 upstream. If the beginning of the inode bitmap area is corrupted on disk, an inode with the same inode number as the root inode can be allocated and fail soon after. In this case, the subsequent call to nilfs_clear_inode() on that bogus root inode will wrongly decrement the reference counter of struct nilfs_root, and this will erroneously free struct nilfs_root, causing kernel oopses. This fixes the problem by changing nilfs_new_inode() to skip reserved inode numbers while repairing the inode bitmap. Link: https://lkml.kernel.org/r/20221003150519.39789-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+b8c672b0e22615c80fe0@syzkaller.appspotmail.com Reported-by: Khalid Masum <khalid.masum.92@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
ae16440c44 |
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
commit 723ac751208f6d6540191689cfbf6c77135a7a1b upstream. If creation or finalization of a checkpoint fails due to anomalies in the checkpoint metadata on disk, a kernel warning is generated. This patch replaces the WARN_ONs by nilfs_error, so that a kernel, booted with panic_on_warn, does not panic. A nilfs_error is appropriate here to handle the abnormal filesystem condition. This also replaces the detected error codes with an I/O error so that neither of the internal error codes is returned to callers. Link: https://lkml.kernel.org/r/20220929123330.19658-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+fbb3e0b24e8dae5a16ee@syzkaller.appspotmail.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
a832de79d8 |
nilfs2: fix leak of nilfs_root in case of writer thread creation failure
commit d0d51a97063db4704a5ef6bc978dddab1636a306 upstream. If nilfs_attach_log_writer() failed to create a log writer thread, it frees a data structure of the log writer without any cleanup. After commit e912a5b66837 ("nilfs2: use root object to get ifile"), this causes a leak of struct nilfs_root, which started to leak an ifile metadata inode and a kobject on that struct. In addition, if the kernel is booted with panic_on_warn, the above ifile metadata inode leak will cause the following panic when the nilfs2 kernel module is removed: kmem_cache_destroy nilfs2_inode_cache: Slab cache still has objects when called from nilfs_destroy_cachep+0x16/0x3a [nilfs2] WARNING: CPU: 8 PID: 1464 at mm/slab_common.c:494 kmem_cache_destroy+0x138/0x140 ... RIP: 0010:kmem_cache_destroy+0x138/0x140 Code: 00 20 00 00 e8 a9 55 d8 ff e9 76 ff ff ff 48 8b 53 60 48 c7 c6 20 70 65 86 48 c7 c7 d8 69 9c 86 48 8b 4c 24 28 e8 ef 71 c7 00 <0f> 0b e9 53 ff ff ff c3 48 81 ff ff 0f 00 00 77 03 31 c0 c3 53 48 ... Call Trace: <TASK> ? nilfs_palloc_freev.cold.24+0x58/0x58 [nilfs2] nilfs_destroy_cachep+0x16/0x3a [nilfs2] exit_nilfs_fs+0xa/0x1b [nilfs2] __x64_sys_delete_module+0x1d9/0x3a0 ? __sanitizer_cov_trace_pc+0x1a/0x50 ? syscall_trace_enter.isra.19+0x119/0x190 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd ... </TASK> Kernel panic - not syncing: panic_on_warn set ... This patch fixes these issues by calling nilfs_detach_log_writer() cleanup function if spawning the log writer thread fails. Link: https://lkml.kernel.org/r/20221007085226.57667-1-konishi.ryusuke@gmail.com Fixes: e912a5b66837 ("nilfs2: use root object to get ifile") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+7381dc4ad60658ca4c05@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
1ce68de30b |
nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
commit 21a87d88c2253350e115029f14fe2a10a7e6c856 upstream. If the i_mode field in inode of metadata files is corrupted on disk, it can cause the initialization of bmap structure, which should have been called from nilfs_read_inode_common(), not to be called. This causes a lockdep warning followed by a NULL pointer dereference at nilfs_bmap_lookup_at_level(). This patch fixes these issues by adding a missing sanitiy check for the i_mode field of metadata file's inode. Link: https://lkml.kernel.org/r/20221002030804.29978-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+2b32eb36c1a825b7a74c@syzkaller.appspotmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
e1692c9f82 |
nilfs2: fix incorrect masking of permission flags for symlinks
commit 5924e6ec1585445f251ea92713eb15beb732622a upstream. The permission flags of newly created symlinks are wrongly dropped on nilfs2 with the current umask value even though symlinks should have 777 (rwxrwxrwx) permissions: $ umask 0022 $ touch file && ln -s file symlink; ls -l file symlink -rw-r--r--. 1 root root 0 Jun 23 16:29 file lrwxr-xr-x. 1 root root 4 Jun 23 16:29 symlink -> file This fixes the bug by inserting a missing check that excludes symlinks. Link: https://lkml.kernel.org/r/1655974441-5612-1-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: Tommy Pettersson <ptp@lysator.liu.se> Reported-by: Ciprian Craciun <ciprian.craciun@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
f6686b79fe |
nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
[ Upstream commit 17243e1c3072b8417a5ebfc53065d0a87af7ca77 ] kobject_put() should be used to cleanup the memory associated with the kobject instead of kobject_del(). See the section "Kobject removal" of "Documentation/core-api/kobject.rst". Link: https://lkml.kernel.org/r/20210629022556.3985106-7-sunnanyong@huawei.com Link: https://lkml.kernel.org/r/1625651306-10829-7-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
c97b3f5b48 |
nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
[ Upstream commit b2fe39c248f3fa4bbb2a20759b4fdd83504190f7 ] If kobject_init_and_add returns with error, kobject_put() is needed here to avoid memory leak, because kobject_init_and_add may return error without freeing the memory associated with the kobject it allocated. Link: https://lkml.kernel.org/r/20210629022556.3985106-6-sunnanyong@huawei.com Link: https://lkml.kernel.org/r/1625651306-10829-6-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
ba2a1f0dd1 |
nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
[ Upstream commit a3e181259ddd61fd378390977a1e4e2316853afa ] The kobject_put() should be used to cleanup the memory associated with the kobject instead of kobject_del. See the section "Kobject removal" of "Documentation/core-api/kobject.rst". Link: https://lkml.kernel.org/r/20210629022556.3985106-5-sunnanyong@huawei.com Link: https://lkml.kernel.org/r/1625651306-10829-5-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
796cfc16ff |
nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
[ Upstream commit 24f8cb1ed057c840728167dab33b32e44147c86f ] If kobject_init_and_add return with error, kobject_put() is needed here to avoid memory leak, because kobject_init_and_add may return error without freeing the memory associated with the kobject it allocated. Link: https://lkml.kernel.org/r/20210629022556.3985106-4-sunnanyong@huawei.com Link: https://lkml.kernel.org/r/1625651306-10829-4-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
b5b374ef1f |
nilfs2: fix NULL pointer in nilfs_##name##_attr_release
[ Upstream commit dbc6e7d44a514f231a64d9d5676e001b660b6448 ] In nilfs_##name##_attr_release, kobj->parent should not be referenced because it is a NULL pointer. The release() method of kobject is always called in kobject_put(kobj), in the implementation of kobject_put(), the kobj->parent will be assigned as NULL before call the release() method. So just use kobj to get the subgroups, which is more efficient and can fix a NULL pointer reference problem. Link: https://lkml.kernel.org/r/20210629022556.3985106-3-sunnanyong@huawei.com Link: https://lkml.kernel.org/r/1625651306-10829-3-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
f6c813ceed |
nilfs2: fix memory leak in nilfs_sysfs_create_device_group
[ Upstream commit 5f5dec07aca7067216ed4c1342e464e7307a9197 ] Patch series "nilfs2: fix incorrect usage of kobject". This patchset from Nanyong Sun fixes memory leak issues and a NULL pointer dereference issue caused by incorrect usage of kboject in nilfs2 sysfs implementation. This patch (of 6): Reported by syzkaller: BUG: memory leak unreferenced object 0xffff888100ca8988 (size 8): comm "syz-executor.1", pid 1930, jiffies 4294745569 (age 18.052s) hex dump (first 8 bytes): 6c 6f 6f 70 31 00 ff ff loop1... backtrace: kstrdup+0x36/0x70 mm/util.c:60 kstrdup_const+0x35/0x60 mm/util.c:83 kvasprintf_const+0xf1/0x180 lib/kasprintf.c:48 kobject_set_name_vargs+0x56/0x150 lib/kobject.c:289 kobject_add_varg lib/kobject.c:384 [inline] kobject_init_and_add+0xc9/0x150 lib/kobject.c:473 nilfs_sysfs_create_device_group+0x150/0x7d0 fs/nilfs2/sysfs.c:986 init_nilfs+0xa21/0xea0 fs/nilfs2/the_nilfs.c:637 nilfs_fill_super fs/nilfs2/super.c:1046 [inline] nilfs_mount+0x7b4/0xe80 fs/nilfs2/super.c:1316 legacy_get_tree+0x105/0x210 fs/fs_context.c:592 vfs_get_tree+0x8e/0x2d0 fs/super.c:1498 do_new_mount fs/namespace.c:2905 [inline] path_mount+0xf9b/0x1990 fs/namespace.c:3235 do_mount+0xea/0x100 fs/namespace.c:3248 __do_sys_mount fs/namespace.c:3456 [inline] __se_sys_mount fs/namespace.c:3433 [inline] __x64_sys_mount+0x14b/0x1f0 fs/namespace.c:3433 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae If kobject_init_and_add return with error, then the cleanup of kobject is needed because memory may be allocated in kobject_init_and_add without freeing. And the place of cleanup_dev_kobject should use kobject_put to free the memory associated with the kobject. As the section "Kobject removal" of "Documentation/core-api/kobject.rst" says, kobject_del() just makes the kobject "invisible", but it is not cleaned up. And no more cleanup will do after cleanup_dev_kobject, so kobject_put is needed here. Link: https://lkml.kernel.org/r/1625651306-10829-1-git-send-email-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/1625651306-10829-2-git-send-email-konishi.ryusuke@gmail.com Reported-by: Hulk Robot <hulkci@huawei.com> Link: https://lkml.kernel.org/r/20210629022556.3985106-2-sunnanyong@huawei.com Signed-off-by: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
49e3def271 |
nilfs2: fix memory leak in nilfs_sysfs_delete_device_group
[ Upstream commit 8fd0c1b0647a6bda4067ee0cd61e8395954b6f28 ] My local syzbot instance hit memory leak in nilfs2. The problem was in missing kobject_put() in nilfs_sysfs_delete_device_group(). kobject_del() does not call kobject_cleanup() for passed kobject and it leads to leaking duped kobject name if kobject_put() was not called. Fail log: BUG: memory leak unreferenced object 0xffff8880596171e0 (size 8): comm "syz-executor379", pid 8381, jiffies 4294980258 (age 21.100s) hex dump (first 8 bytes): 6c 6f 6f 70 30 00 00 00 loop0... backtrace: kstrdup+0x36/0x70 mm/util.c:60 kstrdup_const+0x53/0x80 mm/util.c:83 kvasprintf_const+0x108/0x190 lib/kasprintf.c:48 kobject_set_name_vargs+0x56/0x150 lib/kobject.c:289 kobject_add_varg lib/kobject.c:384 [inline] kobject_init_and_add+0xc9/0x160 lib/kobject.c:473 nilfs_sysfs_create_device_group+0x150/0x800 fs/nilfs2/sysfs.c:999 init_nilfs+0xe26/0x12b0 fs/nilfs2/the_nilfs.c:637 Link: https://lkml.kernel.org/r/20210612140559.20022-1-paskripkin@gmail.com Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
5402e2d021 |
nilfs2: fix null pointer dereference at nilfs_segctor_do_construct()
commit 8301c719a2bd131436438e49130ee381d30933f5 upstream. After commit c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping has no dirty pages"), the following null pointer dereference has been reported on nilfs2: BUG: kernel NULL pointer dereference, address: 00000000000000a8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... RIP: 0010:percpu_counter_add_batch+0xa/0x60 ... Call Trace: __test_set_page_writeback+0x2d3/0x330 nilfs_segctor_do_construct+0x10d3/0x2110 [nilfs2] nilfs_segctor_construct+0x168/0x260 [nilfs2] nilfs_segctor_thread+0x127/0x3b0 [nilfs2] kthread+0xf8/0x130 ... This crash turned out to be caused by set_page_writeback() call for segment summary buffers at nilfs_segctor_prepare_write(). set_page_writeback() can call inc_wb_stat(inode_to_wb(inode), WB_WRITEBACK) where inode_to_wb(inode) is NULL if the inode of underlying block device does not have an associated wb. This fixes the issue by calling inode_attach_wb() in advance to ensure to associate the bdev inode with its wb. Fixes: c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping has no dirty pages") Reported-by: Walton Hoops <me@waltonhoops.com> Reported-by: Tomas Hlavaty <tom@logand.com> Reported-by: ARAI Shun-ichi <hermes@ceres.dti.ne.jp> Reported-by: Hideki EIRAKU <hdk1983@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> [5.4+] Link: http://lkml.kernel.org/r/20200608.011819.1399059588922299158.konishi.ryusuke@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
f440ea85d4 |
do d_instantiate/unlock_new_inode combinations safely
commit 1e2e547a93a00ebc21582c06ca3c6cfea2a309ee upstream. For anything NFS-exported we do _not_ want to unlock new inode before it has grown an alias; original set of fixes got the ordering right, but missed the nasty complication in case of lockdep being enabled - unlock_new_inode() does lockdep_annotate_inode_mutex_key(inode) which can only be done before anyone gets a chance to touch ->i_mutex. Unfortunately, flipping the order and doing unlock_new_inode() before d_instantiate() opens a window when mkdir can race with open-by-fhandle on a guessed fhandle, leading to multiple aliases for a directory inode and all the breakage that follows from that. Correct solution: a new primitive (d_instantiate_new()) combining these two in the right order - lockdep annotate, then d_instantiate(), then the rest of unlock_new_inode(). All combinations of d_instantiate() with unlock_new_inode() should be converted to that. Cc: stable@kernel.org # 2.6.29 and later Tested-by: Mike Marshall <hubcap@omnibond.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
f94782668b |
nilfs2: fix race condition that causes file system corruption
commit 31ccb1f7ba3cfe29631587d451cf5bb8ab593550 upstream. There is a race condition between nilfs_dirty_inode() and nilfs_set_file_dirty(). When a file is opened, nilfs_dirty_inode() is called to update the access timestamp in the inode. It calls __nilfs_mark_inode_dirty() in a separate transaction. __nilfs_mark_inode_dirty() caches the ifile buffer_head in the i_bh field of the inode info structure and marks it as dirty. After some data was written to the file in another transaction, the function nilfs_set_file_dirty() is called, which adds the inode to the ns_dirty_files list. Then the segment construction calls nilfs_segctor_collect_dirty_files(), which goes through the ns_dirty_files list and checks the i_bh field. If there is a cached buffer_head in i_bh it is not marked as dirty again. Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate transactions, it is possible that a segment construction that writes out the ifile occurs in-between the two. If this happens the inode is not on the ns_dirty_files list, but its ifile block is still marked as dirty and written out. In the next segment construction, the data for the file is written out and nilfs_bmap_propagate() updates the b-tree. Eventually the bmap root is written into the i_bh block, which is not dirty, because it was written out in another segment construction. As a result the bmap update can be lost, which leads to file system corruption. Either the virtual block address points to an unallocated DAT block, or the DAT entry will be reused for something different. The error can remain undetected for a long time. A typical error message would be one of the "bad btree" errors or a warning that a DAT entry could not be found. This bug can be reproduced reliably by a simple benchmark that creates and overwrites millions of 4k files. Link: http://lkml.kernel.org/r/1509367935-3086-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Andreas Rohner <andreas.rohner@gmx.net> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
b24413180f |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
0f0d12728e |
Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount flag updates from Al Viro: "Another chunk of fmount preparations from dhowells; only trivial conflicts for that part. It separates MS_... bits (very grotty mount(2) ABI) from the struct super_block ->s_flags (kernel-internal, only a small subset of MS_... stuff). This does *not* convert the filesystems to new constants; only the infrastructure is done here. The next step in that series is where the conflicts would be; that's the conversion of filesystems. It's purely mechanical and it's better done after the merge, so if you could run something like list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$') sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \ -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \ -e 's/\<MS_NODEV\>/SB_NODEV/g' \ -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \ -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \ -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \ -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \ -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \ -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \ -e 's/\<MS_SILENT\>/SB_SILENT/g' \ -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \ -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \ -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \ -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \ $list and commit it with something along the lines of 'convert filesystems away from use of MS_... constants' as commit message, it would save a quite a bit of headache next cycle" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: Differentiate mount flags (MS_*) from internal superblock flags VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags |
||
|
a0725ab0c7 |
Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe: "This is the first pull request for 4.14, containing most of the code changes. It's a quiet series this round, which I think we needed after the churn of the last few series. This contains: - Fix for a registration race in loop, from Anton Volkov. - Overflow complaint fix from Arnd for DAC960. - Series of drbd changes from the usual suspects. - Conversion of the stec/skd driver to blk-mq. From Bart. - A few BFQ improvements/fixes from Paolo. - CFQ improvement from Ritesh, allowing idling for group idle. - A few fixes found by Dan's smatch, courtesy of Dan. - A warning fixup for a race between changing the IO scheduler and device remova. From David Jeffery. - A few nbd fixes from Josef. - Support for cgroup info in blktrace, from Shaohua. - Also from Shaohua, new features in the null_blk driver to allow it to actually hold data, among other things. - Various corner cases and error handling fixes from Weiping Zhang. - Improvements to the IO stats tracking for blk-mq from me. Can drastically improve performance for fast devices and/or big machines. - Series from Christoph removing bi_bdev as being needed for IO submission, in preparation for nvme multipathing code. - Series from Bart, including various cleanups and fixes for switch fall through case complaints" * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits) kernfs: checking for IS_ERR() instead of NULL drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set drbd: Fix allyesconfig build, fix recent commit drbd: switch from kmalloc() to kmalloc_array() drbd: abort drbd_start_resync if there is no connection drbd: move global variables to drbd namespace and make some static drbd: rename "usermode_helper" to "drbd_usermode_helper" drbd: fix race between handshake and admin disconnect/down drbd: fix potential deadlock when trying to detach during handshake drbd: A single dot should be put into a sequence. drbd: fix rmmod cleanup, remove _all_ debugfs entries drbd: Use setup_timer() instead of init_timer() to simplify the code. drbd: fix potential get_ldev/put_ldev refcount imbalance during attach drbd: new disk-option disable-write-same drbd: Fix resource role for newly created resources in events2 drbd: mark symbols static where possible drbd: Send P_NEG_ACK upon write error in protocol != C drbd: add explicit plugging when submitting batches drbd: change list_for_each_safe to while(list_first_entry_or_null) drbd: introduce drbd_recv_header_maybe_unplug ... |
||
|
397162ffa2 |
mm: remove nr_pages argument from pagevec_lookup{,_range}()
All users of pagevec_lookup() and pagevec_lookup_range() now pass PAGEVEC_SIZE as a desired number of pages. Just drop the argument. Link: http://lkml.kernel.org/r/20170726114704.7626-11-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
d72dc8a25a |
mm: make pagevec_lookup() update index
Make pagevec_lookup() (and underlying find_get_pages()) update index to the next page where iteration should continue. Most callers want this and also pagevec_lookup_tag() already does this. Link: http://lkml.kernel.org/r/20170726114704.7626-3-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
74d46992e0 |
block: replace bi_bdev with a gendisk pointer and partitions index
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
bc98a42c1f |
VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
Firstly by applying the following with coccinelle's spatch: @@ expression SB; @@ -SB->s_flags & MS_RDONLY +sb_rdonly(SB) to effect the conversion to sb_rdonly(sb), then by applying: @@ expression A, SB; @@ ( -(!sb_rdonly(SB)) && A +!sb_rdonly(SB) && A | -A != (sb_rdonly(SB)) +A != sb_rdonly(SB) | -A == (sb_rdonly(SB)) +A == sb_rdonly(SB) | -!(sb_rdonly(SB)) +!sb_rdonly(SB) | -A && (sb_rdonly(SB)) +A && sb_rdonly(SB) | -A || (sb_rdonly(SB)) +A || sb_rdonly(SB) | -(sb_rdonly(SB)) != A +sb_rdonly(SB) != A | -(sb_rdonly(SB)) == A +sb_rdonly(SB) == A | -(sb_rdonly(SB)) && A +sb_rdonly(SB) && A | -(sb_rdonly(SB)) || A +sb_rdonly(SB) || A ) @@ expression A, B, SB; @@ ( -(sb_rdonly(SB)) ? 1 : 0 +sb_rdonly(SB) | -(sb_rdonly(SB)) ? A : B +sb_rdonly(SB) ? A : B ) to remove left over excess bracketage and finally by applying: @@ expression A, SB; @@ ( -(A & MS_RDONLY) != sb_rdonly(SB) +(bool)(A & MS_RDONLY) != sb_rdonly(SB) | -(A & MS_RDONLY) == sb_rdonly(SB) +(bool)(A & MS_RDONLY) == sb_rdonly(SB) ) to make comparisons against the result of sb_rdonly() (which is a bool) work correctly. Signed-off-by: David Howells <dhowells@redhat.com> |
||
|
9bd42183b9 |
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Add the SYSTEM_SCHEDULING bootup state to move various scheduler debug checks earlier into the bootup. This turns silent and sporadically deadly bugs into nice, deterministic splats. Fix some of the splats that triggered. (Thomas Gleixner) - A round of restructuring and refactoring of the load-balancing and topology code (Peter Zijlstra) - Another round of consolidating ~20 of incremental scheduler code history: this time in terms of wait-queue nomenclature. (I didn't get much feedback on these renaming patches, and we can still easily change any names I might have misplaced, so if anyone hates a new name, please holler and I'll fix it.) (Ingo Molnar) - sched/numa improvements, fixes and updates (Rik van Riel) - Another round of x86/tsc scheduler clock code improvements, in hope of making it more robust (Peter Zijlstra) - Improve NOHZ behavior (Frederic Weisbecker) - Deadline scheduler improvements and fixes (Luca Abeni, Daniel Bristot de Oliveira) - Simplify and optimize the topology setup code (Lauro Ramos Venancio) - Debloat and decouple scheduler code some more (Nicolas Pitre) - Simplify code by making better use of llist primitives (Byungchul Park) - ... plus other fixes and improvements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) sched/cputime: Refactor the cputime_adjust() code sched/debug: Expose the number of RT/DL tasks that can migrate sched/numa: Hide numa_wake_affine() from UP build sched/fair: Remove effective_load() sched/numa: Implement NUMA node level wake_affine() sched/fair: Simplify wake_affine() for the single socket case sched/numa: Override part of migrate_degrades_locality() when idle balancing sched/rt: Move RT related code from sched/core.c to sched/rt.c sched/deadline: Move DL related code from sched/core.c to sched/deadline.c sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled sched/fair: Spare idle load balancing on nohz_full CPUs nohz: Move idle balancer registration to the idle path sched/loadavg: Generalize "_idle" naming to "_nohz" sched/core: Drop the unused try_get_task_struct() helper function sched/fair: WARN() and refuse to set buddy when !se->on_rq sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h> ... |
||
|
2055da9738 |
sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming
So I've noticed a number of instances where it was not obvious from the code whether ->task_list was for a wait-queue head or a wait-queue entry. Furthermore, there's a number of wait-queue users where the lists are not for 'tasks' but other entities (poll tables, etc.), in which case the 'task_list' name is actively confusing. To clear this all up, name the wait-queue head and entry list structure fields unambiguously: struct wait_queue_head::task_list => ::head struct wait_queue_entry::task_list => ::entry For example, this code: rqw->wait.task_list.next != &wait->task_list ... is was pretty unclear (to me) what it's doing, while now it's written this way: rqw->wait.head.next != &wait->entry ... which makes it pretty clear that we are iterating a list until we see the head. Other examples are: list_for_each_entry_safe(pos, next, &x->task_list, task_list) { list_for_each_entry(wq, &fence->wait.task_list, task_list) { ... where it's unclear (to me) what we are iterating, and during review it's hard to tell whether it's trying to walk a wait-queue entry (which would be a bug), while now it's written as: list_for_each_entry_safe(pos, next, &x->head, entry) { list_for_each_entry(wq, &fence->wait.head, entry) { Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
ac6424b981 |
sched/wait: Rename wait_queue_t => wait_queue_entry_t
Rename: wait_queue_t => wait_queue_entry_t 'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue", but in reality it's a queue *entry*. The 'real' queue is the wait queue head, which had to carry the name. Start sorting this out by renaming it to 'wait_queue_entry_t'. This also allows the real structure name 'struct __wait_queue' to lose its double underscore and become 'struct wait_queue_entry', which is the more canonical nomenclature for such data types. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
4e4cbee93d |
block: switch bios to blk_status_t
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
c1844d536d |
fs: Remove SB_I_DYNBDI flag
Now that all bdi structures filesystems use are properly refcounted, we can remove the SB_I_DYNBDI flag. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
0546c537b1 |
nilfs2: Convert to properly refcounting bdi
Similarly to set_bdev_super() NILFS2 just used block device reference to bdi. Convert it to properly getting bdi reference. The reference will get automatically dropped on superblock destruction. CC: linux-nilfs@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
174cd4b1e5 |
sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
f3048d17d1 |
nilfs2: use i_blocksize()
Since i_blocksize() helper has been defined in fs.h, use it instead of open-coding. Link: http://lkml.kernel.org/r/1485184655-3895-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
1508ddc39a |
nilfs2: use nilfs_btree_node_size()
Use nilfs_btree_node_size() instead of open-coding. Link: http://lkml.kernel.org/r/1485184655-3895-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
93407472a2 |
fs: add i_blocksize()
Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs branch. This patch also fixes multiple checkpatch warnings: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' Thanks to Andrew Morton for suggesting more appropriate function instead of macro. [geliangtang@gmail.com: truncate: use i_blocksize()] Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |