Rename the current posix_acl_created to __posix_acl_create and add
a fully featured helper to set up the ACLs on file creation that
uses get_acl().
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Rename the current posix_acl_chmod to __posix_acl_chmod and add
a fully featured ACL chmod helper that uses the ->set_acl inode
operation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
With the ->set_acl inode operation we can implement the Posix ACL
xattr handlers in generic code instead of duplicating them all
over the tree.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Factor out the code to get an ACL either from the inode or disk from
check_acl, so that it can be used elsewhere later on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull networking updates from David Miller:
1) BPF debugger and asm tool by Daniel Borkmann.
2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.
3) Correct reciprocal_divide and update users, from Hannes Frederic
Sowa and Daniel Borkmann.
4) Currently we only have a "set" operation for the hw timestamp socket
ioctl, add a "get" operation to match. From Ben Hutchings.
5) Add better trace events for debugging driver datapath problems, also
from Ben Hutchings.
6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
have a small send and a previous packet is already in the qdisc or
device queue, defer until TX completion or we get more data.
7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.
8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
Borkmann.
9) Share IP header compression code between Bluetooth and IEEE802154
layers, from Jukka Rissanen.
10) Fix ipv6 router reachability probing, from Jiri Benc.
11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.
12) Support tunneling in GRO layer, from Jerry Chu.
13) Allow bonding to be configured fully using netlink, from Scott
Feldman.
14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
already get the TCI. From Atzm Watanabe.
15) New "Heavy Hitter" qdisc, from Terry Lam.
16) Significantly improve the IPSEC support in pktgen, from Fan Du.
17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
Herbert.
18) Add Proportional Integral Enhanced packet scheduler, from Vijay
Subramanian.
19) Allow openvswitch to mmap'd netlink, from Thomas Graf.
20) Key TCP metrics blobs also by source address, not just destination
address. From Christoph Paasch.
21) Support 10G in generic phylib. From Andy Fleming.
22) Try to short-circuit GRO flow compares using device provided RX
hash, if provided. From Tom Herbert.
The wireless and netfilter folks have been busy little bees too.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
net/cxgb4: Fix referencing freed adapter
ipv6: reallocate addrconf router for ipv6 address when lo device up
fib_frontend: fix possible NULL pointer dereference
rtnetlink: remove IFLA_BOND_SLAVE definition
rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
qlcnic: update version to 5.3.55
qlcnic: Enhance logic to calculate msix vectors.
qlcnic: Refactor interrupt coalescing code for all adapters.
qlcnic: Update poll controller code path
qlcnic: Interrupt code cleanup
qlcnic: Enhance Tx timeout debugging.
qlcnic: Use bool for rx_mac_learn.
bonding: fix u64 division
rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
sfc: Use the correct maximum TX DMA ring size for SFC9100
Add Shradha Shah as the sfc driver maintainer.
net/vxlan: Share RX skb de-marking and checksum checks with ovs
tulip: cleanup by using ARRAY_SIZE()
ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
net/cxgb4: Don't retrieve stats during recovery
...
Also fix befs_iget return value if iget_locked fails.
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The slow path in __fget_light() can use __fget() to avoid the
code duplication. Saves 232 bytes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Apart from FMODE_PATH check fget_light() and fget_raw_light() are
identical, shift the code into the new helper, __fget_light(fd, mask).
Saves 208 bytes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Apart from FMODE_PATH check fget() and fget_raw() are identical,
shift the code into the new simple helper, __fget(fd, mask). Saves
160 bytes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
put_files_struct() and close_files() do rcu_read_lock() to make
rcu_dereference_check_fdtable() happy.
This looks a bit ugly, files_fdtable() just reads the pointer,
we can simply use rcu_dereference_raw() to avoid the warning.
The patch also changes close_files() to return fdt, this avoids
another rcu_read_lock()/files_fdtable() in put_files_struct().
I think close_files() needs more cleanups:
- we do not need xchg() exactly because we are the last
user of this files_struct
- "if (file)" should be turned into WARN_ON(!file)
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
rcu_dereference_check_fdtable() looks very wrong,
1. rcu_my_thread_group_empty() was added by 844b9a8707f1 "vfs: fix
RCU-lockdep false positive due to /proc" but it doesn't really
fix the problem. A CLONE_THREAD (without CLONE_FILES) task can
hit the same race with get_files_struct().
And otoh rcu_my_thread_group_empty() can suppress the correct
warning if the caller is the CLONE_FILES (without CLONE_THREAD)
task.
2. files->count == 1 check is not really right too. Even if this
files_struct is not shared it is not safe to access it lockless
unless the caller is the owner.
Otoh, this check is sub-optimal. files->count == 0 always means
it is safe to use it lockless even if files != current->files,
but put_files_struct() has to take rcu_read_lock(). See the next
patch.
This patch removes the buggy checks and turns fcheck_files() into
__fcheck_files() which uses rcu_dereference_raw(), the "unshared"
callers, fget_light() and fget_raw_light(), can use it to avoid
the warning from RCU-lockdep.
fcheck_files() is trivially reimplemented as rcu_lockdep_assert()
plus __fcheck_files().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
kill pointless method instances and don't bother with ->owner - it's
ignored for procfs files anyway, make use of remove_proc_subtree() for
removal, get rid of cell->proc_dir.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
pass owner explicitly to __register_nls(), make register_nls() a macro passing
THIS_MODULE as the owner argument to __register_nls().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* don't assume that ->dest_count won't change between copy_from_user()
and memdup_user()
* use fdget instead of fget
* don't bother comparing superblocks when we'd already compared vfsmounts
* get rid of excessive goto
* use file_inode() instead of open-coding the sucker
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* pass on-disk superblock to qnx4_chkroot() explicitly
* don't leave stale (and unused) pointers in qnx4_super_block
* free stuff in ->kill_sb(); ->put_super() becomes empty and dies
* simplify failure exits
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If ecryptfs_readlink_lower() fails, buf remains an uninitialized
pointer and passing it nd_set_link() won't do anything good.
Fixed by switching ecryptfs_readlink_lower() to saner API - make it
return buf or ERR_PTR(...) and update callers.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
A struct svc_fh is 320 bytes on x86_64, it'd be better not to have these
on the stack.
kmalloc'ing them probably isn't ideal either, but this is the simplest
thing to do. If it turns out to be a problem in the readdir case then
we could add a svc_fh to nfsd4_readdir and pass that in.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Some time ago, mkfs.xfs started picking the storage physical
sector size as the default filesystem "sector size" in order
to avoid RMW costs incurred by doing IOs at logical sector
size alignments.
However, this means that for a filesystem made with i.e.
a 4k sector size on an "advanced format" 4k/512 disk,
512-byte direct IOs are no longer allowed. This means
that XFS has essentially turned this AF drive into a hard
4K device, from the filesystem on up.
XFS's mkfs-specified "sector size" is really just controlling
the minimum size & alignment of filesystem metadata.
There is no real need to tightly couple XFS's minimal
metadata size to the minimum allowed direct IO size;
XFS can continue doing metadata in optimal sizes, but
still allow smaller DIOs for apps which issue them,
for whatever reason.
This patch adds a new field to the xfs_buftarg, so that
we now track 2 sizes:
1) The metadata sector size, which is the minimum unit and
alignment of IO which will be performed by metadata operations.
2) The device logical sector size
The first is used internally by the file system for metadata
alignment and IOs.
The second is used for the minimum allowed direct IO alignment.
This has passed xfstests on filesystems made with 4k sectors,
including when run under the patch I sent to ignore
XFS_IOC_DIOINFO, and issue 512 DIOs anyway. I also directly
tested end of block behavior on preallocated, sparse, and
existing files when we do a 512 IO into a 4k file on a
4k-sector filesystem, to be sure there were no unexpected
behaviors.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
In preparation for adding new members to the structure,
give these old ones more descriptive names:
bt_ssize -> bt_meta_sectorsize
bt_smask -> bt_meta_sectormask
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Clean up the xfs_buftarg structure a bit:
- remove bt_bsize which is never used
- replace bt_sshift with bt_ssize; we only ever shift it back
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Getting an inode by romfs_iget may lead to an err in fill_super, and the
err value should be return.
And it should return -ENOMEM instead while d_make_root fails, fix it too.
Signed-off-by: Rui Xiang <rui.xiang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
So far, POSIX ACLs are using a canonical representation that keeps all ACL
entries in a strict order; the ACL_USER and ACL_GROUP entries for specific
users and groups are ordered by user and group identifier, respectively.
The user-space code provides ACL entries in this order; the kernel
verifies that the ACL entry order is correct in posix_acl_valid().
User namespaces allow to arbitrary map user and group identifiers which
can cause the ACL_USER and ACL_GROUP entry order to differ between user
space and the kernel; posix_acl_valid() would then fail.
Work around this by allowing ACL_USER and ACL_GROUP entries to be in any
order in the kernel. The effect is only minor: file permission checks
will pick the first matching ACL_USER entry, and check all matching
ACL_GROUP entries.
(The libacl user-space library and getfacl / setfacl tools will not create
ACLs with duplicate user or group idenfifiers; they will handle ACLs with
entries in an arbitrary order correctly.)
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
use do{}while - more efficient and it squishes a coccinelle warning
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Michel Lespinasse <walken@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently both setup_new_exec() and flush_old_exec() issue a call to
arch_pick_mmap_layout(). As setup_new_exec() and flush_old_exec() are
always called pairwise arch_pick_mmap_layout() is called twice.
This patch removes one call from setup_new_exec() to have it only called
once.
Signed-off-by: Richard Weinberger <richard@nod.at>
Tested-by: Pat Erley <pat-lkml@erley.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Userspace process doesn't want the PF_NO_SETAFFINITY, but its parent may be
a kernel worker thread which has PF_NO_SETAFFINITY set, and this worker thread
can do kernel_thread() to create the child.
Clearing this flag in usersapce child to enable its migrating capability.
Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the remaining next_thread (ab)users to use while_each_thread().
The last user which should be changed is next_tid(), but we can't do this
now.
__exit_signal() and complete_signal() are fine, they actually need
next_thread() logic.
This patch (of 3):
do_task_stat() can use while_each_thread(), no changes in
the compiled code.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can kill either task->did_exec or PF_FORKNOEXEC, they are mutually
exclusive. The patch kills ->did_exec because it has a single user.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Both success/failure paths cleanup bprm->file, we can move this
code into free_bprm() to simlify and cleanup this logic.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fs_struct->in_exec == T means that this ->fs is used by a single process
(thread group), and one of the treads does do_execve().
To avoid the mt-exec races this code has the following complications:
1. check_unsafe_exec() returns -EBUSY if ->in_exec was
already set by another thread.
2. do_execve_common() records "clear_in_exec" to ensure
that the error path can only clear ->in_exec if it was
set by current.
However, after 9b1bf12d5d51 "signals: move cred_guard_mutex from
task_struct to signal_struct" we do not need these complications:
1. We can't race with our sub-thread, this is called under
per-process ->cred_guard_mutex. And we can't race with
another CLONE_FS task, we already checked that this fs
is not shared.
We can remove the dead -EAGAIN logic.
2. "out_unmark:" in do_execve_common() is either called
under ->cred_guard_mutex, or after de_thread() which
kills other threads, so we can't race with sub-thread
which could set ->in_exec. And if ->fs is shared with
another process ->in_exec should be false anyway.
We can clear in_exec unconditionally.
This also means that check_unsafe_exec() can be void.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
next_thread() should be avoided, change check_unsafe_exec() to use
while_each_thread().
Nobody except signal->curr_target actually needs next_thread-like code,
and we need to change (fix) this interface. This particular code is fine,
p == current. But in general the code like this can loop forever if p
exits and next_thread(t) can't reach the unhashed thread.
This also saves 32 bytes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PROC_FS is a bool, so this code is either present or absent. It will
never be modular, so using module_init as an alias for __initcall is
rather misleading.
Fix this up now, so that we can relocate module_init from init.h into
module.h in the future. If we don't do this, we'd have to add module.h to
obviously non-modular code, and that would be ugly at best.
Note that direct use of __initcall is discouraged, vs. one of the
priority categorized subgroups. As __initcall gets mapped onto
device_initcall, our use of fs_initcall (which makes sense for fs code)
will thus change these registrations from level 6-device to level 5-fs
(i.e. slightly earlier). However no observable impact of that small
difference has been observed during testing, or is expected.
Also note that this change uncovers a missing semicolon bug in the
registration of vmcore_init as an initcall.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Trivial cleanup to eliminate a goto.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Distribution kernels might want to build in support for /proc/device-tree
for kernels that might end up running on hardware that doesn't support
openfirmware. This results in an empty /proc/device-tree existing.
Remove it if the OFW root node doesn't exist.
This situation actually confuses grub2, resulting in install failures.
grub2 sees the /proc/device-tree and picks the wrong install target cf.
http://bzr.savannah.gnu.org/lh/grub/trunk/grub/annotate/4300/util/grub-install.in#L311
grub should be more robust, but still, leaving an empty proc dir seems
pointless.
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=818378.
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use existing accessors proc_set_user() and proc_set_size() to set
attributes. Just a cleanup.
Signed-off-by: Rui Xiang <rui.xiang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1. proc_task_readdir()->first_tid() path truncates f_pos to int, this
is wrong even on 64bit.
We could check that f_pos < PID_MAX or even INT_MAX in
proc_task_readdir(), but this patch simply checks the potential
overflow in first_tid(), this check is nop on 64bit. We do not care if
it was negative and the new unsigned value is huge, all we need to
ensure is that we never wrongly return !NULL.
2. Remove the 2nd "nr != 0" check before get_nr_threads(),
nr_threads == 0 is not distinguishable from !pid_task() above.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Sameer Nanda <snanda@chromium.org>
Cc: Sergey Dyasly <dserrg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc_task_readdir() does not really need "leader", first_tid() has to
revalidate it anyway. Just pass proc_pid(inode) to first_tid() instead,
it can do pid_task(PIDTYPE_PID) itself and read ->group_leader only if
necessary.
The patch also extracts the "inode is dead" code from
pid_delete_dentry(dentry) into the new trivial helper,
proc_inode_is_dead(inode), proc_task_readdir() uses it to return -ENOENT
if this dir was removed.
This is a bit racy, but the race is very inlikely and the getdents() after
openndir() can see the empty "." + ".." dir only once.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Sameer Nanda <snanda@chromium.org>
Cc: Sergey Dyasly <dserrg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rerwrite the main loop to use while_each_thread() instead of
next_thread(). We are going to fix or replace while_each_thread(),
next_thread() should be avoided whenever possible.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Sameer Nanda <snanda@chromium.org>
Cc: Sergey Dyasly <dserrg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>