35687 Commits

Author SHA1 Message Date
Miklos Szeredi
520c8b1650 vfs: add renameat2 syscall
Add new renameat2 syscall, which is the same as renameat with an added
flags argument.

Pass flags to vfs_rename() and to i_op->rename() as well.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
2014-04-01 17:08:42 +02:00
Miklos Szeredi
bc27027a73 vfs: rename: use common code for dir and non-dir
There's actually very little difference between vfs_rename_dir() and
vfs_rename_other() so move both inline into vfs_rename() which still stays
reasonably readable.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
2014-04-01 17:08:42 +02:00
Miklos Szeredi
de22a4c372 vfs: rename: move d_move() up
Move the d_move() in vfs_rename_dir() up, similarly to how it's done in
vfs_rename_other().  The next patch will consolidate these two functions
and this is the only structural difference between them.

I'm not sure if doing the d_move() after the dput is even valid.  But there
may be a logical explanation for that.  But moving the d_move() before the
dput() (and the mutex_unlock()) should definitely not hurt.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
2014-04-01 17:08:42 +02:00
Miklos Szeredi
44b1d53043 vfs: add d_is_dir()
Add d_is_dir(dentry) helper which is analogous to S_ISDIR().

To avoid confusion, rename d_is_directory() to d_can_lookup().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
2014-04-01 17:08:41 +02:00
Chao Yu
6e452d69d4 f2fs: avoid unneeded lookup when xattr name length is too long
In f2fs_setxattr we have limit this attribute name length, so we should also
check it in f2fs_getxattr to avoid useless lookup caused by invalid name length.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-01 18:54:24 +09:00
Chao Yu
df0f8dc0e1 f2fs: avoid unnecessary bio submit when wait page writeback
This patch introduce is_merged_page() to check whether current page is merged
in f2fs bio cache. When page is not in cache, we can avoid submitting bio cache,
resulting in having more chance to merge pages.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-01 18:53:41 +09:00
Jaegeuk Kim
3bb5e2c8fe f2fs: return -EIO when node id is not matched
During the cleaing of node segments, F2FS can get errored node blocks due to
data race between node page lock and its valid bitmap operations.
In that case, it needs to return an error to skip such the obsolete block copy.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-01 17:38:26 +09:00
Lukas Czerner
e5b30416f3 ext4: remove unneeded test of ret variable
Currently in ext4_fallocate() and ext4_zero_range() we're testing ret
variable along with new_size. However in ext4_fallocate() we just tested
ret before and in ext4_zero_range() if will always be zero when we get
there so there is no need to test it in both cases.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-01 00:59:21 -04:00
Linus Torvalds
9d919e8d5b Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue changes from Tejun Heo:
 "PREPARE_[DELAYED_]WORK() were used to change the work function of work
  items without fully reinitializing it; however, this makes workqueue
  consider the work item as a different one from before and allows the
  work item to start executing before the previous instance is finished
  which can lead to extremely subtle issues which are painful to debug.

  The interface has never been popular.  This pull request contains
  patches to remove existing usages and kill the interface.  As one of
  the changes was routed during the last devel cycle and another
  depended on a pending change in nvme, for-3.15 contains a couple merge
  commits.

  In addition, interfaces which were deprecated quite a while ago -
  __cancel_delayed_work() and WQ_NON_REENTRANT - are removed too"

* 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: remove deprecated WQ_NON_REENTRANT
  workqueue: Spelling s/instensive/intensive/
  workqueue: remove PREPARE_[DELAYED_]WORK()
  staging/fwserial: don't use PREPARE_WORK
  afs: don't use PREPARE_WORK
  nvme: don't use PREPARE_WORK
  usb: don't use PREPARE_DELAYED_WORK
  floppy: don't use PREPARE_[DELAYED_]WORK
  ps3-vuart: don't use PREPARE_WORK
  wireless/rt2x00: don't use PREPARE_WORK in rt2800usb.c
  workqueue: Remove deprecated __cancel_delayed_work()
2014-03-31 15:08:51 -07:00
Linus Torvalds
1ce235faa8 - KGDB support for arm64
- PCI I/O space extended to 16M (in preparation of PCIe support patches)
 - Dropping ZONE_DMA32 in favour of ZONE_DMA (we only need one for the
   time being), together with swiotlb late initialisation to correctly
   setup the bounce buffer
 - DMA API cache maintenance support (not all ARMv8 platforms have
   hardware cache coherency)
 - Crypto extensions advertising via ELF_HWCAP2 for compat user space
 - Perf support for dwarf unwinding in compat mode
 - asm/tlb.h converted to the generic mmu_gather code
 - asm-generic rwsem implementation
 - Code clean-up
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.9 (GNU/Linux)
 
 iQIcBAABAgAGBQJTOaqsAAoJEGvWsS0AyF7xYNUP/3/IPySIB+/6pyUG6q7kvIpF
 Di93M+VdmnLEOKhhx/tjkiEmEQMp0hFPeOlQRWf/Ugg4ksulP6gRejdDEjIfkmsk
 LrRXLjvH79NDJbN0pTUXqGDvLLZ9Qnib+HEOuKABIYUrwhNKySBk+5omGfXFtwLR
 Mb5JxPX0kbBXOqbOX4RgANQoRlE8GxJR3V245zlGxA4klcN4IiaDy/99kj+kaeaa
 Cl8X9K2I550IZ2YUAWPOut2aee2qRFQtAhIDgVthTYlGRx7Y/rDLM16B8fFY/T0H
 7azIpSO5hk5lp8J3giJHYajlJlXNla5FeHQb8XAVnlyqFBmCUn0vvd2VbPvWREJp
 UD8t1vZZt/s2he6CVAQIfQghwLyzrpPa19KbnyI+3HtsZ+NS/puBJmcVKZ2PBY/L
 28BsRzB7BKAPEVhNmyPwFHNdZTvjaqYUCLhQ0uTp1sSHMcLeSs7+vyMR99f/0u9E
 doSYAeF41ZkxHXL5xEevdj4sFkCEY1XFxER1Y8VM1rqHTeGEoeYbdS/u9tEeBgit
 jBelvHAlNTBgbur2nW4E9fQpAF2CsvWnRq6lSmDRTkyjzcLUQqA8bsQJ3aUyJtZt
 j17kUIzSH1q7x3zAaWQcvMVeawdkv2+HanjuTOdeO2ehvyG71vvxA3RkCv8o5Jhh
 da+jAMhkpYQxk8mSKkWm
 =8+cB
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull ARM64 updates from Catalin Marinas:
 - KGDB support for arm64
 - PCI I/O space extended to 16M (in preparation of PCIe support
   patches)
 - Dropping ZONE_DMA32 in favour of ZONE_DMA (we only need one for the
   time being), together with swiotlb late initialisation to correctly
   setup the bounce buffer
 - DMA API cache maintenance support (not all ARMv8 platforms have
   hardware cache coherency)
 - Crypto extensions advertising via ELF_HWCAP2 for compat user space
 - Perf support for dwarf unwinding in compat mode
 - asm/tlb.h converted to the generic mmu_gather code
 - asm-generic rwsem implementation
 - Code clean-up

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits)
  arm64: Remove pgprot_dmacoherent()
  arm64: Support DMA_ATTR_WRITE_COMBINE
  arm64: Implement custom mmap functions for dma mapping
  arm64: Fix __range_ok macro
  arm64: Fix duplicated Kconfig entries
  arm64: mm: Route pmd thp functions through pte equivalents
  arm64: rwsem: use asm-generic rwsem implementation
  asm-generic: rwsem: de-PPCify rwsem.h
  arm64: enable generic CPU feature modalias matching for this architecture
  arm64: smp: make local symbol static
  arm64: debug: make local symbols static
  ARM64: perf: support dwarf unwinding in compat mode
  ARM64: perf: add support for frame pointer unwinding in compat mode
  ARM64: perf: add support for perf registers API
  arm64: Add boot time configuration of Intermediate Physical Address size
  arm64: Do not synchronise I and D caches for special ptes
  arm64: Make DMA coherent and strongly ordered mappings not executable
  arm64: barriers: add dmb barrier
  arm64: topology: Implement basic CPU topology support
  arm64: advertise ARMv8 extensions to 32-bit compat ELF binaries
  ...
2014-03-31 15:01:45 -07:00
Linus Torvalds
190f918660 Merge branch 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 compat wrapper rework from Heiko Carstens:
 "S390 compat system call wrapper simplification work.

  The intention of this work is to get rid of all hand written assembly
  compat system call wrappers on s390, which perform proper sign or zero
  extension, or pointer conversion of compat system call parameters.
  Instead all of this should be done with C code eg by using Al's
  COMPAT_SYSCALL_DEFINEx() macro.

  Therefore all common code and s390 specific compat system calls have
  been converted to the COMPAT_SYSCALL_DEFINEx() macro.

  In order to generate correct code all compat system calls may only
  have eg compat_ulong_t parameters, but no unsigned long parameters.
  Those patches which change parameter types from unsigned long to
  compat_ulong_t parameters are separate in this series, but shouldn't
  cause any harm.

  The only compat system calls which intentionally have 64 bit
  parameters (preadv64 and pwritev64) in support of the x86/32 ABI
  haven't been changed, but are now only available if an architecture
  defines __ARCH_WANT_COMPAT_SYS_PREADV64/PWRITEV64.

  System calls which do not have a compat variant but still need proper
  zero extension on s390, like eg "long sys_brk(unsigned long brk)" will
  get a proper wrapper function with the new s390 specific
  COMPAT_SYSCALL_WRAPx() macro:

     COMPAT_SYSCALL_WRAP1(brk, unsigned long, brk);

  which generates the following code (simplified):

     asmlinkage long sys_brk(unsigned long brk);
     asmlinkage long compat_sys_brk(long brk)
     {
         return sys_brk((u32)brk);
     }

  Given that the C file which contains all the COMPAT_SYSCALL_WRAP lines
  includes both linux/syscall.h and linux/compat.h, it will generate
  build errors, if the declaration of sys_brk() doesn't match, or if
  there exists a non-matching compat_sys_brk() declaration.

  In addition this will intentionally result in a link error if
  somewhere else a compat_sys_brk() function exists, which probably
  should have been used instead.  Two more BUILD_BUG_ONs make sure the
  size and type of each compat syscall parameter can be handled
  correctly with the s390 specific macros.

  I converted the compat system calls step by step to verify the
  generated code is correct and matches the previous code.  In fact it
  did not always match, however that was always a bug in the hand
  written asm code.

  In result we get less code, less bugs, and much more sanity checking"

* 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits)
  s390/compat: add copyright statement
  compat: include linux/unistd.h within linux/compat.h
  s390/compat: get rid of compat wrapper assembly code
  s390/compat: build error for large compat syscall args
  mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  ipc/compat: convert to COMPAT_SYSCALL_DEFINE
  fs/compat: convert to COMPAT_SYSCALL_DEFINE
  security/compat: convert to COMPAT_SYSCALL_DEFINE
  mm/compat: convert to COMPAT_SYSCALL_DEFINE
  net/compat: convert to COMPAT_SYSCALL_DEFINE
  kernel/compat: convert to COMPAT_SYSCALL_DEFINE
  fs/compat: optional preadv64/pwrite64 compat system calls
  ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t
  s390/compat: partial parameter conversion within syscall wrappers
  s390/compat: automatic zero, sign and pointer conversion of syscalls
  s390/compat: add sync_file_range and fallocate compat syscalls
  ...
2014-03-31 14:32:17 -07:00
Linus Torvalds
7cc3afdf43 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI changes from Ingo Molnar:
 "The main changes:

  - Add debug code to the dump EFI pagetable - Borislav Petkov

  - Make 1:1 runtime mapping robust when booting on machines with lots
    of memory - Borislav Petkov

  - Move the EFI facilities bits out of 'x86_efi_facility' and into
    efi.flags which is the standard architecture independent place to
    keep EFI state, by Matt Fleming.

  - Add 'EFI mixed mode' support: this allows 64-bit kernels to be
    booted from 32-bit firmware.  This needs a bootloader that supports
    the 'EFI handover protocol'.  By Matt Fleming"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  x86, efi: Abstract x86 efi_early calls
  x86/efi: Restore 'attr' argument to query_variable_info()
  x86/efi: Rip out phys_efi_get_time()
  x86/efi: Preserve segment registers in mixed mode
  x86/boot: Fix non-EFI build
  x86, tools: Fix up compiler warnings
  x86/efi: Re-disable interrupts after calling firmware services
  x86/boot: Don't overwrite cr4 when enabling PAE
  x86/efi: Wire up CONFIG_EFI_MIXED
  x86/efi: Add mixed runtime services support
  x86/efi: Firmware agnostic handover entry points
  x86/efi: Split the boot stub into 32/64 code paths
  x86/efi: Add early thunk code to go from 64-bit to 32-bit
  x86/efi: Build our own EFI services pointer table
  efi: Add separate 32-bit/64-bit definitions
  x86/efi: Delete dead code when checking for non-native
  x86/mm/pageattr: Always dump the right page table in an oops
  x86, tools: Consolidate #ifdef code
  x86/boot: Cleanup header.S by removing some #ifdefs
  efi: Use NULL instead of 0 for pointer
  ...
2014-03-31 12:26:05 -07:00
Linus Torvalds
b3fd4ea9df Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "Main changes:

   - Torture-test changes, including refactoring of rcutorture and
     introduction of a vestigial locktorture.

   - Real-time latency fixes.

   - Documentation updates.

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
  rcu: Provide grace-period piggybacking API
  rcu: Ensure kernel/rcu/rcu.h can be sourced/used stand-alone
  rcu: Fix sparse warning for rcu_expedited from kernel/ksysfs.c
  notifier: Substitute rcu_access_pointer() for rcu_dereference_raw()
  Documentation/memory-barriers.txt: Clarify release/acquire ordering
  rcutorture: Save kvm.sh output to log
  rcutorture: Add a lock_busted to test the test
  rcutorture: Place kvm-test-1-run.sh output into res directory
  rcutorture: Rename TREE_RCU-Kconfig.txt
  locktorture: Add kvm-recheck.sh plug-in for locktorture
  rcutorture: Gracefully handle NULL cleanup hooks
  locktorture: Add vestigial locktorture configuration
  rcutorture: Introduce "rcu" directory level underneath configs
  rcutorture: Rename kvm-test-1-rcu.sh
  rcutorture: Remove RCU dependencies from ver_functions.sh API
  rcutorture: Create CFcommon file for common Kconfig parameters
  rcutorture: Create config files for scripted test-the-test testing
  rcutorture: Add an rcu_busted to test the test
  locktorture: Add a lock-torture kernel module
  rcutorture: Abstract kvm-recheck.sh
  ...
2014-03-31 11:05:24 -07:00
Steven Whitehouse
1b2ad41214 GFS2: Fix address space from page function
Now that rgrps use the address space which is part of the super
block, we need to update gfs2_mapping2sbd() to take account of
that. The only way to do that easily is to use a different set
of address_space_operations for rgrps.

Reported-by: Abhi Das <adas@redhat.com>
Tested-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-31 17:48:27 +01:00
Abhi Das
059788039f GFS2: Fix uninitialized VFS inode in gfs2_create_inode
When gfs2_create_inode() fails due to quota violation, the VFS
inode is not completely uninitialized. This can cause a list
corruption error.

This patch correctly uninitializes the VFS inode when a quota
violation occurs in the gfs2_create_inode codepath.

Resolves: rhbz#1059808
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-31 16:41:39 +01:00
Jeff Layton
29723adee1 locks: make locks_mandatory_area check for file-private locks
Allow locks_mandatory_area() to handle file-private locks correctly.
If there is a file-private lock set on an open file and we're doing I/O
via the same, then that should not cause anything to block.

Handle this by first doing a non-blocking FL_ACCESS check for a
file-private lock, and then fall back to checking for a classic POSIX
lock (and possibly blocking).

Note that this approach is subject to the same races that have always
plagued mandatory locking on Linux.

Reported-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:43 -04:00
Jeff Layton
d7a06983a0 locks: fix locks_mandatory_locked to respect file-private locks
As Trond pointed out, you can currently deadlock yourself by setting a
file-private lock on a file that requires mandatory locking and then
trying to do I/O on it.

Avoid this problem by plumbing some knowledge of file-private locks into
the mandatory locking code. In order to do this, we must pass down
information about the struct file that's being used to
locks_verify_locked.

Reported-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
2014-03-31 08:24:43 -04:00
Jeff Layton
90478939dc locks: require that flock->l_pid be set to 0 for file-private locks
Neil Brown suggested potentially overloading the l_pid value as a "lock
context" field for file-private locks. While I don't think we will
probably want to do that here, it's probably a good idea to ensure that
in the future we could extend this API without breaking existing
callers.

Typically the l_pid value is ignored for incoming struct flock
arguments, serving mainly as a place to return the pid of the owner if
there is a conflicting lock. For file-private locks, require that it
currently be set to 0 and return EINVAL if it isn't. If we eventually
want to make a non-zero l_pid mean something, then this will help ensure
that we don't break legacy programs that are using file-private locks.

Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:43 -04:00
Jeff Layton
5d50ffd7c3 locks: add new fcntl cmd values for handling file private locks
Due to some unfortunate history, POSIX locks have very strange and
unhelpful semantics. The thing that usually catches people by surprise
is that they are dropped whenever the process closes any file descriptor
associated with the inode.

This is extremely problematic for people developing file servers that
need to implement byte-range locks. Developers often need a "lock
management" facility to ensure that file descriptors are not closed
until all of the locks associated with the inode are finished.

Additionally, "classic" POSIX locks are owned by the process. Locks
taken between threads within the same process won't conflict with one
another, which renders them useless for synchronization between threads.

This patchset adds a new type of lock that attempts to address these
issues. These locks conflict with classic POSIX read/write locks, but
have semantics that are more like BSD locks with respect to inheritance
and behavior on close.

This is implemented primarily by changing how fl_owner field is set for
these locks. Instead of having them owned by the files_struct of the
process, they are instead owned by the filp on which they were acquired.
Thus, they are inherited across fork() and are only released when the
last reference to a filp is put.

These new semantics prevent them from being merged with classic POSIX
locks, even if they are acquired by the same process. These locks will
also conflict with classic POSIX locks even if they are acquired by
the same process or on the same file descriptor.

The new locks are managed using a new set of cmd values to the fcntl()
syscall. The initial implementation of this converts these values to
"classic" cmd values at a fairly high level, and the details are not
exposed to the underlying filesystem. We may eventually want to push
this handing out to the lower filesystem code but for now I don't
see any need for it.

Also, note that with this implementation the new cmd values are only
available via fcntl64() on 32-bit arches. There's little need to
add support for legacy apps on a new interface like this.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:43 -04:00
Jeff Layton
57b65325fe locks: skip deadlock detection on FL_FILE_PVT locks
It's not really feasible to do deadlock detection with FL_FILE_PVT
locks since they aren't owned by a single task, per-se. Deadlock
detection also tends to be rather expensive so just skip it for
these sorts of locks.

Also, add a FIXME comment about adding more limited deadlock detection
that just applies to ro -> rw upgrades, per Andy's request.

Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:43 -04:00
Jeff Layton
c1e62b8fc3 locks: pass the cmd value to fcntl_getlk/getlk64
Once we introduce file private locks, we'll need to know what cmd value
was used, as that affects the ownership and whether a conflict would
arise.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:43 -04:00
Jeff Layton
3fd80cddc6 locks: report l_pid as -1 for FL_FILE_PVT locks
FL_FILE_PVT locks are no longer tied to a particular pid, and are
instead inheritable by child processes. Report a l_pid of '-1' for
these sorts of locks since the pid is somewhat meaningless for them.

This precedent comes from FreeBSD. There, POSIX and flock() locks can
conflict with one another. If fcntl(F_GETLK, ...) returns a lock set
with flock() then the l_pid member cannot be a process ID because the
lock is not held by a process as such.

Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:42 -04:00
Jeff Layton
c918d42a27 locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT"
In a later patch, we'll be adding a new type of lock that's owned by
the struct file instead of the files_struct. Those sorts of locks
will be flagged with a new FL_FILE_PVT flag.

Report these types of locks as "FLPVT" in /proc/locks to distinguish
them from "classic" POSIX locks.

Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:42 -04:00
Jeff Layton
78ed8a1338 locks: rename locks_remove_flock to locks_remove_file
This function currently removes leases in addition to flock locks and in
a later patch we'll have it deal with file-private locks too. Rename it
to locks_remove_file to indicate that it removes locks that are
associated with a particular struct file, and not just flock locks.

Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:42 -04:00
Jeff Layton
bce7560d49 locks: consolidate checks for compatible filp->f_mode values in setlk handlers
Move this check into flock64_to_posix_lock instead of duplicating it in
two places. This also fixes a minor wart in the code where we continue
referring to the struct flock after converting it to struct file_lock.

Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:42 -04:00
J. Bruce Fields
ef12e72a01 locks: fix posix lock range overflow handling
In the 32-bit case fcntl assigns the 64-bit f_pos and i_size to a 32-bit
off_t.

The existing range checks also seem to depend on signed arithmetic
wrapping when it overflows.  In practice maybe that works, but we can be
more careful.  That also allows us to make a more reliable distinction
between -EINVAL and -EOVERFLOW.

Note that in the 32-bit case SEEK_CUR or SEEK_END might allow the caller
to set a lock with starting point no longer representable as a 32-bit
value.  We could return -EOVERFLOW in such cases, but the locks code is
capable of handling such ranges, so we choose to be lenient here.  The
only problem is that subsequent GETLK calls on such a lock will fail
with EOVERFLOW.

While we're here, do some cleanup including consolidating code for the
flock and flock64 cases.

Signed-off-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:42 -04:00
Jeff Layton
8c3cac5e6a locks: eliminate BUG() call when there's an unexpected lock on file close
A leftover lock on the list is surely a sign of a problem of some sort,
but it's not necessarily a reason to panic the box. Instead, just log a
warning with some info about the lock, and then delete it like we would
any other lock.

In the event that the filesystem declares a ->lock f_op, we may end up
leaking something, but that's generally preferable to an immediate
panic.

Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:42 -04:00
Jeff Layton
b03dfdec03 locks: add __acquires and __releases annotations to locks_start and locks_stop
...to make sparse happy.

Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:42 -04:00
Jeff Layton
6ca10ed8ed locks: remove "inline" qualifier from fl_link manipulation functions
It's best to let the compiler decide that.

Acked-by: J. Bruce Fields <bfields@fieldses.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:42 -04:00
Jeff Layton
46dad7603f locks: clean up comment typo
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31 08:24:42 -04:00
Jeff Layton
24cbe7845e locks: close potential race between setlease and open
As Al Viro points out, there is an unlikely, but possible race between
opening a file and setting a lease on it. generic_add_lease is done with
the i_lock held, but the inode->i_flock check in break_lease is
lockless. It's possible for another task doing an open to do the entire
pathwalk and call break_lease between the point where generic_add_lease
checks for a conflicting open and adds the lease to the list. If this
occurs, we can end up with a lease set on the file with a conflicting
open.

To guard against that, check again for a conflicting open after adding
the lease to the i_flock list. If the above race occurs, then we can
simply unwind the lease setting and return -EAGAIN.

Because we take dentry references and acquire write access on the file
before calling break_lease, we know that if the i_flock list is empty
when the open caller goes to check it then the necessary refcounts have
already been incremented. Thus the additional check for a conflicting
open will see that there is one and the setlease call will fail.

Cc: Bruce Fields <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@fieldses.org>
2014-03-31 08:24:42 -04:00
Abhi Das
e9fb7c73a4 GFS2: Fix return value in slot_get()
ENOSPC was being returned in slot_get inspite of successful
execution of the function. This patch fixes this return
code.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-31 10:43:05 +01:00
Grant Likely
d88cf7d7b4 Merge remote-tracking branch 'robh/for-next' into devicetree/next 2014-03-31 08:10:55 +01:00
Linus Torvalds
fedc1ed0f1 Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Switch mnt_hash to hlist, turning the races between __lookup_mnt() and
  hash modifications into false negatives from __lookup_mnt() (instead
  of hangs)"

On the false negatives from __lookup_mnt():
 "The *only* thing we care about is not getting stuck in __lookup_mnt().
  If it misses an entry because something in front of it just got moved
  around, etc, we are fine.  We'll notice that mount_lock mismatch and
  that'll be it"

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  switch mnt_hash to hlist
  don't bother with propagate_mnt() unless the target is shared
  keep shadowed vfsmounts together
  resizable namespace.c hashes
2014-03-30 17:26:08 -07:00
Theodore Ts'o
00a1a053eb ext4: atomically set inode->i_flags in ext4_set_inode_flags()
Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.

Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-30 17:02:06 -07:00
Al Viro
38129a13e6 switch mnt_hash to hlist
fixes RCU bug - walking through hlist is safe in face of element moves,
since it's self-terminating.  Cyclic lists are not - if we end up jumping
to another hash chain, we'll loop infinitely without ever hitting the
original list head.

[fix for dumb braino folded]

Spotted by: Max Kellermann <mk@cm4all.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-30 19:18:51 -04:00
Al Viro
0b1b901b5a don't bother with propagate_mnt() unless the target is shared
If the dest_mnt is not shared, propagate_mnt() does nothing -
there's no mounts to propagate to and thus no copies to create.
Might as well don't bother calling it in that case.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-30 19:18:50 -04:00
Al Viro
1d6a32acd7 keep shadowed vfsmounts together
preparation to switching mnt_hash to hlist

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-30 19:18:50 -04:00
Al Viro
0818bf27c0 resizable namespace.c hashes
* switch allocation to alloc_large_system_hash()
* make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
* switch mountpoint_hashtable from list_head to hlist_head

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-30 19:18:49 -04:00
Trond Myklebust
e911b8158e NFSv4: Fix a use-after-free problem in open()
If we interrupt the nfs4_wait_for_completion_rpc_task() call in
nfs4_run_open_task(), then we don't prevent the RPC call from
completing. So freeing up the opendata->f_attr.mdsthreshold
in the error path in _nfs4_do_open() leads to a use-after-free
when the XDR decoder tries to decode the mdsthreshold information
from the server.

Fixes: 82be417aa37c0 (NFSv4.1 cache mdsthreshold values on OPEN)
Tested-by: Steve Dickson <SteveD@redhat.com>
Cc: stable@vger.kernel.org # 3.5+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-28 20:12:10 -04:00
Sasha Levin
d9060742fb ocfs2: check if cluster name exists before deref
Commit c74a3bdd9b52 ("ocfs2: add clustername to cluster connection") is
trying to strlcpy a string which was explicitly passed as NULL in the
very same patch, triggering a NULL ptr deref.

  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: strlcpy (lib/string.c:388 lib/string.c:151)
  CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G        W     3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274
  RIP:  strlcpy (lib/string.c:388 lib/string.c:151)
  Call Trace:
   ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350)
   ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396)
   user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679)
   dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503)
   vfs_mkdir (fs/namei.c:3467)
   SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472)
   tracesys (arch/x86/kernel/entry_64.S:749)

akpm: this patch probably disables the feature.  A temporary thing to
avoid triviel oopses.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-28 13:56:58 -07:00
Jan Kara
75c5a52da3 vfs: Allocate anon_inode_inode in anon_inode_init()
Currently we allocated anon_inode_inode in anon_inodefs_mount. This is
somewhat fragile as if that function ever gets called again, it will
overwrite anon_inode_inode pointer. So move the initialization of
anon_inode_inode to anon_inode_init().

Signed-off-by: Jan Kara <jack@suse.cz>
[ Further simplified on suggestion from Dave Jones ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-27 09:52:54 -07:00
Greg Kroah-Hartman
72099304ee Revert "sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()"
This reverts commit d1ba277e79889085a2faec3b68b91ce89c63f888.

As reported by Stephen, this patch breaks linux-next as a ppc patch
suddenly (after 2 years) started using this old api call.  So revert it
for now, it will go away in 3.15-rc2 when we can change the PPC call to
the new api.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-25 20:54:57 -07:00
Linus Torvalds
fce7fc79c8 fs: remove now stale label in anon_inode_init()
The previous commit removed the register_filesystem() call and the
associated error handling, but left the label for the error path that no
longer exists.  Remove that too.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-25 17:43:34 -07:00
Jan Kara
d6f2589ad5 fs: Avoid userspace mounting anon_inodefs filesystem
anon_inodefs filesystem is a kernel internal filesystem userspace
shouldn't mess with. Remove registration of it so userspace cannot
even try to mount it (which would fail anyway because the filesystem is
MS_NOUSER).

This fixes an oops triggered by trinity when it tried mounting
anon_inodefs which overwrote anon_inode_inode pointer while other CPU
has been in anon_inode_getfile() between ihold() and d_instantiate().
Thus effectively creating dentry pointing to an inode without holding a
reference to it.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-25 17:42:16 -07:00
Linus Torvalds
632b06aa28 Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux
Pull nfsd fix frm Bruce Fields:
 "J R Okajima sent this early and I was just slow to pass it along,
  apologies.  Fortunately it's a simple fix"

* 'nfsd-next' of git://linux-nfs.org/~bfields/linux:
  nfsd: fix lost nfserrno() call in nfsd_setattr()
2014-03-25 15:24:11 -07:00
Matthew Wilcox
e04027e887 ext4: fix comment typo
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-24 15:15:07 -04:00
Matthew Wilcox
94350ab5c3 ext4: make ext4_block_zero_page_range static
It's only called within inode.c, so make it static, remove its prototype
from ext4.h and move it above all of its callers so it doesn't need a
prototype within inode.c.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-24 15:09:16 -04:00
Theodore Ts'o
5f16f3225b ext4: atomically set inode->i_flags in ext4_set_inode_flags()
Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.

Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2014-03-24 14:43:12 -04:00
Theodore Ts'o
ed3654eb98 ext4: optimize Hurd tests when reading/writing inodes
Set a in-memory superblock flag to indicate whether the file system is
designed to support the Hurd.

Also, add a sanity check to make sure the 64-bit feature is not set
for Hurd file systems, since i_file_acl_high conflicts with a
Hurd-specific field.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-24 14:09:06 -04:00