* after successfully taking the $RECOVERY lock in EX mode, recheck to make
sure that recovery has not already begun or completed on another node
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
1. The tracee can go from ptrace_stop() to do_signal_stop()
after __ptrace_unlink(p).
2. It is unsafe to __ptrace_unlink(p) while p->parent may wait
for tasklist_lock in ptrace_detach().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There was a bug in the unstuffing logic which caused a crash
under certain circumstances. This is now fixed.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Two internal files which are read through the gfs2_internal_read()
routine were already locked when the routine was called and this
do not need locking at the redapages level.
This patch introduces a struct file which is used as a sentinal
so that readpage will only perform locking in the case that the
struct file passed to it is _not_ equal to this sentinal.
Since the comments in the generic kernel code indicate that the
struct file will never be used for anything other than passing
straight through to readpage(), this should be ok.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes an oops reported by Adrian Bunk in cifs_user_read when a null
read response is returned on a forcedirectio mount.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If 2 threads attached to the same process are blocking on different locks on
different files (maybe even on different servers) but have the same lock
arguments (i.e. same offset+length - actually quite common, since most
processes try to lock the entire file) then the first GRANTED call that wakes
one up will also wake the other.
Currently when the NLM_GRANTED callback comes in, lockd walks the list of
blocked locks in search of a match to the lock that the NLM server has
granted. Although it checks the lock pid, start and end, it fails to check
the filehandle and the server address.
By checking the filehandle and server IP address, we ensure that this only
happens if the locks truly are referencing the same file.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch reverts commit f93ea411b73594f7d144855fd34278bcf34a9afc:
[PATCH] jbd: split checkpoint lists
This broke journal_flush() for OCFS2, which is its method of being sure
that metadata is sent to disk for another node.
And two related commits 8d3c7fce2d20ecc3264c8d8c91ae3beacdeaed1b and
43c3e6f5abdf6acac9b90c86bf03f995bf7d3d92 with the subjects:
[PATCH] jbd: log_do_checkpoint fix
[PATCH] jbd: remove_transaction fix
These seem to be incremental bugfixes on the original patch and as such are
no longer needed.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds back O_DIRECT support with various caveats
attached:
1. Journaled data can be read via O_DIRECT since its now the
same on disk format as normal data files.
2. Journaled data writes with O_DIRECT will be failed sliently
back to normal writes (should we really do this I wonder or
should we return an error instead?)
3. Stuffed files will be failed back to normal buffered I/O
4. All the usual corner cases (write beyond current end of file,
write to an unallocated block) will also revert to normal buffered I/O.
The I/O path is slightly odd as reads arrive at the page cache layer
with the lock for the file already held, but writes arrive unlocked.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This fixes a potential oops if there is an error reported by
posix_acl_from_disk(). This is mostly theoretical due to the use of
magics and checksums in xattrs, but is still possible.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There were one or two fields in structures which didn't get changed
last time back to their gfs1 sizes and alignments. One or two constants
have also changed back to their original values which were missed the
first time.
Its possible that indirect pointer blocks might need to change. If
they don't we'll have to rewrite them all on upgrade due to a change
in the amount of padding that they use.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Umount is now working correctly again. The bug was due to
not getting an extra ref count when mounting the fs. We
should have bumped it by two (once for the internal pointer
to the root inode from the super block and once for the
inode hanging off the dcache entry for root).
Also this patch tidys up the code dealing with looking up
and creating inodes. We now pass Linux inodes (with gfs2_inodes
attached) rather than the other way around and this reduces code
duplication in various places.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Unfortunately, the reiserfs_attrs_cleared bit in the superblock flag can
lie. File systems have been observed with the bit set, yet still contain
garbage in the stat data field, causing unpredictable results.
This patch backs out the enable-by-default behavior.
It eliminates the changes from: d50a5cd860ce721dbeac6a4f3c6e42abcde68cd8,
and ef5e5414e7a83eb9b4295bbaba5464410b11e030.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With David Woodhouse <dwmw2@infradead.org>
select() presently has a habit of increasing the value of the user's
`timeout' argument on return.
We were writing back a timeout larger than the original. We _deliberately_
round up, since we know we must wait at _least_ as long as the caller asks
us to.
The patch adds a couple of helper functions for magnitude comparison of
timespecs and of timevals, and uses them to prevent the various poll and
select functions from returning a timeout which is larger than the one which
was passed in.
The patch also fixes a bug in compat_sys_pselect7(): it was adding the new
timeout value to the old one and was returning that. It should just return
the new timeout value.
(We have various handy timespec/timeval-to-from-nsec conversion functions in
time.h. But this code open-codes it all).
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@muc.de>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: george anzinger <george@mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The *at patches introduced fstatat and, due to inusfficient research, I
used the newfstat functions generally as the guideline. The result is that
on 32-bit platforms we don't have all the information needed to implement
fstatat64.
This patch modifies the code to pass up 64-bit information if
__ARCH_WANT_STAT64 is defined. I renamed the syscall entry point to make
this clear. Other archs will continue to use the existing code. On x86-64
the compat code is implemented using a new sys32_ function. this is what
is done for the other stat syscalls as well.
This patch might break some other archs (those which define
__ARCH_WANT_STAT64 and which already wired up the syscall). Yet others
might need changes to accomodate the compatibility mode. I really don't
want to do that work because all this stat handling is a mess (more so in
glibc, but the kernel is also affected). It should be done by the arch
maintainers. I'll provide some stand-alone test shortly. Those who are
eager could compile glibc and run 'make check' (no installation needed).
The patch below has been tested on x86 and x86-64.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a very large patch, with a few still to be resolved issues
so you might want to check out the previous head of the tree since
this is known to be unstable. Fixes for the various bugs will be
forthcoming shortly.
This patch removes the special data format which has been used
up till now for journaled data files. Directories still retain the
old format so that they will remain on disk compatible with earlier
releases. As a result you can now do the following with journaled
data files:
1) mmap them
2) export them over NFS
3) convert to/from normal files whenever you want to (the zero length
restriction is gone)
In addition the level at which GFS' locking is done has changed for all
files (since they all now use the page cache) such that the locking is
done at the page cache level rather than the level of the fs operations.
This should mean that things like loopback mounts and other things which
touch the page cache directly should now work.
Current known issues:
1. There is a lock mode inversion problem related to the resource
group hold function which needs to be resolved.
2. Any significant amount of I/O causes an oops with an offset of hex 320
(NULL pointer dereference) which appears to be related to a journaled data
buffer appearing on a list where it shouldn't be.
3. Direct I/O writes are disabled for the time being (will reappear later)
4. There is probably a deadlock between the page lock and GFS' locks under
certain combinations of mmap and fs operation I/O.
5. Issue relating to ref counting on internally used inodes causes a hang
on umount (discovered before this patch, and not fixed by it)
6. One part of the directory metadata is different from GFS1 and will need
to be resolved before next release.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Direct backport of 2.4 fix that didn't get propagated to 2.6; original
comment follows:
<quote>
When I specify the NFS port for nfsroot (e.g.,
nfsroot=<dir>,port=2049), the
kernel uses the wrong port. In my case it tries to use 264 (0x108)
instead
of 2049 (0x801).
This patch adds the missing htons().
Eric
</quote>
Patch got applied in 2.4.21-pre6. Author: Eric Lammerts (<eric@lammerts.org>,
AFAICS).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
A bunch of asm/bug.h includes are both not needed (since it will get
pulled anyway) and bogus (since they are done too early). Removed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If the namespace structure is being shared, allocate a new one and copy
information from the current, shared, structure.
Signed-off-by: Janak Desai <janak@us.ibm.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We had a user trigger this message on a box that had a lot of different
mounts, all with different options. It might help narrow down wtf happened
if we print out which device failed.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix one-shot support in inotify. We currently drop the IN_ONESHOT flag
during watch addition. Fix is to not do that.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix do_path_lookup() to avoid accessing invalid dentry or inode when the
link_path_walk() has failed. This should fix Bugme #5897.
Signed-off-by: Suzuki K P <suzuki@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I just noticed that my patch "don't create on open that fails due to
ERR_GRACE" (recently commited as fb553c0f17444e090db951b96df4d2d71b4f4b6b)
had an obvious problem that causes a deadlock on reboot recovery. Sending
in this now since it seems like a clear 2.6.16 candidate.--b.
We're returning with a lock held in some error cases.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
the clustering of extra pages in a buffered write.
SGI-PV: 949210
SGI-Modid: xfs-linux-melb:xfs-kern:25130a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
Fix trivial type mixup in the debugfs function comments.
Signed-off-by: Vincent Hanquez <vincent@snarc.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When walking a path, the LOOKUP_CONTINUE flag is used by some filesystems
(for instance NFS) in order to determine whether or not it is looking up
the last component of the path. It this is the case, it may have to look
at the intent information in order to perform various tasks such as atomic
open.
A problem currently occurs when link_path_walk() hits a symlink. In this
case LOOKUP_CONTINUE may be cleared prematurely when we hit the end of the
path passed by __vfs_follow_link() (i.e. the end of the symlink path)
rather than when we hit the end of the path passed by the user.
The solution is to have link_path_walk() clear LOOKUP_CONTINUE if and only
if that flag was unset when we entered the function.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ben points out that:
When writing files out using O_SYNC, jbd's 1 jiffy delay results in a
significant drop in throughput as the disk sits idle. The patch below
results in a 4-5x performance improvement (from 6.5MB/s to ~24-30MB/s on my
IDE test box) when writing out files using O_SYNC.
So optimise the batching code by omitting it entirely if the process which is
doing a sync write is the same as the one which did the most recent sync
write. If that's true, we're unlikely to get any other processes joining the
transaction.
(Has been in -mm for ages - it took me a long time to get on to performance
testing it)
Numbers, on write-cache-disabled IDE:
/usr/bin/time -p synctest -n 10 -uf -t 1 -p 1 dir-name
Unpatched:
40 seconds
Patched:
35 seconds
Batching disabled:
35 seconds
This is the problematic single-process-doing-fsync case. With multiple
fsyncing processes the numbers are AFACIT unaltered by the patch.
Aside: performance testing and instrumentation shows that the transaction
batching almost doesn't help (testing with synctest -n 1 -uf -t 100 -p 10
dir-name on non-writeback-caching IDE). This is because by the time one
process is running a synchronous commit, a bunch of other processes already
have a transaction handle open, so they're all going to batch into the same
transaction anyway.
The batching seems to offer maybe 5-10% speedup with this workload, but I'm
pretty sure it was more important than that when it was first developed 4-odd
years ago...
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The last fix for this function in fact opened up a much more often
triggering race.
It was uncommented tricky code, that was buggy. Add comment, make it less
tricky and fix bug.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.
As a preparation for changing that, we need to convert various 0 -> NR_CPUS
loops to use for_each_cpu().
(The above only applies to users of asm-generic/percpu.h. powerpc has gone it
alone and is presently only allocating memory for present CPUs, so it's
currently corrupting memory).
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The mount path had incorrectly asked the locking code to wait for recovery
completion, which deadlocks things because recovery waits for mount to
complete first.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
configfs always made item and attribute ownership root.root and
permissions based on a umask of 022. Add ->setattr() to allow
chown(2)/chmod(2), and persist the changes for the lifetime of the
items and attributes.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
fs/ocfs2/dlm/dlmrecovery.c does now use msleep(), and does therefore
need to #include <linux/delay.h> for getting the prototype of this
function.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* fix a hang which can occur during shutdown migration
* do not allow nodes to join during recovery
* when restarting lock mastery, do not ignore nodes which come up
* more than one node could become recovery master, fix this
* sleep to allow some time for heartbeat state to catch up to network
* extra debug info for bad recovery state problems
* make DLM_RECO_NODE_DATA_DONE a valid state for non-master recovery nodes
* prune all locks from dead nodes on $RECOVERY lock resources
* do NOT automatically add new nodes to mle nodemaps until they have properly
joined the domain
* make sure dlm_pick_recovery_master only exits when all nodes have synced
* properly handle dlmunlock errors in dlm_pick_recovery_master
* do not propagate network errors in dlm_send_begin_reco_message
* dead nodes were not being put in the recovery map sometimes, fix this
* dlmunlock was failing to clear the unlock actions on DLM_DENIED
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Including <asm/signal.h> results in compilation failure on ia64 due to
not including <linux/compiler.h>
Including <linux/signal.h> corrects the problem.
Please apply.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Functions called by __init funtions mustn't be __exit.
Reported by Jan-Benedict Glaw <jbglaw@lug-owl.de>.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>