A user reported a problem booting into a new kernel with the old format inodes.
He was panicing in cow_file_range while writing out the inode cache. This is
because if the block group is not cached we'll just skip writing out the cache,
however if it gets dirtied again in the same transaction and it finished caching
we'd go ahead and write it out, but since we set cache_generation to the transid
we think we've already truncated it and will just carry on, running into
cow_file_range and blowing up. We need to make sure we only set
cache_generation if we've done the truncate. The user tested this patch and
verified that the panic no longer occured. Thanks,
Reported-and-Tested-by: Klaus Bitto <klaus.bitto@gmail.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
I've been hitting this BUG_ON() in btrfs_orphan_add when running xfstest 269 in
a loop. This is because we will add an orphan item, do the truncate, the
truncate will fail for whatever reason (*cough*ENOSPC*cough*) and then we're
left with an orphan item still in the fs. Then we come back later to do another
truncate and it blows up because we already have an orphan item. This is ok so
just fix the BUG_ON() to only BUG() if ret is not EEXIST. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
We were occasionaly leaking space when running xfstest 269. This is because if
we failed to start the transaction in the truncate loop we'd just goto out, but
we need to break so that the inode is removed from the orphan list and the space
is properly freed. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Running xfstests 269 with some tracing my scripts kept spitting out errors about
releasing bytes that we didn't actually have reserved. This took me down a huge
rabbit hole and it turns out the way we deal with reserved_extents is wrong,
we need to only be setting it if the reservation succeeds, otherwise the free()
method will come in and unreserve space that isn't actually reserved yet, which
can lead to other warnings and such. The math was all working out right in the
end, but it caused all sorts of other issues in addition to making my scripts
yell and scream and generally make it impossible for me to track down the
original issue I was looking for. The other problem is with our error handling
in the reservation code. There are two cases that we need to deal with
1) We raced with free. In this case free won't free anything because csum_bytes
is modified before we dro the lock in our reservation path, so free rightly
doesn't release any space because the reservation code may be depending on that
reservation. However if we fail, we need the reservation side to do the free at
that point since that space is no longer in use. So as it stands the code was
doing this fine and it worked out, except in case #2
2) We don't race with free. Nobody comes in and changes anything, and our
reservation fails. In this case we didn't reserve anything anyway and we just
need to clean up csum_bytes but not free anything. So we keep track of
csum_bytes before we drop the lock and if it hasn't changed we know we can just
decrement csum_bytes and carry on.
Because of the case where we can race with free()'s since we have to drop our
spin_lock to do the reservation, I'm going to serialize all reservations with
the i_mutex. We already get this for free in the heavy use paths, truncate and
file write all hold the i_mutex, just needed to add it to page_mkwrite and
various ioctl/balance things. With this patch my space leak scripts no longer
scream bloody murder. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Now that we're properly keeping track of delayed inode space we've been getting
a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is
because a bunch of people call mark_inode_dirty, which is void so we can't
return ENOSPC. This needs to be fixed in a few areas
1) file_update_time - this updates the mtime and such when writing to a file,
which will call mark_inode_dirty. So copy file_update_time into btrfs so we can
call btrfs_dirty_inode directly and return an error if we get one appropriately.
2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't
setting ->setattr for symlinks, even though we should have been. This catches
one of the cases where we were getting errors in mark_inode_dirty.
3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
instead of mark_inode_dirty. This lets us return errors properly for truncate
and chown/anything related to setattr.
4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
print an error if we have one. The only remaining user we can't control for
this is touch_atime(), but we don't really want to keep people from walking
down the tree if we don't have space to save the atime update, so just complain
but don't worry about it.
With this patch xfstests 83 complains a handful of times instead of hundreds of
times. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Al pointed out we have some random problems with the way we account for
num_workers_starting in the async thread stuff. First of all we need to make
sure to decrement num_workers_starting if we fail to start the worker, so make
__btrfs_start_workers do this. Also fix __btrfs_start_workers so that it
doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
failed to create a worker. Also check_pending_worker_creates needs to call
__btrfs_start_work in it's work function since it already increments
num_workers_starting.
People only start one worker at a time, so get rid of the num_workers argument
everywhere, and make btrfs_queue_worker a void since it will always succeed.
Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
The Smack LSM hook for security_d_instantiate checks
the inode's i_op->getxattr value to determine if the
containing filesystem supports extended attributes.
The BTRFS filesystem sets the inode's i_op value only
after it has instantiated the inode. This results in
Smack incorrectly giving new BTRFS inodes attributes
from the filesystem defaults on the assumption that
values can't be stored on the filesystem. This patch
moves the assignment of inode operation vectors ahead
of the calls to d_instantiate, letting Smack know that
the filesystem supports extended attributes. There
should be no impact on the performance or behavior of
BTRFS.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
If we have a constant stream of end_io completions or crc work,
we can hit softlockup messages from the async helper threads. This
adds a cond_resched() into the loop to avoid them.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Since we have the free space caches, btrfs_orphan_cleanup also runs for
the tree_root. Unfortunately this also cleans up the orphans used to mark
subvol deletions in progress.
Currently if a subvol deletion gets interrupted twice by umount/mount, the
deletion will not be continued and the space permanently lost, though it
would be possible to write a tool to recover those lost subvol deletions.
This patch checks if the orphan belongs to a subvol (dead root) and skips
the deletion.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When we use raid0 as the data profile, df command may show us a very
inaccurate value of the available space, which may be much less than the
real one. It may make the users puzzled. Fix it by changing the calculation
of the available space, and making it be more similar to a fake chunk
allocation.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Btrfsck report errors after the 83th case of xfstests was run, The error
number is 400, it means the used disk space of the file is wrong.
The reason of this bug is that:
The file truncation may fail when the space of the file system is not enough,
and leave some file extents, whose offset are beyond the end of the files.
When we want to expand those files, we will drop those file extents, and
put in dummy file extents, and then we should update the i-node. But btrfs
forgets to do it.
This patch adds the forgotten i-node update.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Btrfsck report error 100 after the 83th case of xfstests was run, it means
the i_size of the file is wrong.
The reason of this bug is that:
Btrfs increased i_size of the file at the beginning, but it failed to expand
the file, and failed to update the i_size to the old size because there is no
enough space in the file system, so we found a wrong i_size.
This patch fixes this bug by updating the i_size just when we pass the file
expanding and get enough space to update i-node.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The patch below removes an extra semicolon.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: linux-btrfs@vger.kernel.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
For 32-bit architectures using standard jiffies the idletime calculation
in uptime_proc_show will quickly overflow. It takes (2^32 / HZ) seconds
of idle-time, or e.g. 12.45 days with no load on a quad-core with HZ=1000.
Switch to 64-bit calculations.
Cc: stable@vger.kernel.org
Cc: Michael Abbott <michael.abbott@diamond.ac.uk>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Allow xfs_qm_dqput to work without trylock loops by nesting the freelist lock
inside the dquot qlock. In turn that requires trylocks in the reclaim path
instead, but given it's a classic tradeoff between fast and slow path, and
we follow the model of the inode and dentry caches.
Document our new lock order now that it has settled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABCAAGBQJO6PXBAAoJENNvdpvBGATwpuEP/2RCxmdWYZ8/6Z6pmTh3hHN5
fx6HckTdvLQOvbQs72wzVW0JKyc25QmW2mQc5z3MjSymjf/RbEKihPUITRNbHrTD
T2sP/lWu09AKLioEg4ucAKn/A7Do3UDIkXTszvVVP/t2psVPzLeJ1njQKra14Nyz
o0+gSlnwuGx9WaxfR+7MYNs2ikdSkXIeYsiFAOY4YOxwwC99J/lZ0YaNkbI7UBtC
yu2XLIvPboa5JZXANq2G3VhVIETMmOyRTCC76OAXjqkdp9nLFWDG0ydqQh0vVZwL
xQGOmAj+l3BNTE0QmMni1w7A0SBU3N6xBA5HN6Y49RlbsMYG27aN54Fy5K2R41I3
QXVhBL53VD6b0KaITcoz7jIGIy6qk9Wx+2WcCYtQBSIjL2YwlaJq0PL07+vRamex
sqHGDejcNY87i6AV0DP6SNuCFCi9xFYoAoMi9Wu5E9+T+Vck0okFzW/luk/FvsSP
YA5Dh+vISyBeCnWQvcnBmsUQyf8d9MaNnejZ48ath+GiiMfY8USAZ29RAG4VuRtS
9DAyTTIBA73dKpnvEV9u4i8Lwd8hRVMOnPyOO785NwEXk3Ng08pPSSbMklW6UfCY
4nr5UNB13ZPbXx4uoAvATMpCpYxMaLEdxmeMvgXpkekl0hHBzpVDey1Vu9fb/a5n
dQpo6WWG9HIJ23hOGAGR
=n3Lm
-----END PGP SIGNATURE-----
Merge tag 'tytso-for-linus-20111214' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* tag 'tytso-for-linus-20111214' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: handle EOF correctly in ext4_bio_write_page()
ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers()
ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesize
ext4: avoid hangs in ext4_da_should_update_i_disksize()
ext4: display the correct mount option in /proc/mounts for [no]init_itable
ext4: Fix crash due to getting bogus eh_depth value on big-endian systems
ext4: fix ext4_end_io_dio() racing against fsync()
.. using the new signed tag merge of git that now verifies the gpg
signature automatically. Yay. The branchname was just 'dev', which is
prettier. I'll tell Ted to use nicer tag names for future cases.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/ncpfs: fix error paths and goto statements in ncp_fill_super()
configfs: register_filesystem() called too early
fuse: register_filesystem() called too early
ubifs: too early register_filesystem()
... and the same kind of leak for mqueue
procfs: fix a vfsmount longterm reference leak
Introduce a new XFS_DQ_FREEING flag that tells lookup and mplist walks
to skip a dquot that is beeing freed, and use this avoid the trylock
on the hash and mplist locks in xfs_qm_dqreclaim_one. Also simplify
xfs_dqpurge by moving the inodes to a dispose list after marking them
XFS_DQ_FREEING and avoid the locker ordering constraints.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
The label 'out_bdi' should be followed by bdi_destroy() instead of
fput() which should be after the 'out_fput' label.
If bdi_setup_and_register() fails then jump to the 'out_fput' label
instead of the 'out_bdi' one.
If fget(data.info_fd) fails then jump to the previously fixed 'out_bdi'
label to call bdi_destroy() otherwise the bdi object will not be
destroyed.
Compile tested only.
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We need to zero out part of a page which beyond EOF before setting uptodate,
otherwise, mapread or write will see non-zero data beyond EOF.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
If a file is fallocated on a hole, map->m_lblk + map->m_len may be greater
than ee_block + ee_len.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
If a page has been read into memory and never been written, it has no
buffers, but we should handle the page in truncate or punch hole.
VFS code of writing operations has handled holes correctly, so this
patch removes the code handling holes in writing operations.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
If there is an unwritten but clean buffer in a page and there is a
dirty buffer after the buffer, then mpage_submit_io does not write the
dirty buffer out. As a result, da_writepages loops forever.
This patch fixes the problem by checking dirty flag.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.
gdb> bt
#0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
#1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
#2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
#3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
#4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
xffff88001e26be40) at mm/filemap.c:2600
#5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
zed out>, pos=<value optimized out>) at mm/filemap.c:2632
#6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
t fs/ext4/file.c:136
#7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
ppos=0xffff88001e26bf48) at fs/read_write.c:406
#8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
000, pos=0xffff88001e26bf48) at fs/read_write.c:435
#9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
4000) at fs/read_write.c:487
#10 <signal handler called>
#11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
#12 0x0000000000000000 in ?? ()
gdb> print offset
$22 = 0xffffffffffffffff
gdb> print idx
$23 = 0xffffffff
gdb> print inode->i_blkbits
$24 = 0xc
gdb> up
#1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
2512 if (ext4_da_should_update_i_disksize(page, end)) {
gdb> print start
$25 = 0x0
gdb> print end
$26 = 0xffffffffffffffff
gdb> print pos
$27 = 0x108000
gdb> print new_i_size
$28 = 0x108000
gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
$29 = 0xd9000
gdb> down
2467 for (i = 0; i < idx; i++)
gdb> print i
$30 = 0xd44acbee
This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: add missing spin_unlock at ceph_mdsc_build_path()
ceph: fix SEEK_CUR, SEEK_SET regression
crush: fix mapping calculation when force argument doesn't exist
ceph: use i_ceph_lock instead of i_lock
rbd: remove buggy rollback functionality
rbd: return an error when an invalid header is read
ceph: fix rasize reporting by ceph_show_options
* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: set max_pause to lowest value on zero bdi_dirty
writeback: permit through good bdi even when global dirty exceeded
writeback: comment on the bdi dirty threshold
fs: Make write(2) interruptible by a fatal signal
writeback: Fix issue on make htmldocs
Do not remove dquots from the freelist when we grab a reference to them in
xfs_qm_dqlookup, but leave them on the freelist util scanning notices that
they have a reference. This speeds up the lookup fastpath, and greatly
simplifies the lock ordering constraints. Note that the same scheme is
used by the VFS inode and dentry caches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Free dquots when purging them during umount instead of keeping them around
on the freelist in a degraded state. The out of order locking in
xfs_qm_dqpurge will be removed again later in this series.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Rearrange the code to avoid the conditional locking around the flist_locked
variable. This means we lose a (rather pointless) assert, and hold the
freelist lock a bit longer for one corner case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Commit 06222e491e663dac939f04b125c9dc52126a75c4 got the if wrong so that
it always evaluates as true. This is semantically harmless, but makes
SEEK_CUR and SEEK_SET needlessly query the server.
Rewrite the if to explicitly enumerate the cases we DO need a valid i_size
to make this code less fragile.
Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
Fix race between lseek(fd, 0, SEEK_CUR) and read/write. This was fixed in
generic code by commit 5b6f1eb97d (vfs: lseek(fd, 0, SEEK_CUR) race condition).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
The test in fuse_file_llseek() "not SEEK_CUR or not SEEK_SET" always evaluates
to true.
This was introduced in 3.1 by commit 06222e49 (fs: handle SEEK_HOLE/SEEK_DATA
properly in all fs's that define their own llseek) and changed the behavior of
SEEK_CUR and SEEK_SET to always retrieve the file attributes. This is a
performance regression.
Fix the test so that it makes sense.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@vger.kernel.org
CC: Josef Bacik <josef@redhat.com>
CC: Al Viro <viro@zeniv.linux.org.uk>
Fix two bugs in fuse_retrieve():
- retrieving more than one page would yield repeated instances of the
first page
- if more than FUSE_MAX_PAGES_PER_REQ pages were requested than the
request page array would overflow
fuse_retrieve() was added in 2.6.36 and these bugs had been there since the
beginning.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@vger.kernel.org
/proc/mounts was showing the mount option [no]init_inode_table when
the correct mount option that will be accepted by parse_options() is
[no]init_itable.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Mark the trivial lock wrappers as inline, and make the naming consistent
for all of them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
It always is zero, and removing it will make future changes easier.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Now that we can't have any dirty dquots around that aren't in the AIL we
can get rid of the explicit dquot syncing from xfssyncd and xfs_fs_sync_fs
and instead rely on AIL pushing to write out any quota updates.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Make sure we do not skip any dquots when flushing them out after a
quotacheck to make sure that we will never have any dirty dquots on a
live filesystem. At this point no dquot should be pinnable, but lets
be pedantic about it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Only skip pinned dquots if SYNC_TRYLOCK is specified, and adjust the callers
to keep the behaviour unchanged.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Commit 1939dd84b3 ("ext4: cleanup ext4_ext_grow_indepth code") added a
reference to ext4_extent_header.eh_depth, but forget to pass the value
read through le16_to_cpu. The result is a crash on big-endian
machines, such as this crash on a POWER7 server:
attempt to access beyond end of device
sda8: rw=0, want=776392648163376, limit=168558560
Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6bcb
Faulting instruction address: 0xc0000000001f5f38
cpu 0x14: Vector: 300 (Data Access) at [c000001bd1aaecf0]
pc: c0000000001f5f38: .__brelse+0x18/0x60
lr: c0000000002e07a4: .ext4_ext_drop_refs+0x44/0x80
sp: c000001bd1aaef70
msr: 9000000000009032
dar: 6b6b6b6b6b6b6bcb
dsisr: 40000000
current = 0xc000001bd15b8010
paca = 0xc00000000ffe4600
pid = 19911, comm = flush-8:0
enter ? for help
[c000001bd1aaeff0] c0000000002e07a4 .ext4_ext_drop_refs+0x44/0x80
[c000001bd1aaf090] c0000000002e0c58 .ext4_ext_find_extent+0x408/0x4c0
[c000001bd1aaf180] c0000000002e145c .ext4_ext_insert_extent+0x2bc/0x14c0
[c000001bd1aaf2c0] c0000000002e3fb8 .ext4_ext_map_blocks+0x628/0x1710
[c000001bd1aaf420] c0000000002b2974 .ext4_map_blocks+0x224/0x310
[c000001bd1aaf4d0] c0000000002b7f2c .mpage_da_map_and_submit+0xbc/0x490
[c000001bd1aaf5a0] c0000000002b8688 .write_cache_pages_da+0x2c8/0x430
[c000001bd1aaf720] c0000000002b8b28 .ext4_da_writepages+0x338/0x670
[c000001bd1aaf8d0] c000000000157280 .do_writepages+0x40/0x90
[c000001bd1aaf940] c0000000001ea830 .writeback_single_inode+0xe0/0x530
[c000001bd1aafa00] c0000000001eb680 .writeback_sb_inodes+0x210/0x300
[c000001bd1aafb20] c0000000001ebc84 .__writeback_inodes_wb+0xd4/0x140
[c000001bd1aafbe0] c0000000001ebfec .wb_writeback+0x2fc/0x3e0
[c000001bd1aafce0] c0000000001ed770 .wb_do_writeback+0x2f0/0x300
[c000001bd1aafdf0] c0000000001ed848 .bdi_writeback_thread+0xc8/0x340
[c000001bd1aafed0] c0000000000c5494 .kthread+0xb4/0xc0
[c000001bd1aaff90] c000000000021f48 .kernel_thread+0x54/0x70
This is due to getting ext_depth(inode) == 0x101 and therefore running
off the end of the path array in ext4_ext_drop_refs into following
unallocated structures.
This fixes it by adding the necessary le16_to_cpu.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We need to make sure iocb->private is cleared *before* we put the
io_end structure on i_completed_io_list. Otherwise fsync() could
potentially run on another CPU and free the iocb structure out from
under us.
Reported-by: Kent Overstreet <koverstreet@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org