Previously we tracked whether the integrity metadata had been remapped
using a request flag. This was fine for low-level retries. However, if
an I/O was redriven by upper layers we would end up remapping again,
causing the retry to fail.
Deprecate the REQ_INTEGRITY flag and introduce BIO_MAPPED_INTEGRITY
which enables filesystems to notify lower layers that the bio in
question has already been remapped.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This adds a necessary race breaker to these commits:
drbd: fix for possible deadlock on IO error during resync
drbd: drop wrong debug asserts, fix recently introduced race
What we do is get a refcount, check the state, then depending on the
state and the requested minimum disk state, either hold it (success),
or give it back immediately (failed "try lock").
Some code paths (flushing of drbd metadata) may still grab and hold a
refcount even if we are D_FAILED (application IO won't).
So even if we hit local_cnt == 0 once after being D_FAILED,
we still need to wait for that again after we changed to D_DISKLESS.
Once local_cnt reaches 0 while we are D_DISKLESS, we can be sure that
no one will look at the protected members anymore, so only then is it
safe to free them.
We cannot easily convert to standard locking primitives here, as we want
to be able to use it in atomic context (we always do a "try lock"),
as well as hold references for a "long time" (from IO submission to
completion callback).
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Commit e9677b3ce (oprofile, ARM: Use oprofile_arch_exit() to
cleanup on failure) caused oprofile_perf_exit to be called
in the cleanup path of oprofile_perf_init. The __exit tag
for oprofile_perf_exit should therefore be dropped.
The same has to be done for exit_driverfs as well, as this
function is called from oprofile_perf_exit. Else, we get
the following two linker errors.
LD .tmp_vmlinux1
`oprofile_perf_exit' referenced in section `.init.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
make: *** [.tmp_vmlinux1] Error 1
LD .tmp_vmlinux1
`exit_driverfs' referenced in section `.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Anand Gadiyar <gadiyar@ti.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Tony Luck reports that the addition of the access_ok() check in commit
0eead9ab41da ("Don't dump task struct in a.out core-dumps") broke the
ia64 compile due to missing the necessary header file includes.
Rather than add yet another include (<asm/unistd.h>) to make everything
happy, just uninline the silly core dump helper functions and move the
bodies to fs/exec.c where they make a lot more sense.
dump_seek() in particular was too big to be an inline function anyway,
and none of them are in any way performance-critical. And we really
don't need to mess up our include file headers more than they already
are.
Reported-and-tested-by: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
akiphie points out that a.out core-dumps have that odd task struct
dumping that was never used and was never really a good idea (it goes
back into the mists of history, probably the original core-dumping
code). Just remove it.
Also do the access_ok() check on dump_write(). It probably doesn't
matter (since normal filesystems all seem to do it anyway), but he
points out that it's normally done by the VFS layer, so ...
[ I suspect that we should possibly do "vfs_write()" instead of
calling ->write directly. That also does the whole fsnotify and write
statistics thing, which may or may not be a good idea. ]
And just to be anal, do this all for the x86-64 32-bit a.out emulation
code too, even though it's not enabled (and won't currently even
compile)
Reported-by: akiphie <akiphie@lavabit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Various cleanup paths have been incomplete, for the very unlikely case
that we cannot allocate enough bios from process context when submitting
on behalf of the peer or resync process.
Never observed.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Connections through a compressing proxy might have more bits
on the fly. 500MByte instead of 50MByte
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
There are three ways to get IO suspended:
* Loss of any access to data
* Fence-peer-handler running
* User requested to suspend IO
Track those in different bits, so that one condition clearing its
state bit does not interfere with the other two conditions.
Only when the user resumes IO he overrules all three bits.
The fact is hidden from the user, he sees only a single suspend
bit.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We now track the data rate of locally submitted resync related requests,
and can thus detect non-resync activity on the lower level device.
If the current sync rate is above c-min-rate, and the lower level device
appears to be busy, we throttle the resyncer.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When no data is accessible (no connection to the peer, nor a local disk)
allow the user to select to freeze all IO operations instead of getting
IO errors.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Define dummy __stop_machine() function even when
CONFIG_STOP_MACHINE=n. This getcpu-required version of
stop_machine() will be used from poke_text_smp().
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20101014031030.4100.34156.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Based on suggestion by Rémi Denis-Courmont to implement 'connect'
for Pipe controller logic, this patch implements 'connect' socket
call for the Pipe controller logic.
The patch does following:-
- Removes setsockopts for PNPIPE_CREATE and PNPIPE_DESTROY
- Adds setsockopt for setting the Pipe handle value
- Implements connect socket call
- Updates the Pipe controller logic
User-space should now follow below sequence with Pipe controller:-
-socket
-bind
-setsockopt for PNPIPE_PIPE_HANDLE
-connect
-setsockopt for PNPIPE_ENCAP_IP
-setsockopt for PNPIPE_ENABLE
GPRS/3G data has been tested working fine with this.
Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
wext: fix alignment problem in serializing 'struct iw_point'
This fixes a typo in the definition of the serialized length of struct iw_point:
a) wireless.h is exported to userspace, the typo causes IW_EV_POINT_PK_LEN
to be 12 on 64-bit, and 8 on 32-bit systems (causing misalignment);
b) in compat-64 mode iwe_stream_add_point() memcpys overlap (see below).
The second case in in compat-64 mode looks like (variable names are as in
include/net/iw_handler.h:iwe_stream_add_point()):
point_len = IW_EV_COMPAT_POINT_LEN = 8
lcp_len = IW_EV_COMPAT_LCP_LEN = 4
2nd memcpy: IW_EV_POINT_PK_LEN - IW_EV_LCP_PK_LEN = 12 - 4 = 8
IW_EV_LCP_PK_LEN
<--------------> *---> 'extra' data area
+-------+-------+-------+-------+---------------+------- ...-+
| len | cmd |length | flags | (empty) -> extra ... |
+-------+-------+-------+-------+---------------+------- ...-+
2 2 2 2 4
lcp_len
<--------------> <-!! OVERLAP !!>
<--1st memcpy--><------- 2nd memcpy ----------->
<---- 3rd memcpy ------- ... >
<--------- point_len ---------->
This case could cause overrun whenever iw_point.length < 4.
The other two cases are -
* 32-bit systems: IW_EV_POINT_PK_LEN - IW_EV_LCP_PK_LEN = 8 - 4 = 4,
the second memcpy copies exactly the 4 required bytes;
* 64-bit systems: IW_EV_POINT_PK_LEN - IW_EV_LCP_PK_LEN = 12 - 4 = 8,
the second memcpy copies a superfluous (but non overlapping) 4 bytes.
The patch changes IW_EV_POINT_PK_LEN to be 8, so that in all 3 cases always only
the requested iw_point.{length,flags} (both __u16) are copied, avoiding overrrun
(compat-64) and superfluous copy (64-bit). In addition, the userspace header is
sanitized (in agreement with version 30 of the wireless tools).
Many thanks to Johannes Berg for help and review with this patch.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Physical block size was declared unsigned int to accomodate the maximum
size reported by READ CAPACITY(16). Make sure we use the right type in
the related functions.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Many of the used macros are just there for userspace compatibility.
Substitute the in-kernel code to directly use the terminal macro
and stuff the defines into #ifndef __KERNEL__ sections.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
What is the dev pointer doing inside the platform data anyway.
We have another pointer to the actual device at hand, use that.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This patch adds spi->mode support for the AMBA pl022 driver and
allows spidev to correctly alter SPI modes. Unused fields used in
the pl022 header file for the pl022_config_chip have been removed.
The ab8500 client driver selects the data transfer size instead
of the platform data.
For platforms that use the amba pl022 driver, the unused fields
in the controller data structure have been removed and the .mode
field in the SPI board info structure is used instead.
Signed-off-by: Kevin Wells <wellsk40@gmail.com>
Tested-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This extends the PL022 SSP/SPI driver with generic DMA engine
support using the PrimeCell DMA engine interface. Also fix up the
test code for the U300 platform.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
We need to round memory regions correctly -- specifically, we need to
round reserved region in the more expansive direction (lower limit
down, upper limit up) whereas usable memory regions need to be rounded
in the more restrictive direction (lower limit up, upper limit down).
This introduces two set of inlines:
memblock_region_memory_base_pfn()
memblock_region_memory_end_pfn()
memblock_region_reserved_base_pfn()
memblock_region_reserved_end_pfn()
Although they are antisymmetric (and therefore are technically
duplicates) the use of the different inlines explicitly documents the
programmer's intention.
The lack of proper rounding caused a bug on ARM, which was then found
to also affect other architectures.
Reported-by: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CB4CDFD.4020105@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
We tried very hard to remove all possible dev_hold()/dev_put() pairs in
network stack, using RCU conversions.
There is still an unavoidable device refcount change for every dst we
create/destroy, and this can slow down some workloads (routers or some
app servers, mmap af_packet)
We can switch to a percpu refcount implementation, now dynamic per_cpu
infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes
per device.
On x86, dev_hold(dev) code :
before
lock incl 0x280(%ebx)
after:
movl 0x260(%ebx),%eax
incl fs:(%eax)
Stress bench :
(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE)
Before:
real 1m1.662s
user 0m14.373s
sys 12m55.960s
After:
real 0m51.179s
user 0m15.329s
sys 10m15.942s
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The move_irq_desc() function was only used due to the problem that the
allocator did not free the old descriptors. So the descriptors had to
be moved in create_irq_nr(). That's history.
The code would have never been able to move active interrupt
descriptors on affinity settings. That can be done in a completely
different way w/o all this horror.
Remove all of it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Use the cleanup functions of the dynamic allocator. No need to have
separate implementations.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
This function should have not been there in the first place.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
irq_2_iommu is now in the x86 code where it belongs. Remove all
leftovers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
irq_2_iommu is in struct irq_cfg, so we can do the irq_remapped check
based on irq_cfg instead of going through a lookup function. That's
especially interesting in the eoi_ioapic_irq() hotpath.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
That interrupt remapping code is x86 specific and tied to the io_apic
code. No need for separate allocator functions in the interrupt
remapping code. This allows to simplify the code and irq_2_iommu is
small (13 bytes on 64bit) so it's not a real problem even if interrupt
remapping is runtime disabled. If it's compile time disabled the
impact is zero.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Handing down irq_desc to msi just so that msi can access
irq_desc.irq_data.msi_desc is a pretty stupid idea. The calling code
can hand down a pointer to msi_desc so msi code does not need to know
about the irq descriptor at all.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
sparse irq sets up NR_IRQS_LEGACY irq descriptors and archs then go
ahead and allocate more.
Use the unused return value of arch_probe_nr_irqs() to let the
architecture return the number of early allocations. Fix up all users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Mark a range of interrupts as allocated. In the SPARSE_IRQ=n case we
need this to update the bitmap for the legacy irqs so the enumerator
via irq_get_next_irq() works.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The current sparse_irq allocator has several short comings due to
failures in the design or the lack of it:
- Requires iteration over the number of active irqs to find a free slot
(Some architectures have grown their own workarounds for this)
- Removal of entries is not possible
- Racy between create_irq_nr and destroy_irq (plugged by horrible
callbacks)
- Migration of active irq descriptors is not possible
- No bulk allocation of irq ranges
- Sprinkeled irq_desc references all over the place outside of kernel/irq/
(The previous chip functions series is addressing this issue)
Implement a sane allocator which fixes the above short comings (though
migration of active descriptors needs a full tree wide cleanup of the
direct and mostly unlocked access to irq_desc).
The new allocator still uses a radix_tree, but uses a bitmap for
keeping track of allocated irq numbers. That allows:
- Fast lookup of a free slot
- Allows the removal of descriptors
- Prevents the create/destroy race
- Bulk allocation of consecutive irq ranges
- Basic design is ready for migration of life descriptors after
further cleanups
The bitmap is also used in the SPARSE_IRQ=n case for lookup and
raceless (de)allocation of irq numbers. So it removes the requirement
for looping through the descriptor array to find slots.
Right now it uses sparse_irq_lock to protect the bitmap and the radix
tree, but after cleaning up all users we should be able convert that
to a mutex and to switch the radix_tree and decriptor allocations to
GFP_KERNEL.
[ Folded in a bugfix from Yinghai Lu ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>