15017 Commits

Author SHA1 Message Date
Marcelo Tosatti
02626b6af5 KVM: x86: fix kvm_write_tsc() TSC matching thinko
kvm_write_tsc() converts from guest TSC to microseconds, not nanoseconds
as intended. The result is that the window for matching is 1000 seconds,
not 1 second.

Microsecond precision is enough for checking whether the TSC write delta
is within the heuristic values, so use it instead of nanoseconds.

Noted by Avi Kivity.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-20 12:40:36 +02:00
Marcelo Tosatti
b74f05d61b x86: kvmclock: abstract save/restore sched_clock_state
Upon resume from hibernation, CPU 0's hvclock area contains the old
values for system_time and tsc_timestamp. It is necessary for the
hypervisor to update these values with uptodate ones before the CPU uses
them.

Abstract TSC's save/restore sched_clock_state functions and use
restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update.

Also move restore_sched_clock_state before __restore_processor_state,
since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC).
Thanks to Igor Mammedov for tracking it down.

Fixes suspend-to-disk with kvmclock.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-20 12:37:45 +02:00
Linus Torvalds
b0e37d7ac6 Merge branch 'dcache-word-accesses'
* branch 'dcache-word-accesses':
  vfs: use 'unsigned long' accesses for dcache name comparison and hashing

This does the name hashing and lookup using word-sized accesses when
that is efficient, namely on x86 (although any little-endian machine
with good unaligned accesses would do).

It does very much depend on little-endian logic, but it's a very hot
couple of functions under some real loads, and this patch improves the
performance of __d_lookup_rcu() and link_path_walk() by up to about 30%.
Giving a 10% improvement on some very pathname-heavy benchmarks.

Because we do make unaligned accesses past the filename, the
optimization is disabled when CONFIG_DEBUG_PAGEALLOC is active, and we
effectively depend on the fact that on x86 we don't really ever have the
last page of usable RAM followed immediately by any IO memory (due to
ACPI tables, BIOS buffer areas etc).

Some of the bit operations we do are a bit "subtle".  It's commented,
but you do need to really think about the code.  Or just consider it
black magic.

Thanks to people on G+ for some of the optimized bit tricks.
2012-03-19 16:37:28 -07:00
Eric Dumazet
dc72d99dab net: bpf_jit: fix BPF_S_LDX_B_MSH compilation
Matt Evans spotted that x86 bpf_jit was incorrectly handling negative
constant offsets in BPF_S_LDX_B_MSH instruction.

We need to abort JIT compilation like we do in common_load so that
filter uses the interpreter code and can call __load_pointer()

Reference: http://lists.openwall.net/netdev/2011/07/19/11

Thanks to Indan Zupancic to bring back this issue.

Reported-by: Matt Evans <matt@ozlabs.org>
Reported-by: Indan Zupancic <indan@nul.nu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-19 17:41:44 -04:00
Steffen Persvold
943bc7e110 x86: Fix section warnings
Fix the following section warnings :

WARNING: vmlinux.o(.text+0x49dbc): Section mismatch in reference
from the function acpi_map_cpu2node() to the variable
.cpuinit.data:__apicid_to_node The function acpi_map_cpu2node()
references the variable __cpuinitdata __apicid_to_node. This is
often because acpi_map_cpu2node lacks a __cpuinitdata
annotation or the annotation of __apicid_to_node is wrong.

WARNING: vmlinux.o(.text+0x49dc1): Section mismatch in reference
from the function acpi_map_cpu2node() to the function
.cpuinit.text:numa_set_node() The function acpi_map_cpu2node()
references the function __cpuinit numa_set_node(). This is often
because acpi_map_cpu2node lacks a __cpuinit  annotation or the
annotation of numa_set_node is wrong.

WARNING: vmlinux.o(.text+0x526e77): Section mismatch in
reference from the function prealloc_protection_domains() to the
function .init.text:alloc_passthrough_domain() The function
prealloc_protection_domains() references the function __init
alloc_passthrough_domain(). This is often because
prealloc_protection_domains lacks a __init  annotation or the annotation of alloc_passthrough_domain is wrong.

Signed-off-by: Steffen Persvold <sp@numascale.com>
Link: http://lkml.kernel.org/r/1331810188-24785-1-git-send-email-sp@numascale.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-19 12:01:01 +01:00
Dan Carpenter
c7b738351b x86, efi: Fix pointer math issue in handle_ramdisks()
"filename" is a efi_char16_t string so this check for reaching the end
of the array doesn't work.  We need to cast the pointer to (u8 *) before
doing the math.

This patch changes the "filename" to "filename_16" to avoid confusion in
the future.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/20120305180614.GA26880@elgon.mountain
Acked-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-03-16 13:03:24 -07:00
Jiri Olsa
641cc93881 perf: Adding sysfs group format attribute for pmu device
Adding sysfs group 'format' attribute for pmu device that
contains a syntax description on how to construct raw events.

The event configuration is described in following
struct pefr_event_attr attributes:

  config
  config1
  config2

Each sysfs attribute within the format attribute group,
describes mapping of name and bitfield definition within
one of above attributes.

eg:
  "/sys/...<dev>/format/event" contains "config:0-7"
  "/sys/...<dev>/format/umask" contains "config:8-15"
  "/sys/...<dev>/format/usr"   contains "config:16"

the attribute value syntax is:

  line:      config ':' bits
  config:    'config' | 'config1' | 'config2"
  bits:      bits ',' bit_term | bit_term
  bit_term:  VALUE '-' VALUE | VALUE

Adding format attribute definitions for x86 cpu pmus.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-vhdk5y2hyype9j63prymty36@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-16 14:06:06 -03:00
Alok Kataria
57779dc2b3 x86, tsc: Skip refined tsc calibration on systems with reliable TSC
While running the latest Linux as guest under VMware in highly
over-committed situations, we have seen cases when the refined TSC
algorithm fails to get a valid tsc_start value in
tsc_refine_calibration_work from multiple attempts. As a result the
kernel keeps on scheduling the tsc_irqwork task for later. Subsequently
after several attempts when it gets a valid start value it goes through
the refined calibration and either bails out or uses the new results.
Given that the kernel originally read the TSC frequency from the
platform, which is the best it can get, I don't think there is much
value in refining it.

So  for systems which get the TSC frequency from the platform we
should skip the refined tsc algorithm.

We can use the TSC_RELIABLE cpu cap flag to detect this, right now it is
set only on VMware and for Moorestown Penwell both of which have there
own TSC calibration methods.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Dirk Brandewie <dirk.brandewie@gmail.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: stable@kernel.org
[jstultz: Reworked to simply not schedule the refining work,
rather then scheduling the work and bombing out later]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-03-15 18:23:11 -07:00
Thomas Gleixner
2ab516575f x86: vdso: Use seqcount instead of seqlock
The update of the vdso data happens under xtime_lock, so adding a
nested lock is pointless. Just use a seqcount to sync the readers.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-03-15 18:17:58 -07:00
Thomas Gleixner
6c260d5863 x86: vdso: Remove bogus locking in update_vsyscall_tz()
Changing the sequence count in update_vsyscall_tz() is completely
pointless.

The vdso code copies the data unprotected. There is no point to change
this as sys_tz is nowhere protected at all. See sys_gettimeofday().

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-03-15 18:17:57 -07:00
John Stultz
a939e817aa time: x86: Fix race switching from vsyscall to non-vsyscall clock
When switching from a vsyscall capable to a non-vsyscall capable
clocksource, there was a small race, where the last vsyscall
gettimeofday before the switch might return a invalid time value
using the new non-vsyscall enabled clocksource values after the
switch is complete.

This is due to the vsyscall code checking the vclock_mode once
outside of the seqcount protected section. After it reads the
vclock mode, it doesn't re-check that the sampled clock data
that is obtained in the seqcount critical section still matches.

The fix is to sample vclock_mode inside the protected section,
and as long as it isn't VCLOCK_NONE, return the calculated
value. If it has changed and is now VCLOCK_NONE, fall back
to the syscall gettime calculation.

v2:
  * Cleanup checks as suggested by tglx
  * Also fix same issue present in gettimeofday path

CC: Andy Lutomirski <luto@amacapital.net>
CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-03-15 18:17:53 -07:00
Chris Metcalf
48b25c43e6 [PATCH v3] ipc: provide generic compat versions of IPC syscalls
When using the "compat" APIs, architectures will generally want to
be able to make direct syscalls to msgsnd(), shmctl(), etc., and
in the kernel we would want them to be handled directly by
compat_sys_xxx() functions, as is true for other compat syscalls.

However, for historical reasons, several of the existing compat IPC
syscalls do not do this.  semctl() expects a pointer to the fourth
argument, instead of the fourth argument itself.  msgsnd(), msgrcv()
and shmat() expect arguments in different order.

This change adds an ARCH_WANT_OLD_COMPAT_IPC config option that can be
set to preserve this behavior for ports that use it (x86, sparc, powerpc,
s390, and mips).  No actual semantics are changed for those architectures,
and there is only a minimal amount of code refactoring in ipc/compat.c.

Newer architectures like tile (and perhaps future architectures such
as arm64 and unicore64) should not select this option, and thus can
avoid having any IPC-specific code at all in their architecture-specific
compat layer.  In the same vein, if this option is not selected, IPC_64
mode is assumed, since that's what the <asm-generic> headers expect.

The workaround code in "tile" for msgsnd() and msgrcv() is removed
with this change; it also fixes the bug that shmat() and semctl() were
not being properly handled.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-03-15 13:13:38 -04:00
Dave Airlie
8229c885fe drm: Merge tag 'v3.3-rc7' into drm-core-next
Merge the fixes so far into core-next, needed to test
intel driver.

Conflicts:
	drivers/gpu/drm/i915/intel_ringbuffer.c
2012-03-15 10:24:32 +00:00
H. Peter Anvin
31796ac4e8 x32: Fix alignment fail in struct compat_siginfo
Adding struct _sigchld_x32 caused a misalignment cascade in struct
siginfo, because union _sifields is located on an 4-byte boundary
(8-byte misaligned.)

Adding new fields that are 8-byte aligned caused the intermediate
structures to also be aligned to 8 bytes, thereby adding padding in
unexpected places.

Thus, change s64 to compat_s64 here, which makes it "misaligned on
paper".  In reality these fields *are* actually aligned (there are 3
preceeding ints outside the union and 3 inside struct _sigchld_x32),
but because of the intervening union and struct it is not possible for
gcc to avoid padding without breaking the ABI.

Reported-and-tested-by: H. J. Lu <hjl.tools@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1329696488-16970-1-git-send-email-hpa@zytor.com
2012-03-14 14:27:52 -07:00
Jussi Kivilinna
0b95ec56ae crypto: camellia - add assembler implementation for x86_64
Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of
functions are provided. First set is regular 'one-block at time' encrypt/decrypt
functions. Second is 'two-block at time' functions that gain performance increase
on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way
functions with in-order CPUs.

Patch has been tested with tcrypt and automated filesystem tests.

Tcrypt benchmark results:

AMD Phenom II 1055T (fam:16, model:10):

camellia-asm vs camellia_generic:
128bit key:                                             (lrw:256bit)    (xts:256bit)
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B     1.27x   1.22x   1.30x   1.42x   1.30x   1.34x   1.19x   1.05x   1.23x   1.24x
64B     1.74x   1.79x   1.43x   1.87x   1.81x   1.87x   1.48x   1.38x   1.55x   1.62x
256B    1.90x   1.87x   1.43x   1.94x   1.94x   1.95x   1.63x   1.62x   1.67x   1.70x
1024B   1.96x   1.93x   1.43x   1.95x   1.98x   2.01x   1.67x   1.69x   1.74x   1.80x
8192B   1.96x   1.96x   1.39x   1.93x   2.01x   2.03x   1.72x   1.64x   1.71x   1.76x

256bit key:                                             (lrw:384bit)    (xts:512bit)
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B     1.23x   1.23x   1.33x   1.39x   1.34x   1.38x   1.04x   1.18x   1.21x   1.29x
64B     1.72x   1.69x   1.42x   1.78x   1.81x   1.89x   1.57x   1.52x   1.56x   1.65x
256B    1.85x   1.88x   1.42x   1.86x   1.93x   1.96x   1.69x   1.65x   1.70x   1.75x
1024B   1.88x   1.86x   1.45x   1.95x   1.96x   1.95x   1.77x   1.71x   1.77x   1.78x
8192B   1.91x   1.86x   1.42x   1.91x   2.03x   1.98x   1.73x   1.71x   1.78x   1.76x

camellia-asm vs aes-asm (8kB block):
         128bit  256bit
ecb-enc  1.15x   1.22x
ecb-dec  1.16x   1.16x
cbc-enc  0.85x   0.90x
cbc-dec  1.20x   1.23x
ctr-enc  1.28x   1.30x
ctr-dec  1.27x   1.28x
lrw-enc  1.12x   1.16x
lrw-dec  1.08x   1.10x
xts-enc  1.11x   1.15x
xts-dec  1.14x   1.15x

Intel Core2 T8100 (fam:6, model:23, step:6):

camellia-asm vs camellia_generic:
128bit key:                                             (lrw:256bit)    (xts:256bit)
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B     1.10x   1.12x   1.14x   1.16x   1.16x   1.15x   1.02x   1.02x   1.08x   1.08x
64B     1.61x   1.60x   1.17x   1.68x   1.67x   1.66x   1.43x   1.42x   1.44x   1.42x
256B    1.65x   1.73x   1.17x   1.77x   1.81x   1.80x   1.54x   1.53x   1.58x   1.54x
1024B   1.76x   1.74x   1.18x   1.80x   1.85x   1.85x   1.60x   1.59x   1.65x   1.60x
8192B   1.77x   1.75x   1.19x   1.81x   1.85x   1.86x   1.63x   1.61x   1.66x   1.62x

256bit key:                                             (lrw:384bit)    (xts:512bit)
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B     1.10x   1.07x   1.13x   1.16x   1.11x   1.16x   1.03x   1.02x   1.08x   1.07x
64B     1.61x   1.62x   1.15x   1.66x   1.63x   1.68x   1.47x   1.46x   1.47x   1.44x
256B    1.71x   1.70x   1.16x   1.75x   1.69x   1.79x   1.58x   1.57x   1.59x   1.55x
1024B   1.78x   1.72x   1.17x   1.75x   1.80x   1.80x   1.63x   1.62x   1.65x   1.62x
8192B   1.76x   1.73x   1.17x   1.78x   1.80x   1.81x   1.64x   1.62x   1.68x   1.64x

camellia-asm vs aes-asm (8kB block):
         128bit  256bit
ecb-enc  1.17x   1.21x
ecb-dec  1.17x   1.20x
cbc-enc  0.80x   0.82x
cbc-dec  1.22x   1.24x
ctr-enc  1.25x   1.26x
ctr-dec  1.25x   1.26x
lrw-enc  1.14x   1.18x
lrw-dec  1.13x   1.17x
xts-enc  1.14x   1.18x
xts-dec  1.14x   1.17x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-14 17:25:56 +08:00
Daniel J Blueman
fa63030e9c x86/platform: Move APIC ID validity check into platform APIC code
Move APIC ID validity check into platform APIC code, so it can
be overridden when needed. For NumaChip systems, always trust
MADT, as it's constructed with high APIC IDs.

Behaviour verifies on standard x86 systems and on NumaChip
systems with this, and compile-tested with allyesconfig.

Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Reviewed-by: Steffen Persvold <sp@numascale.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1331709454-27966-1-git-send-email-daniel@numascale-asia.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-14 09:49:48 +01:00
Ingo Molnar
c96a987669 Linux 3.3-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPW8yUAAoJEHm+PkMAQRiGhFIH/RGUPxGmUkJv8EP5I4HDA4dJ
 c6/PrzZCHs8rxzYzvn7ojXqZGXTOAA5ZgS9A6LkJ2sxMFvgMnkpFi6B4CwMzizS3
 vLWo/HNxbiTCNGFfQrhQB8O58uNI8wOBa87lrQfkXkDqN0cFhdjtIxeY1BD9LXIo
 qbWysGxCcZhJWHapsQ3NZaVJQnIK5vA/+mhyYP4HzbcHI3aWnbIEZ8GQKeY28Ch0
 +pct5UQBjZavV5SujaW0Xd65oIiycm8XHAQw6FxQy//DfaabauWgFteR162Q/oew
 xxUBDOHF3nO1bdteHHaYqxig0j1MbIHsqxTnE/neR8UryF04//1SFF7DYuY/1pg=
 =SV5V
 -----END PGP SIGNATURE-----

Merge tag 'v3.3-rc7' into x86/platform

Merge reason: Update to the almost-final v3.3 kernel.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-14 09:48:16 +01:00
Ingo Molnar
ea281a9eba Two miscellaneous MCE fixes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJPX7q/AAoJEKurIx+X31iBIHIQAIjXmz+5cCp/sDjtcoiPvcvu
 0OLspJbT4jswGdiFP+Xvn7MEJH2Q+Bj8ejejCrpyXG4R1v/K9zmzDwyAGaJbLiGw
 T5I9L9/GesYhCoIzyT7IfyNCtjsXMRmdmxLbxvyWxXiw/tT355Reb8Pqsq3Xy9dA
 0swO69MC3oJcYG19pLgv+QrtZaK79aiJqe5YmDIIoLYo8jIhnza2bKn3CKvQLV34
 8jIKTyo31MOTVZW9qy+pt5kNrnUgtmMBtdEsfVQdrAxKyEXwBd0rFRLHZ0c8YQit
 QcI0/MyK5la3swtiGnEiJQOvGGP/VIfPklVdw3ahLAthYUC+mof6zDBqpSGheJBQ
 f0Tbm0k3uWzVVij6Au2i3xi+nKNiPtUlOCcE1EPFvOYh0QOSDy4TPdxRDCZHHPeA
 hc5nbLTqwxAc8BE3vnDySztCGoeA3/UEOba1Kc4meKyLma7pJ4eNEL9j40XInn/z
 JRNbfj4BuSgmpV5nZbsb9jU/f15OfOuwZ+UMz//PYuIEI6gK3aYXojc5x1Z++Dot
 G6GIF4zs1zkzOnvaNeB8KZkmtWvRl/7dZuEM5SMDU+V0lj5xDs3nM4AhYPYSAvX2
 knmj//Gr/NE/DpAH0+bZIqzz6eduGc8HV5Y0nWlWvPKO+qs4n9uE0/8VL+5lZcQ8
 kojVLi2WTLvhCsnxUV3z
 =G63l
 -----END PGP SIGNATURE-----

Merge tag 'mce-for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce

Apply two miscellaneous MCE fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-14 07:44:48 +01:00
Ingo Molnar
cd593accdc Linux 3.3-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPW8yUAAoJEHm+PkMAQRiGhFIH/RGUPxGmUkJv8EP5I4HDA4dJ
 c6/PrzZCHs8rxzYzvn7ojXqZGXTOAA5ZgS9A6LkJ2sxMFvgMnkpFi6B4CwMzizS3
 vLWo/HNxbiTCNGFfQrhQB8O58uNI8wOBa87lrQfkXkDqN0cFhdjtIxeY1BD9LXIo
 qbWysGxCcZhJWHapsQ3NZaVJQnIK5vA/+mhyYP4HzbcHI3aWnbIEZ8GQKeY28Ch0
 +pct5UQBjZavV5SujaW0Xd65oIiycm8XHAQw6FxQy//DfaabauWgFteR162Q/oew
 xxUBDOHF3nO1bdteHHaYqxig0j1MbIHsqxTnE/neR8UryF04//1SFF7DYuY/1pg=
 =SV5V
 -----END PGP SIGNATURE-----

Merge tag 'v3.3-rc7' into x86/mce

Merge reason: Update from an ancient -rc1 base to an almost-final stable kernel.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-14 07:44:11 +01:00
H. Peter Anvin
bb6fa8b275 x32: Fix stupid ia32/x32 inversion in the siginfo format
Fix a stray ! which flipped the sense if we were generating a signal
frame for ia32 vs. x32.

Introduced in:

e7084fd5 x32: Switch to a 64-bit clock_t

Reported-by: H. J. Lu <hjl.tools@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Gregory M. Lueck <gregory.m.lueck@intel.com>
Link: http://lkml.kernel.org/r/1329696488-16970-1-git-send-email-hpa@zytor.com
2012-03-13 22:44:41 -07:00
Konrad Rzeszutek Wilk
a1f37788a6 tboot: Add return values for tboot_sleep
.. as appropiately. As tboot_sleep now returns values.
remove tboot_sleep_wrapper.

Suggested-and-Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Joseph Cihula <joseph.cihula@intel.com>
[v1: Return -1/0/+1 instead of ACPI_xx values]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-13 14:06:55 -04:00
Tang Liang
09f98a825a x86, acpi, tboot: Have a ACPI os prepare sleep instead of calling tboot_sleep.
The ACPI suspend path makes a call to tboot_sleep right before
it writes the PM1A, PM1B values. We replace the direct call to
tboot via an registration callback similar to __acpi_register_gsi.

CC: Len Brown <len.brown@intel.com>
Acked-by: Joseph Cihula <joseph.cihula@intel.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
[v1: Added __attribute__ ((unused))]
[v2: Introduced a wrapper instead of changing tboot_sleep return values]
[v3: Added return value AE_CTRL_SKIP for acpi_os_sleep_prepare]
Signed-off-by: Tang Liang <liang.tang@oracle.com>
[v1: Fix compile issues on IA64 and PPC64]
[v2: Fix where __acpi_os_prepare_sleep==NULL and did not go in sleep properly]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-13 14:06:33 -04:00
Thomas Gleixner
df8d291f28 Merge branch 'linus' into irq/core
Reason: Get upstream fixes integrated before further modifications.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-03-13 16:35:16 +01:00
Srikar Dronamraju
ef334a20d8 x86: Move is_ia32_task to asm/thread_info.h from asm/compat.h
is_ia32_task() is useful even in !CONFIG_COMPAT cases - utrace will
use it for example. Hence move it to a more generic file: asm/thread_info.h

Also now is_ia32_task() returns true if CONFIG_X86_32 is defined.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120313140303.17134.1401.sendpatchset@srdronam.in.ibm.com
[ Performed minor cleanup ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-13 16:31:09 +01:00
Salman Qazi
9993bc635d sched/x86: Fix overflow in cyc2ns_offset
When a machine boots up, the TSC generally gets reset.  However,
when kexec is used to boot into a kernel, the TSC value would be
carried over from the previous kernel.  The computation of
cycns_offset in set_cyc2ns_scale is prone to an overflow, if the
machine has been up more than 208 days prior to the kexec.  The
overflow happens when we multiply *scale, even though there is
enough room to store the final answer.

We fix this issue by decomposing tsc_now into the quotient and
remainder of division by CYC2NS_SCALE_FACTOR and then performing
the multiplication separately on the two components.

Refactor code to share the calculation with the previous
fix in __cycles_2_ns().

Signed-off-by: Salman Qazi <sqazi@google.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Cc: john stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/20120310004027.19291.88460.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-13 16:27:51 +01:00
Ingo Molnar
47258cf3c4 Linux 3.3-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPW8yUAAoJEHm+PkMAQRiGhFIH/RGUPxGmUkJv8EP5I4HDA4dJ
 c6/PrzZCHs8rxzYzvn7ojXqZGXTOAA5ZgS9A6LkJ2sxMFvgMnkpFi6B4CwMzizS3
 vLWo/HNxbiTCNGFfQrhQB8O58uNI8wOBa87lrQfkXkDqN0cFhdjtIxeY1BD9LXIo
 qbWysGxCcZhJWHapsQ3NZaVJQnIK5vA/+mhyYP4HzbcHI3aWnbIEZ8GQKeY28Ch0
 +pct5UQBjZavV5SujaW0Xd65oIiycm8XHAQw6FxQy//DfaabauWgFteR162Q/oew
 xxUBDOHF3nO1bdteHHaYqxig0j1MbIHsqxTnE/neR8UryF04//1SFF7DYuY/1pg=
 =SV5V
 -----END PGP SIGNATURE-----

Merge tag 'v3.3-rc7' into sched/core

Merge reason: merge back final fixes, prepare for the merge window.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-13 16:26:52 +01:00
Srikar Dronamraju
51e7dc7011 x86: Rename trap_no to trap_nr in thread_struct
There are precedences of trap number being referred to as
trap_nr. However thread struct refers trap number as trap_no.
Change it to trap_nr.

Also use enum instead of left-over literals for trap values.

This is pure cleanup, no functional change intended.

Suggested-by: Ingo Molnar <mingo@eltu.hu>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120312092555.5379.942.sendpatchset@srdronam.in.ibm.com
[ Fixed the math-emu build ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-13 06:24:09 +01:00
Ingo Molnar
e898c67068 Merge branch 'x86/x32' into x86/cleanups
Merge reason: We are going to merge a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-13 05:54:41 +01:00
Suresh Siddha
73d63d038e x86/ioapic: Add register level checks to detect bogus io-apic entries
With the recent changes to clear_IO_APIC_pin() which tries to
clear remoteIRR bit explicitly, some of the users started to see
"Unable to reset IRR for apic .." messages.

Close look shows that these are related to bogus IO-APIC entries
which return's all 1's for their io-apic registers. And the
above mentioned error messages are benign. But kernel should
have ignored such io-apic's in the first place.

Check if register 0, 1, 2 of the listed io-apic are all 1's and
ignore such io-apic.

Reported-by: Álvaro Castillo <midgoon@gmail.com>
Tested-by: Jon Dufresne <jon@jondufresne.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: kernel-team@fedoraproject.org
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1331577393.31585.94.camel@sbsiddha-desk.sc.intel.com
[ Performed minor cleanup of affected code. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-13 05:52:02 +01:00
Ingo Molnar
bea95c152d Merge branch 'perf/hw-branch-sampling' into perf/core
Merge reason: The 'perf record -b' hardware branch sampling feature is ready for upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:47:05 +01:00
Peter Zijlstra
f9b4eeb809 perf/x86: Prettify pmu config literals
I got somewhat tired of having to decode hex numbers..

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Link: http://lkml.kernel.org/n/tip-0vsy1sgywc4uar3mu1szm0rg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:44:54 +01:00
Ingo Molnar
35239e23c6 Merge branch 'perf/urgent' into perf/core
Merge reason: We are going to queue up a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:44:11 +01:00
Peter Zijlstra
87e24f4b67 perf/x86: Fix local vs remote memory events for NHM/WSM
Verified using the below proglet.. before:

[root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 0
remote write

 Performance counter stats for './numa 0':

         2,101,554 node-stores
         2,096,931 node-store-misses

       5.021546079 seconds time elapsed

[root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 1
local write

 Performance counter stats for './numa 1':

           501,137 node-stores
               199 node-store-misses

       5.124451068 seconds time elapsed

After:

[root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 0
remote write

 Performance counter stats for './numa 0':

         2,107,516 node-stores
         2,097,187 node-store-misses

       5.012755149 seconds time elapsed

[root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 1
local write

 Performance counter stats for './numa 1':

         2,063,355 node-stores
               165 node-store-misses

       5.082091494 seconds time elapsed

#define _GNU_SOURCE

#include <sched.h>
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <dirent.h>
#include <signal.h>
#include <unistd.h>
#include <numaif.h>
#include <stdlib.h>

#define SIZE (32*1024*1024)

volatile int done;

void sig_done(int sig)
{
	done = 1;
}

int main(int argc, char **argv)
{
	cpu_set_t *mask, *mask2;
	size_t size;
	int i, err, t;
	int nrcpus = 1024;
	char *mem;
	unsigned long nodemask = 0x01; /* node 0 */
	DIR *node;
	struct dirent *de;
	int read = 0;
	int local = 0;

	if (argc < 2) {
		printf("usage: %s [0-3]\n", argv[0]);
		printf("  bit0 - local/remote\n");
		printf("  bit1 - read/write\n");
		exit(0);
	}

	switch (atoi(argv[1])) {
	case 0:
		printf("remote write\n");
		break;
	case 1:
		printf("local write\n");
		local = 1;
		break;
	case 2:
		printf("remote read\n");
		read = 1;
		break;
	case 3:
		printf("local read\n");
		local = 1;
		read = 1;
		break;
	}

	mask = CPU_ALLOC(nrcpus);
	size = CPU_ALLOC_SIZE(nrcpus);
	CPU_ZERO_S(size, mask);

	node = opendir("/sys/devices/system/node/node0/");
	if (!node)
		perror("opendir");
	while ((de = readdir(node))) {
		int cpu;

		if (sscanf(de->d_name, "cpu%d", &cpu) == 1)
			CPU_SET_S(cpu, size, mask);
	}
	closedir(node);

	mask2 = CPU_ALLOC(nrcpus);
	CPU_ZERO_S(size, mask2);
	for (i = 0; i < size; i++)
		CPU_SET_S(i, size, mask2);
	CPU_XOR_S(size, mask2, mask2, mask); // invert

	if (!local)
		mask = mask2;

	err = sched_setaffinity(0, size, mask);
	if (err)
		perror("sched_setaffinity");

	mem = mmap(0, SIZE, PROT_READ|PROT_WRITE,
			MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
	err = mbind(mem, SIZE, MPOL_BIND, &nodemask, 8*sizeof(nodemask), MPOL_MF_MOVE);
	if (err)
		perror("mbind");

	signal(SIGALRM, sig_done);
	alarm(5);

	if (!read) {
		while (!done) {
			for (i = 0; i < SIZE; i++)
				mem[i] = 0x01;
		}
	} else {
		while (!done) {
			for (i = 0; i < SIZE; i++)
				t += *(volatile char *)(mem + i);
		}
	}

	return 0;
}

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/n/tip-tq73sxus35xmqpojf7ootxgs@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:43:41 +01:00
Peter Zijlstra
5fbd036b55 sched: Cleanup cpu_active madness
Stepan found:

CPU0		CPUn

_cpu_up()
  __cpu_up()

		boostrap()
		  notify_cpu_starting()
		  set_cpu_online()
		  while (!cpu_active())
		    cpu_relax()

<PREEMPT-out>

smp_call_function(.wait=1)
  /* we find cpu_online() is true */
  arch_send_call_function_ipi_mask()

  /* wait-forever-more */

<PREEMPT-in>
		  local_irq_enable()

  cpu_notify(CPU_ONLINE)
    sched_cpu_active()
      set_cpu_active()

Now the purpose of cpu_active is mostly with bringing down a cpu, where
we mark it !active to avoid the load-balancer from moving tasks to it
while we tear down the cpu. This is required because we only update the
sched_domain tree after we brought the cpu-down. And this is needed so
that some tasks can still run while we bring it down, we just don't want
new tasks to appear.

On cpu-up however the sched_domain tree doesn't yet include the new cpu,
so its invisible to the load-balancer, regardless of the active state.
So instead of setting the active state after we boot the new cpu (and
consequently having to wait for it before enabling interrupts) set the
cpu active before we set it online and avoid the whole mess.

Reported-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1323965362.18942.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:43:15 +01:00
Grant Likely
e2aa417726 Linux 3.3-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPW8yUAAoJEHm+PkMAQRiGhFIH/RGUPxGmUkJv8EP5I4HDA4dJ
 c6/PrzZCHs8rxzYzvn7ojXqZGXTOAA5ZgS9A6LkJ2sxMFvgMnkpFi6B4CwMzizS3
 vLWo/HNxbiTCNGFfQrhQB8O58uNI8wOBa87lrQfkXkDqN0cFhdjtIxeY1BD9LXIo
 qbWysGxCcZhJWHapsQ3NZaVJQnIK5vA/+mhyYP4HzbcHI3aWnbIEZ8GQKeY28Ch0
 +pct5UQBjZavV5SujaW0Xd65oIiycm8XHAQw6FxQy//DfaabauWgFteR162Q/oew
 xxUBDOHF3nO1bdteHHaYqxig0j1MbIHsqxTnE/neR8UryF04//1SFF7DYuY/1pg=
 =SV5V
 -----END PGP SIGNATURE-----

Merge tag 'v3.3-rc7' into gpio/next

Linux 3.3-rc7.  Merged into the gpio branch to pick up gpio bugfixes already
in mainline before queueing up move v3.4 patches
2012-03-12 09:41:28 -06:00
Konrad Rzeszutek Wilk
73c154c60b xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it.
For the hypervisor to take advantage of the MWAIT support it needs
to extract from the ACPI _CST the register address. But the
hypervisor does not have the support to parse DSDT so it relies on
the initial domain (dom0) to parse the ACPI Power Management information
and push it up to the hypervisor. The pushing of the data is done
by the processor_harveset_xen module which parses the information that
the ACPI parser has graciously exposed in 'struct acpi_processor'.

For the ACPI parser to also expose the Cx states for MWAIT, we need
to expose the MWAIT capability (leaf 1). Furthermore we also need to
expose the MWAIT_LEAF capability (leaf 5) for cstate.c to properly
function.

The hypervisor could expose these flags when it traps the XEN_EMULATE_PREFIX
operations, but it can't do it since it needs to be backwards compatible.
Instead we choose to use the native CPUID to figure out if the MWAIT
capability exists and use the XEN_SET_PDC query hypercall to figure out
if the hypervisor wants us to expose the MWAIT_LEAF capability or not.

Note: The XEN_SET_PDC query was implemented in c/s 23783:
"ACPI: add _PDC input override mechanism".

With this in place, instead of
 C3 ACPI IOPORT 415
we get now
 C3:ACPI FFH INTEL MWAIT 0x20

Note: The cpu_idle which would be calling the mwait variants for idling
never gets set b/c we set the default pm_idle to be the hypercall variant.

Acked-by: Jan Beulich <JBeulich@suse.com>
[v2: Fix missing header file include and #ifdef]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-10 12:44:44 -05:00
Konrad Rzeszutek Wilk
cc7335b2f6 xen/setup/pm/acpi: Remove the call to boot_option_idle_override.
We needed that call in the past to force the kernel to use
default_idle (which called safe_halt, which called xen_safe_halt).

But set_pm_idle_to_default() does now that, so there is no need
to use this boot option operand.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-10 12:44:09 -05:00
Kirill A. Shutemov
026abc3332 gma500: initial medfield merge
We need to merge this ahead of some of the cleanup because a lot of needed
cleanup spans both new and old chips. If we try and clean up and the merge
we end up fighting ourselves.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[With a load of the cleanup stuff folded in, register stuff reworked sanely]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-03-10 13:05:48 +00:00
Kees Cook
c94082656d x86: Use enum instead of literals for trap values
The traps are referred to by their numbers and it can be difficult to
understand them while reading the code without context. This patch adds
enumeration of the trap numbers and replaces the numbers with the correct
enum for x86.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20120310000710.GA32667@www.outflux.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-03-09 16:47:54 -08:00
Thomas Gleixner
a7f4255f90 x86: Derandom delay_tsc for 64 bit
Commit f0fbf0abc093 ("x86: integrate delay functions") converted
delay_tsc() into a random delay generator for 64 bit.  The reason is
that it merged the mostly identical versions of delay_32.c and
delay_64.c.  Though the subtle difference of the result was:

 static void delay_tsc(unsigned long loops)
 {
-	unsigned bclock, now;
+	unsigned long bclock, now;

Now the function uses rdtscl() which returns the lower 32bit of the
TSC. On 32bit that's not problematic as unsigned long is 32bit. On 64
bit this fails when the lower 32bit are close to wrap around when
bclock is read, because the following check

       if ((now - bclock) >= loops)
       	  	break;

evaluated to true on 64bit for e.g. bclock = 0xffffffff and now = 0
because the unsigned long (now - bclock) of these values results in
0xffffffff00000001 which is definitely larger than the loops
value. That explains Tvortkos observation:

"Because I am seeing udelay(500) (_occasionally_) being short, and
 that by delaying for some duration between 0us (yep) and 491us."

Make those variables explicitely u32 again, so this works for both 32
and 64 bit.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # >= 2.6.27
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-09 12:43:27 -08:00
Greg Kroah-Hartman
263a5c8e16 Merge 3.3-rc6 into driver-core-next
This was done to resolve a conflict in the drivers/base/cpu.c file.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-09 12:35:53 -08:00
Linus Torvalds
bfcfaa77bd vfs: use 'unsigned long' accesses for dcache name comparison and hashing
Ok, this is hacky, and only works on little-endian machines with goo
unaligned handling.  And even then only with CONFIG_DEBUG_PAGEALLOC
disabled, since it can access up to 7 bytes after the pathname.

But it runs like a bat out of hell.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-08 18:08:44 -08:00
Nadav Har'El
9587190107 KVM: nVMX: Fix erroneous exception bitmap check
The code which checks whether to inject a pagefault to L1 or L2 (in
nested VMX) was wrong, incorrect in how it checked the PF_VECTOR bit.
Thanks to Dan Carpenter for spotting this.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08 14:14:23 +02:00
Nicolae Mogoreanu
a223c313cb KVM: Ignore the writes to MSR_K7_HWCR(3)
When CPUID Fn8000_0001_EAX reports 0x00100f22 Windows 7 x64 guest
tries to set bit 3 in MSRC001_0015 in nt!KiDisableCacheErrataSource
and fails. This patch will ignore this step and allow things to move
on without having to fake CPUID value.

Signed-off-by: Nicolae Mogoreanu <mogoreanu@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08 14:14:14 +02:00
Davidlohr Bueso
4d6931c380 KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask
The reset_rsvds_bits_mask() function can use the guest walker's root level
number instead of using a separate 'level' variable.

Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08 14:13:54 +02:00
Gleb Natapov
62079d8a43 KVM: PMU: add proper support for fixed counter 2
Currently pmu emulation emulates fixed counter 2 as bus cycles
architectural counter, but since commit 9c1497ea591b25d perf has
pseudo encoding for it. Use it.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08 14:13:34 +02:00
Gleb Natapov
fac3368310 KVM: PMU: Fix raw event check
If eventsel has EDGE, INV or CMASK set we should create raw counter for
it, but the check is done on a wrong variable. Fix it.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08 14:13:26 +02:00
Gleb Natapov
a7b9d2ccc3 KVM: PMU: warn when pin control is set in eventsel msr
Print warning once if pin control bit is set in eventsel msr since
emulation does not support it yet.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08 14:13:18 +02:00
Avi Kivity
9ee73970c0 KVM: VMX: Fix delayed load of shared MSRs
Shared MSRs (MSR_*STAR and related) are stored in both vmx->guest_msrs
and in the CPU registers, but vmx_set_msr() only updated memory. Prior
to 46199f33c2953, this didn't matter, since we called vmx_load_host_state(),
which scheduled a vmx_save_host_state(), which re-synchronized the CPU
state, but now we don't, so the CPU state will not be synchronized until
the next exit to host userspace.  This mostly affects nested vmx workloads,
which play with these MSRs a lot.

Fix by loading the MSR eagerly.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08 14:11:55 +02:00
Jan Kiszka
07700a94b0 KVM: Allow host IRQ sharing for assigned PCI 2.3 devices
PCI 2.3 allows to generically disable IRQ sources at device level. This
enables us to share legacy IRQs of such devices with other host devices
when passing them to a guest.

The new IRQ sharing feature introduced here is optional, user space has
to request it explicitly. Moreover, user space can inform us about its
view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the
interrupt and signaling it if the guest masked it via the virtualized
PCI config space.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08 14:11:36 +02:00