sendmmsg is reachable via the socket system call. We don't enable a second
way on s390 to reach the same system call.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use ctl_set_bit instead of the smp_ctl_set_bit (likewise for clear bit)
to prevent the following build error for !CONFIG_SMP:
CC arch/s390/oprofile/hwsampler.o
arch/s390/oprofile/hwsampler.c: In function ‘hwsampler_deallocate’:
arch/s390/oprofile/hwsampler.c:1012: error: implicit declaration of function ‘smp_ctl_clear_bit’
arch/s390/oprofile/hwsampler.c: In function ‘hwsampler_start_all’:
arch/s390/oprofile/hwsampler.c:1201: error: implicit declaration of function ‘smp_ctl_set_bit’
CC kernel/seccomp.o
make[1]: *** [arch/s390/oprofile/hwsampler.o] Error 1
make: *** [arch/s390/oprofile] Error 2
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The counter for requested interrupts should be incremented if the
program-request-alert bit is set and not the invalid-address-entry
bit.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove unsused includes from arch/s390/kernel/process.c.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Rework the architecture page table functions to access the bits in the
page table extension array (pgste). There are a number of changes:
1) Fix missing pgste update if the attach_count for the mm is <= 1.
2) For every operation that affects the invalid bit in the pte or the
rcp byte in the pgste the pcl lock needs to be acquired. The function
pgste_get_lock gets the pcl lock and returns the current pgste value
for a pte pointer. The function pgste_set_unlock stores the pgste
and releases the lock. Between these two calls the bits in the pgste
can be shuffled.
3) Define two software bits in the pte _PAGE_SWR and _PAGE_SWC to avoid
calling SetPageDirty and SetPageReferenced from pgtable.h. If the
host reference backup bit or the host change backup bit has been
set the dirty/referenced state is transfered to the pte. The common
code will pick up the state from the pte.
4) Add ptep_modify_prot_start and ptep_modify_prot_commit for mprotect.
5) Remove pgd_populate_kernel, pud_populate_kernel, pmd_populate_kernel
pgd_clear_kernel, pud_clear_kernel, pmd_clear_kernel and ptep_invalidate.
6) Rename kvm_s390_test_and_clear_page_dirty to
ptep_test_and_clear_user_dirty and add ptep_test_and_clear_user_young.
7) Define mm_exclusive() and mm_has_pgste() helper to improve readability.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The page_clear_dirty primitive always sets the default storage key
which resets the access control bits and the fetch protection bit.
That will surprise a KVM guest that sets non-zero access control
bits or the fetch protection bit. Merge page_test_dirty and
page_clear_dirty back to a single function and only clear the
dirty bit from the storage key.
In addition move the function page_test_and_clear_dirty and
page_test_and_clear_young to page.h where they belong. This
requires to change the parameter from a struct page * to a page
frame number.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On cpu hot remove a PFAULT CANCEL command is sent to the hypervisor
which in turn will cancel all outstanding pfault requests that have
been issued on that cpu (the same happens with a SIGP cpu reset).
The result is that we end up with uninterruptible processes where
the interrupt that would wake up these processes never arrives.
In order to solve this all processes which wait for a pfault
completion interrupt get woken up after a cpu hot remove. The worst
case that could happen is that they fault again and in turn need to
wait again.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add missing __noreturn attribute to cpu_die():
arch/s390/kernel/smp.c:691:6: error: symbol 'cpu_die' redeclared with different type
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Implement arch specific irqsafe_cpu ops. The arch specific ops do not
disable/enable interrupts since that is an expensive operation. Instead
we disable preemption and perform a compare and swap loop.
Since on server distros (the ones we care about) preemption is disabled
the preempt_disable()/preempt_enable() pair is a nop.
In the end this code should be faster than the generic one.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The concepts of VDSO and gcov-based profiling don't mix: the former
includes kernel-provided code running in userspace, the latter adds
instructions that modify counters in kernel data segments. On s390
this has not been a problem so far due to VDSO code being written in
all-assembler which is exempt from gcov-based profiling. This could
change in the future, so disable profiling excplicitly for VDSO code.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Get rid of these:
arch/s390/mm/extmem.c: In function 'segment_modify_shared':
arch/s390/mm/extmem.c:622:3: warning: 'end_addr' may be used uninitialized in this function [-Wuninitialized]
arch/s390/mm/extmem.c:627:18: warning: 'start_addr' may be used uninitialized in this function [-Wuninitialized]
arch/s390/mm/extmem.c: In function 'segment_load':
arch/s390/mm/extmem.c:481:11: warning: 'end_addr' may be used uninitialized in this function [-Wuninitialized]
arch/s390/mm/extmem.c:480:18: warning: 'start_addr' may be used uninitialized in this function [-Wuninitialized]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The noexec support on s390 does not rely on a bit in the page table
entry but utilizes the secondary space mode to distinguish between
memory accesses for instructions vs. data. The noexec code relies
on the assumption that the cpu will always use the secondary space
page table for data accesses while it is running in the secondary
space mode. Up to the z9-109 class machines this has been the case.
Unfortunately this is not true anymore with z10 and later machines.
The load-relative-long instructions lrl, lgrl and lgfrl access the
memory operand using the same addressing-space mode that has been
used to fetch the instruction.
This breaks the noexec mode for all user space binaries compiled
with march=z10 or later. The only option is to remove the current
noexec support.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Some modules may end up with R_SH_NONE relocs with the right combination
of compiler/kernel config (specifically dwarf unwinder), so simply trap
and ignore them instead of letting them get down to the error path.
Reported-by: Carmelo AMOROSO <carmelo.amoroso@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Both warning and warning_symbol are nowhere used.
Let's get rid of them.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next-2.6: (28 commits)
sparc32: fix build, fix missing cpu_relax declaration
SCHED_TTWU_QUEUE is not longer needed since sparc32 now implements IPI
sparc32,leon: Remove unnecessary page_address calls in LEON DMA API.
sparc: convert old cpumask API into new one
sparc32, sun4d: Implemented SMP IPIs support for SUN4D machines
sparc32, sun4m: Implemented SMP IPIs support for SUN4M machines
sparc32,leon: Implemented SMP IPIs for LEON CPU
sparc32: implement SMP IPIs using the generic functions
sparc32,leon: SMP power down implementation
sparc32,leon: added some SMP comments
sparc: add {read,write}*_be routines
sparc32,leon: don't rely on bootloader to mask IRQs
sparc32,leon: operate on boot-cpu IRQ controller registers
sparc32: always define boot_cpu_id
sparc32: removed unused code, implemented by generic code
sparc32: avoid build warning at mm/percpu.c:1647
sparc32: always register a PROM based early console
sparc32: probe for cpu info only during startup
sparc: consolidate show_cpuinfo in cpu.c
sparc32,leon: implement genirq CPU affinity
...
The setup_smep function gets calle at resume time too, and is thus not a
pure __init function. When marked as __init, it gets thrown out after
the kernel has initialized, and when the kernel is suspended and
resumed, the code will no longer be around, and we'll get a nice "kernel
tried to execute NX-protected page" oops because the page is no longer
marked executable.
Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix following sparc (32 bit) build error:
CC arch/sparc/kernel/asm-offsets.s
In file included from include/linux/seqlock.h:29:0,
from include/linux/time.h:8,
from include/linux/timex.h:56,
from include/linux/sched.h:57,
from arch/sparc/kernel/asm-offsets.c:13:
include/linux/spinlock.h: In function 'spin_unlock_wait':
include/linux/spinlock.h:360:2: error: implicit declaration of function 'cpu_relax'
Most likely caused by commit e66eed651fd1 ("list: remove
prefetching from regular list iterators") due to include
changes.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use an existing local variable, instead of calculating the pointer
multiple times explicitly.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: Simon Horman <horms@verge.net.au>
Cc: Magnus Damm <damm@opensource.se>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
[PARISC] wire up syncfs syscall
[PARISC] wire up the fhandle syscalls
[PARISC] wire up clock_adjtime syscall
[PARISC] wire up fanotify syscalls
[PARISC] prevent speculative re-read on cache flush
[PARISC] only make executable areas executable
[PARISC] fix pacache .size with new binutils
The address of the gpte was already calculated and stored in ptep_user
before entering cmpxchg_gpte().
This patch makes cmpxchg_gpte() to use that to make it clear that we
are using the same address during walk_addr_generic().
Note that the unlikely annotations are used to show that the conditions
are something unusual rather than for performance.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We introduce em_jmp_far().
We also call this from em_grp45() to stop treating modrm_reg == 5 case
separately in the group 5 emulation.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
The prototypes are changed appropriately.
We also replaces "goto grp45;" with simple em_grp45() call.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
The opt of emulate_grp1a() is also removed.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
In addition, one comma at the end of a statement is replaced with a
semicolon.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
This way, we can avoid checking the user space address many times when
we read the guest memory.
Although we can do the same for write if we check which slots are
writable, we do not care write now: reading the guest memory happens
more often than writing.
[avi: change VERIFY_READ to VERIFY_WRITE]
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
When we optimized walk_addr_generic() by not using the generic guest
memory reader, we replaced copy_from_user() with get_user():
commit e30d2a170506830d5eef5e9d7990c5aedf1b0a51
KVM: MMU: Optimize guest page table walk
commit 15e2ac9a43d4d7d08088e404fddf2533a8e7d52e
KVM: MMU: Fix 64-bit paging breakage on x86_32
But as Andi pointed out later, copy_from_user() does the same as
get_user() as long as we give a constant size to it.
So we use copy_from_user() to clean up the code.
The only, noticeable, regression introduced by this is 64-bit gpte
reading on x86_32 hosts needed for PAE guests.
But this can be mitigated by implementing 8-byte get_user() for x86_32,
if needed.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Linux doesn't use USPRG0 (now renamed VRSAVE in the architecture, even
when Altivec isn't involved), but a guest might.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Convert to microseconds when displaying
(with fix from Bharat Bhushan <Bharat.Bhushan@freescale.com>).
This reduces rounding error with large quantities of short exits.
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The exit type setting for mfspr/mtspr is moved from 44x to toplevel SPR
emulation. This enables it on e500, and makes sure that all SPRs
are covered.
Exit accounting for tlbwe and tlbsx is added to e500.
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Return the actual host SVR for now, as we already do for PVR. Eventually
we may support Qemu overriding PVR/SVR if the situation is appropriate,
once we implement KVM_SET_SREGS on e500.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Since the emulator now checks segment limits and access rights, it
generates a lot more accesses to the vmcs segment fields. Undo some
of the performance hit by cacheing those fields in a read-only cache
(the entire cache is invalidated on any write, or on guest exit).
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of separate accessors for the segment selector and cached descriptor,
use one accessor for both. This simplifies the code somewhat.
Signed-off-by: Avi Kivity <avi@redhat.com>
dump_vmcb isn't used outside this module, make it static.
Shrink text and object by ~1% by standardizing formats.
$ size arch/x86/kvm/svm.o*
text data bss dec hex filename
52910 580 10072 63562 f84a arch/x86/kvm/svm.o.new
53563 580 10072 64215 fad7 arch/x86/kvm/svm.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Fix regression introduced by
commit e30d2a170506830d5eef5e9d7990c5aedf1b0a51
KVM: MMU: Optimize guest page table walk
On x86_32, get_user() does not support 64-bit values and we fail to
build KVM at the point of 64-bit paging.
This patch fixes this by using get_user() twice for that condition.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch fixes some sparse warning about "dubious one-bit signed bitfield."
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Originally-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The CPUIDs for Centaur are added, and then the features of
PadLock hardware engine on VIA CPU, such as "ace", "ace_en"
and so on, can be passed into the kvm guest.
Signed-off-by: Brilly Wu <brillywu@viatech.com.cn>
Signed-off-by: Kary Jin <karyjin@viatech.com.cn>
Signed-off-by: Avi Kivity <avi@redhat.com>
mmio_index should be taken into account when copying data from
userspace.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>