10253 Commits

Author SHA1 Message Date
Eric Paris
11637e4b7d fanotify: fanotify_init syscall declaration
This patch defines a new syscall fanotify_init() of the form:

int sys_fanotify_init(unsigned int flags, unsigned int event_f_flags,
		      unsigned int priority)

This syscall is used to create and fanotify group.  This is very similar to
the inotify_init() syscall.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:55 -04:00
Andreas Gruenbacher
3556608709 fsnotify: take inode->i_lock inside fsnotify_find_mark_entry()
All callers to fsnotify_find_mark_entry() except one take and
release inode->i_lock around the call.  Take the lock inside
fsnotify_find_mark_entry() instead.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:54 -04:00
Eric Paris
d07754412f fsnotify: rename fsnotify_find_mark_entry to fsnotify_find_mark
the _entry portion of fsnotify functions is useless.  Drop it.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:53 -04:00
Eric Paris
e61ce86737 fsnotify: rename fsnotify_mark_entry to just fsnotify_mark
The name is long and it serves no real purpose.  So rename
fsnotify_mark_entry to just fsnotify_mark.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:53 -04:00
Eric Paris
2823e04de4 fsnotify: put inode specific fields in an fsnotify_mark in a union
The addition of marks on vfs mounts will be simplified if the inode
specific parts of a mark and the vfsmnt specific parts of a mark are
actually in a union so naming can be easy.  This patch just implements the
inode struct and the union.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:52 -04:00
Eric Paris
3a9fb89f4c fsnotify: include vfsmount in should_send_event when appropriate
To ensure that a group will not duplicate events when it receives it based
on the vfsmount and the inode should_send_event test we should distinguish
those two cases.  We pass a vfsmount to this function so groups can make
their own determinations.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:52 -04:00
Eric Paris
0d2e2a1d00 fsnotify: drop mask argument from fsnotify_alloc_group
Nothing uses the mask argument to fsnotify_alloc_group.  This patch drops
that argument.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:51 -04:00
Eric Paris
220d14df0d Audit: only set group mask when something is being watched
Currently the audit watch group always sets a mask equal to all events it
might care about.  We instead should only set the group mask if we are
actually watching inodes.  This should be a perf win when audit watches are
compiled in.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:51 -04:00
Eric Paris
ffab83402f fsnotify: fsnotify_obtain_group should be fsnotify_alloc_group
fsnotify_obtain_group was intended to be able to find an already existing
group.  Nothing uses that functionality.  This just renames it to
fsnotify_alloc_group so it is clear what it is doing.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:50 -04:00
Eric Paris
74be0cc828 fsnotify: remove group_num altogether
The original fsnotify interface has a group-num which was intended to be
able to find a group after it was added.  I no longer think this is a
necessary thing to do and so we remove the group_num.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:50 -04:00
Eric Paris
8112e2d6a7 fsnotify: include data in should_send calls
fanotify is going to need to look at file->private_data to know if an event
should be sent or not.  This passes the data (which might be a file,
dentry, inode, or none) to the should_send function calls so fanotify can
get that information when available

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:31 -04:00
Eric Paris
7b0a04fbfb fsnotify: provide the data type to should_send_event
fanotify is only interested in event types which contain enough information
to open the original file in the context of the fanotify listener.  Since
fanotify may not want to send events if that data isn't present we pass
the data type to the should_send_event function call so fanotify can express
its lack of interest.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:31 -04:00
Eric Paris
2dfc1cae4c inotify: remove inotify in kernel interface
nothing uses inotify in the kernel, drop it!

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:31 -04:00
Eric Paris
1a3aedbce4 Audit: audit watch init should not be before fsnotify init
Audit watch init and fsnotify init both use subsys_initcall() but since the
audit watch code is linked in before the fsnotify code the audit watch code
would be using the fsnotify srcu struct before it was initialized.  This
patch fixes that problem by moving audit watch init to device_initcall() so
it happens after fsnotify is ready.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Eric Paris <eparis@redhat.com>
Tested-by : Sachin Sant <sachinp@in.ibm.com>
2010-07-28 09:58:19 -04:00
Eric Paris
939a67fc4c Audit: split audit watch Kconfig
Audit watch should depend on CONFIG_AUDIT_SYSCALL and should select
FSNOTIFY.  This splits the spagetti like mixing of audit_watch and
audit_filter code so they can be configured seperately.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:19 -04:00
Eric Paris
28a3a7eb3b audit: reimplement audit_trees using fsnotify rather than inotify
Simply switch audit_trees from using inotify to using fsnotify for it's
inode pinning and disappearing act information.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:17 -04:00
Eric Paris
40554c3dae fsnotify: allow addition of duplicate fsnotify marks
This patch allows a task to add a second fsnotify mark to an inode for the
same group.  This mark will be added to the end of the inode's list and
this will never be found by the stand fsnotify_find_mark() function.   This
is useful if a user wants to add a new mark before removing the old one.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:17 -04:00
Eric Paris
a05fb6cc57 audit: do not get and put just to free a watch
deleting audit watch rules is not currently done under audit_filter_mutex.
It was done this way because we could not hold the mutex during inotify
manipulation.  Since we are using fsnotify we don't need to do the extra
get/put pair nor do we need the private list on which to store the parents
while they are about to be freed.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:17 -04:00
Eric Paris
e118e9c563 audit: redo audit watch locking and refcnt in light of fsnotify
fsnotify can handle mutexes to be held across all fsnotify operations since
it deals strickly in spinlocks.  This can simplify and reduce some of the
audit_filter_mutex taking and dropping.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:16 -04:00
Eric Paris
e9fd702a58 audit: convert audit watches to use fsnotify instead of inotify
Audit currently uses inotify to pin inodes in core and to detect when
watched inodes are deleted or unmounted.  This patch uses fsnotify instead
of inotify.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:16 -04:00
Eric Paris
ae7b8f4108 Audit: clean up the audit_watch split
No real changes, just cleanup to the audit_watch split patch which we done
with minimal code changes for easy review.  Now fix interfaces to make
things work better.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:16 -04:00
Sridhar Samudrala
d7926ee38f cgroups: Add an API to attach a task to current task's cgroup
Add a new kernel API to attach a task to current task's cgroup
in all the active hierarchies.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paul Menage <menage@google.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2010-07-28 15:45:12 +03:00
Jason Baron
b82bab4bbe dynamic debug: move ddebug_remove_module() down into free_module()
The command

	echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control

causes an oops.

Move the call to ddebug_remove_module() down into free_module().  In this
way it should be called from all error paths.  Currently, we are missing
the remove if the module init routine fails.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Reported-by: Thomas Renninger <trenn@suse.de>
Tested-by: Thomas Renninger <trenn@suse.de>
Cc: <stable@kernel.org>		[2.6.32+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-27 14:32:06 -07:00
John Stultz
852db46d55 clocksource: Add __clocksource_updatefreq_hz/khz methods
To properly handle clocksources that change frequencies
at the clocksource->enable() point, this patch adds
a method that will update the clocksource's mult/shift and
max_idle_ns values.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-12-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:55 +02:00
John Stultz
0fb86b0629 timekeeping: Make xtime and wall_to_monotonic static
This patch makes xtime and wall_to_monotonic static, as planned in
Documentation/feature-removal-schedule.txt. This will allow for
further cleanups to the timekeeping core.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-10-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:55 +02:00
John Stultz
8ab4351a4c hrtimer: Cleanup direct access to wall_to_monotonic
Provides an accessor function to replace hrtimer.c's
direct access of wall_to_monotonic.

This will allow wall_to_monotonic to be made static as
planned in Documentation/feature-removal-schedule.txt

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-9-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:55 +02:00
John Stultz
7615856ebf timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset
update_vsyscall() did not provide the wall_to_monotoinc offset,
so arch specific implementations tend to reference wall_to_monotonic
directly. This limits future cleanups in the timekeeping core, so
this patch fixes the update_vsyscall interface to provide
wall_to_monotonic, allowing wall_to_monotonic to be made static
as planned in Documentation/feature-removal-schedule.txt

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:54 +02:00
John Stultz
592913ecb8 time: Kill off CONFIG_GENERIC_TIME
Now that all arches have been converted over to use generic time via
clocksources or arch_gettimeoffset(), we can remove the GENERIC_TIME
config option and simplify the generic code.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-4-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:54 +02:00
John Stultz
ce3bf7ab22 time: Implement timespec_add
After accidentally misusing timespec_add_safe, I wanted to make sure
we don't accidently trip over that issue again, so I created a simple
timespec_add() function which we can use to replace the instances
of timespec_add_safe() that don't want the overflow detection.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-3-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:53 +02:00
Steffen Klassert
7424713b83 padata: Check for valid cpumasks
Now that we allow to change the cpumasks from userspace, we have
to check for valid cpumasks in padata_do_parallel. This patch adds
the necessary check. This fixes a division by zero crash if the
parallel cpumask contains no active cpu.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-26 14:13:58 +08:00
Steffen Klassert
b89661dff5 padata: Allocate cpumask dependend recources in any case
The cpumask separation work assumes the cpumask dependend recources
present regardless of valid or invalid cpumasks. With this patch
we allocate the cpumask dependend recources in any case. This fixes
two NULL pointer dereference crashes in padata_replace and in
padata_get_cpumask.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-26 14:13:58 +08:00
Steffen Klassert
fad3a906d3 padata: Fix cpu index counting
The counting of the cpu index got lost with a recent commit.
This patch restores it. This fixes a hang of the parallel worker
threads on cpu hotplug.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-26 14:13:57 +08:00
Andrey Vagin
2b08de0073 posix_timer: Move copy_to_user(created_timer_id) down in timer_create()
According to Oleg Nesterov:
We can move copy_to_user(created_timer_id) down after
"if (timer_event_spec)" block too. (but before CLOCK_DISPATCH(),
of course).

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-23 15:08:12 +02:00
Patrick Pannuto
22b8f15c2f timer: Added usleep[_range] timer
usleep[_range] are finer precision implementations of msleep
and are designed to be drop-in replacements for udelay where
a precise sleep / busy-wait is unnecessary. They also allow
an easy interface to specify slack when a precise (ish)
wakeup is unnecessary to help minimize wakeups

Signed-off-by: Patrick Pannuto <ppannuto@codeaurora.org>
Cc: akinobu.mita@gmail.com
Cc: sboyd@codeaurora.org
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <4C44CDD2.1070708@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-23 15:08:12 +02:00
J. Bruce Fields
866e26115c timers: Document meaning of deferrable timer
Steal some text from 6e453a67510 "Add support for deferrable timers".  A
reader shouldn't have to dig through the git logs for the basic
description of a deferrable timer.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: johnstul@us.ibm.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-23 15:08:12 +02:00
Tejun Heo
181a51f6e0 slow-work: kill it
slow-work doesn't have any user left.  Kill it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
2010-07-23 13:14:49 +02:00
Ingo Molnar
3a01736e70 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-07-23 09:10:29 +02:00
Tejun Heo
e120153ddf workqueue: fix how cpu number is stored in work->data
Once a work starts execution, its data contains the cpu number it was
on instead of pointing to cwq.  This is added by commit 7a22ad75
(workqueue: carry cpu number in work data once execution starts) to
reliably determine the work was last on even if the workqueue itself
was destroyed inbetween.

Whether data points to a cwq or contains a cpu number was
distinguished by comparing the value against PAGE_OFFSET.  The
assumption was that a cpu number should be below PAGE_OFFSET while a
pointer to cwq should be above it.  However, on architectures which
use separate address spaces for user and kernel spaces, this doesn't
hold as PAGE_OFFSET is zero.

Fix it by using an explicit flag, WORK_STRUCT_CWQ, to mark what the
data field contains.  If the flag is set, it's pointing to a cwq;
otherwise, it contains a cpu number.

Reported on s390 and microblaze during linux-next testing.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Sachin Sant <sachinp@in.ibm.com>
Reported-by: Michal Simek <michal.simek@petalogix.com>
Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Tested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Tested-by: Michal Simek <monstr@monstr.eu>
2010-07-22 22:39:22 +02:00
Dan Carpenter
24a461d537 trace: strlen() return doesn't account for the NULL
We need to add one to the strlen() return because of the NULL
character.  The type->name here generally comes from the kernel and I
don't think any of them come close to being MAX_TRACER_SIZE (100)
characters long so this is basically a cleanup.

Signed-off-by: Dan Carpenter <error27@gmail.com>
LKML-Reference: <20100710100644.GV19184@bicker>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-22 14:56:41 -04:00
Jason Wessel
edd63cb6b9 sysrq,kdb: Use __handle_sysrq() for kdb's sysrq function
The kdb code should not toggle the sysrq state in case an end user
wants to try and resume the normal kernel execution.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2010-07-21 19:27:07 -05:00
Jason Wessel
b0679c63db debug_core,kdb: fix kgdb_connected bit set in the wrong place
Immediately following an exit from the kdb shell the kgdb_connected
variable should be set to zero, unless there are breakpoints planted.
If the kgdb_connected variable is not zeroed out with kdb, it is
impossible to turn off kdb.

This patch is merely a work around for now, the real fix will check
for the breakpoints.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-07-21 19:27:07 -05:00
Jason Wessel
9e8b624fca Fix merge regression from external kdb to upstream kdb
In the process of merging kdb to the mainline, the kdb lsmod command
stopped printing the base load address of kernel modules.  This is
needed for using kdb in conjunction with external tools such as gdb.

Simply restore the functionality by adding a kdb_printf for the base
load address of the kernel modules.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-07-21 19:27:06 -05:00
Jason Wessel
fb82c0ff27 repair gdbstub to match the gdbserial protocol specification
The gdbserial protocol handler should return an empty packet instead
of an error string when ever it responds to a command it does not
implement.

The problem cases come from a debugger client sending
qTBuffer, qTStatus, qSearch, qSupported.

The incorrect response from the gdbstub leads the debugger clients to
not function correctly.  Recent versions of gdb will not detach correctly as a result of this behavior.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
2010-07-21 19:27:05 -05:00
Martin Hicks
1396a21ba0 kdb: break out of kdb_ll() when command is terminated
Without this patch the "ll" linked-list traversal command won't
terminate when you hit q/Q.

Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-07-21 19:27:05 -05:00
Josh Hunt
eebef74695 sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug
The sched_child_runs_first value in /proc/sched_debug is
currently displayed using a macro meant to split ns time values.
This patch uses the correct macro to display it as a plain
decimal value.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Cc: peterz@infradead.org
Cc: juhlenko@akamai.com
Cc: efault@gmx.de
LKML-Reference: <1279567876-25418-1-git-send-email-johunt@akamai.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-21 21:46:12 +02:00
Ingo Molnar
dca45ad8af Merge branch 'linus' into sched/core
Merge reason: Move from the -rc3 to the almost-rc6 base.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-21 21:45:08 +02:00
Ingo Molnar
23c2875725 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core 2010-07-21 21:44:18 +02:00
Ingo Molnar
9dcdbf7a33 Merge branch 'linus' into perf/core
Merge reason: Pick up the latest perf fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-21 21:43:06 +02:00
KOSAKI Motohiro
ef710e100c tracing: Shrink max latency ringbuffer if unnecessary
Documentation/trace/ftrace.txt says

  buffer_size_kb:

        This sets or displays the number of kilobytes each CPU
        buffer can hold. The tracer buffers are the same size
        for each CPU. The displayed number is the size of the
        CPU buffer and not total size of all buffers. The
        trace buffers are allocated in pages (blocks of memory
        that the kernel uses for allocation, usually 4 KB in size).
        If the last page allocated has room for more bytes
        than requested, the rest of the page will be used,
        making the actual allocation bigger than requested.
        ( Note, the size may not be a multiple of the page size
          due to buffer management overhead. )

        This can only be updated when the current_tracer
        is set to "nop".

But it's incorrect. currently total memory consumption is
'buffer_size_kb x CPUs x 2'.

Why two times difference is there? because ftrace implicitly allocate
the buffer for max latency too.

That makes sad result when admin want to use large buffer. (If admin
want full logging and makes detail analysis). example, If admin
have 24 CPUs machine and write 200MB to buffer_size_kb, the system
consume ~10GB memory (200MB x 24 x 2). umm.. 5GB memory waste is
usually unacceptable.

Fortunatelly, almost all users don't use max latency feature.
The max latency buffer can be disabled easily.

This patch shrink buffer size of the max latency buffer if
unnecessary.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
LKML-Reference: <20100701104554.DA2D.A69D9226@jp.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-21 10:20:17 -04:00
Lai Jiangshan
bc289ae98b tracing: Reduce latency and remove percpu trace_seq
__print_flags() and __print_symbolic() use percpu trace_seq:

1) Its memory is allocated at compile time, it wastes memory if we don't use tracing.
2) It is percpu data and it wastes more memory for multi-cpus system.
3) It disables preemption when it executes its core routine
   "trace_seq_printf(s, "%s: ", #call);" and introduces latency.

So we move this trace_seq to struct trace_iterator.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4C078350.7090106@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-20 22:05:34 -04:00