4517 Commits

Author SHA1 Message Date
Ingo Molnar
5806b81ac1 Merge branch 'auto-ftrace-next' into tracing/for-linus
Conflicts:

	arch/x86/kernel/entry_32.S
	arch/x86/kernel/process_32.c
	arch/x86/kernel/process_64.c
	arch/x86/lib/Makefile
	include/asm-x86/irqflags.h
	kernel/Makefile
	kernel/sched.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-14 16:11:52 +02:00
Ingo Molnar
d14c8a680c Merge branch 'sched/for-linus' into tracing/for-linus 2008-07-14 16:11:02 +02:00
Ingo Molnar
6712e299b7 Merge branch 'tracing/ftrace' into auto-ftrace-next 2008-07-14 15:58:35 +02:00
Ingo Molnar
873a6ed628 Merge commit 'v2.6.26' into sched/devel 2008-07-14 12:19:19 +02:00
Ingo Molnar
361833efac Merge branch 'sched/clock' into sched/devel 2008-07-14 12:19:13 +02:00
Ingo Molnar
d12c1a3792 lockdep: fix kernel/fork.c warning
fix:

[    0.184011] ------------[ cut here ]------------
[    0.188011] WARNING: at kernel/fork.c:918 copy_process+0x1c0/0x1084()
[    0.192011] Pid: 0, comm: swapper Not tainted 2.6.26-tip-00351-g01d4a50-dirty #14521
[    0.196011]  [<c0135d48>] warn_on_slowpath+0x3c/0x60
[    0.200012]  [<c016f805>] ? __alloc_pages_internal+0x92/0x36b
[    0.208012]  [<c033de5e>] ? __spin_lock_init+0x24/0x4a
[    0.212012]  [<c01347e3>] copy_process+0x1c0/0x1084
[    0.216013]  [<c013575f>] do_fork+0xb8/0x1ad
[    0.220013]  [<c034f75e>] ? acpi_os_release_lock+0x8/0xa
[    0.228013]  [<c034ff7a>] ? acpi_os_vprintf+0x20/0x24
[    0.232014]  [<c01129ee>] kernel_thread+0x75/0x7d
[    0.236014]  [<c0a491eb>] ? kernel_init+0x0/0x24a
[    0.240014]  [<c0a491eb>] ? kernel_init+0x0/0x24a
[    0.244014]  [<c01151b0>] ? kernel_thread_helper+0x0/0x10
[    0.252015]  [<c06c6ac0>] rest_init+0x14/0x50
[    0.256015]  [<c0a498ce>] start_kernel+0x2b9/0x2c0
[    0.260015]  [<c0a4904f>] __init_begin+0x4f/0x57
[    0.264016]  =======================
[    0.268016] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.272016] enabled ExtINT on CPU#0

which occurs if CONFIG_TRACE_IRQFLAGS=y, CONFIG_DEBUG_LOCKDEP=y,
but CONFIG_PROVE_LOCKING is disabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-14 12:09:28 +02:00
Ingo Molnar
d59fdcf2ac Merge commit 'v2.6.26' into x86/core 2008-07-14 11:37:46 +02:00
Ingo Molnar
992860e991 lockdep: fix ftrace irq tracing false positive
fix this false positive:

[    0.020000] ------------[ cut here ]------------
[    0.020000] WARNING: at kernel/lockdep.c:2718 check_flags+0x14a/0x170()
[    0.020000] Modules linked in:
[    0.020000] Pid: 0, comm: swapper Not tainted 2.6.26-tip-00343-gd7e5521-dirty #14486
[    0.020000]  [<c01312e4>] warn_on_slowpath+0x54/0x80
[    0.020000]  [<c067e451>] ? _spin_unlock_irqrestore+0x61/0x70
[    0.020000]  [<c0131bb1>] ? release_console_sem+0x201/0x210
[    0.020000]  [<c0143d65>] ? __kernel_text_address+0x35/0x40
[    0.020000]  [<c010562e>] ? dump_trace+0x5e/0x140
[    0.020000]  [<c01518b5>] ? __lock_acquire+0x245/0x820
[    0.020000]  [<c015063a>] check_flags+0x14a/0x170
[    0.020000]  [<c0151ed8>] ? lock_acquire+0x48/0xc0
[    0.020000]  [<c0151ee1>] lock_acquire+0x51/0xc0
[    0.020000]  [<c014a16c>] ? down+0x2c/0x40
[    0.020000]  [<c010a609>] ? sched_clock+0x9/0x10
[    0.020000]  [<c067e7b2>] _write_lock+0x32/0x60
[    0.020000]  [<c013797f>] ? request_resource+0x1f/0xb0
[    0.020000]  [<c013797f>] request_resource+0x1f/0xb0
[    0.020000]  [<c02f89ad>] vgacon_startup+0x2bd/0x3e0
[    0.020000]  [<c094d62a>] con_init+0x19/0x22f
[    0.020000]  [<c0330c7c>] ? tty_register_ldisc+0x5c/0x70
[    0.020000]  [<c094cf49>] console_init+0x20/0x2e
[    0.020000]  [<c092a969>] start_kernel+0x20c/0x379
[    0.020000]  [<c092a516>] ? unknown_bootoption+0x0/0x1f6
[    0.020000]  [<c092a099>] __init_begin+0x99/0xa1
[    0.020000]  =======================
[    0.020000] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.020000] possible reason: unannotated irqs-on.
[    0.020000] irq event stamp: 0

which occurs if CONFIG_TRACE_IRQFLAGS=y, CONFIG_DEBUG_LOCKDEP=y,
but CONFIG_PROVE_LOCKING is disabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-14 10:32:14 +02:00
Ingo Molnar
b4ba0ba24b Merge commit 'v2.6.26' into core/locking 2008-07-14 10:31:59 +02:00
Stephen Smalley
006ebb40d3 Security: split proc ptrace checking into read vs. attach
Enable security modules to distinguish reading of process state via
proc from full ptrace access by renaming ptrace_may_attach to
ptrace_may_access and adding a mode argument indicating whether only
read access or full attach access is requested.  This allows security
modules to permit access to reading process state without granting
full ptrace access.  The base DAC/capability checking remains unchanged.

Read access to /proc/pid/mem continues to apply a full ptrace attach
check since check_mem_permission() already requires the current task
to already be ptracing the target.  The other ptrace checks within
proc for elements like environ, maps, and fds are changed to pass the
read mode instead of attach.

In the SELinux case, we model such reading of process state as a
reading of a proc file labeled with the target process' label.  This
enables SELinux policy to permit such reading of process state without
permitting control or manipulation of the target process, as there are
a number of cases where programs probe for such information via proc
but do not need to be able to control the target (e.g. procps,
lsof, PolicyKit, ConsoleKit).  At present we have to choose between
allowing full ptrace in policy (more permissive than required/desired)
or breaking functionality (or in some cases just silencing the denials
via dontaudit rules but this can hide genuine attacks).

This version of the patch incorporates comments from Casey Schaufler
(change/replace existing ptrace_may_attach interface, pass access
mode), and Chris Wright (provide greater consistency in the checking).

Note that like their predecessors __ptrace_may_attach and
ptrace_may_attach, the __ptrace_may_access and ptrace_may_access
interfaces use different return value conventions from each other (0
or -errno vs. 1 or 0).  I retained this difference to avoid any
changes to the caller logic but made the difference clearer by
changing the latter interface to return a bool rather than an int and
by adding a comment about it to ptrace.h for any future callers.

Signed-off-by:  Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-07-14 15:01:47 +10:00
Lai Jiangshan
199a952876 rcu classic: update qlen when cpu offline
When callbacks are moved from offline cpu to this cpu,
the qlen field of this rdp should be updated.

[ Paul E. McKenney: ]

The effect of this bug would be for force_quiescent_state() to be invoked
when it should not and vice versa -- wasting cycles in the first case
and letting RCU callbacks remain piled up in the second case.  The bug
is thus "benign" in that it does not result in premature grace-period
termination, but should of course be fixed nonetheless.

Preemption is disabled by the caller's get_cpu_var(), so we are guaranteed
to remain on the same CPU, as required.  The local_irq_disable() is indeed
needed, otherwise, an interrupt might invoke call_rcu() or call_rcu_bh(),
which could cause that interrupt's increment of ->qlen to be lost.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 23:12:17 +02:00
Linus Torvalds
3b5c6b8349 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  cpusets, hotplug, scheduler: fix scheduler domain breakage
2008-07-13 11:03:59 -07:00
Dmitry Adamushko
3e84050c81 cpusets, hotplug, scheduler: fix scheduler domain breakage
Commit f18f982ab ("sched: CPU hotplug events must not destroy scheduler
domains created by the cpusets") introduced a hotplug-related problem as
described below:

Upon CPU_DOWN_PREPARE,

  update_sched_domains() -> detach_destroy_domains(&cpu_online_map)

does the following:

/*
 * Force a reinitialization of the sched domains hierarchy. The domains
 * and groups cannot be updated in place without racing with the balancing
 * code, so we temporarily attach all running cpus to the NULL domain
 * which will prevent rebalancing while the sched domains are recalculated.
 */

The sched-domains should be rebuilt when a CPU_DOWN ops. has been
completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
initial state). That's what update_sched_domains() also does but only
for !CPUSETS case.

With f18f982ab, sched-domains' reinitialization is delegated to
CPUSETS code:

cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
rebuild_sched_domains()

Being called for CPU_UP_PREPARE and if its callback is called after
update_sched_domains()), it just negates all the work done by
update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
the sched-domains and that makes it visible for the load-balancer
while the CPU_DOWN ops. is in progress.

__migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
"offline" when this function is called).

try_to_wake_up() is called for one of these tasks from another CPU ->
the load-balancer (wake_idle()) picks up a "dead" CPU and places the
task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later
-> oops.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Paul Menage <menage@google.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: miaox@cn.fujitsu.com
Cc: rostedt@goodmis.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 11:37:02 +02:00
Ingo Molnar
54ef76f37b Merge branch 'linus' into sched/devel 2008-07-13 08:50:13 +02:00
Ingo Molnar
ae94b8075a Merge branch 'linus' into x86/core
Conflicts:

	arch/x86/mm/ioremap.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 07:29:02 +02:00
Ingo Molnar
b2613e370d ftrace: build fix for ftraced_suspend
fix:

 kernel/trace/ftrace.c:1615: error: 'ftraced_suspend' undeclared (first use in this function)
 kernel/trace/ftrace.c:1615: error: (Each undeclared identifier is reported only once
 kernel/trace/ftrace.c:1615: error: for each function it appears in.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 16:46:50 +02:00
Steven Rostedt
c300ba2528 sched_clock: and multiplier for TSC to gtod drift
The sched_clock code currently tries to keep all CPU clocks of all CPUS
somewhat in sync. At every clock tick it records the gtod clock and
uses that and jiffies and the TSC to calculate a CPU clock that tries to
stay in sync with all the other CPUs.

ftrace depends heavily on this timer and it detects when this timer
"jumps".  One problem is that the TSC and the gtod also drift.
When the TSC is 0.1% faster or slower than the gtod it is very noticeable
in ftrace. To help compensate for this, I've added a multiplier that
tries to keep the CPU clock updating at the same rate as the gtod.

I've tried various ways to get it to be in sync and this ended up being
the most reliable. At every scheduler tick we calculate the new multiplier:

  multi = delta_gtod / delta_TSC

This means we perform a 64 bit divide at the tick (once a HZ). A shift
is used to handle the accuracy.

Other methods that failed due to dynamic HZ are:

(not used)  multi += (gtod - tsc) / delta_gtod
(not used)  multi += (gtod - (last_tsc + delta_tsc)) / delta_gtod

as well as other variants.

This code still allows for a slight drift between TSC and gtod, but
it keeps the damage down to a minimum.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:53:28 +02:00
Steven Rostedt
a83bc47c33 sched_clock: record TSC after gtod
To read the gtod we need to grab the xtime lock for read. Reading the gtod
before the TSC can cause a bigger gab if the xtime lock is contended.

This patch simply reverses the order to read the TSC after the gtod.
The locking in the reading of the gtod handles any barriers one might
think is needed.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:53:27 +02:00
Steven Rostedt
c0c87734f1 sched_clock: only update deltas with local reads.
Reading the CPU clock should try to stay accurate within the CPU.
By reading the CPU clock from another CPU and updating the deltas can
cause unneeded jumps when reading from the local CPU.

This patch changes the code to update the last read TSC only when read
from the local CPU.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:53:27 +02:00
Steven Rostedt
2b8a0cf489 sched_clock: fix calculation of other CPU
The algorithm to calculate the 'now' of another CPU is not correct.
At each scheduler tick, each CPU records the last sched_clock and
gtod (tick_raw and tick_gtod respectively). If the TSC is somewhat the
same in speed between two clocks the algorithm would be:

  tick_gtod1 + (now1 - tick_raw1) = tick_gtod2 + (now2 - tick_raw2)

To calculate now2 we would have:

  now2 = (tick_gtod1 - tick_gtod2) + (tick_raw2 - tick_raw1) + now1

Currently the algorithm is:

  now2 = (tick_gtod1 - tick_gtod2) + (tick_raw1 - tick_raw2) + now1

This solves most of the rest of the issues I've had with timestamps in
ftace.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:53:26 +02:00
Steven Rostedt
af52a90a14 sched_clock: stop maximum check on NO HZ
Working with ftrace I would get large jumps of 11 millisecs or more with
the clock tracer. This killed the latencing timings of ftrace and also
caused the irqoff self tests to fail.

What was happening is with NO_HZ the idle would stop the jiffy counter and
before the jiffy counter was updated the sched_clock would have a bad
delta jiffies to compare with the gtod with the maximum.

The jiffies would stop and the last sched_tick would record the last gtod.
On wakeup, the sched clock update would compare the gtod + delta jiffies
(which would be zero) and compare it to the TSC. The TSC would have
correctly (with a stable TSC) moved forward several jiffies. But because the
jiffies has not been updated yet the clock would be prevented from moving
forward because it would appear that the TSC jumped too far ahead.

The clock would then virtually stop, until the jiffies are updated. Then
the next sched clock update would see that the clock was very much behind
since the delta jiffies is now correct. This would then jump the clock
forward by several jiffies.

This caused ftrace to report several milliseconds of interrupts off
latency at every resume from NO_HZ idle.

This patch adds hooks into the nohz code to disable the checking of the
maximum clock update when nohz is in effect. It resumes the max check
when nohz has updated the jiffies again.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:53:26 +02:00
Steven Rostedt
f7cce27f56 sched_clock: widen the max and min time
With keeping the max and min sched time within one jiffy of the gtod clock
was too tight. Just before a schedule tick the max could easily be hit, as
well as just after a schedule_tick the min could be hit. This caused the
clock to jump around by a jiffy.

This patch widens the minimum to
   last gtod + (delta_jiffies ? delta_jiffies - 1 : 0) * TICK_NSECS

and the maximum to
    last gtod + (2 + delta_jiffies) * TICK_NSECS

This keeps the minum to gtod or if one jiffy less than delta jiffies
and the maxim 2 jiffies ahead of gtod. This may cause unstable TSCs to be
a bit more sporadic, but it helps keep a clock with a stable TSC working well.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:53:25 +02:00
Steven Rostedt
62c43dd986 sched_clock: record from last tick
The sched_clock code tries to keep within the gtod time by one tick (jiffy).
The current code mistakenly keeps track of the delta jiffies between
updates of the clock, where the the delta is used to compare with the
number of jiffies that have past since an update of the gtod. The gtod is
updated at each schedule tick not each sched_clock update. After one
jiffy passes the clock is updated fine. But the delta is taken from the
last update so if the next update happens before the next tick the delta
jiffies used will be incorrect.

This patch changes the code to check the delta of jiffies between ticks
and not updates to match the comparison of the updates with the gtod.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:53:25 +02:00
Steven Rostedt
60bc080090 ftrace: separate out the function enabled variable
Currently the function tracer uses the global tracer_enabled variable that
is used to keep track if the tracer is enabled or not. The function tracing
startup needs to be separated out, otherwise the internal happenings of
the tracer startup is also recorded.

This patch creates a ftrace_function_enabled variable to all the starting
of the function traces to happen after everything has been started.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:49:22 +02:00
Steven Rostedt
a2bb6a3d85 ftrace: add ftrace_kill_atomic
It has been suggested that I add a way to disable the function tracer
on an oops. This code adds a ftrace_kill_atomic. It is not meant to be
used in normal situations. It will disable the ftrace tracer, but will
not perform the nice shutdown that requires scheduling.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:49:21 +02:00
Steven Rostedt
26bc83f4cb ftrace: use current CPU for function startup
This is more of a clean up. Currently the function tracer initializes the
tracer with which ever CPU was last used for tracing. This value isn't
realy useful for function tracing, but at least it should be something other
than a random number.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:49:21 +02:00
Steven Rostedt
ad591240ce ftrace: start wakeup tracing after setting function tracer
Enabling the wakeup tracer before enabling the function tracing causes
some strange results due to the dynamic enabling of the functions.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:49:20 +02:00
Steven Rostedt
b5c21b4514 ftrace: check proper config for preempt type
There is no CONFIG_PREEMPT_DESKTOP. Use the proper entry CONFIG_PREEMPT.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:49:19 +02:00
Steven Rostedt
1e16c0a081 ftrace: trace schedule
After the sched_clock code has been removed from sched.c we can now trace
the scheduler. The scheduler has a lot of functions that would be worth
tracing.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:49:19 +02:00
Steven Rostedt
001b6767b1 ftrace: define function trace nop
When CONFIG_FTRACE is not enabled, the tracing_start_functon_trace
and tracing_stop_function_trace should be nops.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:49:18 +02:00
Steven Rostedt
007c05d4d2 ftrace: move sched_switch enable after markers
We have two markers now that are enabled on sched_switch. One that records
the context switching and the other that records task wake ups. Currently
we enable the tracing first and then set the markers. This causes some
confusing traces:

# tracer: sched_switch
#
#           TASK-PID   CPU#    TIMESTAMP  FUNCTION
#              | |      |          |         |
       trace-cmd-3973  [00]   115.834817:   3973:120:R   +     3:  0:S
       trace-cmd-3973  [01]   115.834910:   3973:120:R   +     6:  0:S
       trace-cmd-3973  [02]   115.834910:   3973:120:R   +     9:  0:S
       trace-cmd-3973  [03]   115.834910:   3973:120:R   +    12:  0:S
       trace-cmd-3973  [02]   115.834910:   3973:120:R   +     9:  0:S
          <idle>-0     [02]   115.834910:      0:140:R ==>  3973:120:R

Here we see that trace-cmd with PID 3973 wakes up task 9 but the next line
shows the idle task doing a context switch to task 3973.

Enabling the tracing to _after_ the markers are set creates a much saner
output:

# tracer: sched_switch
#
#           TASK-PID   CPU#    TIMESTAMP  FUNCTION
#              | |      |          |         |
          <idle>-0     [02]  7922.634225:      0:140:R ==>  4790:120:R
       trace-cmd-4789  [03]  7922.634225:      0:140:R   +  4790:120:R

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:49:18 +02:00
Heiko Carstens
857f3fd7a4 nohz: don't stop idle tick if softirqs are pending.
In case a cpu goes idle but softirqs are pending only an error message is
printed to the console. It may take a very long time until the pending
softirqs will finally be executed. Worst case would be a hanging system.

With this patch the timer tick just continues and the softirqs will be
executed after the next interrupt. Still a delay but better than a
hanging system.

Currently we have at least two device drivers on s390 which under certain
circumstances schedule a tasklet from process context. This is a reason
why we can end up with pending softirqs when going idle. Fixing these
drivers seems to be non-trivial.
However there is no question that the drivers should be fixed.
This patch shouldn't be considered as a bug fix. It just is intended to
keep a system running even if device drivers are buggy.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jan Glauber <jan.glauber@de.ibm.com>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 11:17:04 +02:00
Ingo Molnar
0c81b2a144 Merge branch 'linus' into core/rcu
Conflicts:

	include/linux/rculist.h
	kernel/rcupreempt.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 10:46:50 +02:00
Linus Torvalds
a26449daa2 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: fix cpu hotplug, cleanup
  sched: fix cpu hotplug
2008-07-10 12:34:55 -07:00
Linus Torvalds
b1e387348a sched: fix cpu hotplug, cleanup
Clean up __migrate_task(): to just have separate "done" and "fail"
cases, instead of that "out" case with random error behavior.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-10 20:39:58 +02:00
Nick Piggin
70ff05554f Fix PREEMPT_RCU without HOTPLUG_CPU
PREEMPT_RCU without HOTPLUG_CPU is broken.  The rcu_online_cpu is called
to initially populate rcu_cpu_online_map with all online CPUs when the
hotplug event handler is installed, and also to populate the map with
CPUs as they come online.  The former case is meant to happen with and
without HOTPLUG_CPU, but without HOTPLUG_CPU, the rcu_offline_cpu
function is no-oped -- while it still gets called, it does not set the
rcu CPU map.

With a blank RCU CPU map, grace periods get to tick by completely
oblivious to active RCU read side critical sections.  This results in
free-before-grace bugs.

Fix is obvious once the problem is known. (Also, change __devinit to
__cpuinit so the function gets thrown away on !HOTPLUG_CPU kernels).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Reported-and-tested-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ Nick is my personal hero of the day - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10 11:13:44 -07:00
Daniel Guilak
544304b200 kernel/kprobes.c: Made kprobe_blacklist static.
Signed-off-by: Daniel Guilak <daniel@danielguilak.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10 10:13:51 -07:00
Russell King
f0006314d3 Merge branch 'imx' into devel
Conflicts:

	arch/arm/mm/Kconfig
2008-07-10 16:41:50 +01:00
Russell King
a177ba3b7a Merge branches 'at91', 'dyntick', 'ep93xx', 'iop', 'ixp', 'misc', 'orion', 'omap-reviewed', 'rpc', 'rtc' and 's3c' into devel 2008-07-10 16:38:50 +01:00
Ingo Molnar
ec1bb60bbf Merge branch 'tracing/sysprof' into auto-ftrace-next 2008-07-10 11:43:08 +02:00
Ingo Molnar
5373fdbdc1 Merge branch 'tracing/mmiotrace' into auto-ftrace-next 2008-07-10 11:43:06 +02:00
Ingo Molnar
bac0c9103b Merge branch 'tracing/ftrace' into auto-ftrace-next 2008-07-10 11:43:00 +02:00
Dmitry Adamushko
dc7fab8b3b sched: fix cpu hotplug
I think we may have a race between try_to_wake_up() and
migrate_live_tasks() -> move_task_off_dead_cpu() when the later one
may end up looping endlessly.

Interrupts are enabled on other CPUs when migration_call(CPU_DEAD, ...) is
called so we may get a race between try_to_wake_up() and
migrate_live_tasks() -> move_task_off_dead_cpu(). The former one may push
a task out of a dead CPU causing the later one to loop endlessly.

Heiko Carstens observed:

| That's exactly what explains a dump I got yesterday. Thanks for fixing! :)

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: miaox@cn.fujitsu.com
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-10 09:35:34 +02:00
Ingo Molnar
9e4144abf8 Merge branch 'linus' into core/printk
Conflicts:

	kernel/printk.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-10 08:17:14 +02:00
Thomas Gleixner
48627d8d23 genirq: remove extraneous checks in manage.c
In http://bugzilla.kernel.org/show_bug.cgi?id=9580 it was pointed out
that the desc->chip checks are extraneous. In fact these are left
overs from early development and can be removed safely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-10 07:01:13 +02:00
Daniel Guilak
7683c57c48 kernel/printk.c: Made printk_recursion_bug_msg static.
Signed-off-by: Daniel Guilak <daniel@danielguilak.com>
Acked-by: Josh Triplett <josh@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-08 18:10:34 -07:00
Ingo Molnar
a29d1cfe9e printk: export console_drivers
this symbol is needed by drivers/video/xen-fbfront.ko.

[ cherry-picked from tip/core/printk ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 14:11:27 +02:00
Ingo Molnar
2b4fa851b2 Merge branch 'x86/numa' into x86/devel
Conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/e820.c
	arch/x86/kernel/efi_64.c
	arch/x86/kernel/mpparse.c
	arch/x86/kernel/setup.c
	arch/x86/kernel/setup_32.c
	arch/x86/mm/init_64.c
	include/asm-x86/proto.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:59:23 +02:00
Mike Travis
076ac2af86 sched, numa: replace MAX_NUMNODES with nr_node_ids in kernel/sched.c
* Replace usages of MAX_NUMNODES with nr_node_ids in kernel/sched.c,
    where appropriate.  This saves some allocated space as well as many
    wasted cycles going through node entries that are non-existent.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:31:30 +02:00
Ingo Molnar
6924d1ab8b Merge branches 'x86/numa-fixes', 'x86/apic', 'x86/apm', 'x86/bitops', 'x86/build', 'x86/cleanups', 'x86/cpa', 'x86/cpu', 'x86/defconfig', 'x86/gart', 'x86/i8259', 'x86/intel', 'x86/irqstats', 'x86/kconfig', 'x86/ldt', 'x86/mce', 'x86/memtest', 'x86/pat', 'x86/ptemask', 'x86/resumetrace', 'x86/threadinfo', 'x86/timers', 'x86/vdso' and 'x86/xen' into x86/devel 2008-07-08 09:16:56 +02:00