Create the new functions we need to support uretprobes, and change
alloc_trace_uprobe() to initialize consumer.ret_handler if the new
"is_ret" argument is true. Curently this argument is always false,
so the new code is never called and is_ret_probe(tu) is false too.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>
Extract the output code from uprobe_trace_func() and uprobe_perf_func()
into the new helpers, they will be used by ->ret_handler() too. We also
add the unused "unsigned long func" argument in advance, to simplify the
next changes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>
struct uprobe_trace_entry_head has a single member for reporting,
"unsigned long ip". If we want to support uretprobes we need to
create another struct which has "func" and "ret_ip" and duplicate
a lot of functions, like trace_kprobe.c does.
To avoid this copy-and-paste horror we turn ->ip into ->vaddr[]
and add couple of trivial helpers to calculate sizeof/data. This
uglifies the code a bit, but this allows us to avoid a lot more
complications later, when we add the support for ret-probes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>
uprobe_trace_func() is never called with irqs or preemption
disabled, no need to ask preempt_count() or local_save_flags().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Anton Arapov <anton@redhat.com>
seq_print_ip_sym(ip) in print_uprobe_event() is pointless,
kallsyms_lookup(ip) can not resolve a user-space address.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>
uprobe_trace_func() and uprobe_perf_func() do not need task_pt_regs(),
we already have "struct pt_regs *regs".
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>
Unlike the kretprobes we can't trust userspace, thus must have
protection from user space attacks. User-space have "unlimited"
stack, and this patch limits the return probes nestedness as a
simple remedy for it.
Note that this implementation leaks return_instance on siglongjmp
until exit()/exec().
The intention is to have KISS and bare minimum solution for the
initial implementation in order to not complicate the uretprobes
code.
In the future we may come up with more sophisticated solution that
remove this depth limitation. It is not easy task and lays beyond
this patchset.
Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Uretprobe handlers are invoked when the trampoline is hit, on completion
the trampoline is replaced with the saved return address and the uretprobe
instance deleted.
TODO: handle_trampoline() assumes that ->return_instances is always valid.
We should teach it to handle longjmp() which can invalidate the pending
return_instance's. This is nontrivial, we will try to do this in a separate
series.
Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
When a uprobe with return probe consumer is hit, prepare_uretprobe()
function is invoked. It creates return_instance, hijacks return address
and replaces it with the trampoline.
* Return instances are kept as stack per uprobed task.
* Return instance is chained, when the original return address is
trampoline's page vaddr (e.g. recursive call of the probed function).
Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Allocate trampoline page, as the very first one in uprobed
task xol area, and fill it with breakpoint opcode.
Also introduce get_trampoline_vaddr() helper, to wrap the
trampoline address extraction from area->vaddr. That removes
confusion and eases the debug experience in case ->vaddr
notion will be changed.
Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
It seems that function profiler's hash size is fixed at 1024. Add and
use FTRACE_PROFILE_HASH_BITS instead and update hash size macro.
Link: http://lkml.kernel.org/r/1365551750-4504-1-git-send-email-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The ftrace_graph_count can be decreased with a "!" pattern, so that
the enabled flag should be updated too.
Link: http://lkml.kernel.org/r/1365663698-2413-1-git-send-email-namhyung@kernel.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As ftrace_filter_lseek is now used with ftrace_pid_fops, it needs to
be moved out of the #ifdef CONFIG_DYNAMIC_FTRACE section as the
ftrace_pid_fops is defined when DYNAMIC_FTRACE is not.
Cc: stable@vger.kernel.org
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently set_ftrace_pid and set_graph_function files use seq_lseek
for their fops. However seq_open() is called only for FMODE_READ in
the fops->open() so that if an user tries to seek one of those file
when she open it for writing, it sees NULL seq_file and then panic.
It can be easily reproduced with following command:
$ cd /sys/kernel/debug/tracing
$ echo 1234 | sudo tee -a set_ftrace_pid
In this example, GNU coreutils' tee opens the file with fopen(, "a")
and then the fopen() internally calls lseek().
Link: http://lkml.kernel.org/r/1365663302-2170-1-git-send-email-namhyung@kernel.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This reverts commit 84cfb6ab484b442d5115eb3baf9db7d74a3ea626. There
are scheduled changes which make use of the removed callback.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rami Rosen <ramirose@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>
The smpboot threads rely on the park/unpark mechanism which binds per
cpu threads on a particular core. Though the functionality is racy:
CPU0 CPU1 CPU2
unpark(T) wake_up_process(T)
clear(SHOULD_PARK) T runs
leave parkme() due to !SHOULD_PARK
bind_to(CPU2) BUG_ON(wrong CPU)
We cannot let the tasks move themself to the target CPU as one of
those tasks is actually the migration thread itself, which requires
that it starts running on the target cpu right away.
The solution to this problem is to prevent wakeups in park mode which
are not from unpark(). That way we can guarantee that the association
of the task to the target cpu is working correctly.
Add a new task state (TASK_PARKED) which prevents other wakeups and
use this state explicitly for the unpark wakeup.
Peter noticed: Also, since the task state is visible to userspace and
all the parked tasks are still in the PID space, its a good hint in ps
and friends that these tasks aren't really there for the moment.
The migration thread has another related issue.
CPU0 CPU1
Bring up CPU2
create_thread(T)
park(T)
wait_for_completion()
parkme()
complete()
sched_set_stop_task()
schedule(TASK_PARKED)
The sched_set_stop_task() call is issued while the task is on the
runqueue of CPU1 and that confuses the hell out of the stop_task class
on that cpu. So we need the same synchronizaion before
sched_set_stop_task().
Reported-by: Dave Jones <davej@redhat.com>
Reported-and-tested-by: Dave Hansen <dave@sr71.net>
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Acked-by: Peter Ziljstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: dhillf@gmail.com
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
- System reboot/halt fix related to CPU offline ordering
from Huacai Chen.
- intel_pstate driver fix for a delay time computation error
occasionally crashing systems using it from Dirk Brandewie.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJRZynOAAoJEKhOf7ml8uNsYWcQAIZIps7Ivn2+r3ENL+jhTohx
ErEz/cu/YIS/TnDzO3GO+Yo9CcXjUebMWqefIC//YK/K+tNepVOLovthTGiA/X36
23RDRrF1hqZlEgiEfFpuXiyq9u33CbUCYt75tsBXhxJkxeG7J7JfiG4AUh8dED4B
nUCbQ4jWM7r9DYJFl2gjDkFt1SjG/UbxcN9Kua9v4zfJil9fKp9093HHYBHH3a2n
zXlAE7CskXrNOepwp9Efzu5uPU3gbkIiQdKxvUs91remAcZ3fMsbz8CerZlgfy1S
+3f4AuU9i2AXeYI5fanhLo6Mwm8jqBvZ8ZE4Fh/EuQs9eHk7VuRsy7n22zaVeU0A
efaldd/pdP7KbSv5Wrs8adQr3GcRHkuHnMGhTlp41tfR8gJfpZUrK3/6h/jnIPRC
1UnBAF4K67v85fBO6gnC8UhEp3MXXXZoPtPByGILxj34KVn+oHzrVgE+8+ugv7HM
ZJ5jobYPWrxI2lZv5kuBdHCVg2TAC3YUz2aev8cEhIo4vdcIC2cofVDyAcN9ArqF
aF6fcNr6Rgu/M6bB2bP/zbhmDApr8H8z952jss51gprJ+IiKNUh9daiFnYw+o391
9VVTolC7k6P4pXTbtgqEFDLTJ0dKD8i/J4RLHwIsX7jzVgLctyqKZsNXskovjH4p
jqIxu/1SPxR2dtBziUEH
=bkFT
-----END PGP SIGNATURE-----
Merge tag 'pm-3.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
- System reboot/halt fix related to CPU offline ordering from Huacai
Chen.
- intel_pstate driver fix for a delay time computation error
occasionally crashing systems using it from Dirk Brandewie.
* tag 'pm-3.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()
cpufreq / intel_pstate: Set timer timeout correctly
If the function profiler fails to allocate memory for everything,
it will do a double free on the same pointer which can cause a panic.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJRZdpLAAoJEOdOSU1xswtMyKgH/12ep1nFAYvXQQ04vcV3stCV
7vgk6oDMAGSYgwV2eNUbHNm2zkQBifFxUWLqWyzCd9t4RZUiIv5QHd2a+N2Ta+Xp
Do8zhwod3vzSaZsM3JvQRK5q8U6R72dqroPiv+lJ+jh7cIPdHCm87P+ZPYgAgpfv
6J80Vk34q/HdEGEmNuQgLzgfB+sfld/Ob6Te69f1rmzqCfHCytY1i3R0iPWvaI/v
B8R5cosjDhm0hAljsFlZb2Vl1jb89ByTgX3dL5Ph3O+hnHPCWE+ZQtbLCaOBV9F0
z8glXmAu2XVhv++0d21ul/TddQhVYQYF+ZMawxUlnLVKZ/J66c3l9Omhwf33Wz4=
=+NqN
-----END PGP SIGNATURE-----
Merge tag 'trace-fixes-3.9-rc-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Namhyung Kim fixed a long standing bug that can cause a kernel panic.
If the function profiler fails to allocate memory for everything, it
will do a double free on the same pointer which can cause a panic"
* tag 'trace-fixes-3.9-rc-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix double free when function profile init failed
perf_event is one of a couple remaining cgroup controllers with broken
hierarchy support. Converting it to support hierarchy is almost
trivial. The only thing necessary is to consider a task belonging to
a descendant cgroup as a match. IOW, if the cgroup of the currently
executing task (@cpuctx->cgrp) equals or is a descendant of the
event's cgroup (@event->cgrp), then the event should be enabled.
Implement hierarchy support and remove .broken_hierarchy tag along
with the incorrect comment on what needs to be done for hierarchy
support.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
A couple controllers want to determine whether two cgroups are in
ancestor/descendant relationship. As it's more likely that the
descendant is the primary subject of interest and there are other
operations focusing on the descendants, let's ask is_descendent rather
than is_ancestor.
Implementation is trivial as the previous patch guarantees that all
ancestors of a cgroup stay accessible as long as the cgroup is
accessible.
tj: Removed depth optimization, renamed from cgroup_is_ancestor(),
rewrote descriptions.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Suppose we rmdir a cgroup and there're still css refs, this cgroup won't
be freed. Then we rmdir the parent cgroup, and the parent is freed
immediately due to css ref draining to 0. Now it would be a disaster if
the still-alive child cgroup tries to access its parent.
Make sure this won't happen.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The bind() method of cgroup_subsys is not used in any of the
controllers (cpuset, freezer, blkio, net_cls, memcg, net_prio,
devices, perf, hugetlb, cpu and cpuacct)
tj: Removed the entry on ->bind() from
Documentation/cgroups/cgroups.txt. Also updated a couple
paragraphs which were suggesting that dynamic re-binding may be
implemented. It's not gonna.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The cpuacct split caused this build failure on UML:
kernel/sched/cpuacct.c:94:2: error: implicit declaration of function 'ERR_PTR'
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The only user was cpuacct.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5155385A.4040207@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now we're guaranteed when cpuacct_charge() and
cpuacct_account_field() are called, cpuacct has already been
properly initialized, so we no longer need those checks.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5155384C.7000508@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Initialize cpuacct before the scheduler is functioning, so when
cpuacct_charge() and cpuacct_account_field() are called,
task_ca() won't return NULL.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5155383F.8000005@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now we don't need cpuacct_init(), and instead we just initialize
root_cpuacct when it's defined.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51553834.9090701@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is a preparation, so later we can initialize cpuacct
earlier.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51553822.5000403@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now most of the code in cpuacct.h can be moved to cpuacct.c
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/515536D5.2080401@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is a micro optimazation for a hot path.
- We don't need to check if @ca returned from task_ca() is NULL.
- We don't need to check if @ca returned from parent_ca() is NULL.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/515536B7.6060602@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is a micro optimization for the hot path.
- We don't need to check if @ca is NULL in parent_ca().
- We don't need to check if @ca is NULL in the beginning of the for loop.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/515536A9.5000700@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So we can remove open-coded cpuacct code in cputime.c.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51553692.9060008@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So we don't open-coded initialization of cpuacct in core.c.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51553687.1060906@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add cpuacct.h and let sched.h include it.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5155367B.2060506@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A comment in function rebalance_domains() mentions
arch_init_sched_domains(), but that function does not exist
anymore. The proper function is init_sched_domains().
Signed-off-by: Libin <huawei.libin@huawei.com>
Cc: <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1364814841-49156-1-git-send-email-huawei.libin@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull RCU updates from Paul E. McKenney:
* Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ
take advantage of numbered callbacks, do additional callback
accelerations based on numbered callbacks. Posted to LKML
at https://lkml.org/lkml/2013/3/18/960.
* RCU documentation updates. Posted to LKML at
https://lkml.org/lkml/2013/3/18/570.
* Miscellaneous fixes. Posted to LKML at
https://lkml.org/lkml/2013/3/18/594.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
At this point tsk_cache_hot is always true, so no need to check it.
Signed-off-by: Zhang Hang <bob.zhanghang@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51650107.9040606@huawei.com
[ Also remove unnecessary schedstat #ifdefs. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On the failure path, stat->start and stat->pages will refer same page.
So it'll attempt to free the same page again and get kernel panic.
Link: http://lkml.kernel.org/r/1364820385-32027-1-git-send-email-namhyung@kernel.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
The only part of proc_dir_entry the code outside of fs/proc
really cares about is PDE(inode)->data. Provide a helper
for that; static inline for now, eventually will be moved
to fs/proc, along with the knowledge of struct proc_dir_entry
layout.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>