* pm-cpufreq: (40 commits)
thermal: exynos: boost: Automatic enable/disable of BOOST feature (at Exynos4412)
cpufreq: exynos4x12: Change L0 driver data to CPUFREQ_BOOST_FREQ
Documentation: cpufreq / boost: Update BOOST documentation
cpufreq: exynos: Extend Exynos cpufreq driver to support boost
cpufreq / boost: Kconfig: Support for software-managed BOOST
acpi-cpufreq: Adjust the code to use the common boost attribute
cpufreq: Add boost frequency support in core
intel_pstate: Add trace point to report internal state.
cpufreq: introduce cpufreq_generic_get() routine
ARM: SA1100: Create dummy clk_get_rate() to avoid build failures
cpufreq: stats: create sysfs entries when cpufreq_stats is a module
cpufreq: stats: free table and remove sysfs entry in a single routine
cpufreq: stats: remove hotplug notifiers
cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly
cpufreq: speedstep: remove unused speedstep_get_state
powernow-k6: reorder frequencies
powernow-k6: correctly initialize default parameters
powernow-k6: disable cache when changing frequency
Documentation: add ABI entry for intel_pstate
cpufreq: exynos: Convert exynos-cpufreq to platform driver
...
This patch provides auto disable/enable operation for boost. It uses already
present thermal infrastructure to provide BOOST hysteresis.
The TMU data is modified to work properly with or without BOOST.
Hence, the two first trip points with corresponding clip frequencies are
adjusted.
The first one is reduced from 85 to 70 degrees and the clip frequency is
increased to 1.4 GHz from 800 MHz. This trip point is in fact responsible
for providing BOOST hysteresis. When temperature exceeds 70 deg, the maximal
non BOOST frequency for Exynos4412 is imposed.
Since the first trigger level has been "stolen" for BOOST, the second one
needs to be a compromise for the previously used two for non BOOST
configuration. The 95 deg with modified clip freq (to 400 MHz) should provide
a good balance between cooling down the overheated device and throughput on
an acceptable level.
Two last trigger levels are not modified since, they cause platform shutdown
on emergency overheat to happen.
The third trip point passage results in SW managed shut down of the system.
If the last trip point is crossed, the PMU HW generates the power off
signal.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Acked-by: Eduardo Valentin <eduardo.valentin@ti.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a special driver data flag (CPUFREQ_BOOST_FREQ) to indicate
a frequency that can be only enabled for BOOST mode.
This frequency will be used only for limited time, since running with
it for too long may cause the target device to overheat.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since the support for software and hardware controlled boosting has
been added, update the corresponding documentation.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpufreq_driver's boost_supported flag is true only when boost
support is explicitly enabled. Boost related attributes are exported
only under the same condition.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add CONFIG_CPU_FREQ_BOOST_SW Kconfig option such that software-managed
boost is enabled only after selecting "EXYNOS Frequency Overclocking -
Software". It also depends on the thermal subsystem to be compiled in,
which is necessary for disabling boost and cooling down the device when
overheating is detected.
Software-managed boost _MUST_ _NOT_ be enabled without thermal subsystem
with properly defined overheating temperature thresholds.
This option doesn't affect the x86's hardware-driven boost support
in the acpi-cpufreq driver.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Subject and changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Modify acpi-cpufreq's hardware-based boost solution to work with the
common cpufreq boost framework.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Subject and changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This commit adds boost frequency support in cpufreq core (Hardware &
Software). Some SoCs (like Exynos4 - e.g. 4x12) allow setting frequency
above its normal operation limits. Such mode shall be only used for a
short time.
Overclocking (boost) support is essentially provided by platform
dependent cpufreq driver.
This commit unifies support for SW and HW (Intel) overclocking solutions
in the core cpufreq driver. Previously the "boost" sysfs attribute was
defined in the ACPI processor driver code. By default boost is disabled.
One global attribute is available at: /sys/devices/system/cpu/cpufreq/boost.
It only shows up when cpufreq driver supports overclocking.
Under the hood frequencies dedicated for boosting are marked with a
special flag (CPUFREQ_BOOST_FREQ) at driver's frequency table.
It is the user's concern to enable/disable overclocking with a proper call
to sysfs.
The cpufreq_boost_trigger_state() function is defined non static on purpose.
It is used later with thermal subsystem to provide automatic enable/disable
of the BOOST feature.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add perf trace event "power:pstate_sample" to report driver state to
aid in diagnosing issues reported against intel_pstate.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CPUFreq drivers that use clock frameworks interface,i.e. clk_get_rate(),
to get CPUs clk rate, have similar sort of code used in most of them.
This patch adds a generic ->get() which will do the same thing for them.
All those drivers are required to now is to set .get to cpufreq_generic_get()
and set their clk pointer in policy->clk during ->init().
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are some parts of common kernel which would be using routines
like clk_get_rate() on some platforms. Currently, they wouldn't be
called for SA1100 boards, but they are needed for successful kernel
compilation.
Create a dummy clk_get_rate() routine for SA1100 which can be called
by the cpufreq core. More dummy routines might be added later if
necessary.
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
Reported-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When cpufreq_stats is compiled in as a module, cpufreq driver would
have already been registered. And so the CPUFREQ_CREATE_POLICY
notifiers wouldn't be called for it. Hence no sysfs entries for stats. :(
This patch calls cpufreq_stats_create_table() for each online CPU from
cpufreq_stats_init() and so if policy is already created for CPUx then
we will register sysfs stats for it.
When its not compiled as module, we will return early as policy wouldn't
be found for any of the CPUs.
Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We don't have code paths now where we need to do these two things
separately, so it is better do them in a single routine. Just as
they are allocated in a single routine.
Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Either CPUs are hot-unplugged or suspend/resume occurs, cpufreq core
will send notifications to cpufreq-stats and stats structure and sysfs
entries would be correctly handled..
And so we don't actually need hotcpu notifiers in cpufreq-stats anymore.
We were only handling cpu hot-unplug events here and that are already
taken care of by POLICY notifiers.
Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are several problems with cpufreq stats in the way it handles
cpufreq_unregister_driver() and suspend/resume..
- We must not lose data collected so far when suspend/resume happens
and so stats directories must not be removed/allocated during these
operations, which is done currently.
- cpufreq_stat has registered notifiers with both cpufreq and hotplug.
It adds sysfs stats directory with a cpufreq notifier: CPUFREQ_NOTIFY
and removes this directory with a notifier from hotplug core.
In case cpufreq_unregister_driver() is called (on rmmod cpufreq driver),
stats directories per cpu aren't removed as CPUs are still online. The
only call cpufreq_stats gets is cpufreq_stats_update_policy_cpu() for
all CPUs except the last of each policy. And pointer to stat information
is stored in the entry for last CPU in the per-cpu cpufreq_stats_table.
But policy structure would be freed inside cpufreq core and so that will
result in memory leak inside cpufreq stats (as we are never freeing
memory for stats).
Now if we again insert the module cpufreq_register_driver() will be
called and we will again allocate stats data and put it on for first
CPU of every policy. In case we only have a single CPU per policy, we
will return with a error from cpufreq_stats_create_table() due to this
code:
if (per_cpu(cpufreq_stats_table, cpu))
return -EBUSY;
And so probably cpufreq stats directory would not show up anymore (as
it was added inside last policies->kobj which doesn't exist anymore).
I haven't tested it, though. Also the values in stats files wouldn't
be refreshed as we are using the earlier stats structure.
- CPUFREQ_NOTIFY is called from cpufreq_set_policy() which is called for
scenarios where we don't really want cpufreq_stat_notifier_policy() to get
called. For example whenever we are changing anything related to a policy:
min/max/current freq, etc. cpufreq_set_policy() is called and so cpufreq
stats is notified. Where we don't do any useful stuff other than simply
returning with -EBUSY from cpufreq_stats_create_table(). And so this
isn't the right notifier that cpufreq stats..
Due to all above reasons this patch does following changes:
- Add new notifiers CPUFREQ_CREATE_POLICY and CPUFREQ_REMOVE_POLICY,
which are only called when policy is created/destroyed. They aren't
called for suspend/resume paths..
- Use these notifiers in cpufreq_stat_notifier_policy() to create/destory
stats sysfs entries. And so cpufreq_unregister_driver() or suspend/resume
shouldn't be a problem for cpufreq_stats.
- Return early from cpufreq_stat_cpu_callback() for suspend/resume sequence,
so that we don't free stats structure.
Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The only caller of speedstep_get_state() was removed in commit d4019f0a92ab
("cpufreq: move freq change notifications to cpufreq core"). So building
speedstep-smi.o now triggers a GCC warning:
drivers/cpufreq/speedstep-smi.c:148:12: warning: 'speedstep_get_state' defined but not used [-Wunused-function]
Remove this unused function.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* acpi-modules:
platform: introduce OF style 'modalias' support for platform bus
ACPI: fix module autoloading for ACPI enumerated devices
ACPI: add module autoloading support for ACPI enumerated devices
ACPI: fix create_modalias() return value handling
Fix a problem that, the platform bus supports the OF style modalias
in .uevent() call, but not in its device 'modalias' sysfs attribute.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This tool is designed to assist kernel and OS developers in optimizing
their linux stack's suspend/resume time. Using a kernel image built with a
few extra options enabled, the tool will execute a suspend and will
capture dmesg and ftrace data until resume is complete. This data is
transformed into a device timeline and a callgraph to give a quick and
detailed view of which devices and callbacks are taking the most time in
suspend/resume. The output is a single html file which can be viewed in
firefox or chrome.
References: https://01.org/suspendresume
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPI enumerated devices has ACPI style _HID and _CID strings,
all of these strings can be used for both driver loading and matching.
Currently, in Platform, I2C and SPI bus, the ACPI style driver matching
is supported by invoking acpi_driver_match_device() in bus .match() callback.
But, the module autoloading is still broken.
For example, there is any ACPI device with _HID "INTABCD" that is
enumerated to platform bus, and we have a driver that can probe it.
The driver exports its module_alias as "acpi:INTABCD" use the following code
static const struct acpi_device_id xxx_acpi_match[] = {
{ "INTABCD", 0 },
{ }
};
MODULE_DEVICE_TABLE(acpi, xxx_acpi_match);
But, unfortunately, the device' modalias is shown as "platform:INTABCD:00",
please refer to modalias_show() and platform_uevent() in
drivers/base/platform.c.
This results in that the driver will not be loaded automatically when the
device node is created, because their modalias do not match.
This also applies to I2C and SPI bus.
With this patch, the device' modalias will be shown as "acpi:INTABCD" as well.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Mark Brown <broonie@linaro.org>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
An ACPI enumerated device may have its compatible id strings.
To support the compatible ACPI ids (acpi_device->pnp.ids),
we introduced acpi_driver_match_device() to match
the driver->acpi_match_table and acpi_device->pnp.ids.
For those drivers, MODULE_DEVICE_TABLE(acpi, xxx) is used to
exports the driver module alias in the format of
"acpi:device_compatible_ids".
But in the mean time, the current code does not export the
ACPI compatible strings as part of the module_alias for the
ACPI enumerated devices, which will break the module autoloading.
Take the following piece of code for example,
static const struct acpi_device_id xxx_acpi_match[] = {
{ "INTABCD", 0 },
{ }
};
MODULE_DEVICE_TABLE(acpi, xxx_acpi_match);
If this piece of code is used in a platform driver for
an ACPI enumerated platform device, the platform driver module_alias
is "acpi:INTABCD", but the uevent attribute of its platform device node
is "platform:INTABCD:00" (PREFIX:platform_device->name).
If this piece of code is used in an i2c driver for an ACPI enumerated
i2c device, the i2c driver module_alias is "acpi:INTABCD", but
the uevent of its i2c device node is "i2c:INTABCD:00" (PREFIX:i2c_client->name).
If this piece of code is used in an spi driver for an ACPI enumerated
spi device, the spi driver module_alias is "acpi:INTABCD", but
the uevent of its spi device node is "spi:INTABCD" (PREFIX:spi_device->modalias).
The reason why the module autoloading is not broken for now is that
the uevent file of the ACPI device node is "acpi:INTABCD".
Thus it is the ACPI device node creation that loads the platform/i2c/spi driver.
So this is a problem that will affect us the day when the ACPI bus
is removed from device model.
This patch introduces two new APIs,
one for exporting ACPI ids in uevent MODALIAS field,
and another for exporting ACPI ids in device' modalias sysfs attribute.
For any bus that supports ACPI enumerated devices, it needs to invoke
these two functions for their uevent and modalias attribute.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, create_modalias() handles the output truncated case in
an improper way (return -EINVAL).
Plus, acpi_device_uevent() and acpi_device_modalias_show() do
improper check for the create_modalias() return value as well.
This patch fixes create_modalias() to
return -EINVAL if there is an output error,
return -ENOMEM if the output is truncated,
and also fixes both acpi_device_uevent() and acpi_device_modalias_show()
to do proper return value check.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch updates MAINTAINERS file, adding tools/power/acpi for ACPI and
ACPICA.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch enables ACPI tool build in the tools/Makefile, so that the ACPI
tools can be built/cleaned/installed along with other tools.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch cleans up old tools/power/acpi Makefile for further porting,
make it compiled in a similar way as the other tools. No functional
changes.
The CFLAGS is modified as follows:
1. Previous cc flags:
-Wall -Wstrict-prototypes -Wdeclaration-after-statement -Os -s \
-D_LINUX -DDEFINE_ALTERNATE_TYPES -I../../../include
2. Current cc flags:
DEBUG=false:
-D_LINUX -DDEFINE_ALTERNATE_TYPES -I../../../include -Wall \
-Wstrict-prototypes -Wdeclaration-after-statement -Os \
-fomit-frame-pointer
Normal:
-D_LINUX -DDEFINE_ALTERNATE_TYPES -I../../../include -Wall \
-Wstrict-prototypes -Wdeclaration-after-statement -O1 -g -DDEBUG
There is only one difference: -fomit-frame-pointer.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This is a variant patch from Rafael J. Wysocki's
ACPI / init: Run acpi_early_init() before efi_enter_virtual_mode()
According to Matt Fleming, if acpi_early_init() was executed before
efi_enter_virtual_mode(), the EFI initialization could benefit from
it, so Rafael's patch makes that happen.
And, we want accessing ACPI TAD device to set system clock, so move
acpi_early_init() before timekeeping_init(). This final position is
also before efi_enter_virtual_mode().
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When booting a kexec/kdump kernel on a system that has specific memory
hotplug regions the boot will fail with warnings like:
swapper/0: page allocation failure: order:9, mode:0x84d0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-65.el7.x86_64 #1
Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S013.032920111005 03/29/2011
0000000000000000 ffff8800341bd8c8 ffffffff815bcc67 ffff8800341bd950
ffffffff8113b1a0 ffff880036339b00 0000000000000009 00000000000084d0
ffff8800341bd950 ffffffff815b87ee 0000000000000000 0000000000000200
Call Trace:
[<ffffffff815bcc67>] dump_stack+0x19/0x1b
[<ffffffff8113b1a0>] warn_alloc_failed+0xf0/0x160
[<ffffffff815b87ee>] ? __alloc_pages_direct_compact+0xac/0x196
[<ffffffff8113f14f>] __alloc_pages_nodemask+0x7ff/0xa00
[<ffffffff815b417c>] vmemmap_alloc_block+0x62/0xba
[<ffffffff815b41e9>] vmemmap_alloc_block_buf+0x15/0x3b
[<ffffffff815b1ff6>] vmemmap_populate+0xb4/0x21b
[<ffffffff815b461d>] sparse_mem_map_populate+0x27/0x35
[<ffffffff815b400f>] sparse_add_one_section+0x7a/0x185
[<ffffffff815a1e9f>] __add_pages+0xaf/0x240
[<ffffffff81047359>] arch_add_memory+0x59/0xd0
[<ffffffff815a21d9>] add_memory+0xb9/0x1b0
[<ffffffff81333b9c>] acpi_memory_device_add+0x18d/0x26d
[<ffffffff81309a01>] acpi_bus_device_attach+0x7d/0xcd
[<ffffffff8132379d>] acpi_ns_walk_namespace+0xc8/0x17f
[<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
[<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
[<ffffffff81323c8c>] acpi_walk_namespace+0x95/0xc5
[<ffffffff8130a6d6>] acpi_bus_scan+0x8b/0x9d
[<ffffffff81a2019a>] acpi_scan_init+0x63/0x160
[<ffffffff81a1ffb5>] acpi_init+0x25d/0x2a6
[<ffffffff81a1fd58>] ? acpi_sleep_proc_init+0x2a/0x2a
[<ffffffff810020e2>] do_one_initcall+0xe2/0x190
[<ffffffff819e20c4>] kernel_init_freeable+0x17c/0x207
[<ffffffff819e18d0>] ? do_early_param+0x88/0x88
[<ffffffff8159fea0>] ? rest_init+0x80/0x80
[<ffffffff8159feae>] kernel_init+0xe/0x180
[<ffffffff815cca2c>] ret_from_fork+0x7c/0xb0
[<ffffffff8159fea0>] ? rest_init+0x80/0x80
Mem-Info:
Node 0 DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Node 0 DMA32 per-cpu:
CPU 0: hi: 42, btch: 7 usd: 0
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
free:872 slab_reclaimable:13 slab_unreclaimable:1880
mapped:0 shmem:0 pagetables:0 bounce:0
free_cma:0
because the system has run out of memory at boot time. This occurs
because of the following sequence in the boot:
Main kernel boots and sets E820 map. The second kernel is booted with a
map generated by the kdump service using memmap= and memmap=exactmap.
These parameters are added to the kernel parameters of the kexec/kdump
kernel. The kexec/kdump kernel has limited memory resources so as not
to severely impact the main kernel.
The system then panics and the kdump/kexec kernel boots (which is a
completely new kernel boot). During this boot ACPI is initialized and the
kernel (as can be seen above) traverses the ACPI namespace and finds an
entry for a memory device to be hotadded.
ie)
[<ffffffff815a1e9f>] __add_pages+0xaf/0x240
[<ffffffff81047359>] arch_add_memory+0x59/0xd0
[<ffffffff815a21d9>] add_memory+0xb9/0x1b0
[<ffffffff81333b9c>] acpi_memory_device_add+0x18d/0x26d
[<ffffffff81309a01>] acpi_bus_device_attach+0x7d/0xcd
[<ffffffff8132379d>] acpi_ns_walk_namespace+0xc8/0x17f
[<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
[<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
[<ffffffff81323c8c>] acpi_walk_namespace+0x95/0xc5
[<ffffffff8130a6d6>] acpi_bus_scan+0x8b/0x9d
[<ffffffff81a2019a>] acpi_scan_init+0x63/0x160
[<ffffffff81a1ffb5>] acpi_init+0x25d/0x2a6
At this point the kernel adds page table information and the the kexec/kdump
kernel runs out of memory.
This can also be reproduced by using the memmap=exactmap and mem=X
parameters on the main kernel and booting.
This patchset resolves the problem by adding a kernel parameter,
acpi_no_memhotplug, to disable ACPI memory hotplug.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If clk_enable() fails, then print a message so that the user can see
what is happening instead of silently failing to enable the clock.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Ian Molton <ian.molton@codethink.co.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The clk_enable() call in the pm_clk_resume() call returns an error
that is not being checked. If clk_enable() fails then we should
not set the state of the clock to PCE_STATUS_ENABLED.
Note, the issue of warning the user if this fails has not been
addressed in this patch as this is not the only place the driver
calls clk_enable().
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Ian Molton <ian.molton@codethink.co.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The drivers/base/power/clock_ops.c file is causing warnings from
the clock driver (as shown below) due to failing to do a clk_prepare()
call before enabling a clock. It also fails to check the balance of
prepare/unprepare as __pm_clk_remove() do clk_disable_unprepare() call.
This bug has probably been in since commit b2476490e ("clk: introduce
the common clock framework") as the warning was part of the original
commit. It is strange that it has not been noticed (although this has
also been coupled with a failure for certain SH builds to not build the
necessary glue to use this method of controlling the clocks).
In summary, this is probably needed in several stable branches but need
advice on which ones.
On the Renesas Lager board, this causes numerous warnings of the following
and even worse the clock system will not enable clocks, causing drivers
that are in development to fail to work:
WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:883 __clk_enable+0x2c/0xa0()
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Ian Molton <ian.molton@codethink.co.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pnp:
PNPBIOS: check return value of pnp_add_device()
PNP: Mark the function pnp_build_option() as static in resource.c
PNP / card: add missing put_device() call
PNPACPI: check return value of pnp_add_device()
* acpi-gpe:
ACPI / EC: disable GPE before removing GPE handler
ACPI / Button: Fix enabling button GPEs twice
* acpi-video:
ACPI: Blacklist Win8 OSI for some HP laptop 2013 models
ACPI / video: Fix typo in video_detect.c
* acpi-thermal:
ACPI / thermal: remove const from thermal_zone_device_ops declaration
* acpi-processor:
ACPI / scan: bail out early if failed to parse APIC ID for CPU
* acpi-sleep:
ACPI / sleep: remove panic in case hardware has changed after S4
* acpica: (21 commits)
ACPICA: Update version to 20131218.
ACPICA: Utilities: Cleanup declarations of the acpi_gbl_debug_file global.
ACPICA: Linuxize: Cleanup spaces after special macro invocations.
ACPICA: Interpreter: Add additional debug info for an error case.
ACPICA: Update ACPI example code to make it an actual working program.
ACPICA: Add an error message if the Debugger fails initialization.
ACPICA: Conditionally define a local variable that is used for debug only.
ACPICA: Parser: Updates/fixes for debug output.
ACPICA: Enhance ACPI warning for memory/IO address conflicts.
ACPICA: Update several debug statements - no functional change.
ACPICA: Improve exception handling for GPE block installation.
ACPICA: Add helper macros to extract bus/segment numbers from HEST table.
ACPICA: Tables: Add full support for the PCCT table, update table definition.
ACPICA: Tables: Add full support for the DBG2 table.
ACPICA: Add option to favor 32-bit FADT addresses.
ACPICA: Cleanup the option of forcing the use of the RSDT.
ACPICA: Back port and refine validation of the XSDT root table.
ACPICA: Linux Header: Remove unused OSL prototypes.
ACPICA: Remove unused ACPI_FREE_BUFFER macro. No functional change.
ACPICA: Disassembler: Improve pathname support for emitted External() statements.
...
* acpi-cleanup: (22 commits)
ACPI / tables: Return proper error codes from acpi_table_parse() and fix comment.
ACPI / tables: Check if id is NULL in acpi_table_parse()
ACPI / proc: Include appropriate header file in proc.c
ACPI / EC: Remove unused functions and add prototype declaration in internal.h
ACPI / dock: Include appropriate header file in dock.c
ACPI / PCI: Include appropriate header file in pci_link.c
ACPI / PCI: Include appropriate header file in pci_slot.c
ACPI / EC: Mark the function acpi_ec_add_debugfs() as static in ec_sys.c
ACPI / NVS: Include appropriate header file in nvs.c
ACPI / OSL: Mark the function acpi_table_checksum() as static
ACPI / processor: initialize a variable to silence compiler warning
ACPI / processor: use ACPI_COMPANION() to get ACPI device
ACPI: correct minor typos
ACPI / sleep: Drop redundant acpi_disabled check
ACPI / dock: Drop redundant acpi_disabled check
ACPI / table: Replace '1' with specific error return values
ACPI: remove trailing whitespace
ACPI / IBFT: Fix incorrect <acpi/acpi.h> inclusion in iSCSI boot firmware module
ACPI / i915: Fix incorrect <acpi/acpi.h> inclusions via <linux/acpi_io.h>
SFI / ACPI: Fix warnings reported during builds with W=1
...
Conflicts:
drivers/acpi/nvs.c
drivers/hwmon/asus_atk0110.c
While running stress tests on adding and deleting ftrace instances I hit
this bug:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: selinux_inode_permission+0x85/0x160
PGD 63681067 PUD 7ddbe067 PMD 0
Oops: 0000 [#1] PREEMPT
CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20
Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000
RIP: 0010:[<ffffffff812d8bc5>] [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160
RSP: 0018:ffff88007ddb1c48 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840
RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000
RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54
R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000
R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000
FS: 00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0
Call Trace:
security_inode_permission+0x1c/0x30
__inode_permission+0x41/0xa0
inode_permission+0x18/0x50
link_path_walk+0x66/0x920
path_openat+0xa6/0x6c0
do_filp_open+0x43/0xa0
do_sys_open+0x146/0x240
SyS_open+0x1e/0x20
system_call_fastpath+0x16/0x1b
Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff
RIP selinux_inode_permission+0x85/0x160
CR2: 0000000000000020
Investigating, I found that the inode->i_security was NULL, and the
dereference of it caused the oops.
in selinux_inode_permission():
isec = inode->i_security;
rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);
Note, the crash came from stressing the deletion and reading of debugfs
files. I was not able to recreate this via normal files. But I'm not
sure they are safe. It may just be that the race window is much harder
to hit.
What seems to have happened (and what I have traced), is the file is
being opened at the same time the file or directory is being deleted.
As the dentry and inode locks are not held during the path walk, nor is
the inodes ref counts being incremented, there is nothing saving these
structures from being discarded except for an rcu_read_lock().
The rcu_read_lock() protects against freeing of the inode, but it does
not protect freeing of the inode_security_struct. Now if the freeing of
the i_security happens with a call_rcu(), and the i_security field of
the inode is not changed (it gets freed as the inode gets freed) then
there will be no issue here. (Linus Torvalds suggested not setting the
field to NULL such that we do not need to check if it is NULL in the
permission check).
Note, this is a hack, but it fixes the problem at hand. A real fix is
to restructure the destroy_inode() to call all the destructor handlers
from the RCU callback. But that is a major job to do, and requires a
lot of work. For now, we just band-aid this bug with this fix (it
works), and work on a more maintainable solution in the future.
Link: http://lkml.kernel.org/r/20140109101932.0508dec7@gandalf.local.home
Link: http://lkml.kernel.org/r/20140109182756.17abaaa8@gandalf.local.home
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We see General Protection Fault on RSI in copy_page_rep: that RSI is
what you get from a NULL struct page pointer.
RIP: 0010:[<ffffffff81154955>] [<ffffffff81154955>] copy_page_rep+0x5/0x10
RSP: 0000:ffff880136e15c00 EFLAGS: 00010286
RAX: ffff880000000000 RBX: ffff880136e14000 RCX: 0000000000000200
RDX: 6db6db6db6db6db7 RSI: db73880000000000 RDI: ffff880dd0c00000
RBP: ffff880136e15c18 R08: 0000000000000200 R09: 000000000005987c
R10: 000000000005987c R11: 0000000000000200 R12: 0000000000000001
R13: ffffea00305aa000 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f195752f700(0000) GS:ffff880c7fc20000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000093010000 CR3: 00000001458e1000 CR4: 00000000000027e0
Call Trace:
copy_user_huge_page+0x93/0xab
do_huge_pmd_wp_page+0x710/0x815
handle_mm_fault+0x15d8/0x1d70
__do_page_fault+0x14d/0x840
do_page_fault+0x2f/0x90
page_fault+0x22/0x30
do_huge_pmd_wp_page() tests is_huge_zero_pmd(orig_pmd) four times: but
since shrink_huge_zero_page() can free the huge_zero_page, and we have
no hold of our own on it here (except where the fourth test holds
page_table_lock and has checked pmd_same), it's possible for it to
answer yes the first time, but no to the second or third test. Change
all those last three to tests for NULL page.
(Note: this is not the same issue as trinity's DEBUG_PAGEALLOC BUG
in copy_page_rep with RSI: ffff88009c422000, reported by Sasha Levin
in https://lkml.org/lkml/2013/3/29/103. I believe that one is due
to the source page being split, and a tail page freed, while copy
is in progress; and not a problem without DEBUG_PAGEALLOC, since
the pmd_same check will prevent a miscopy from being made visible.)
Fixes: 97ae17497e99 ("thp: implement refcounting for huge zero page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v3.10 v3.11 v3.12
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When queue_mode is NULL_Q_MQ and null_blk is being removed,
blk_cleanup_queue() isn't called to cleanup queue, so the queue
allocated won't be freed.
This patch calls blk_cleanup_queue() for MQ to drain all pending
requests first and release the reference counter of queue kobject, then
blk_mq_free_queue() will be called in queue kobject's release handler
when queue kobject's reference counter drops to zero.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds a "status" attribute for an ACPI device. This status
attribute shows the value of the _STA object. The _STA object returns
current status of an ACPI device: enabled, disabled, functioning,
present.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[rjw: Subject and changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some BIOSes change hardware based on the state of
a laptop's lid. If the lid is closed, the touchpad is
disabled and the checksum changes. Windows 8 no longer
aborts resume if the checksum has changed.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
[rjw: Use pr_crit() for the message and don't break the string]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Enhance ACPI CPU hotplug driver to print clear error message and
bail out early if BIOS returns wrong value in ACPI MADT table or
_MAT method. Otherwise it will add the CPU device even if failed
to get APIC ID and fails any operations against sysfs interface:
/sys/devices/system/cpu/cpux/online
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
intel_idle driver sets dev->state_count to drv->state_count so
the default dev->state_count initialization in cpuidle_enable_device()
(called from cpuidle_register_device()) can be used instead.
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Len Brown <lenb@kernel.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>