-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl+GrkEACgkQONu9yGCS
aT58SQ/+PLxjpiE1Mn0CRjYBclZXvuhrJOP0KOqLNIM7/eXn3pbxS9wKjn9ykTM+
KTa5s1y0IXDaWYs4lsEnfIKKXDmLHfwnj959StIR6gW+16/cSqppKpiq14MPhkOE
WMLvvXOUKfAGMCEzsCoof6Qu/in302DoBK6Nvec53PFeAl+yWaJV4dnIGJpZQtZF
O2A/gVL2Fqvk2O1v6wRqWfaBPFBNePOCdMcGrTWwH8JnoSuk8VGad6AWvOTakbny
xeRyzKhoPGXiKCiwbNU71IhXO6X5fG7Q/bnS+uZ91186FsHUEMRQeDWPWqz3OqEw
Xa/1SSSK0bkFzLn8U0XF0Xe8Txadr/ZDc2EeRlFe0pUVO/kBrGbnT9u7erv3/Ry3
DPPI/JeHg2onsVlnHZLAqFegA6JpGr8FiWQxgMIQ0CtklxVM123dYw8XNXS8Zr/c
qeWKGtcpacXR+6fogtPF7HEHma59+XP2hawICgH25JOKa6MeqsaQdM5YAS2DymVV
fhzfEj1a851KjesPM/axbQifJVjgDud2vbbv19hVMaWWDLXH/vhB+QNGeI3wAjJn
0QuUe5kUASFy1HrleCmFQUEjOIxTKE87l2vEHzkkOnjgmWpNF/T+SR5MutCrhV8h
9sl3QIT7zqIYci+x8oK8E2X9d2bGmGN30NfqgHo+iL47DZXKSCc=
=tX82
-----END PGP SIGNATURE-----
Merge 4.14.201 into android-4.14-stable
Changes in 4.14.201
vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock
vsock/virtio: stop workers during the .remove()
vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock()
net: virtio_vsock: Enhance connection semantics
USB: gadget: f_ncm: Fix NDP16 datagram validation
gpio: tc35894: fix up tc35894 interrupt configuration
Input: i8042 - add nopnp quirk for Acer Aspire 5 A515
drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config
drivers/net/wan/hdlc_fr: Add needed_headroom for PVC devices
drm/sun4i: mixer: Extend regmap max_register
net: dec: de2104x: Increase receive ring size for Tulip
rndis_host: increase sleep time in the query-response loop
drivers/net/wan/lapbether: Make skb->protocol consistent with the header
drivers/net/wan/hdlc: Set skb->protocol before transmitting
mac80211: do not allow bigger VHT MPDUs than the hardware supports
spi: fsl-espi: Only process interrupts for expected events
nvme-fc: fail new connections to a deleted host or remote port
pinctrl: mvebu: Fix i2c sda definition for 98DX3236
nfs: Fix security label length not being reset
clk: samsung: exynos4: mark 'chipid' clock as CLK_IGNORE_UNUSED
iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()
i2c: cpm: Fix i2c_ram structure
Input: trackpoint - enable Synaptics trackpoints
random32: Restore __latent_entropy attribute on net_rand_state
net/packet: fix overflow in tpacket_rcv
epoll: do not insert into poll queues until all sanity checks are done
epoll: replace ->visited/visited_list with generation count
epoll: EPOLL_CTL_ADD: close the race in decision to take fast path
ep_create_wakeup_source(): dentry name can change under you...
netfilter: ctnetlink: add a range check for l3/l4 protonum
drm/syncobj: Fix drm_syncobj_handle_to_fd refcount leak
fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h
Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts
Revert "ravb: Fixed to be able to unload modules"
fbcon: Fix global-out-of-bounds read in fbcon_get_font()
net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key()
usermodehelper: reset umask to default before executing user process
platform/x86: thinkpad_acpi: initialize tp_nvram_state variable
platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse
driver core: Fix probe_count imbalance in really_probe()
perf top: Fix stdio interface input handling with glibc 2.28+
mtd: rawnand: sunxi: Fix the probe error path
Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space
ftrace: Move RCU is watching check after recursion check
macsec: avoid use-after-free in macsec_handle_frame()
mm/khugepaged: fix filemap page_to_pgoff(page) != offset
cifs: Fix incomplete memory allocation on setxattr path
i2c: meson: fix clock setting overwrite
sctp: fix sctp_auth_init_hmacs() error path
team: set dev->needed_headroom in team_setup_by_port()
net: team: fix memory leak in __team_options_register
openvswitch: handle DNAT tuple collision
drm/amdgpu: prevent double kfree ttm->sg
xfrm: clone XFRMA_REPLAY_ESN_VAL in xfrm_do_migrate
xfrm: clone XFRMA_SEC_CTX in xfrm_do_migrate
xfrm: clone whole liftime_cur structure in xfrm_do_migrate
net: stmmac: removed enabling eee in EEE set callback
platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP
xfrm: Use correct address family in xfrm_state_find
bonding: set dev->needed_headroom in bond_setup_by_slave()
mdio: fix mdio-thunder.c dependency & build error
net: usb: ax88179_178a: fix missing stop entry in driver_info
rxrpc: Fix rxkad token xdr encoding
rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read()
rxrpc: Fix some missing _bh annotations on locking conn->state_lock
rxrpc: Fix server keyring leak
perf: Fix task_function_call() error handling
mmc: core: don't set limits.discard_granularity as 0
mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails
Linux 4.14.201
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iffb5ee67b94a852de1bd865817587bc27320f28b
commit 3701cb59d892b88d569427586f01491552f377b1 upstream.
or get freed, for that matter, if it's a long (separately stored)
name.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe0a916c1eae8e17e86c3753d13919177d63ed7e upstream.
Checking for the lack of epitems refering to the epoll we want to insert into
is not enough; we might have an insertion of that epoll into another one that
has already collected the set of files to recheck for excessive reverse paths,
but hasn't gotten to creating/inserting the epitem for it.
However, any such insertion in progress can be detected - it will update the
generation count in our epoll when it's done looking through it for files
to check. That gets done under ->mtx of our epoll and that allows us to
detect that safely.
We are *not* holding epmutex here, so the generation count is not stable.
However, since both the update of ep->gen by loop check and (later)
insertion into ->f_ep_link are done with ep->mtx held, we are fine -
the sequence is
grab epmutex
bump loop_check_gen
...
grab tep->mtx // 1
tep->gen = loop_check_gen
...
drop tep->mtx // 2
...
grab tep->mtx // 3
...
insert into ->f_ep_link
...
drop tep->mtx // 4
bump loop_check_gen
drop epmutex
and if the fastpath check in another thread happens for that
eventpoll, it can come
* before (1) - in that case fastpath is just fine
* after (4) - we'll see non-empty ->f_ep_link, slow path
taken
* between (2) and (3) - loop_check_gen is stable,
with ->mtx providing barriers and we end up taking slow path.
Note that ->f_ep_link emptiness check is slightly racy - we are protected
against insertions into that list, but removals can happen right under us.
Not a problem - in the worst case we'll end up taking a slow path for
no good reason.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 18306c404abe18a0972587a6266830583c60c928 upstream.
removes the need to clear it, along with the races.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl9ZCusACgkQONu9yGCS
aT7qhxAA1ow01f4+8syC7RlzqjoSBWK8kp+e9DqvjLd1jCDAJedZts2SwTTL+wh4
agJ4JzmWhZhYka4fmv03kfuCL7niOQNKfmGNIfREnQaznG75C5w5Esc36cxxk6tu
DzwxHoGSs9KqSAa7mLxI6tkEJfxy8c0OANX9tCypH6kPhBmFVhq6bNr1fxg2DE8I
vLTJwTHD6fUBAPUsBnxF6Mx8jOyqqU7tk94eqU3DeWQygR8ZXIaSESNRxQWe1CVm
UWHb5BcBME8UgML9hPkSpcz6FL9qTjRZxtHW+hrzZiycW1gpsoGNC6S1EvoTDcsX
zBjclMF2CEIGRY/MTyl9K2XNxgAhLYNtlFSf+xXm1p2IaAaN2+qLwdGLjR/M2vak
/LGQRdfLWzO3KxoXgOPnsdTnM8tFVUSDgq4F7Dkwhvu6y4M2iaF0mBSWg1EqIbAZ
vCfIarjbsLD74Bu1K5dMUi6ZnFqLm2jdbgwL7wH0kgKToKWqox5ds4+Gods9WKd7
VJfY29/FZ2v/2/C9Ia5dDf0wLTF9WQIeuk+BlUu/H3XQLWchUFT/BLDX7vAqCw08
21dO9dP0sGgdNpZSJRR11QUn4tta13XLPuDIquM4IWa24ET/4RpYdHIXFhPfmYyz
H84Xep/Joc4+rmUZVsGRSMtWooxpZU8vhcQ23uNhsiTxMANBFUE=
=rMMq
-----END PGP SIGNATURE-----
Merge 4.14.197 into android-4.14-stable
Changes in 4.14.197
HID: core: Correctly handle ReportSize being zero
HID: core: Sanitize event code and type when mapping input
perf record/stat: Explicitly call out event modifiers in the documentation
drm/msm: add shutdown support for display platform_driver
hwmon: (applesmc) check status earlier.
nvmet: Disable keep-alive timer when kato is cleared to 0h
ceph: don't allow setlease on cephfs
cpuidle: Fixup IRQ state
s390: don't trace preemption in percpu macros
xen/xenbus: Fix granting of vmalloc'd memory
dmaengine: of-dma: Fix of_dma_router_xlate's of_dma_xlate handling
batman-adv: Avoid uninitialized chaddr when handling DHCP
batman-adv: Fix own OGM check in aggregated OGMs
batman-adv: bla: use netif_rx_ni when not in interrupt context
dmaengine: at_hdmac: check return value of of_find_device_by_node() in at_dma_xlate()
MIPS: mm: BMIPS5000 has inclusive physical caches
MIPS: BMIPS: Also call bmips_cpu_setup() for secondary cores
netfilter: nf_tables: add NFTA_SET_USERDATA if not null
netfilter: nf_tables: incorrect enum nft_list_attributes definition
netfilter: nf_tables: fix destination register zeroing
net: hns: Fix memleak in hns_nic_dev_probe
net: systemport: Fix memleak in bcm_sysport_probe
ravb: Fixed to be able to unload modules
net: arc_emac: Fix memleak in arc_mdio_probe
dmaengine: pl330: Fix burst length if burst size is smaller than bus width
gtp: add GTPA_LINK info to msg sent to userspace
bnxt_en: Check for zero dir entries in NVRAM.
bnxt_en: Fix PCI AER error recovery flow
nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()'
perf tools: Correct SNOOPX field offset
net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
fix regression in "epoll: Keep a reference on files added to the check list"
tg3: Fix soft lockup when tg3_reset_task() fails.
iommu/vt-d: Serialize IOMMU GCMD register modifications
thermal: ti-soc-thermal: Fix bogus thermal shutdowns for omap4430
include/linux/log2.h: add missing () around n in roundup_pow_of_two()
btrfs: drop path before adding new uuid tree entry
btrfs: Remove redundant extent_buffer_get in get_old_root
btrfs: Remove extraneous extent_buffer_get from tree_mod_log_rewind
btrfs: set the lockdep class for log tree extent buffers
uaccess: Add non-pagefault user-space read functions
uaccess: Add non-pagefault user-space write function
btrfs: fix potential deadlock in the search ioctl
net: usb: qmi_wwan: add Telit 0x1050 composition
usb: qmi_wwan: add D-Link DWM-222 A2 device ID
ALSA: ca0106: fix error code handling
ALSA: pcm: oss: Remove superfluous WARN_ON() for mulaw sanity check
ALSA: hda/hdmi: always check pin power status in i915 pin fixup
ALSA: firewire-digi00x: exclude Avid Adrenaline from detection
affs: fix basic permission bits to actually work
block: allow for_each_bvec to support zero len bvec
block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h>
libata: implement ATA_HORKAGE_MAX_TRIM_128M and apply to Sandisks
dm cache metadata: Avoid returning cmd->bm wild pointer on error
dm thin metadata: Avoid returning cmd->bm wild pointer on error
mm: slub: fix conversion of freelist_corrupted()
KVM: arm64: Add kvm_extable for vaxorcism code
KVM: arm64: Defer guest entry when an asynchronous exception is pending
KVM: arm64: Survive synchronous exceptions caused by AT instructions
KVM: arm64: Set HCR_EL2.PTW to prevent AT taking synchronous exception
checkpatch: fix the usage of capture group ( ... )
mm/hugetlb: fix a race between hugetlb sysctl handlers
cfg80211: regulatory: reject invalid hints
net: usb: Fix uninit-was-stored issue in asix_read_phy_addr()
Linux 4.14.197
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I59da148f114d7f8fd84fc76c8178081ddaf26a49
[ Upstream commit 77f4689de17c0887775bb77896f4cc11a39bf848 ]
epoll_loop_check_proc() can run into a file already committed to destruction;
we can't grab a reference on those and don't need to add them to the set for
reverse path check anyway.
Tested-by: Marc Zyngier <maz@kernel.org>
Fixes: a9ed4a6560b8 ("epoll: Keep a reference on files added to the check list")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl9GHZYACgkQONu9yGCS
aT4yJQ//SYMo7u1etFoWpZw7QuWb7G4gMcSm4LEu5eBpJDg1B2QzTlN7zG1UGu4G
d5hFHAyUg8xDenIUo7oqqR4SjfIHyQhukiTabM6DZOfXSkW9cbHJYVXe77SR2+87
KT79vND4vvt1Cqpmw27ZAevBUk7Myq8bw5sB5Y+1F1Dg+Z+Ya19B6ds+w9EkWqh/
hEJxk2Xdn2JaNED7go32mspfQAfG1FlZ+LzXn1SdZ1vMd3NJWucRDGl1QopQ1NBT
W/9fol2r5CKvpmGGXRf3+6qymcaa1mxwMqs3fR5UrVkBcl4gnJzexyD08osBL/lJ
54TuJ2i4QbNwHP00YlVxjpH7sHOUiKXUA6ZXz3Br+eSACueWMMGQOyy3YT2hPM3y
FkQrrljsoj8iRP3IT+kB8OdfaVMeYxn6XjlEahXOr5vgVzRxmqHSEk9aWLP7sLbX
jxha0xh/sfW+PtLRa9OKlGU+AgutQSFC5YxCRglo12mMiZU6b+qddR9NYCQRYabz
idHVaZl8ByfaBTODA4HZvDnikDDOp/xqtxlDBbW5rGVMu2a6iksxda3H7Wpyvb1x
5y+UCNDlzNiehcFqL4lctx+symuVRZB9dfh/vq6D/Dv5aJ6jUPdjPZ9xKtJJF2ww
sgaWXhQFUA6TCIFFbfpN0gbr8CYxBUaOXNIfyj+qKAm1l9k4rZI=
=cZzK
-----END PGP SIGNATURE-----
Merge 4.14.195 into android-4.14-stable
Changes in 4.14.195
drm/vgem: Replace opencoded version of drm_gem_dumb_map_offset()
perf probe: Fix memory leakage when the probe point is not found
khugepaged: khugepaged_test_exit() check mmget_still_valid()
khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
powerpc/mm: Only read faulting instruction when necessary in do_page_fault()
powerpc: Allow 4224 bytes of stack expansion for the signal frame
btrfs: export helpers for subvolume name/id resolution
btrfs: don't show full path of bind mounts in subvol=
btrfs: Move free_pages_out label in inline extent handling branch in compress_file_range
btrfs: inode: fix NULL pointer dereference if inode doesn't need compression
btrfs: sysfs: use NOFS for device creation
romfs: fix uninitialized memory leak in romfs_dev_read()
kernel/relay.c: fix memleak on destroy relay channel
mm: include CMA pages in lowmem_reserve at boot
mm, page_alloc: fix core hung in free_pcppages_bulk()
ext4: fix checking of directory entry validity for inline directories
jbd2: add the missing unlock_buffer() in the error path of jbd2_write_superblock()
spi: Prevent adding devices below an unregistering controller
scsi: ufs: Add DELAY_BEFORE_LPM quirk for Micron devices
media: budget-core: Improve exception handling in budget_register()
rtc: goldfish: Enable interrupt in set_alarm() when necessary
media: vpss: clean up resources in init
Input: psmouse - add a newline when printing 'proto' by sysfs
m68knommu: fix overwriting of bits in ColdFire V3 cache control
xfs: fix inode quota reservation checks
jffs2: fix UAF problem
cpufreq: intel_pstate: Fix cpuinfo_max_freq when MSR_TURBO_RATIO_LIMIT is 0
scsi: libfc: Free skb in fc_disc_gpn_id_resp() for valid cases
virtio_ring: Avoid loop when vq is broken in virtqueue_poll
xfs: Fix UBSAN null-ptr-deref in xfs_sysfs_init
alpha: fix annotation of io{read,write}{16,32}be()
ext4: fix potential negative array index in do_split()
i40e: Set RX_ONLY mode for unicast promiscuous on VLAN
i40e: Fix crash during removing i40e driver
net: fec: correct the error path for regulator disable in probe
bonding: show saner speed for broadcast mode
bonding: fix a potential double-unregister
ASoC: msm8916-wcd-analog: fix register Interrupt offset
ASoC: intel: Fix memleak in sst_media_open
vfio/type1: Add proper error unwind for vfio_iommu_replay()
bonding: fix active-backup failover for current ARP slave
hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit()
net: dsa: b53: check for timeout
powerpc/pseries: Do not initiate shutdown when system is running on UPS
epoll: Keep a reference on files added to the check list
do_epoll_ctl(): clean the failure exits up a bit
mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
xen: don't reschedule in preemption off sections
clk: Evict unregistered clks from parent caches
KVM: arm/arm64: Don't reschedule in unmap_stage2_range()
Linux 4.14.195
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6c25044ba9166ec01723671d9cfa3fdf08ccc43f
commit a9ed4a6560b8562b7e2e2bed9527e88001f7b682 upstream.
When adding a new fd to an epoll, and that this new fd is an
epoll fd itself, we recursively scan the fds attached to it
to detect cycles, and add non-epool files to a "check list"
that gets subsequently parsed.
However, this check list isn't completely safe when deletions
can happen concurrently. To sidestep the issue, make sure that
a struct file placed on the check list sees its f_count increased,
ensuring that a concurrent deletion won't result in the file
disapearing from under our feet.
Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add an ID and a device pointer to 'struct wakeup_source'. Use them to to
expose wakeup sources statistics in sysfs under
/sys/class/wakeup/wakeup<ID>/*.
Co-developed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Co-developed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Tri Vo <trong@android.com>
Tested-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit c8377adfa78103be5380200eb9dab764d7ca890e)
[ Replaced ida_alloc()/ida_free( with) ida_simple_get()/ida_simple_remove() as
the former is not present in 4.14. ]
Bug: 129087298
Signed-off-by: Tri Vo <trong@google.com>
Change-Id: Iecd3412423f9d499981f44d3b69507eaa62a2cd9
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxjFHYACgkQONu9yGCS
aT5H0w/8D6sNHqN7RXJogmqrdCg6xzfwxdQs+56WWjOUrEjfzpT/x6c8+f/s8nuA
lvTO12FKspWxLnwYNx+hcLUnIzs1LX6XxH/ls1mHWVqQ11We/yrUhnpEFhcCLITn
n8fSx6OOI/MeblsvW6qkz+uIf7EwArvCdv956nSeF6UyJVXxe67kyJKZZYCsuvpL
GvBmMpr+REhKu170bHP7cS783iLN6HavEcuzxnjxh7T6+InTa1zJQQATL3NR/PWZ
Cy1RAwS3pSeGA6UqeXbslTPwuQ5MCCXucP3cGvAMRURxNEbUL44dBS1G3+HDurn6
OqXxO4u0mBhRabm4cFGXrFXqVJOCn0O2E1Otjat7S9To/bRINZsVkj+XjqLHyHuN
oPmsQUiRZ3iBOphr69WuxzjQRtHzkalJmo/poAdPhvA4OFqKB8n3Q4GLU1nqp4qx
EXJ7pQo2AanClkP2j7lIQtdM7n8lPyy80hnR484ysP0pZaH2ErGMrK60rzZLDDlY
Zk7i8uZ/jTrHsIXgWYL2F5Lz4St6MY49ePQaFR4Wjy7p76ZoPZMkYmxvA5ZrmOpT
UyggwNwT6kh0/yzXTZZ5O1N+7IUeam9Br+2UHxmpYXMs/P6xjW0YAczDZ9crqiV8
zys1u1gR+DKL/bw7JMEIMDsUZlhhzxmW9Eidfl7QlUpiYJNpM00=
=ShQi
-----END PGP SIGNATURE-----
Merge 4.14.99 into android-4.14
Changes in 4.14.99
drm/bufs: Fix Spectre v1 vulnerability
staging: iio: adc: ad7280a: handle error from __ad7280_read32()
drm/vgem: Fix vgem_init to get drm device available.
pinctrl: bcm2835: Use raw spinlock for RT compatibility
ASoC: Intel: mrfld: fix uninitialized variable access
gpu: ipu-v3: image-convert: Prevent race between run and unprepare
ath9k: dynack: use authentication messages for 'late' ack
scsi: lpfc: Correct LCB RJT handling
scsi: mpt3sas: Call sas_remove_host before removing the target devices
scsi: lpfc: Fix LOGO/PLOGI handling when triggerd by ABTS Timeout event
ARM: 8808/1: kexec:offline panic_smp_self_stop CPU
clk: boston: fix possible memory leak in clk_boston_setup()
dlm: Don't swamp the CPU with callbacks queued during recovery
x86/PCI: Fix Broadcom CNB20LE unintended sign extension (redux)
powerpc/pseries: add of_node_put() in dlpar_detach_node()
crypto: aes_ti - disable interrupts while accessing S-box
drm/vc4: ->x_scaling[1] should never be set to VC4_SCALING_NONE
serial: fsl_lpuart: clear parity enable bit when disable parity
ptp: check gettime64 return code in PTP_SYS_OFFSET ioctl
MIPS: Boston: Disable EG20T prefetch
staging:iio:ad2s90: Make probe handle spi_setup failure
fpga: altera-cvp: Fix registration for CvP incapable devices
Tools: hv: kvp: Fix a warning of buffer overflow with gcc 8.0.1
platform/chrome: don't report EC_MKBP_EVENT_SENSOR_FIFO as wakeup
staging: iio: ad7780: update voltage on read
usbnet: smsc95xx: fix rx packet alignment
drm/rockchip: fix for mailbox read size
ARM: OMAP2+: hwmod: Fix some section annotations
net/mlx5: EQ, Use the right place to store/read IRQ affinity hint
modpost: validate symbol names also in find_elf_symbol
perf tools: Add Hygon Dhyana support
soc/tegra: Don't leak device tree node reference
media: mtk-vcodec: Release device nodes in mtk_vcodec_init_enc_pm()
ptp: Fix pass zero to ERR_PTR() in ptp_clock_register
dmaengine: xilinx_dma: Remove __aligned attribute on zynqmp_dma_desc_ll
iio: adc: meson-saradc: check for devm_kasprintf failure
iio: adc: meson-saradc: fix internal clock names
iio: accel: kxcjk1013: Add KIOX010A ACPI Hardware-ID
media: adv*/tc358743/ths8200: fill in min width/height/pixelclock
ACPI: SPCR: Consider baud rate 0 as preconfigured state
staging: pi433: fix potential null dereference
f2fs: move dir data flush to write checkpoint process
f2fs: fix race between write_checkpoint and write_begin
f2fs: fix wrong return value of f2fs_acl_create
i2c: sh_mobile: add support for r8a77990 (R-Car E3)
arm64: io: Ensure calls to delay routines are ordered against prior readX()
sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN
soc: bcm: brcmstb: Don't leak device tree node reference
nfsd4: fix crash on writing v4_end_grace before nfsd startup
drm: Clear state->acquire_ctx before leaving drm_atomic_helper_commit_duplicated_state()
arm64: io: Ensure value passed to __iormb() is held in a 64-bit register
Thermal: do not clear passive state during system sleep
firmware/efi: Add NULL pointer checks in efivars API functions
s390/zcrypt: improve special ap message cmd handling
arm64: ftrace: don't adjust the LR value
ARM: dts: mmp2: fix TWSI2
x86/fpu: Add might_fault() to user_insn()
media: DaVinci-VPBE: fix error handling in vpbe_initialize()
smack: fix access permissions for keyring
usb: dwc3: Correct the logic for checking TRB full in __dwc3_prepare_one_trb()
usb: hub: delay hub autosuspend if USB3 port is still link training
timekeeping: Use proper seqcount initializer
usb: mtu3: fix the issue about SetFeature(U1/U2_Enable)
clk: sunxi-ng: a33: Set CLK_SET_RATE_PARENT for all audio module clocks
driver core: Move async_synchronize_full call
kobject: return error code if writing /sys/.../uevent fails
IB/hfi1: Unreserve a reserved request when it is completed
usb: dwc3: trace: add missing break statement to make compiler happy
pinctrl: sx150x: handle failure case of devm_kstrdup
iommu/amd: Fix amd_iommu=force_isolation
ARM: dts: Fix OMAP4430 SDP Ethernet startup
mips: bpf: fix encoding bug for mm_srlv32_op
media: coda: fix H.264 deblocking filter controls
ARM: dts: Fix up the D-Link DIR-685 MTD partition info
watchdog: renesas_wdt: don't set divider while watchdog is running
usb: dwc3: gadget: Disable CSP for stream OUT ep
iommu/arm-smmu: Add support for qcom,smmu-v2 variant
iommu/arm-smmu-v3: Use explicit mb() when moving cons pointer
sata_rcar: fix deferred probing
clk: imx6sl: ensure MMDC CH0 handshake is bypassed
cpuidle: big.LITTLE: fix refcount leak
OPP: Use opp_table->regulators to verify no regulator case
i2c-axxia: check for error conditions first
phy: sun4i-usb: add support for missing USB PHY index
udf: Fix BUG on corrupted inode
switchtec: Fix SWITCHTEC_IOCTL_EVENT_IDX_ALL flags overwrite
selftests/bpf: use __bpf_constant_htons in test_prog.c
ARM: pxa: avoid section mismatch warning
ASoC: fsl: Fix SND_SOC_EUKREA_TLV320 build error on i.MX8M
KVM: PPC: Book3S: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines
mmc: bcm2835: Recover from MMC_SEND_EXT_CSD
mmc: bcm2835: reset host on timeout
memstick: Prevent memstick host from getting runtime suspended during card detection
mmc: sdhci-of-esdhc: Fix timeout checks
mmc: sdhci-xenon: Fix timeout checks
tty: serial: samsung: Properly set flags in autoCTS mode
perf test: Fix perf_event_attr test failure
perf header: Fix unchecked usage of strncpy()
perf probe: Fix unchecked usage of strncpy()
arm64: KVM: Skip MMIO insn after emulation
usb: musb: dsps: fix otg state machine
percpu: convert spin_lock_irq to spin_lock_irqsave.
powerpc/uaccess: fix warning/error with access_ok()
mac80211: fix radiotap vendor presence bitmap handling
xfrm6_tunnel: Fix spi check in __xfrm6_tunnel_alloc_spi
Bluetooth: Fix unnecessary error message for HCI request completion
mlxsw: spectrum: Properly cleanup LAG uppers when removing port from LAG
scsi: smartpqi: correct host serial num for ssa
scsi: smartpqi: correct volume status
scsi: smartpqi: increase fw status register read timeout
cw1200: Fix concurrency use-after-free bugs in cw1200_hw_scan()
powerpc/perf: Fix thresholding counter data for unknown type
drbd: narrow rcu_read_lock in drbd_sync_handshake
drbd: disconnect, if the wrong UUIDs are attached on a connected peer
drbd: skip spurious timeout (ping-timeo) when failing promote
drbd: Avoid Clang warning about pointless switch statment
video: clps711x-fb: release disp device node in probe()
md: fix raid10 hang issue caused by barrier
fbdev: fbmem: behave better with small rotated displays and many CPUs
i40e: define proper net_device::neigh_priv_len
igb: Fix an issue that PME is not enabled during runtime suspend
ACPI/APEI: Clear GHES block_status before panic()
fbdev: fbcon: Fix unregister crash when more than one framebuffer
powerpc/mm: Fix reporting of kernel execute faults on the 8xx
pinctrl: meson: meson8: fix the GPIO function for the GPIOAO pins
pinctrl: meson: meson8b: fix the GPIO function for the GPIOAO pins
KVM: x86: svm: report MSR_IA32_MCG_EXT_CTL as unsupported
powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.
kvm: Change offset in kvm_write_guest_offset_cached to unsigned
NFS: nfs_compare_mount_options always compare auth flavors.
hwmon: (lm80) fix a missing check of the status of SMBus read
hwmon: (lm80) fix a missing check of bus read in lm80 probe
seq_buf: Make seq_buf_puts() null-terminate the buffer
crypto: ux500 - Use proper enum in cryp_set_dma_transfer
crypto: ux500 - Use proper enum in hash_set_dma_transfer
MIPS: ralink: Select CONFIG_CPU_MIPSR2_IRQ_VI on MT7620/8
cifs: check ntwrk_buf_start for NULL before dereferencing it
um: Avoid marking pages with "changed protection"
niu: fix missing checks of niu_pci_eeprom_read
f2fs: fix sbi->extent_list corruption issue
cgroup: fix parsing empty mount option string
scripts/decode_stacktrace: only strip base path when a prefix of the path
ocfs2: don't clear bh uptodate for block read
ocfs2: improve ocfs2 Makefile
isdn: hisax: hfc_pci: Fix a possible concurrency use-after-free bug in HFCPCI_l1hw()
gdrom: fix a memory leak bug
fsl/fman: Use GFP_ATOMIC in {memac,tgec}_add_hash_mac_address()
block/swim3: Fix -EBUSY error when re-opening device after unmount
thermal: bcm2835: enable hwmon explicitly
kdb: Don't back trace on a cpu that didn't round up
thermal: generic-adc: Fix adc to temp interpolation
HID: lenovo: Add checks to fix of_led_classdev_register
kernel/hung_task.c: break RCU locks based on jiffies
proc/sysctl: fix return error for proc_doulongvec_minmax()
kernel/hung_task.c: force console verbose before panic
fs/epoll: drop ovflist branch prediction
exec: load_script: don't blindly truncate shebang string
scripts/gdb: fix lx-version string output
thermal: hwmon: inline helpers when CONFIG_THERMAL_HWMON is not set
dccp: fool proof ccid_hc_[rt]x_parse_options()
enic: fix checksum validation for IPv6
net: dp83640: expire old TX-skb
rxrpc: bad unlock balance in rxrpc_recvmsg
skge: potential memory corruption in skge_get_regs()
rds: fix refcount bug in rds_sock_addref
net: systemport: Fix WoL with password after deep sleep
net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
net: dsa: slave: Don't propagate flag changes on down slave interfaces
ALSA: compress: Fix stop handling on compressed capture streams
ALSA: hda - Serialize codec registrations
fuse: call pipe_buf_release() under pipe lock
fuse: decrement NR_WRITEBACK_TEMP on the right page
fuse: handle zero sized retrieve correctly
dmaengine: bcm2835: Fix interrupt race on RT
dmaengine: bcm2835: Fix abort of transactions
dmaengine: imx-dma: fix wrong callback invoke
futex: Handle early deadlock return correctly
irqchip/gic-v3-its: Plug allocation race for devices sharing a DevID
usb: phy: am335x: fix race condition in _probe
usb: dwc3: gadget: Handle 0 xfer length for OUT EP
usb: gadget: udc: net2272: Fix bitwise and boolean operations
usb: gadget: musb: fix short isoc packets with inventra dma
staging: speakup: fix tty-operation NULL derefs
scsi: cxlflash: Prevent deadlock when adapter probe fails
scsi: aic94xx: fix module loading
KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222)
kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)
KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221)
cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
perf/x86/intel/uncore: Add Node ID mask
x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()
perf/core: Don't WARN() for impossible ring-buffer sizes
perf tests evsel-tp-sched: Fix bitwise operator
serial: fix race between flush_to_ldisc and tty_open
serial: 8250_pci: Make PCI class test non fatal
nfsd4: fix cached replies to solo SEQUENCE compounds
nfsd4: catch some false session retries
IB/hfi1: Add limit test for RC/UC send via loopback
perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu()
ath9k: dynack: make ewma estimation faster
ath9k: dynack: check da->enabled first in sampling routines
Linux 4.14.99
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 76699a67f3041ff4c7af6d6ee9be2bfbf1ffb671 ]
The ep->ovflist is a secondary ready-list to temporarily store events
that might occur when doing sproc without holding the ep->wq.lock. This
accounts for every time we check for ready events and also send events
back to userspace; both callbacks, particularly the latter because of
copy_to_user, can account for a non-trivial time.
As such, the unlikely() check to see if the pointer is being used, seems
both misleading and sub-optimal. In fact, we go to an awful lot of
trouble to sync both lists, and populating the ovflist is far from an
uncommon scenario.
For example, profiling a concurrent epoll_wait(2) benchmark, with
CONFIG_PROFILE_ANNOTATED_BRANCHES shows that for a two threads a 33%
incorrect rate was seen; and when incrementally increasing the number of
epoll instances (which is used, for example for multiple queuing load
balancing models), up to a 90% incorrect rate was seen.
Similarly, by deleting the prediction, 3% throughput boost was seen
across incremental threads.
Link: http://lkml.kernel.org/r/20181108051006.18751-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Avoid waking up every thread sleeping in an epoll_wait call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.
This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.
Change-Id: I848d08d28c89302fd42bbbdfa76489a474ab27bf
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
... such that we can avoid the tree walks to get the node with the
smallest key. Semantically the same, as the previously used rb_first(),
but O(1). The main overhead is the extra footprint for the cached rb_node
pointer, which should not matter for epoll.
Link: http://lkml.kernel.org/r/20170719014603.19029-15-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The race was introduced by me in commit 971316f0503a ("epoll:
ep_unregister_pollwait() can use the freed pwq->whead"). I did not
realize that nothing can protect eventpoll after ep_poll_callback() sets
->whead = NULL, only whead->lock can save us from the race with
ep_free() or ep_remove().
Move ->whead = NULL to the end of ep_poll_callback() and add the
necessary barriers.
TODO: cleanup the ewake/EPOLLEXCLUSIVE logic, it was confusing even
before this patch.
Hopefully this explains use-after-free reported by syzcaller:
BUG: KASAN: use-after-free in debug_spin_lock_before
...
_raw_spin_lock_irqsave+0x4a/0x60 kernel/locking/spinlock.c:159
ep_poll_callback+0x29f/0xff0 fs/eventpoll.c:1148
this is spin_lock(eventpoll->lock),
...
Freed by task 17774:
...
kfree+0xe8/0x2c0 mm/slub.c:3883
ep_free+0x22c/0x2a0 fs/eventpoll.c:865
Fixes: 971316f0503a ("epoll: ep_unregister_pollwait() can use the freed pwq->whead")
Reported-by: 范龙飞 <long7573@126.com>
Cc: stable@vger.kernel.org
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kcmp syscall is build iif CONFIG_CHECKPOINT_RESTORE is selected, so wrap
appropriate helpers in epoll code with the config to build it
conditionally.
Link: http://lkml.kernel.org/r/20170513083456.GG1881@uranus.lan
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Andrew Morton <akpm@linuxfoundation.org>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With current epoll architecture target files are addressed with
file_struct and file descriptor number, where the last is not unique.
Moreover files can be transferred from another process via unix socket,
added into queue and closed then so we won't find this descriptor in the
task fdinfo list.
Thus to checkpoint and restore such processes CRIU needs to find out
where exactly the target file is present to add it into epoll queue.
For this sake one can use kcmp call where some particular target file
from the queue is compared with arbitrary file passed as an argument.
Because epoll target files can have same file descriptor number but
different file_struct a caller should explicitly specify the offset
within.
To test if some particular file is matching entry inside epoll one have
to
- fill kcmp_epoll_slot structure with epoll file descriptor,
target file number and target file offset (in case if only
one target is present then it should be 0)
- call kcmp as kcmp(pid1, pid2, KCMP_EPOLL_TFD, fd, &kcmp_epoll_slot)
- the kernel fetch file pointer matching file descriptor @fd of pid1
- lookups for file struct in epoll queue of pid2 and returns traditional
0,1,2 result for sorting purpose
Link: http://lkml.kernel.org/r/20170424154423.511592110@gmail.com
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since it is possbile to have same number in tfd field (say file added,
closed, then nother file dup'ed to same number and added back) it is
imposible to distinguish such target files solely by their numbers.
Strictly speaking regular applications don't need to recognize these
targets at all but for checkpoint/restore sake we need to collect
targets to be able to push them back on restore stage in a proper order.
Thus lets add file position, inode and device number where this target
lays. This three fields can be used as a primary key for sorting, and
together with kcmp help CRIU can find out an exact file target (from the
whole set of processes being checkpointed).
Link: http://lkml.kernel.org/r/20170424154423.436491881@gmail.com
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrei Vagin <avagin@virtuozzo.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've encountered zombies that are waiting for a thread to exit that are
looping in ep_poll() almost endlessly although there is a pending
SIGKILL as a result of a group exit.
This happens because we always find ep_events_available() and fetch more
events and never are able to check for signal_pending() that would break
from the loop and return -EINTR.
Special case fatal signals and break immediately to guarantee that we
loop to fetch more events and delay making a timely exit.
It would also be possible to simply move the check for signal_pending()
higher than checking for ep_events_available(), but there have been no
reports of delayed signal handling other than SIGKILL preventing zombies
from exiting that would be fixed by this.
It fixes an issue for us where we have witnessed zombies sticking around
for at least O(minutes), but considering the code has been like this
forever and nobody else has complained that I have found, I would simply
queue it up for 4.12.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1705031722350.76784@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
So I've noticed a number of instances where it was not obvious from the
code whether ->task_list was for a wait-queue head or a wait-queue entry.
Furthermore, there's a number of wait-queue users where the lists are
not for 'tasks' but other entities (poll tables, etc.), in which case
the 'task_list' name is actively confusing.
To clear this all up, name the wait-queue head and entry list structure
fields unambiguously:
struct wait_queue_head::task_list => ::head
struct wait_queue_entry::task_list => ::entry
For example, this code:
rqw->wait.task_list.next != &wait->task_list
... is was pretty unclear (to me) what it's doing, while now it's written this way:
rqw->wait.head.next != &wait->entry
... which makes it pretty clear that we are iterating a list until we see the head.
Other examples are:
list_for_each_entry_safe(pos, next, &x->task_list, task_list) {
list_for_each_entry(wq, &fence->wait.task_list, task_list) {
... where it's unclear (to me) what we are iterating, and during review it's
hard to tell whether it's trying to walk a wait-queue entry (which would be
a bug), while now it's written as:
list_for_each_entry_safe(pos, next, &x->head, entry) {
list_for_each_entry(wq, &fence->wait.head, entry) {
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rename:
wait_queue_t => wait_queue_entry_t
'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.
Start sorting this out by renaming it to 'wait_queue_entry_t'.
This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds busy poll support to epoll. The implementation is meant to
be opportunistic in that it will take the NAPI ID from the last socket
that is added to the ready list that contains a valid NAPI ID and it will
use that for busy polling until the ready list goes empty. Once the ready
list goes empty the NAPI ID is reset and busy polling is disabled until a
new socket is added to the ready list.
In addition when we insert a new socket into the epoll we record the NAPI
ID and assume we are going to receive events on it. If that doesn't occur
it will be evicted as the active NAPI ID and we will resume normal
behavior.
An application can use SO_INCOMING_CPU or SO_REUSEPORT_ATTACH_C/EBPF socket
options to spread the incoming connections to specific worker threads
based on the incoming queue. This enables epoll for each worker thread
to have only sockets that receive packets from a single queue. So when an
application calls epoll_wait() and there are no events available to report,
busy polling is done on the associated queue to pull the packets.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix up affected files that include this signal functionality via sched.h.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In case if epoll_ctl is called with operation EPOLL_CTL_DEL then
@epds.events variable allocated on stack may contain random bits which
we test then for EPOLLEXCLUSIVE. Since currently the test look like
if (epds.events & EPOLLEXCLUSIVE) {
if (op == EPOLL_CTL_MOD)
goto error_tgt_fput;
if (op == EPOLL_CTL_ADD && (is_file_epoll(tf.file) ||
(epds.events & ~EPOLLEXCLUSIVE_OK_BITS)))
goto error_tgt_fput;
}
Nothing serious will happen even if epds.events has this bit set, still
better to be on safe side and make sure that we're to test this bit at
all.
Link: http://lkml.kernel.org/r/20170214154935.GG1850@uranus.lan
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct timespec is not y2038 safe. Even though timespec might be
sufficient to represent timeouts, use struct timespec64 here as the plan
is to get rid of all timespec reference in the kernel.
The patch transitions the common functions: poll_select_set_timeout()
and select_estimate_accuracy() to use timespec64. And, all the syscalls
that use these functions are transitioned in the same patch.
The restart block parameters for poll uses monotonic time. Use
timespec64 here as well to assign timeout value. This parameter in the
restart block need not change because this only holds the monotonic
timestamp at which timeout should occur. And, unsigned long data type
should be big enough for this timestamp.
The system call interfaces will be handled in a separate series.
Compat interfaces need not change as timespec64 is an alias to struct
timespec on a 64 bit system.
Link: http://lkml.kernel.org/r/1461947989-21926-3-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset introduces a /proc/<pid>/timerslack_ns interface which
would allow controlling processes to be able to set the timerslack value
on other processes in order to save power by avoiding wakeups (Something
Android currently does via out-of-tree patches).
The first patch tries to fix the internal timer_slack_ns usage which was
defined as a long, which limits the slack range to ~4 seconds on 32bit
systems. It converts it to a u64, which provides the same basically
unlimited slack (500 years) on both 32bit and 64bit machines.
The second patch introduces the /proc/<pid>/timerslack_ns interface
which allows the full 64bit slack range for a task to be read or set on
both 32bit and 64bit machines.
With these two patches, on a 32bit machine, after setting the slack on
bash to 10 seconds:
$ time sleep 1
real 0m10.747s
user 0m0.001s
sys 0m0.005s
The first patch is a little ugly, since I had to chase the slack delta
arguments through a number of functions converting them to u64s. Let me
know if it makes sense to break that up more or not.
Other than that things are fairly straightforward.
This patch (of 2):
The timer_slack_ns value in the task struct is currently a unsigned
long. This means that on 32bit applications, the maximum slack is just
over 4 seconds. However, on 64bit machines, its much much larger (~500
years).
This disparity could make application development a little (as well as
the default_slack) to a u64. This means both 32bit and 64bit systems
have the same effective internal slack range.
Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
the interface as a unsigned long, so we preserve that limitation on
32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
actually larger then what can be stored by an unsigned long.
This patch also modifies hrtimer functions which specified the slack
delta as a unsigned long.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the current implementation of the EPOLLEXCLUSIVE flag (added for
4.5-rc1), if epoll waiters create different POLL* sets and register them
as exclusive against the same target fd, the current implementation will
stop waking any further waiters once it finds the first idle waiter.
This means that waiters could miss wakeups in certain cases.
For example, when we wake up a pipe for reading we do:
wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM); So if
one epoll set or epfd is added to pipe p with POLLIN and a second set
epfd2 is added to pipe p with POLLRDNORM, only epfd may receive the
wakeup since the current implementation will stop after it finds any
intersection of events with a waiter that is blocked in epoll_wait().
We could potentially address this by requiring all epoll waiters that
are added to p be required to pass the same set of POLL* events. IE the
first EPOLL_CTL_ADD that passes EPOLLEXCLUSIVE establishes the set POLL*
flags to be used by any other epfds that are added as EPOLLEXCLUSIVE.
However, I think it might be somewhat confusing interface as we would
have to reference count the number of users for that set, and so
userspace would have to keep track of that count, or we would need a
more involved interface. It also adds some shared state that we'd have
store somewhere. I don't think anybody will want to bloat
__wait_queue_head for this.
I think what we could do instead, is to simply restrict EPOLLEXCLUSIVE
such that it can only be specified with EPOLLIN and/or EPOLLOUT. So
that way if the wakeup includes 'POLLIN' and not 'POLLOUT', we can stop
once we hit the first idle waiter that specifies the EPOLLIN bit, since
any remaining waiters that only have 'POLLOUT' set wouldn't need to be
woken. Likewise, we can do the same thing if 'POLLOUT' is in the wakeup
bit set and not 'POLLIN'. If both 'POLLOUT' and 'POLLIN' are set in the
wake bit set (there is at least one example of this I saw in fs/pipe.c),
then we just wake the entire exclusive list. Having both 'POLLOUT' and
'POLLIN' both set should not be on any performance critical path, so I
think that's ok (in fs/pipe.c its in pipe_release()). We also continue
to include EPOLLERR and EPOLLHUP by default in any exclusive set. Thus,
the user can specify EPOLLERR and/or EPOLLHUP but is not required to do
so.
Since epoll waiters may be interested in other events as well besides
EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP, these can still be added by
doing a 'dup' call on the target fd and adding that as one normally
would with EPOLL_CTL_ADD. Since I think that the POLLIN and POLLOUT
events are what we are interest in balancing, I think that the 'dup'
thing could perhaps be added to only one of the waiter threads.
However, I think that EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP should be
sufficient for the majority of use-cases.
Since EPOLLEXCLUSIVE is intended to be used with a target fd shared
among multiple epfds, where between 1 and n of the epfds may receive an
event, it does not satisfy the semantics of EPOLLONESHOT where only 1
epfd would get an event. Thus, it is not allowed to be specified in
conjunction with EPOLLEXCLUSIVE.
EPOLL_CTL_MOD is also not allowed if the fd was previously added as
EPOLLEXCLUSIVE. It seems with the limited number of flags to not be as
interesting, but this could be relaxed at some further point.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Tested-by: Madars Vitolins <m@silodev.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, epoll file descriptors or epfds (the fd returned from
epoll_create[1]()) that are added to a shared wakeup source are always
added in a non-exclusive manner. This means that when we have multiple
epfds attached to a shared fd source they are all woken up. This creates
thundering herd type behavior.
Introduce a new 'EPOLLEXCLUSIVE' flag that can be passed as part of the
'event' argument during an epoll_ctl() EPOLL_CTL_ADD operation. This new
flag allows for exclusive wakeups when there are multiple epfds attached
to a shared fd event source.
The implementation walks the list of exclusive waiters, and queues an
event to each epfd, until it finds the first waiter that has threads
blocked on it via epoll_wait(). The idea is to search for threads which
are idle and ready to process the wakeup events. Thus, we queue an event
to at least 1 epfd, but may still potentially queue an event to all epfds
that are attached to the shared fd source.
Performance testing was done by Madars Vitolins using a modified version
of Enduro/X. The use of the 'EPOLLEXCLUSIVE' flag reduce the length of
this particular workload from 860s down to 24s.
Sample epoll_clt text:
EPOLLEXCLUSIVE
Sets an exclusive wakeup mode for the epfd file descriptor that is
being attached to the target file descriptor, fd. Thus, when an event
occurs and multiple epfd file descriptors are attached to the same
target file using EPOLLEXCLUSIVE, one or more epfds will receive an
event with epoll_wait(2). The default in this scenario (when
EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
EPOLLEXCLUSIVE may only be specified with the op EPOLL_CTL_ADD.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Tested-by: Madars Vitolins <m@silodev.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After waking up a task waiting for an event, we explicitly mark it as
TASK_RUNNING (which is necessary as we do the checks for wakeups as
TASK_INTERRUPTIBLE). Once running and dealing with actually delivering
the events, we're obviously not planning on calling schedule, thus we can
relax the implied barrier and simply update the state with
__set_current_state().
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
seq_printf functions shouldn't really check the return value.
Checking seq_has_overflowed() occasionally is used instead.
Update vfs documentation.
Link: http://lkml.kernel.org/p/e37e6e7b76acbdcc3bb4ab2a57c8f8ca1ae11b9a.1412031505.git.joe@perches.com
Cc: David S. Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Joe Perches <joe@perches.com>
[ did a few clean ups ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When calling epoll_ctl with operation EPOLL_CTL_DEL, structure epds is
not initialized but ep_take_care_of_epollwakeup reads its event field.
When this unintialized field has EPOLLWAKEUP bit set, a capability check
is done for CAP_BLOCK_SUSPEND in ep_take_care_of_epollwakeup. This
produces unexpected messages in the audit log, such as (on a system
running SELinux):
type=AVC msg=audit(1408212798.866:410): avc: denied
{ block_suspend } for pid=7754 comm="dbus-daemon" capability=36
scontext=unconfined_u:unconfined_r:unconfined_t
tcontext=unconfined_u:unconfined_r:unconfined_t
tclass=capability2 permissive=1
type=SYSCALL msg=audit(1408212798.866:410): arch=c000003e syscall=233
success=yes exit=0 a0=3 a1=2 a2=9 a3=7fffd4d66ec0 items=0 ppid=1
pid=7754 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
fsgid=0 tty=(none) ses=3 comm="dbus-daemon"
exe="/usr/bin/dbus-daemon"
subj=unconfined_u:unconfined_r:unconfined_t key=(null)
("arch=c000003e syscall=233 a1=2" means "epoll_ctl(op=EPOLL_CTL_DEL)")
Remove use of epds in epoll_ctl when op == EPOLL_CTL_DEL.
Fixes: 4d7e30d98939 ("epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes use-after-free of epi->fllink.next inside list loop macro.
This loop actually releases elements in the body. The list is
rcu-protected but here we cannot hold rcu_read_lock because we need to
lock mutex inside.
The obvious solution is to use list_for_each_entry_safe(). RCU-ness
isn't essential because nobody can change this list under us, it's final
fput for this file.
The bug was introduced by ae10b2b4eb01 ("epoll: optimize EPOLL_CTL_DEL
using rcu")
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Stable <stable@vger.kernel.org> # 3.13+
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This typedef is unnecessary and should just be removed.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock.
That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with
epoll_ctl(b, EPOLL_CTL_DEL, a, x). The deadlock was introduced with
commmit 67347fe4e632 ("epoll: do not take global 'epmutex' for simple
topologies").
The acquistion of the ep->mtx for the destination 'ep' was added such
that a concurrent EPOLL_CTL_ADD operation would see the correct state of
the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links')
However, by simply not acquiring the lock, we do not serialize behind
the ep->mtx from the add path, and thus may perform a full path check
when if we had waited a little longer it may not have been necessary.
However, this is a transient state, and performing the full loop
checking in this case is not harmful.
The important point is that we wouldn't miss doing the full loop
checking when required, since EPOLL_CTL_ADD always locks any 'ep's that
its operating upon. The reason we don't need to do lock ordering in the
add path, is that we are already are holding the global 'epmutex'
whenever we do the double lock. Further, the original posting of this
patch, which was tested for the intended performance gains, did not
perform this additional locking.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Drop EPOLLWAKEUP from epoll events mask if CONFIG_PM_SLEEP is disabled.
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge first patch-bomb from Andrew Morton:
"Quite a lot of other stuff is banked up awaiting further
next->mainline merging, but this batch contains:
- Lots of random misc patches
- OCFS2
- Most of MM
- backlight updates
- lib/ updates
- printk updates
- checkpatch updates
- epoll tweaking
- rtc updates
- hfs
- hfsplus
- documentation
- procfs
- update gcov to gcc-4.7 format
- IPC"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (269 commits)
ipc, msg: fix message length check for negative values
ipc/util.c: remove unnecessary work pending test
devpts: plug the memory leak in kill_sb
./Makefile: export initial ramdisk compression config option
init/Kconfig: add option to disable kernel compression
drivers: w1: make w1_slave::flags long to avoid memory corruption
drivers/w1/masters/ds1wm.cuse dev_get_platdata()
drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
drivers/memstick/core/mspro_block.c: fix attributes array allocation
drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
kernel/panic.c: reduce 1 byte usage for print tainted buffer
gcov: reuse kbasename helper
kernel/gcov/fs.c: use pr_warn()
kernel/module.c: use pr_foo()
gcov: compile specific gcov implementation based on gcc version
gcov: add support for gcc 4.7 gcov format
gcov: move gcov structs definitions to a gcc version specific file
kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
...
Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:
- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixes
plus misc stuff all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...
When calling EPOLL_CTL_ADD for an epoll file descriptor that is attached
directly to a wakeup source, we do not need to take the global 'epmutex',
unless the epoll file descriptor is nested. The purpose of taking the
'epmutex' on add is to prevent complex topologies such as loops and deep
wakeup paths from forming in parallel through multiple EPOLL_CTL_ADD
operations. However, for the simple case of an epoll file descriptor
attached directly to a wakeup source (with no nesting), we do not need to
hold the 'epmutex'.
This patch along with 'epoll: optimize EPOLL_CTL_DEL using rcu' improves
scalability on larger systems. Quoting Nathan Zimmer's mail on SPECjbb
performance:
"On the 16 socket run the performance went from 35k jOPS to 125k jOPS. In
addition the benchmark when from scaling well on 10 sockets to scaling
well on just over 40 sockets.
...
Currently the benchmark stops scaling at around 40-44 sockets but it seems like
I found a second unrelated bottleneck."
[akpm@linux-foundation.org: use `bool' for boolean variables, remove unneeded/undesirable cast of void*, add missed ep_scan_ready_list() kerneldoc]
Signed-off-by: Jason Baron <jbaron@akamai.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nathan Zimmer found that once we get over 10+ cpus, the scalability of
SPECjbb falls over due to the contention on the global 'epmutex', which is
taken in on EPOLL_CTL_ADD and EPOLL_CTL_DEL operations.
Patch #1 removes the 'epmutex' lock completely from the EPOLL_CTL_DEL path
by using rcu to guard against any concurrent traversals.
Patch #2 remove the 'epmutex' lock from EPOLL_CTL_ADD operations for
simple topologies. IE when adding a link from an epoll file descriptor to
a wakeup source, where the epoll file descriptor is not nested.
This patch (of 2):
Optimize EPOLL_CTL_DEL such that it does not require the 'epmutex' by
converting the file->f_ep_links list into an rcu one. In this way, we can
traverse the epoll network on the add path in parallel with deletes.
Since deletes can't create loops or worse wakeup paths, this is safe.
This patch in combination with the patch "epoll: Do not take global 'epmutex'
for simple topologies", shows a dramatic performance improvement in
scalability for SPECjbb.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
CC: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 1c441e921201 (epoll: use freezable blocking call)
which is reported to cause user space memory corruption to happen
after suspend to RAM.
Since it appears to be extremely difficult to root cause this
problem, it is best to revert the offending commit and try to address
the original issue in a better way later.
References: https://bugzilla.kernel.org/show_bug.cgi?id=61781
Reported-by: Natrio <natrio@list.ru>
Reported-by: Jeff Pohlmeyer <yetanothergeek@gmail.com>
Bisected-by: Leo Wolf <jclw@ymail.com>
Fixes: 1c441e921201 (epoll: use freezable blocking call)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
ep_free() might iterate on a huge set of epitems and hold cpu too long.
Add two cond_resched() in order to yield cpu to other tasks. This is safe
as we only hold mutexes in this function.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Acked-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge first patch-bomb from Andrew Morton:
- various misc bits
- I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
distracted. There has been quite a bit of activity.
- About half the MM queue
- Some backlight bits
- Various lib/ updates
- checkpatch updates
- zillions more little rtc patches
- ptrace
- signals
- exec
- procfs
- rapidio
- nbd
- aoe
- pps
- memstick
- tools/testing/selftests updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
tools/testing/selftests: don't assume the x bit is set on scripts
selftests: add .gitignore for kcmp
selftests: fix clean target in kcmp Makefile
selftests: add .gitignore for vm
selftests: add hugetlbfstest
self-test: fix make clean
selftests: exit 1 on failure
kernel/resource.c: remove the unneeded assignment in function __find_resource
aio: fix wrong comment in aio_complete()
drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
drivers/memstick/host/r592.c: convert to module_pci_driver
drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
pps-gpio: add device-tree binding and support
drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
drivers/parport/share.c: use kzalloc
Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
aoe: update internal version number to v83
aoe: update copyright date
aoe: perform I/O completions in parallel
...
sigprocmask() should die. None of the current callers actually
need this strange interface.
Change fs/eventpoll.c to use set_current_blocked(). This also
means we should not worry about SIGKILL/SIGSTOP.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid waking up every thread sleeping in an epoll_wait call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.
This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull compat cleanup from Al Viro:
"Mostly about syscall wrappers this time; there will be another pile
with patches in the same general area from various people, but I'd
rather push those after both that and vfs.git pile are in."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
syscalls.h: slightly reduce the jungles of macros
get rid of union semop in sys_semctl(2) arguments
make do_mremap() static
sparc: no need to sign-extend in sync_file_range() wrapper
ppc compat wrappers for add_key(2) and request_key(2) are pointless
x86: trim sys_ia32.h
x86: sys32_kill and sys32_mprotect are pointless
get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
merge compat sys_ipc instances
consolidate compat lookup_dcookie()
convert vmsplice to COMPAT_SYSCALL_DEFINE
switch getrusage() to COMPAT_SYSCALL_DEFINE
switch epoll_pwait to COMPAT_SYSCALL_DEFINE
convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect
make HAVE_SYSCALL_WRAPPERS unconditional
consolidate cond_syscall and SYSCALL_ALIAS declarations
teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long
get rid of duplicate logics in __SC_....[1-6] definitions
It is always safe to use RCU_INIT_POINTER to NULL a pointer. This results
in slightly smaller/faster code.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>