547 Commits

Author SHA1 Message Date
Richard Raya
9cdc78c354 Merge branch 'android-4.14-stable' of https://android.googlesource.com/kernel/common
* 'android-4.14-stable' of https://android.googlesource.com/kernel/common: (2966 commits)
  Linux 4.14.331
  net: sched: fix race condition in qdisc_graft()
  scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
  ext4: remove gdb backup copy for meta bg in setup_new_flex_group_blocks
  ext4: correct return value of ext4_convert_meta_bg
  ext4: correct offset of gdb backup in non meta_bg group to update_backups
  ext4: apply umask if ACL support is disabled
  media: venus: hfi: fix the check to handle session buffer requirement
  media: sharp: fix sharp encoding
  i2c: i801: fix potential race in i801_block_transaction_byte_by_byte
  net: dsa: lan9303: consequently nested-lock physical MDIO
  ALSA: info: Fix potential deadlock at disconnection
  parisc/pgtable: Do not drop upper 5 address bits of physical address
  parisc: Prevent booting 64-bit kernels on PA1.x machines
  mcb: fix error handling for different scenarios when parsing
  jbd2: fix potential data lost in recovering journal raced with synchronizing fs bdev
  genirq/generic_chip: Make irq_remove_generic_chip() irqdomain aware
  mmc: meson-gx: Remove setting of CMD_CFG_ERROR
  PM: hibernate: Clean up sync_read handling in snapshot_write_next()
  PM: hibernate: Use __get_safe_page() rather than touching the list
  ...

Change-Id: I755d2aa7c525ace28adc4aee433572b3110ea39b
2023-12-07 20:15:44 -03:00
Georg Ottinger
8646578a52 ext2: fix datatype of block number in ext2_xattr_set2()
[ Upstream commit e88076348425b7d0491c8c98d8732a7df8de7aa3 ]

I run a small server that uses external hard drives for backups. The
backup software I use uses ext2 filesystems with 4KiB block size and
the server is running SELinux and therefore relies on xattr. I recently
upgraded the hard drives from 4TB to 12TB models. I noticed that after
transferring some TBs I got a filesystem error "Freeing blocks not in
datazone - block = 18446744071529317386, count = 1" and the backup
process stopped. Trying to fix the fs with e2fsck resulted in a
completely corrupted fs. The error probably came from ext2_free_blocks(),
and because of the large number 18e19 this problem immediately looked
like some kind of integer overflow. Whereas the 4TB fs was about 1e9
blocks, the new 12TB is about 3e9 blocks. So, searching the ext2 code,
I came across the line in fs/ext2/xattr.c:745 where ext2_new_block()
is called and the resulting block number is stored in the variable block
as an int datatype. If a block with a block number greater than
INT32_MAX is returned, this variable overflows and the call to
sb_getblk() at line fs/ext2/xattr.c:750 fails, then the call to
ext2_free_blocks() produces the error.

Signed-off-by: Georg Ottinger <g.ottinger@gmx.at>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230815100340.22121-1-g.ottinger@gmx.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-23 10:47:04 +02:00
Jan Kara
e28208e21e ext2: Drop fragment support
commit 404615d7f1dcd4cca200e9a7a9df3a1dcae1dd62 upstream.

Ext2 has fields in superblock reserved for subblock allocation support.
However that never landed. Drop the many years dead code.

Reported-by: syzbot+af5e10f73dbff48f70af@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11 11:33:57 +02:00
Jan Kara
0ebfaf1415 ext2: Check block size validity during mount
[ Upstream commit 62aeb94433fcec80241754b70d0d1836d5926b0a ]

Check that log of block size stored in the superblock has sensible
value. Otherwise the shift computing the block size can overflow leading
to undefined behavior.

Reported-by: syzbot+4fec412f59eba8c01b77@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30 12:38:34 +01:00
Jan Kara
0bcdc31094 ext2: Add more validity checks for inode counts
[ Upstream commit fa78f336937240d1bc598db817d638086060e7e9 ]

Add checks verifying number of inodes stored in the superblock matches
the number computed from number of inodes per group. Also verify we have
at least one block worth of inodes per group. This prevents crashes on
corrupted filesystems.

Reported-by: syzbot+d273f7d7f58afd93be48@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25 11:11:14 +02:00
azrim
2abb319f4b
Merge branch 'android-4.14-stable' of https://android.googlesource.com/kernel/common into sheesh
* 'android-4.14-stable' of https://android.googlesource.com/kernel/common:
  Linux 4.14.277
  Revert "net: micrel: fix KS8851_MLL Kconfig"
  ax25: Fix UAF bugs in ax25 timers
  ax25: Fix NULL pointer dereferences in ax25 timers
  ax25: fix NPD bug in ax25_disconnect
  ax25: fix UAF bug in ax25_send_control()
  ax25: Fix refcount leaks caused by ax25_cb_del()
  ax25: fix UAF bugs of net_device caused by rebinding operation
  ax25: fix reference count leaks of ax25_dev
  ax25: add refcount in ax25_dev to avoid UAF bugs
  block/compat_ioctl: fix range check in BLKGETSIZE
  staging: ion: Prevent incorrect reference counting behavour
  ext4: force overhead calculation if the s_overhead_cluster makes no sense
  ext4: fix overhead calculation to account for the reserved gdt blocks
  ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
  ext4: fix symlink file size not match to file content
  ARC: entry: fix syscall_trace_exit argument
  e1000e: Fix possible overflow in LTR decoding
  ASoC: soc-dapm: fix two incorrect uses of list iterator
  openvswitch: fix OOB access in reserve_sfa_size()
  powerpc/perf: Fix power9 event alternatives
  dma: at_xdmac: fix a missing check on list iterator
  ata: pata_marvell: Check the 'bmdma_addr' beforing reading
  stat: fix inconsistency between struct stat and struct compat_stat
  net: macb: Restart tx only if queue pointer is lagging
  drm/msm/mdp5: check the return of kzalloc()
  brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant
  cifs: Check the IOCB_DIRECT flag, not O_DIRECT
  vxlan: fix error return code in vxlan_fdb_append
  ALSA: usb-audio: Fix undefined behavior due to shift overflowing the constant
  platform/x86: samsung-laptop: Fix an unsigned comparison which can never be negative
  ARM: vexpress/spc: Avoid negative array index when !SMP
  netlink: reset network and mac headers in netlink_dump()
  net/packet: fix packet_sock xmit return value checking
  dmaengine: imx-sdma: Fix error checking in sdma_event_remap
  tcp: Fix potential use-after-free due to double kfree()
  tcp: fix race condition when creating child sockets from syncookies
  ALSA: usb-audio: Clear MIDI port active flag after draining
  gfs2: assign rgrp glock before compute_bitstructs
  can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path
  tracing: Dump stacktrace trigger to the corresponding instance
  tracing: Have traceon and traceoff trigger honor the instance
  mm: page_alloc: fix building error on -Werror=array-compare
  etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead
  Linux 4.14.276
  i2c: pasemi: Wait for write xfers to finish
  smp: Fix offline cpu check in flush_smp_call_function_queue()
  ARM: davinci: da850-evm: Avoid NULL pointer dereference
  ALSA: pcm: Test for "silence" field in struct "pcm_format_data"
  gcc-plugins: latent_entropy: use /dev/urandom
  mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
  mm, page_alloc: fix build_zonerefs_node()
  drivers: net: slip: fix NPD bug in sl_tx_timeout()
  scsi: mvsas: Add PCI ID of RocketRaid 2640
  gpu: ipu-v3: Fix dev_dbg frequency output
  ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs
  net: micrel: fix KS8851_MLL Kconfig
  scsi: ibmvscsis: Increase INITIAL_SRP_LIMIT to 1024
  scsi: target: tcmu: Fix possible page UAF
  Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer
  drm/amdkfd: Check for potential null return of kmalloc_array()
  drm/amd: Add USBC connector ID
  cifs: potential buffer overflow in handling symlinks
  nfc: nci: add flush_workqueue to prevent uaf
  net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link
  mlxsw: i2c: Fix initialization error flow
  gpiolib: acpi: use correct format characters
  veth: Ensure eth header is in skb's linear part
  memory: atmel-ebi: Fix missing of_node_put in atmel_ebi_probe
  xfrm: policy: match with both mark and mask on user interfaces
  cgroup: Use open-time cgroup namespace for process migration perm checks
  cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv
  cgroup: Use open-time credentials for process migraton perm checks
  mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning
  arm64: module: remove (NOLOAD) from linker script
  mm: don't skip swap entry even if zap_details specified
  dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error"
  tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts
  perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator
  arm64: patch_text: Fixup last cpu should be master
  btrfs: fix qgroup reserve overflow the qgroup limit
  x86/speculation: Restore speculation related MSRs during S3 resume
  x86/pm: Save the MSR validity status at context setup
  mm/mempolicy: fix mpol_new leak in shared_policy_replace
  mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0)
  Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning"
  drbd: Fix five use after free bugs in get_initial_state
  drm/imx: Fix memory leak in imx_pd_connector_get_modes
  net: stmmac: Fix unset max_speed difference between DT and non-DT platforms
  scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one()
  drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire()
  mm: fix race between MADV_FREE reclaim and blkdev direct IO read
  net: add missing SOF_TIMESTAMPING_OPT_ID support
  ipv6: add missing tx timestamping on IPPROTO_RAW
  parisc: Fix CPU affinity for Lasi, WAX and Dino chips
  jfs: prevent NULL deref in diFree
  virtio_console: eliminate anonymous module_init & module_exit
  serial: samsung_tty: do not unlock port->lock for uart_write_wakeup()
  NFS: swap-out must always use STABLE writes.
  NFS: swap IO handling is slightly different for O_DIRECT IO
  SUNRPC/call_alloc: async tasks mustn't block waiting for memory
  w1: w1_therm: fixes w1_seq for ds28ea00 sensors
  init/main.c: return 1 from handled __setup() functions
  Bluetooth: Fix use after free in hci_send_acl
  xtensa: fix DTC warning unit_address_format
  usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm
  scsi: libfc: Fix use after free in fc_exch_abts_resp()
  MIPS: fix fortify panic when copying asm exception handlers
  bnxt_en: Eliminate unintended link toggle during FW reset
  macvtap: advertise link netns via netlink
  net/smc: correct settings of RMB window update limit
  scsi: aha152x: Fix aha152x_setup() __setup handler return value
  scsi: pm8001: Fix pm8001_mpi_task_abort_resp()
  dm ioctl: prevent potential spectre v1 gadget
  iommu/arm-smmu-v3: fix event handling soft lockup
  PCI: aardvark: Fix support for MSI interrupts
  powerpc: Set crashkernel offset to mid of RMA region
  power: supply: axp20x_battery: properly report current when discharging
  scsi: bfa: Replace snprintf() with sysfs_emit()
  scsi: mvsas: Replace snprintf() with sysfs_emit()
  powerpc: dts: t104xrdb: fix phy type for FMAN 4/5
  ptp: replace snprintf with sysfs_emit
  ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111
  KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
  ARM: 9187/1: JIVE: fix return value of __setup handler
  rtc: wm8350: Handle error for wm8350_register_irq
  ubifs: Rectify space amount budget for mkdir/tmpfile operations
  KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated
  openvswitch: Fixed nd target mask field in the flow dump.
  ARM: dts: spear13xx: Update SPI dma properties
  ARM: dts: spear1340: Update serial node properties
  ASoC: topology: Allow TLV control to be either read or write
  ubi: fastmap: Return error code if memory allocation fails in add_aeb()
  mm/memcontrol: return 1 from cgroup.memory __setup() handler
  mm/mmap: return 1 from stack_guard_gap __setup() handler
  ACPI: CPPC: Avoid out of bounds access when parsing _CPC data
  ubi: Fix race condition between ctrl_cdev_ioctl and ubi_cdev_ioctl
  pinctrl: pinconf-generic: Print arguments for bias-pull-*
  gfs2: Make sure FITRIM minlen is rounded up to fs block size
  can: mcba_usb: properly check endpoint type
  can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path
  ubifs: rename_whiteout: correct old_dir size computing
  ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
  ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
  ubifs: rename_whiteout: Fix double free for whiteout_ui->data
  KVM: Prevent module exit until all VMs are freed
  scsi: qla2xxx: Suppress a kernel complaint in qla_create_qpair()
  scsi: qla2xxx: Fix warning for missing error code
  powerpc/lib/sstep: Fix build errors with newer binutils
  powerpc/lib/sstep: Fix 'sthcx' instruction
  mmc: host: Return an error when ->enable_sdio_irq() ops is missing
  media: hdpvr: initialize dev->worker at hdpvr_register_videodev
  video: fbdev: sm712fb: Fix crash in smtcfb_write()
  ARM: mmp: Fix failure to remove sram device
  ARM: tegra: tamonten: Fix I2C3 pad setting
  media: cx88-mpeg: clear interrupt status register before streaming video
  ASoC: soc-core: skip zero num_dai component in searching dai name
  video: fbdev: omapfb: panel-tpo-td043mtea1: Use sysfs_emit() instead of snprintf()
  video: fbdev: omapfb: panel-dsi-cm: Use sysfs_emit() instead of snprintf()
  ARM: dts: bcm2837: Add the missing L1/L2 cache information
  ARM: dts: qcom: fix gic_irq_domain_translate warnings for msm8960
  video: fbdev: omapfb: acx565akm: replace snprintf with sysfs_emit
  video: fbdev: cirrusfb: check pixclock to avoid divide by zero
  video: fbdev: w100fb: Reset global state
  video: fbdev: nvidiafb: Use strscpy() to prevent buffer overflow
  ntfs: add sanity check on allocation size
  ext4: don't BUG if someone dirty pages without asking ext4 first
  spi: tegra20: Use of_device_get_match_data()
  PM: core: keep irq flags in device_pm_check_callbacks()
  ACPI/APEI: Limit printable size of BERT table data
  ACPICA: Avoid walking the ACPI Namespace if it is not there
  irqchip/nvic: Release nvic_base upon failure
  Fix incorrect type in assignment of ipv6 port for audit
  loop: use sysfs_emit() in the sysfs xxx show()
  selinux: use correct type for context length
  lib/test: use after free in register_test_dev_kmod()
  NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
  net/x25: Fix null-ptr-deref caused by x25_disconnect
  qlcnic: dcb: default to returning -EOPNOTSUPP
  net: phy: broadcom: Fix brcm_fet_config_init()
  xen: fix is_xen_pmu()
  netfilter: nf_conntrack_tcp: preserve liberal flag in tcp options
  jfs: fix divide error in dbNextAG
  kgdbts: fix return value of __setup handler
  kgdboc: fix return value of __setup handler
  tty: hvc: fix return value of __setup handler
  pinctrl/rockchip: Add missing of_node_put() in rockchip_pinctrl_probe
  pinctrl: nomadik: Add missing of_node_put() in nmk_pinctrl_probe
  pinctrl: mediatek: Fix missing of_node_put() in mtk_pctrl_init
  NFS: remove unneeded check in decode_devicenotify_args()
  clk: tegra: tegra124-emc: Fix missing put_device() call in emc_ensure_emc_driver
  clk: clps711x: Terminate clk_div_table with sentinel element
  clk: loongson1: Terminate clk_div_table with sentinel element
  remoteproc: qcom_wcnss: Add missing of_node_put() in wcnss_alloc_memory_region
  clk: qcom: clk-rcg2: Update the frac table for pixel clock
  iio: adc: Add check for devm_request_threaded_irq
  serial: 8250: Fix race condition in RTS-after-send handling
  serial: 8250_mid: Balance reference count for PCI DMA device
  staging:iio:adc:ad7280a: Fix handing of device address bit reversing.
  pwm: lpc18xx-sct: Initialize driver data and hardware before pwmchip_add()
  mxser: fix xmit_buf leak in activate when LSR == 0xff
  mfd: asic3: Add missing iounmap() on error asic3_mfd_probe
  tcp: ensure PMTU updates are processed during fastopen
  i2c: mux: demux-pinctrl: do not deactivate a master that is not active
  af_netlink: Fix shift out of bounds in group mask calculation
  USB: storage: ums-realtek: fix error code in rts51x_read_mem()
  mtd: rawnand: atmel: fix refcount issue in atmel_nand_controller_init
  MIPS: RB532: fix return value of __setup handler
  vxcan: enable local echo for sent CAN frames
  mfd: mc13xxx: Add check for mc13xxx_irq_request
  powerpc/sysdev: fix incorrect use to determine if list is empty
  PCI: Reduce warnings on possible RW1C corruption
  power: supply: wm8350-power: Add missing free in free_charger_irq
  power: supply: wm8350-power: Handle error for wm8350_register_irq
  i2c: xiic: Make bus names unique
  KVM: x86/emulator: Defer not-present segment check in __load_segment_descriptor()
  KVM: x86: Fix emulation in writing cr8
  power: supply: bq24190_charger: Fix bq24190_vbus_is_enabled() wrong false return
  drm/tegra: Fix reference leak in tegra_dsi_ganged_probe
  ext2: correct max file size computing
  TOMOYO: fix __setup handlers return values
  scsi: pm8001: Fix abort all task initialization
  scsi: pm8001: Fix payload initialization in pm80xx_set_thermal_config()
  scsi: pm8001: Fix command initialization in pm8001_chip_ssp_tm_req()
  scsi: pm8001: Fix command initialization in pm80XX_send_read_log()
  dm crypt: fix get_key_size compiler warning if !CONFIG_KEYS
  iwlwifi: Fix -EIO error code that is never returned
  HID: i2c-hid: fix GET/SET_REPORT for unnumbered reports
  power: supply: ab8500: Fix memory leak in ab8500_fg_sysfs_init
  ray_cs: Check ioremap return value
  power: reset: gemini-poweroff: Fix IRQ check in gemini_poweroff_probe
  ath9k_htc: fix uninit value bugs
  drm/edid: Don't clear formats if using deep color
  mtd: onenand: Check for error irq
  ASoC: msm8916-wcd-digital: Fix missing clk_disable_unprepare() in msm8916_wcd_digital_probe
  ASoC: imx-es8328: Fix error return code in imx_es8328_probe()
  ASoC: mxs: Fix error handling in mxs_sgtl5000_probe
  ASoC: dmaengine: do not use a NULL prepare_slave_config() callback
  video: fbdev: omapfb: Add missing of_node_put() in dvic_probe_of
  ASoC: fsi: Add check for clk_enable
  ASoC: wm8350: Handle error for wm8350_register_irq
  ASoC: atmel: Add missing of_node_put() in at91sam9g20ek_audio_probe
  media: stk1160: If start stream fails, return buffers with VB2_BUF_STATE_QUEUED
  ALSA: firewire-lib: fix uninitialized flag for AV/C deferred transaction
  memory: emif: check the pointer temp in get_device_details()
  memory: emif: Add check for setup_interrupts
  ASoC: atmel_ssc_dai: Handle errors for clk_enable
  ASoC: mxs-saif: Handle errors for clk_enable
  printk: fix return value of printk.devkmsg __setup handler
  arm64: dts: broadcom: Fix sata nodename
  arm64: dts: ns2: Fix spi-cpol and spi-cpha property
  ALSA: spi: Add check for clk_enable()
  ASoC: ti: davinci-i2s: Add check for clk_enable()
  media: usb: go7007: s2250-board: fix leak in probe()
  soc: ti: wkup_m3_ipc: Fix IRQ check in wkup_m3_ipc_probe
  ARM: dts: qcom: ipq4019: fix sleep clock
  video: fbdev: fbcvt.c: fix printing in fb_cvt_print_name()
  video: fbdev: smscufx: Fix null-ptr-deref in ufx_usb_probe()
  media: coda: Fix missing put_device() call in coda_get_vdoa_data
  perf/x86/intel/pt: Fix address filter config for 32-bit kernel
  perf/core: Fix address filter parser for multiple filters
  sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa
  clocksource: acpi_pm: fix return value of __setup handler
  hwmon: (pmbus) Add Vin unit off handling
  crypto: ccp - ccp_dmaengine_unregister release dma channels
  ACPI: APEI: fix return value of __setup handlers
  crypto: vmx - add missing dependencies
  hwrng: atmel - disable trng on failure path
  PM: suspend: fix return value of __setup handler
  PM: hibernate: fix __setup handler error handling
  hwmon: (sch56xx-common) Replace WDOG_ACTIVE with WDOG_HW_RUNNING
  hwmon: (pmbus) Add mutex to regulator ops
  spi: pxa2xx-pci: Balance reference count for PCI DMA device
  selftests/x86: Add validity check and allow field splitting
  spi: tegra114: Add missing IRQ check in tegra_spi_probe
  crypto: mxs-dcp - Fix scatterlist processing
  crypto: authenc - Fix sleep in atomic context in decrypt_tail
  PCI: pciehp: Clear cmd_busy bit in polling mode
  brcmfmac: pcie: Replace brcmf_pcie_copy_mem_todev with memcpy_toio
  brcmfmac: firmware: Allocate space for default boardrev in nvram
  media: davinci: vpif: fix unbalanced runtime PM get
  DEC: Limit PMAX memory probing to R3k systems
  lib/raid6/test: fix multiple definition linking error
  thermal: int340x: Increase bitmap size
  carl9170: fix missing bit-wise or operator for tx_params
  ARM: dts: exynos: add missing HDMI supplies on SMDK5420
  ARM: dts: exynos: add missing HDMI supplies on SMDK5250
  ARM: dts: exynos: fix UART3 pins configuration in Exynos5250
  ARM: dts: at91: sama5d2: Fix PMERRLOC resource size
  video: fbdev: atari: Atari 2 bpp (STe) palette bugfix
  video: fbdev: sm712fb: Fix crash in smtcfb_read()
  drivers: hamradio: 6pack: fix UAF bug caused by mod_timer()
  ACPI: properties: Consistently return -ENOENT if there are no more references
  drbd: fix potential silent data corruption
  ALSA: cs4236: fix an incorrect NULL check on list iterator
  Revert "Input: clear BTN_RIGHT/MIDDLE on buttonpads"
  qed: validate and restrict untrusted VFs vlan promisc mode
  qed: display VF trust config
  scsi: libsas: Fix sas_ata_qc_issue() handling of NCQ NON DATA commands
  mempolicy: mbind_range() set_policy() after vma_merge()
  mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
  jffs2: fix memory leak in jffs2_scan_medium
  jffs2: fix memory leak in jffs2_do_mount_fs
  jffs2: fix use-after-free in jffs2_clear_xattr_subsystem
  can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path
  pinctrl: samsung: drop pin banks references on error paths
  NFSD: prevent underflow in nfssvc_decode_writeargs()
  SUNRPC: avoid race between mod_timer() and del_timer_sync()
  Documentation: update stable tree link
  Documentation: add link to stable release candidate tree
  ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
  clk: uniphier: Fix fixed-rate initialization
  iio: inkern: make a best effort on offset calculation
  iio: inkern: apply consumer scale when no channel scale is available
  iio: inkern: apply consumer scale on IIO_VAL_INT cases
  coresight: Fix TRCCONFIGR.QE sysfs interface
  USB: usb-storage: Fix use of bitfields for hardware data in ene_ub6250.c
  virtio-blk: Use blk_validate_block_size() to validate block size
  block: Add a helper to validate the block size
  tpm: fix reference counting for struct tpm_chip
  fuse: fix pipe buffer lifetime for direct_io
  af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
  spi: Fix erroneous sgs value with min_t()
  spi: Fix invalid sgs value
  ethernet: sun: Free the coherent when failing in probing
  virtio_console: break out of buf poll on remove
  netdevice: add the case if dev is NULL
  USB: serial: simple: add Nokia phone driver
  USB: serial: pl2303: add IBM device IDs
  ANDROID: incremental-fs: limit mount stack depth
  UPSTREAM: binderfs: use __u32 for device numbers
  Linux 4.14.275
  arm64: Use the clearbhb instruction in mitigations
  arm64: add ID_AA64ISAR2_EL1 sys register
  KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
  arm64: Mitigate spectre style branch history side channels
  KVM: arm64: Add templates for BHB mitigation sequences
  arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
  arm64: Add percpu vectors for EL1
  arm64: entry: Add macro for reading symbol addresses from the trampoline
  arm64: entry: Add vectors that have the bhb mitigation sequences
  arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
  arm64: entry: Allow the trampoline text to occupy multiple pages
  arm64: entry: Make the kpti trampoline's kpti sequence optional
  arm64: entry: Move trampoline macros out of ifdef'd section
  arm64: entry: Don't assume tramp_vectors is the start of the vectors
  arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
  arm64: entry: Move the trampoline data page before the text page
  arm64: entry: Free up another register on kpti's tramp_exit path
  arm64: entry: Make the trampoline cleanup optional
  arm64: entry.S: Add ventry overflow sanity checks
  arm64: Add Cortex-X2 CPU part definition
  arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
  arm64: Add part number for Arm Cortex-A77
  arm64: Add part number for Neoverse N1
  arm64: Make ARM64_ERRATUM_1188873 depend on COMPAT
  arm64: Add silicon-errata.txt entry for ARM erratum 1188873
  arm64: arch_timer: avoid unused function warning
  arm64: arch_timer: Add workaround for ARM erratum 1188873
  Linux 4.14.274
  llc: only change llc->dev when bind() succeeds
  mac80211: fix potential double free on mesh join
  crypto: qat - disable registration of algorithms
  ACPI: video: Force backlight native for Clevo NL5xRU and NL5xNU
  ACPI: battery: Add device HID and quirk for Microsoft Surface Go 3
  ACPI / x86: Work around broken XSDT on Advantech DAC-BJ01 board
  netfilter: nf_tables: initialize registers in nft_do_chain()
  drivers: net: xgene: Fix regression in CRC stripping
  ALSA: pci: fix reading of swapped values from pcmreg in AC97 codec
  ALSA: cmipci: Restore aux vol on suspend/resume
  ALSA: usb-audio: Add mute TLV for playback volumes on RODE NT-USB
  ALSA: pcm: Add stream lock during PCM reset ioctl operations
  llc: fix netdevice reference leaks in llc_ui_bind()
  thermal: int340x: fix memory leak in int3400_notify()
  staging: fbtft: fb_st7789v: reset display before initialization
  esp: Fix possible buffer overflow in ESP transformation
  net: ipv6: fix skb_over_panic in __ip6_append_data
  nfc: st21nfca: Fix potential buffer overflows in EVT_TRANSACTION
  Linux 4.14.273
  perf symbols: Fix symbol size calculation condition
  Input: aiptek - properly check endpoint type
  usb: gadget: Fix use-after-free bug by not setting udc->dev.driver
  usb: gadget: rndis: prevent integer overflow in rndis_set_response()
  net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit()
  atm: eni: Add check for dma_map_single
  net/packet: fix slab-out-of-bounds access in packet_recvmsg()
  efi: fix return value of __setup handlers
  fs: sysfs_emit: Remove PAGE_SIZE alignment check
  kselftest/vm: fix tests build with old libc
  sfc: extend the locking on mcdi->seqno
  tcp: make tcp_read_sock() more robust
  nl80211: Update bss channel on channel switch for P2P_CLIENT
  atm: firestream: check the return value of ioremap() in fs_init()
  can: rcar_canfd: rcar_canfd_channel_probe(): register the CAN device when fully ready
  ARM: 9178/1: fix unmet dependency on BITREVERSE for HAVE_ARCH_BITREVERSE
  MIPS: smp: fill in sibling and core maps earlier
  ARM: dts: rockchip: fix a typo on rk3288 crypto-controller
  arm64: dts: rockchip: fix rk3399-puma eMMC HS400 signal integrity
  xfrm: Fix xfrm migrate issues when address family changes
  sctp: fix the processing for INIT_ACK chunk
  sctp: fix the processing for INIT chunk
  Linux 4.14.272
  btrfs: unlock newly allocated extent buffer after error
  ext4: add check to prevent attempting to resize an fs with sparse_super2
  ARM: fix Thumb2 regression with Spectre BHB
  virtio: acknowledge all features before access
  virtio: unexport virtio_finalize_features
  staging: gdm724x: fix use after free in gdm_lte_rx()
  ARM: Spectre-BHB: provide empty stub for non-config
  selftests/memfd: clean up mapping in mfd_fail_write
  tracing: Ensure trace buffer is at least 4096 bytes large
  Revert "xen-netback: Check for hotplug-status existence before watching"
  Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
  net-sysfs: add check for netdevice being present to speed_show
  sctp: fix kernel-infoleak for SCTP sockets
  gpio: ts4900: Do not set DAT and OE together
  NFC: port100: fix use-after-free in port100_send_complete
  net/mlx5: Fix size field in bufferx_reg struct
  ax25: Fix NULL pointer dereference in ax25_kill_by_device
  net: ethernet: lpc_eth: Handle error for clk_enable
  net: ethernet: ti: cpts: Handle error for clk_enable
  ethernet: Fix error handling in xemaclite_of_probe
  qed: return status of qed_iov_get_link
  net: qlogic: check the return value of dma_alloc_coherent() in qed_vf_hw_prepare()
2022-05-03 12:46:23 +07:00
Al Viro
9f3c1a55bd
BACKPORT: [PATCH] reduce boilerplate in fsid handling
Get rid of boilerplate in most of ->statfs()
instances...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[cyberknight777: backport to 4.14]
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: azrim <mirzaspc@gmail.com>
2022-04-26 15:07:01 +07:00
Zhang Yi
94fe9ee8d6 ext2: correct max file size computing
[ Upstream commit 50b3a818991074177a56c87124c7a7bdf5fa4f67 ]

We need to calculate the max file size accurately if the total blocks
that can address by block tree exceed the upper_limit. But this check is
not correct now, it only compute the total data blocks but missing
metadata blocks are needed. So in the case of "data blocks < upper_limit
&& total blocks > upper_limit", we will get wrong result. Fortunately,
this case could not happen in reality, but it's confused and better to
correct the computing.

  bits   data blocks   metadatablocks   upper_limit
  10        16843020            66051    2147483647
  11       134480396           263171    1073741823
  12      1074791436          1050627     536870911 (*)
  13      8594130956          4198403     268435455 (*)
  14     68736258060         16785411     134217727 (*)
  15    549822930956         67125251      67108863 (*)
  16   4398314962956        268468227      33554431 (*)

  [*] Need to calculate in depth.

Fixes: 1c2d14212b15 ("ext2: Fix underflow in ext2_max_size()")
Link: https://lore.kernel.org/r/20220212050532.179055-1-yi.zhang@huawei.com
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20 09:08:17 +02:00
Dan Carpenter
c92eb06603 ext2: fix sleeping in atomic bugs on error
[ Upstream commit 372d1f3e1bfede719864d0d1fbf3146b1e638c88 ]

The ext2_error() function syncs the filesystem so it sleeps.  The caller
is holding a spinlock so it's not allowed to sleep.

   ext2_statfs() <- disables preempt
   -> ext2_count_free_blocks()
      -> ext2_get_group_desc()

Fix this by using WARN() to print an error message and a stack trace
instead of using ext2_error().

Link: https://lore.kernel.org/r/20210921203233.GA16529@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-09 14:09:46 +02:00
Mikulas Patocka
56525e594a ext2: fix missing percpu_counter_inc
commit bc2fbaa4d3808aef82dd1064a8e61c16549fe956 upstream.

sbi->s_freeinodes_counter is only decreased by the ext2 code, it is never
increased. This patch fixes it.

Note that sbi->s_freeinodes_counter is only used in the algorithm that
tries to find the group for new allocations, so this bug is not easily
visible (the only visibility is that the group finding algorithm selects
inoptinal result).

Link: https://lore.kernel.org/r/alpine.LRH.2.02.2004201538300.19436@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21 09:48:18 +02:00
Jan Kara
b2c5f60ccc ext2: fix debug reference to ext2_xattr_cache
[ Upstream commit 32302085a8d90859c40cf1a5e8313f575d06ec75 ]

Fix a debug-only build error in ext2/xattr.c:

When building without extra debugging, (and with another patch that uses
no_printk() instead of <empty> for the ext2-xattr debug-print macros,
this build error happens:

../fs/ext2/xattr.c: In function ‘ext2_xattr_cache_insert’:
../fs/ext2/xattr.c:869:18: error: ‘ext2_xattr_cache’ undeclared (first use in
this function); did you mean ‘ext2_xattr_list’?
     atomic_read(&ext2_xattr_cache->c_entry_count));

Fix the problem by removing cached entry count from the debug message
since otherwise we'd have to export the mbcache structure just for that.

Fixes: be0726d33cb8 ("ext2: convert to mbcache2")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-24 08:01:22 +02:00
Randy Dunlap
7ca6f11ed1 ext2: fix empty body warnings when -Wextra is used
[ Upstream commit 44a52022e7f15cbaab957df1c14f7a4f527ef7cf ]

When EXT2_ATTR_DEBUG is not defined, modify the 2 debug macros
to use the no_printk() macro instead of <nothing>.
This fixes gcc warnings when -Wextra is used:

../fs/ext2/xattr.c:252:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
../fs/ext2/xattr.c:258:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
../fs/ext2/xattr.c:330:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
../fs/ext2/xattr.c:872:45: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body]

I have verified that the only object code change (with gcc 7.5.0) is
the reversal of some instructions from 'cmp a,b' to 'cmp b,a'.

Link: https://lore.kernel.org/r/e18a7395-61fb-2093-18e8-ed4f8cf56248@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-24 08:01:21 +02:00
Nathan Chancellor
00ab265a94 ext2: Adjust indentation in ext2_fill_super
commit d9e9866803f7b6c3fdd35d345e97fb0b2908bbbc upstream.

Clang warns:

../fs/ext2/super.c:1076:3: warning: misleading indentation; statement is
not part of the previous 'if' [-Wmisleading-indentation]
        sbi->s_groups_count = ((le32_to_cpu(es->s_blocks_count) -
        ^
../fs/ext2/super.c:1074:2: note: previous statement is here
        if (EXT2_BLOCKS_PER_GROUP(sb) == 0)
        ^
1 warning generated.

This warning occurs because there is a space before the tab on this
line. Remove it so that the indentation is consistent with the Linux
kernel coding style and clang no longer warns.

Fixes: 41f04d852e35 ("[PATCH] ext2: fix mounts at 16T")
Link: https://github.com/ClangBuiltLinux/linux/issues/827
Link: https://lore.kernel.org/r/20191218031930.31393-1-natechancellor@gmail.com
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-14 16:32:17 -05:00
Chengguang Xu
f4845e598a ext2: check err when partial != NULL
commit e705f4b8aa27a59f8933e8f384e9752f052c469c upstream.

Check err when partial == NULL is meaningless because
partial == NULL means getting branch successfully without
error.

CC: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191105045100.7104-1-cgxu519@mykernel.net
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 20:39:43 +01:00
Jan Kara
7412f34ff8 ext2: Fix underflow in ext2_max_size()
commit 1c2d14212b15a60300a2d4f6364753e87394c521 upstream.

When ext2 filesystem is created with 64k block size, ext2_max_size()
will return value less than 0. Also, we cannot write any file in this fs
since the sb->maxbytes is less than 0. The core of the problem is that
the size of block index tree for such large block size is more than
i_blocks can carry. So fix the computation to count with this
possibility.

File size limits computed with the new function for the full range of
possible block sizes look like:

bits file_size
10     17247252480
11    275415851008
12   2196873666560
13   2197948973056
14   2198486220800
15   2198754754560
16   2198888906752

CC: stable@vger.kernel.org
Reported-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23 14:35:23 +01:00
Pan Bian
086d1f60f8 ext2: fix potential use after free
commit ecebf55d27a11538ea84aee0be643dd953f830d5 upstream.

The function ext2_xattr_set calls brelse(bh) to drop the reference count
of bh. After that, bh may be freed. However, following brelse(bh),
it reads bh->b_data via macro HDR(bh). This may result in a
use-after-free bug. This patch moves brelse(bh) after reading field.

CC: stable@vger.kernel.org
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-05 19:41:25 +01:00
Dave Jiang
8214347c26 dax: change bdev_dax_supported() to support boolean returns
commit 80660f20252d6f76c9f203874ad7c7a4a8508cf8 upstream.

The function return values are confusing with the way the function is
named. We expect a true or false return value but it actually returns
0/-errno.  This makes the code very confusing. Changing the return values
to return a bool where if DAX is supported then return true and no DAX
support returns false.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11 16:29:22 +02:00
Darrick J. Wong
a19385766b fs: allow per-device dax status checking for filesystems
commit ba23cba9b3bdc967aabdc6ff1e3e9b11ce05bb4f upstream.

Change bdev_dax_supported so it takes a bdev parameter.  This enables
multi-device filesystems like xfs to check that a dax device can work for
the particular filesystem.  Once that's in place, actually fix all the
parts of XFS where we need to be able to distinguish between datadev and
rtdev.

This patch fixes the problem where we screw up the dax support checking
in xfs if the datadev and rtdev have different dax capabilities.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[rez: Re-added __bdev_dax_supported() for !CONFIG_FS_DAX cases]
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11 16:29:22 +02:00
Al Viro
f440ea85d4 do d_instantiate/unlock_new_inode combinations safely
commit 1e2e547a93a00ebc21582c06ca3c6cfea2a309ee upstream.

For anything NFS-exported we do _not_ want to unlock new inode
before it has grown an alias; original set of fixes got the
ordering right, but missed the nasty complication in case of
lockdep being enabled - unlock_new_inode() does
	lockdep_annotate_inode_mutex_key(inode)
which can only be done before anyone gets a chance to touch
->i_mutex.  Unfortunately, flipping the order and doing
unlock_new_inode() before d_instantiate() opens a window when
mkdir can race with open-by-fhandle on a guessed fhandle, leading
to multiple aliases for a directory inode and all the breakage
that follows from that.

	Correct solution: a new primitive (d_instantiate_new())
combining these two in the right order - lockdep annotate, then
d_instantiate(), then the rest of unlock_new_inode().  All
combinations of d_instantiate() with unlock_new_inode() should
be converted to that.

Cc: stable@kernel.org	# 2.6.29 and later
Tested-by: Mike Marshall <hubcap@omnibond.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:51:47 +02:00
Al Viro
131ff238b8 ext2: fix a block leak
commit 5aa1437d2d9a068c0334bd7c9dafa8ec4f97f13b upstream.

open file, unlink it, then use ioctl(2) to make it immutable or
append only.  Now close it and watch the blocks *not* freed...

Immutable/append-only checks belong in ->setattr().
Note: the bug is old and backport to anything prior to 737f2e93b972
("ext2: convert to use the new truncate convention") will need
these checks lifted into ext2_setattr().

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25 16:17:31 +02:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Linus Torvalds
0f0d12728e Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount flag updates from Al Viro:
 "Another chunk of fmount preparations from dhowells; only trivial
  conflicts for that part. It separates MS_... bits (very grotty
  mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
  only a small subset of MS_... stuff).

  This does *not* convert the filesystems to new constants; only the
  infrastructure is done here. The next step in that series is where the
  conflicts would be; that's the conversion of filesystems. It's purely
  mechanical and it's better done after the merge, so if you could run
  something like

	list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

	sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \
	        -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \
	        -e 's/\<MS_NODEV\>/SB_NODEV/g' \
	        -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \
	        -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \
	        -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \
	        -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \
	        -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \
	        -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \
	        -e 's/\<MS_SILENT\>/SB_SILENT/g' \
	        -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \
	        -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \
	        -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \
	        -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \
	        $list

  and commit it with something along the lines of 'convert filesystems
  away from use of MS_... constants' as commit message, it would save a
  quite a bit of headache next cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Differentiate mount flags (MS_*) from internal superblock flags
  VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
  vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-14 18:54:01 -07:00
Linus Torvalds
89fd915c40 libnvdimm for 4.14
* Media error handling support in the Block Translation Table (BTT)
   driver is reworked to address sleeping-while-atomic locking and
   memory-allocation-context conflicts.
 
 * The dax_device lookup overhead for xfs and ext4 is moved out of the
   iomap hot-path to a mount-time lookup.
 
 * A new 'ecc_unit_size' sysfs attribute is added to advertise the
   read-modify-write boundary property of a persistent memory range.
 
 * Preparatory fix-ups for arm and powerpc pmem support are included
   along with other miscellaneous fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZtsAGAAoJEB7SkWpmfYgCrzMP/2vPvZvrFjZn5pAoZjlmTmHM
 ySceoOC7vwvVXIsSs52FhSjcxEoXo9cklXPwhXOPVtVUFdSDJBUOIUxwIziE6Y+5
 sFJ2xT9K+5zKBUiXJwqFQDg52dn//eBNnnnDz+HQrBSzGrbWQhIZY2m19omPzv1I
 BeN0OCGOdW3cjSo3BCFl1d+KrSl704e7paeKq/TO3GIiAilIXleTVxcefEEodV2K
 ZvWHpFIhHeyN8dsF8teI952KcCT92CT/IaabxQIwCxX0/8/GFeDc5aqf77qiYWKi
 uxCeQXdgnaE8EZNWZWGWIWul6eYEkoCNbLeUQ7eJnECq61VxVajJS0NyGa5T9OiM
 P046Bo2b1b3R0IHxVIyVG0ZCm3YUMAHSn/3uRxPgESJ4bS/VQ3YP5M6MLxDOlc90
 IisLilagitkK6h8/fVuVrwciRNQ71XEC34t6k7GCl/1ZnLlLT+i4/jc5NRZnGEZh
 aXAAGHdteQ+/mSz6p2UISFUekbd6LerwzKRw8ibDvH6pTud8orYR7g2+JoGhgb6Y
 pyFVE8DhIcqNKAMxBsjiRZ46OQ7qrT+AemdAG3aVv6FaNoe4o5jPLdw2cEtLqtpk
 +DNm0/lSWxxxozjrvu6EUZj6hk8R5E19XpRzV5QJkcKUXMu7oSrFLdMcC4FeIjl9
 K4hXLV3fVBVRMiS0RA6z
 =5iGY
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm from Dan Williams:
 "A rework of media error handling in the BTT driver and other updates.
  It has appeared in a few -next releases and collected some late-
  breaking build-error and warning fixups as a result.

  Summary:

   - Media error handling support in the Block Translation Table (BTT)
     driver is reworked to address sleeping-while-atomic locking and
     memory-allocation-context conflicts.

   - The dax_device lookup overhead for xfs and ext4 is moved out of the
     iomap hot-path to a mount-time lookup.

   - A new 'ecc_unit_size' sysfs attribute is added to advertise the
     read-modify-write boundary property of a persistent memory range.

   - Preparatory fix-ups for arm and powerpc pmem support are included
     along with other miscellaneous fixes"

* tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits)
  libnvdimm, btt: fix format string warnings
  libnvdimm, btt: clean up warning and error messages
  ext4: fix null pointer dereference on sbi
  libnvdimm, nfit: move the check on nd_reserved2 to the endpoint
  dax: fix FS_DAX=n BLOCK=y compilation
  libnvdimm: fix integer overflow static analysis warning
  libnvdimm, nd_blk: remove mmio_flush_range()
  libnvdimm, btt: rework error clearing
  libnvdimm: fix potential deadlock while clearing errors
  libnvdimm, btt: cache sector_size in arena_info
  libnvdimm, btt: ensure that flags were also unchanged during a map_read
  libnvdimm, btt: refactor map entry operations with macros
  libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
  libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute
  ext4: perform dax_device lookup at mount
  ext2: perform dax_device lookup at mount
  xfs: perform dax_device lookup at mount
  dax: introduce a fs_dax_get_by_bdev() helper
  libnvdimm, btt: check memory allocation failure
  libnvdimm, label: fix index block size calculation
  ...
2017-09-11 13:10:57 -07:00
Ross Zwisler
91d25ba8a6 dax: use common 4k zero page for dax mmap reads
When servicing mmap() reads from file holes the current DAX code
allocates a page cache page of all zeroes and places the struct page
pointer in the mapping->page_tree radix tree.

This has three major drawbacks:

1) It consumes memory unnecessarily. For every 4k page that is read via
   a DAX mmap() over a hole, we allocate a new page cache page. This
   means that if you read 1GiB worth of pages, you end up using 1GiB of
   zeroed memory. This is easily visible by looking at the overall
   memory consumption of the system or by looking at /proc/[pid]/smaps:

	7f62e72b3000-7f63272b3000 rw-s 00000000 103:00 12   /root/dax/data
	Size:            1048576 kB
	Rss:             1048576 kB
	Pss:             1048576 kB
	Shared_Clean:          0 kB
	Shared_Dirty:          0 kB
	Private_Clean:   1048576 kB
	Private_Dirty:         0 kB
	Referenced:      1048576 kB
	Anonymous:             0 kB
	LazyFree:              0 kB
	AnonHugePages:         0 kB
	ShmemPmdMapped:        0 kB
	Shared_Hugetlb:        0 kB
	Private_Hugetlb:       0 kB
	Swap:                  0 kB
	SwapPss:               0 kB
	KernelPageSize:        4 kB
	MMUPageSize:           4 kB
	Locked:                0 kB

2) It is slower than using a common zero page because each page fault
   has more work to do. Instead of just inserting a common zero page we
   have to allocate a page cache page, zero it, and then insert it. Here
   are the average latencies of dax_load_hole() as measured by ftrace on
   a random test box:

    Old method, using zeroed page cache pages:	3.4 us
    New method, using the common 4k zero page:	0.8 us

   This was the average latency over 1 GiB of sequential reads done by
   this simple fio script:

     [global]
     size=1G
     filename=/root/dax/data
     fallocate=none
     [io]
     rw=read
     ioengine=mmap

3) The fact that we had to check for both DAX exceptional entries and
   for page cache pages in the radix tree made the DAX code more
   complex.

Solve these issues by following the lead of the DAX PMD code and using a
common 4k zero page instead.  As with the PMD code we will now insert a
DAX exceptional entry into the radix tree instead of a struct page
pointer which allows us to remove all the special casing in the DAX
code.

Note that we do still pretty aggressively check for regular pages in the
DAX radix tree, especially where we take action based on the bits set in
the page.  If we ever find a regular page in our radix tree now that
most likely means that someone besides DAX is inserting pages (which has
happened lots of times in the past), and we want to find that out early
and fail loudly.

This solution also removes the extra memory consumption.  Here is that
same /proc/[pid]/smaps after 1GiB of reading from a hole with the new
code:

	7f2054a74000-7f2094a74000 rw-s 00000000 103:00 12   /root/dax/data
	Size:            1048576 kB
	Rss:                   0 kB
	Pss:                   0 kB
	Shared_Clean:          0 kB
	Shared_Dirty:          0 kB
	Private_Clean:         0 kB
	Private_Dirty:         0 kB
	Referenced:            0 kB
	Anonymous:             0 kB
	LazyFree:              0 kB
	AnonHugePages:         0 kB
	ShmemPmdMapped:        0 kB
	Shared_Hugetlb:        0 kB
	Private_Hugetlb:       0 kB
	Swap:                  0 kB
	SwapPss:               0 kB
	KernelPageSize:        4 kB
	MMUPageSize:           4 kB
	Locked:                0 kB

Overall system memory consumption is similarly improved.

Another major change is that we remove dax_pfn_mkwrite() from our fault
flow, and instead rely on the page fault itself to make the PTE dirty
and writeable.  The following description from the patch adding the
vm_insert_mixed_mkwrite() call explains this a little more:

   "To be able to use the common 4k zero page in DAX we need to have our
    PTE fault path look more like our PMD fault path where a PTE entry
    can be marked as dirty and writeable as it is first inserted rather
    than waiting for a follow-up dax_pfn_mkwrite() =>
    finish_mkwrite_fault() call.

    Right now we can rely on having a dax_pfn_mkwrite() call because we
    can distinguish between these two cases in do_wp_page():

            case 1: 4k zero page => writable DAX storage
            case 2: read-only DAX storage => writeable DAX storage

    This distinction is made by via vm_normal_page(). vm_normal_page()
    returns false for the common 4k zero page, though, just as it does
    for DAX ptes. Instead of special casing the DAX + 4k zero page case
    we will simplify our DAX PTE page fault sequence so that it matches
    our DAX PMD sequence, and get rid of the dax_pfn_mkwrite() helper.
    We will instead use dax_iomap_fault() to handle write-protection
    faults.

    This means that insert_pfn() needs to follow the lead of
    insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag. If
    'mkwrite' is set insert_pfn() will do the work that was previously
    done by wp_page_reuse() as part of the dax_pfn_mkwrite() call path"

Link: http://lkml.kernel.org/r/20170724170616.25810-4-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:24 -07:00
Dan Williams
8cf037a8b2 ext2: perform dax_device lookup at mount
The ->iomap_begin() operation is a hot path, so cache the
fs_dax_get_by_host() result at mount time to avoid the incurring the
hash lookup overhead on a per-i/o basis.

Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-08-31 09:33:30 -07:00
Ernesto A. Fernández
fe26569eb9 ext2: preserve i_mode if ext2_set_acl() fails
When changing a file's acl mask, ext2_set_acl() will first set the group
bits of i_mode to the value of the mask, and only then set the actual
extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the file
had no acl attribute to begin with, the system will from now on assume
that the mask permission bits are actual group permission bits, potentially
granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.

[JK: Rebased on top of "ext2: Don't clear SGID when inheriting ACLs"]
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-07-18 11:23:56 +02:00
Jan Kara
a992f2d38e ext2: Don't clear SGID when inheriting ACLs
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by creating __ext2_set_acl() function that does not call
posix_acl_update_mode() and use it when inheriting ACLs. That prevents
SGID bit clearing and the mode has been properly set by
posix_acl_create() anyway.

Fixes: 073931017b49d9458aa351605b43a7e34598caef
CC: stable@vger.kernel.org
CC: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2017-07-17 10:15:31 +02:00
David Howells
bc98a42c1f VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
Firstly by applying the following with coccinelle's spatch:

	@@ expression SB; @@
	-SB->s_flags & MS_RDONLY
	+sb_rdonly(SB)

to effect the conversion to sb_rdonly(sb), then by applying:

	@@ expression A, SB; @@
	(
	-(!sb_rdonly(SB)) && A
	+!sb_rdonly(SB) && A
	|
	-A != (sb_rdonly(SB))
	+A != sb_rdonly(SB)
	|
	-A == (sb_rdonly(SB))
	+A == sb_rdonly(SB)
	|
	-!(sb_rdonly(SB))
	+!sb_rdonly(SB)
	|
	-A && (sb_rdonly(SB))
	+A && sb_rdonly(SB)
	|
	-A || (sb_rdonly(SB))
	+A || sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) != A
	+sb_rdonly(SB) != A
	|
	-(sb_rdonly(SB)) == A
	+sb_rdonly(SB) == A
	|
	-(sb_rdonly(SB)) && A
	+sb_rdonly(SB) && A
	|
	-(sb_rdonly(SB)) || A
	+sb_rdonly(SB) || A
	)

	@@ expression A, B, SB; @@
	(
	-(sb_rdonly(SB)) ? 1 : 0
	+sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) ? A : B
	+sb_rdonly(SB) ? A : B
	)

to remove left over excess bracketage and finally by applying:

	@@ expression A, SB; @@
	(
	-(A & MS_RDONLY) != sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
	|
	-(A & MS_RDONLY) == sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
	)

to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-07-17 08:45:34 +01:00
Linus Torvalds
19c6e12c07 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, udf, reiserfs fixes from Jan Kara:
 "Several ext2, udf, and reiserfs fixes"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: Fix memory leak when truncate races ext2_get_blocks
  reiserfs: fix race in prealloc discard
  reiserfs: don't preallocate blocks for extended attributes
  udf: Convert udf_disk_stamp_to_time() to use mktime64()
  udf: Use time64_to_tm for timestamp conversion
  udf: Fix deadlock between writeback and udf_setsize()
  udf: Use i_size_read() in udf_adinicb_writepage()
  udf: Fix races with i_size changes during readpage
  udf: Remove unused UDF_DEFAULT_BLOCKSIZE
2017-07-13 13:53:54 -07:00
Ernesto A. Fernández
4d9bcaddac ext2: Fix memory leak when truncate races ext2_get_blocks
Buffer heads referencing indirect blocks may not be released if the file
is truncated at the right time. This happens because ext2_get_branch()
returns NULL when it finds the whole chain of indirect blocks already
set, and when truncate alters the chain this value of NULL is
treated as the address of the last head to be released. Handle this in the
same way as it's done after the got_it label.

Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-07-13 13:45:08 +02:00
Linus Torvalds
bc2c6421cb The first major feature for ext4 this merge window is the largedir
feature, which allows ext4 directories to support over 2 billion
 directory entries (assuming ~64 byte file names; in practice, users
 will run into practical performance limits first.)  This feature was
 originally written by the Lustre team, and credit goes to Artem
 Blagodarenko from Seagate for getting this feature upstream.
 
 The second major major feature allows ext4 to support extended
 attribute values up to 64k.  This feature was also originally from
 Lustre, and has been enhanced by Tahsin Erdogan from Google with a
 deduplication feature so that if multiple files have the same xattr
 value (for example, Windows ACL's stored by Samba), only one copy will
 be stored on disk for encoding and caching efficiency.
 
 We also have the usual set of bug fixes, cleanups, and optimizations.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAllhl5AACgkQ8vlZVpUN
 gaOiNQf+L23sT9KIQmFwQP38vkBVw67Eo7gBfevmk7oqQLiRppT5mmLzW8EWEDxR
 PVaDQXvSZi18wSCAAcCd1ZqeIZk0P6tst0ufnIT60tGlZdUlwSLyrqvV/30axR2g
 6kcnv90ZszrQNx5U8q8bMzNrs1KtyPHFCRzavFsBX11WezNSpWnH2in/uxO+t9Jy
 F2zlrLUrE2m9AVMH48Dh6LbeaB6pqgr4k3jq1jG4Iqb2h9xgU8OKhs8gL07YS+Qi
 5A7s8GIvYQSoZUO9DOOie2f1zhpO0KrhXchyZTJukVQH7TsmFxoSh0vhXnP1Bohu
 CNLV6dzetDT0VfmPr1WhVe7lhZeeVw==
 =FFkF
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "The first major feature for ext4 this merge window is the largedir
  feature, which allows ext4 directories to support over 2 billion
  directory entries (assuming ~64 byte file names; in practice, users
  will run into practical performance limits first.) This feature was
  originally written by the Lustre team, and credit goes to Artem
  Blagodarenko from Seagate for getting this feature upstream.

  The second major major feature allows ext4 to support extended
  attribute values up to 64k. This feature was also originally from
  Lustre, and has been enhanced by Tahsin Erdogan from Google with a
  deduplication feature so that if multiple files have the same xattr
  value (for example, Windows ACL's stored by Samba), only one copy will
  be stored on disk for encoding and caching efficiency.

  We also have the usual set of bug fixes, cleanups, and optimizations"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits)
  ext4: fix spelling mistake: "prellocated" -> "preallocated"
  ext4: fix __ext4_new_inode() journal credits calculation
  ext4: skip ext4_init_security() and encryption on ea_inodes
  fs: generic_block_bmap(): initialize all of the fields in the temp bh
  ext4: change fast symlink test to not rely on i_blocks
  ext4: require key for truncate(2) of encrypted file
  ext4: don't bother checking for encryption key in ->mmap()
  ext4: check return value of kstrtoull correctly in reserved_clusters_store
  ext4: fix off-by-one fsmap error on 1k block filesystems
  ext4: return EFSBADCRC if a bad checksum error is found in ext4_find_entry()
  ext4: return EIO on read error in ext4_find_entry
  ext4: forbid encrypting root directory
  ext4: send parallel discards on commit completions
  ext4: avoid unnecessary stalls in ext4_evict_inode()
  ext4: add nombcache mount option
  ext4: strong binding of xattr inode references
  ext4: eliminate xattr entry e_hash recalculation for removes
  ext4: reserve space for xattr entries/names
  quota: add get_inode_usage callback to transfer multi-inode charges
  ext4: xattr inode deduplication
  ...
2017-07-09 09:31:22 -07:00
Linus Torvalds
088737f44b Writeback error handling fixes (pile #2)
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZXhmCAAoJEAAOaEEZVoIVpRkP/1qlYn3pq6d5Kuz84pejOmlL
 5jbkS/cOmeTxeUU4+B1xG8Lx7bAk8PfSXQOADbSJGiZd0ug95tJxplFYIGJzR/tG
 aNMHeu/BVKKhUKORGuKR9rJKtwC839L/qao+yPBo5U3mU4L73rFWX8fxFuhSJ8HR
 hvkgBu3Hx6GY59CzxJ8iJzj+B+uPSFrNweAk0+0UeWkBgTzEdiGqaXBX4cHIkq/5
 hMoCG+xnmwHKbCBsQ5js+YJT+HedZ4lvfjOqGxgElUyjJ7Bkt/IFYOp8TUiu193T
 tA4UinDjN8A7FImmIBIftrECmrAC9HIGhGZroYkMKbb8ReDR2ikE5FhKEpuAGU3a
 BXBgX2mPQuArvZWM7qeJCkxV9QJ0u/8Ykbyzo30iPrICyrzbEvIubeB/mDA034+Z
 Z0/z8C3v7826F3zP/NyaQEojUgRq30McMOIS8GMnx15HJwRsRKlzjfy9Wm4tWhl0
 t3nH1jMqAZ7068s6rfh/oCwdgGOwr5o4hW/bnlITzxbjWQUOnZIe7KBxIezZJ2rv
 OcIwd5qE8PNtpagGj5oUbnjGOTkERAgsMfvPk5tjUNt28/qUlVs2V0aeo47dlcsh
 oYr8WMOIzw98Rl7Bo70mplLrqLD6nGl0LfXOyUlT4STgLWW4ksmLVuJjWIUxcO/0
 yKWjj9wfYRQ0vSUqhsI5
 =3Z93
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull Writeback error handling updates from Jeff Layton:
 "This pile represents the bulk of the writeback error handling fixes
  that I have for this cycle. Some of the earlier patches in this pile
  may look trivial but they are prerequisites for later patches in the
  series.

  The aim of this set is to improve how we track and report writeback
  errors to userland. Most applications that care about data integrity
  will periodically call fsync/fdatasync/msync to ensure that their
  writes have made it to the backing store.

  For a very long time, we have tracked writeback errors using two flags
  in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
  writeback error occurs (via mapping_set_error) and are cleared as a
  side-effect of filemap_check_errors (as you noted yesterday). This
  model really sucks for userland.

  Only the first task to call fsync (or msync or fdatasync) will see the
  error. Any subsequent task calling fsync on a file will get back 0
  (unless another writeback error occurs in the interim). If I have
  several tasks writing to a file and calling fsync to ensure that their
  writes got stored, then I need to have them coordinate with one
  another. That's difficult enough, but in a world of containerized
  setups that coordination may even not be possible.

  But wait...it gets worse!

  The calls to filemap_check_errors can be buried pretty far down in the
  call stack, and there are internal callers of filemap_write_and_wait
  and the like that also end up clearing those errors. Many of those
  callers ignore the error return from that function or return it to
  userland at nonsensical times (e.g. truncate() or stat()). If I get
  back -EIO on a truncate, there is no reason to think that it was
  because some previous writeback failed, and a subsequent fsync() will
  (incorrectly) return 0.

  This pile aims to do three things:

   1) ensure that when a writeback error occurs that that error will be
      reported to userland on a subsequent fsync/fdatasync/msync call,
      regardless of what internal callers are doing

   2) report writeback errors on all file descriptions that were open at
      the time that the error occurred. This is a user-visible change,
      but I think most applications are written to assume this behavior
      anyway. Those that aren't are unlikely to be hurt by it.

   3) document what filesystems should do when there is a writeback
      error. Today, there is very little consistency between them, and a
      lot of cargo-cult copying. We need to make it very clear what
      filesystems should do in this situation.

  To achieve this, the set adds a new data type (errseq_t) and then
  builds new writeback error tracking infrastructure around that. Once
  all of that is in place, we change the filesystems to use the new
  infrastructure for reporting wb errors to userland.

  Note that this is just the initial foray into cleaning up this mess.
  There is a lot of work remaining here:

   1) convert the rest of the filesystems in a similar fashion. Once the
      initial set is in, then I think most other fs' will be fairly
      simple to convert. Hopefully most of those can in via individual
      filesystem trees.

   2) convert internal waiters on writeback to use errseq_t for
      detecting errors instead of relying on the AS_* flags. I have some
      draft patches for this for ext4, but they are not quite ready for
      prime time yet.

  This was a discussion topic this year at LSF/MM too. If you're
  interested in the gory details, LWN has some good articles about this:

      https://lwn.net/Articles/718734/
      https://lwn.net/Articles/724307/"

* tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  btrfs: minimal conversion to errseq_t writeback error reporting on fsync
  xfs: minimal conversion to errseq_t writeback error reporting
  ext4: use errseq_t based error handling for reporting data writeback errors
  fs: convert __generic_file_fsync to use errseq_t based reporting
  block: convert to errseq_t based writeback error tracking
  dax: set errors in mapping when writeback fails
  Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
  mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
  fs: new infrastructure for writeback error handling and reporting
  lib: add errseq_t type and infrastructure for handling it
  mm: don't TestClearPageError in __filemap_fdatawait_range
  mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
  jbd2: don't clear and reset errors after waiting on writeback
  buffer: set errors in mapping at the time that the error occurs
  fs: check for writeback errors after syncing out buffers in generic_file_fsync
  buffer: use mapping_set_error instead of setting the flag
  mm: fix mapping_set_error call in me_pagecache_dirty
2017-07-07 19:38:17 -07:00
Jeff Layton
dac257f741 fs: check for writeback errors after syncing out buffers in generic_file_fsync
ext2 currently does a test+clear of the AS_EIO flag, which is
is problematic for some coming changes.

What we really need to do instead is call filemap_check_errors
in __generic_file_fsync after syncing out the buffers. That
will be sufficient for this case, and help other callers detect
these errors properly as well.

With that, we don't need to twiddle it in ext2.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-07-06 07:02:21 -04:00
Jeff Layton
2b69c8280c mm: drop "wait" parameter from write_one_page()
The callers all set it to 1.

Also, make it clear that this function will not set any sort of AS_*
error, and that the caller must do so if necessary.  No existing caller
uses this on normal files, so none of them need it.

Also, add __must_check here since, in general, the callers need to handle
an error here in some fashion.

Link: http://lkml.kernel.org/r/20170525103303.6524-1-jlayton@redhat.com
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2017-07-05 18:44:22 -04:00
Tahsin Erdogan
47387409ee ext2, ext4: make mb block cache names more explicit
There will be a second mb_cache instance that tracks ea_inodes. Make
existing names more explicit so that it is clear that they refer to
xattr block cache.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-22 11:28:55 -04:00
Tahsin Erdogan
c07dfcb458 mbcache: make mbcache naming more generic
Make names more generic so that mbcache usage is not limited to
block sharing. In a subsequent patch in the series
("ext4: xattr inode deduplication"), we start using the mbcache code
for sharing xattr inodes. With that patch, old mb_cache_entry.e_block
field could be holding either a block number or an inode number.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-22 10:29:53 -04:00
Dan Williams
f5705aa8cf dax, xfs, ext4: compile out iomap-dax paths in the FS_DAX=n case
Tetsuo reports:

  fs/built-in.o: In function `xfs_file_iomap_end':
  xfs_iomap.c:(.text+0xe0ef9): undefined reference to `put_dax'
  fs/built-in.o: In function `xfs_file_iomap_begin':
  xfs_iomap.c:(.text+0xe1a7f): undefined reference to `dax_get_by_host'
  make: *** [vmlinux] Error 1
  $ grep DAX .config
  CONFIG_DAX=m
  # CONFIG_DEV_DAX is not set
  # CONFIG_FS_DAX is not set

When FS_DAX=n we can/must throw away the dax code in filesystems.
Implement 'fs_' versions of dax_get_by_host() and put_dax() that are
nops in the FS_DAX=n case.

Cc: <linux-xfs@vger.kernel.org>
Cc: <linux-ext4@vger.kernel.org>
Cc: Jan Kara <jack@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Fixes: ef51042472f5 ("block, dax: move 'select DAX' from BLOCK to FS_DAX")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-13 17:52:16 -07:00
Linus Torvalds
0fcc3ab23d Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
 "Incremental fixes and a small feature addition on top of the main
  libnvdimm 4.12 pull request:

   - Geert noticed that tinyconfig was bloated by BLOCK selecting DAX.
     The size regression is fixed by moving all dax helpers into the
     dax-core and only specifying "select DAX" for FS_DAX and
     dax-capable drivers. He also asked for clarification of the
     NR_DEV_DAX config option which, on closer look, does not need to be
     a config option at all. Mike also throws in a DEV_DAX_PMEM fixup
     for good measure.

   - Ben's attention to detail on -stable patch submissions caught a
     case where the recent fixes to arch_copy_from_iter_pmem() missed a
     condition where we strand dirty data in the cache. This is tagged
     for -stable and will also be included in the rework of the pmem api
     to a proposed {memcpy,copy_user}_flushcache() interface for 4.13.

   - Vishal adds a feature that missed the initial pull due to pending
     review feedback. It allows the kernel to clear media errors when
     initializing a BTT (atomic sector update driver) instance on a pmem
     namespace.

   - Ross noticed that the dax_device + dax_operations conversion broke
     __dax_zero_page_range(). The nvdimm unit tests fail to check this
     path, but xfstests immediately trips over it. No excuse for missing
     this before submitting the 4.12 pull request.

  These all pass the nvdimm unit tests and an xfstests spot check. The
  set has received a build success notification from the kbuild robot"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  filesystem-dax: fix broken __dax_zero_page_range() conversion
  libnvdimm, btt: ensure that initializing metadata clears poison
  libnvdimm: add an atomic vs process context flag to rw_bytes
  x86, pmem: Fix cache flushing for iovec write < 8 bytes
  device-dax: kill NR_DEV_DAX
  block, dax: move "select DAX" from BLOCK to FS_DAX
  device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
2017-05-12 15:43:10 -07:00
Dan Williams
ef51042472 block, dax: move "select DAX" from BLOCK to FS_DAX
For configurations that do not enable DAX filesystems or drivers, do not
require the DAX core to be built.

Given that the 'direct_access' method has been removed from
'block_device_operations', we can also go ahead and remove the
block-related dax helper functions from fs/block_dev.c to
drivers/dax/super.c. This keeps dax details out of the block layer and
lets the DAX core be built as a module in the FS_DAX=n case.

Filesystems need to include dax.h to call bdev_dax_supported().

Cc: linux-xfs@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-08 10:55:27 -07:00
Linus Torvalds
53ef7d0e20 libnvdimm for 4.12
* Region media error reporting: A libnvdimm region device is the parent
 to one or more namespaces. To date, media errors have been reported via
 the "badblocks" attribute attached to pmem block devices for namespaces
 in "raw" or "memory" mode. Given that namespaces can be in "device-dax"
 or "btt-sector" mode this new interface reports media errors
 generically, i.e. independent of namespace modes or state. This
 subsequently allows userspace tooling to craft "ACPI 6.1 Section
 9.20.7.6 Function Index 4 - Clear Uncorrectable Error" requests and
 submit them via the ioctl path for NVDIMM root bus devices.
 
 * Introduce 'struct dax_device' and 'struct dax_operations': Prompted by
 a request from Linus and feedback from Christoph this allows for dax
 capable drivers to publish their own custom dax operations. This fixes
 the broken assumption that all dax operations are related to a
 persistent memory device, and makes it easier for other architectures
 and platforms to add customized persistent memory support.
 
 * 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
 available for storage appliance applications to manually trigger memory
 controllers to drain write-pending buffers that would otherwise be
 flushed automatically by the platform ADR (asynchronous-DRAM-refresh)
 mechanism at a power loss event. Support for "locked" DIMMs is included
 to prevent namespaces from surfacing when the namespace label data area
 is locked. Finally, fixes for various reported deadlocks and crashes,
 also tagged for -stable.
 
 * ACPI / nfit driver updates: General updates of the nfit driver to add
 DSM command overrides, ACPI 6.1 health state flags support, DSM payload
 debug available by default, and various fixes.
 
 Acknowledgements that came after the branch was pushed:
 
 commmit 565851c972b5 "device-dax: fix sysfs attribute deadlock"
 Tested-by: Yi Zhang <yizhan@redhat.com>
 
 commit 23f498448362 "libnvdimm: rework region badblocks clearing"
 Tested-by: Toshi Kani <toshi.kani@hpe.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZDONJAAoJEB7SkWpmfYgC3SsP/2KrLvTUcz646ViuPOgZ2cC4
 W6wAx6cvDSt+H52kLnFEsYoFt7WAj20ggPirb/Bc5jkGlvwE0lT9Xtmso9GpVkYT
 J9ZJ9pP/4YaAD3II1gmTwaUjYi0FxoOdx3Eb92yuWkO/8ylz4b2Nu3cBpYwyziGQ
 nIfEVwDXRLE86u6x0bWuf6TlVuvsbdiAI55CDqDMVQC6xIOLbSez7b8QIHlpiKEb
 Mw+xqdQva0esoreZEOXEhWNO+qtfILx8/ceBEGTNMp4e/JjZ2FbrSNplM+9bH5k7
 ywqP8lW+mBEw0fmBBkYoVG/xyesiiBb55JLnbi8Ew+7IUxw8a3iV7wftRi62lHcK
 zAjsHe4L+MansgtZsCL8wluvIPaktAdtB4xr7l9VNLKRYRUG73jEWU0gcUNryHIL
 BkQJ52pUS1PkClyAsWbBBHl1I/CvzVPd21VW0YELmLR4OywKy1c+eKw2bcYgjrb4
 59HZSv6S6EoKaQC+2qvVNpePil7cdfg5V2ubH/ki9HoYVyoxDptEWHnvf0NNatIH
 Y7mNcOPvhOksJmnKSyHbDjtRur7WoHIlC9D7UjEFkSBWsKPjxJHoidN4SnCMRtjQ
 WKQU0seoaKj04b68Bs/Qm9NozVgnsPFIUDZeLMikLFX2Jt7YSPu+Jmi2s4re6WLh
 TmJQ3Ly9t3o3/weHSzmn
 =Ox0s
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "The bulk of this has been in multiple -next releases. There were a few
  late breaking fixes and small features that got added in the last
  couple days, but the whole set has received a build success
  notification from the kbuild robot.

  Change summary:

   - Region media error reporting: A libnvdimm region device is the
     parent to one or more namespaces. To date, media errors have been
     reported via the "badblocks" attribute attached to pmem block
     devices for namespaces in "raw" or "memory" mode. Given that
     namespaces can be in "device-dax" or "btt-sector" mode this new
     interface reports media errors generically, i.e. independent of
     namespace modes or state.

     This subsequently allows userspace tooling to craft "ACPI 6.1
     Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error"
     requests and submit them via the ioctl path for NVDIMM root bus
     devices.

   - Introduce 'struct dax_device' and 'struct dax_operations': Prompted
     by a request from Linus and feedback from Christoph this allows for
     dax capable drivers to publish their own custom dax operations.
     This fixes the broken assumption that all dax operations are
     related to a persistent memory device, and makes it easier for
     other architectures and platforms to add customized persistent
     memory support.

   - 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
     available for storage appliance applications to manually trigger
     memory controllers to drain write-pending buffers that would
     otherwise be flushed automatically by the platform ADR
     (asynchronous-DRAM-refresh) mechanism at a power loss event.
     Support for "locked" DIMMs is included to prevent namespaces from
     surfacing when the namespace label data area is locked. Finally,
     fixes for various reported deadlocks and crashes, also tagged for
     -stable.

   - ACPI / nfit driver updates: General updates of the nfit driver to
     add DSM command overrides, ACPI 6.1 health state flags support, DSM
     payload debug available by default, and various fixes.

  Acknowledgements that came after the branch was pushed:

   - commmit 565851c972b5 "device-dax: fix sysfs attribute deadlock":
     Tested-by: Yi Zhang <yizhan@redhat.com>

   - commit 23f498448362 "libnvdimm: rework region badblocks clearing"
     Tested-by: Toshi Kani <toshi.kani@hpe.com>"

* tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits)
  libnvdimm, pfn: fix 'npfns' vs section alignment
  libnvdimm: handle locked label storage areas
  libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
  brd: fix uninitialized use of brd->dax_dev
  block, dax: use correct format string in bdev_dax_supported
  device-dax: fix sysfs attribute deadlock
  libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking"
  libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
  libnvdimm: rework region badblocks clearing
  acpi, nfit: kill ACPI_NFIT_DEBUG
  libnvdimm: fix clear length of nvdimm_forget_poison()
  libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
  libnvdimm, region: sysfs trigger for nvdimm_flush()
  libnvdimm: fix phys_addr for nvdimm_clear_poison
  x86, dax, pmem: remove indirection around memcpy_from_pmem()
  block: remove block_device_operations ->direct_access()
  block, dax: convert bdev_dax_supported() to dax_direct_access()
  filesystem-dax: convert to dax_direct_access()
  Revert "block: use DAX for partition table reads"
  ext2, ext4, xfs: retrieve dax_device for iomap operations
  ...
2017-05-05 18:49:20 -07:00
Dan Williams
fa5d932c32 ext2, ext4, xfs: retrieve dax_device for iomap operations
In preparation for converting fs/dax.c to use dax_direct_access()
instead of bdev_direct_access(), add the plumbing to retrieve the
dax_device associated with a given block_device.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-25 13:20:46 -07:00
Jan Kara
420768d319 ext2: Remove ext2_get_inode_flags()
Now that all places setting inode->i_flags that should be reflected in
on-disk flags are gone, we can remove ext2_get_inode_flags() call.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-19 14:21:23 +02:00
Jan Kara
161f3b7447 ext2: Set flags on quota files directly
Currently immutable and noatime flags on quota files are set by quota
code which requires us to copy inode->i_flags to our on disk version of
quota flags in GETFLAGS ioctl and __ext2_write_inode().  Move to setting
/ clearing these on-disk flags directly to save that copying.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-19 14:21:23 +02:00
Jan Kara
6554766150 ext2: Call dquot_writeback_dquots() with s_umount held
ext2_sync_fs() could be called without s_umount semaphore held when
called through ext2_write_super() from __ext2_write_inode(). This
function then calls dquot_writeback_dquots() which relies on s_umount to
be held for protection against other quota operations.

In fact __ext2_write_inode() does not need all the functionality
ext2_write_super() provides. It is enough to just write the superblock.
So use ext2_sync_super() instead.

Fixes: 9d1ccbe70e0b14545caad12dc73adb3605447df0
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-05 14:23:45 +02:00
Ingo Molnar
5b825c3af1 sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.

Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:31 +01:00
Dave Jiang
c791ace1e7 mm: replace FAULT_FLAG_SIZE with parameter to huge_fault
Since the introduction of FAULT_FLAG_SIZE to the vm_fault flag, it has
been somewhat painful with getting the flags set and removed at the
correct locations.  More than one kernel oops was introduced due to
difficulties of getting the placement correctly.

Remove the flag values and introduce an input parameter to huge_fault
that indicates the size of the page entry.  This makes the code easier
to trace and should avoid the issues we see with the fault flags where
removal of the flag was necessary in the fallback paths.

Link: http://lkml.kernel.org/r/148615748258.43180.1690152053774975329.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Dave Jiang
a2d581675d mm,fs,dax: change ->pmd_fault to ->huge_fault
Patch series "1G transparent hugepage support for device dax", v2.

The following series implements support for 1G trasparent hugepage on
x86 for device dax.  The bulk of the code was written by Mathew Wilcox a
while back supporting transparent 1G hugepage for fs DAX.  I have
forward ported the relevant bits to 4.10-rc.  The current submission has
only the necessary code to support device DAX.

Comments from Dan Williams: So the motivation and intended user of this
functionality mirrors the motivation and users of 1GB page support in
hugetlbfs.  Given expected capacities of persistent memory devices an
in-memory database may want to reduce tlb pressure beyond what they can
already achieve with 2MB mappings of a device-dax file.  We have
customer feedback to that effect as Willy mentioned in his previous
version of these patches [1].

[1]: https://lkml.org/lkml/2016/1/31/52

Comments from Nilesh @ Oracle:

There are applications which have a process model; and if you assume
10,000 processes attempting to mmap all the 6TB memory available on a
server; we are looking at the following:

processes         : 10,000
memory            :    6TB
pte @ 4k page size: 8 bytes / 4K of memory * #processes = 6TB / 4k * 8 * 10000 = 1.5GB * 80000 = 120,000GB
pmd @ 2M page size: 120,000 / 512 = ~240GB
pud @ 1G page size: 240GB / 512 = ~480MB

As you can see with 2M pages, this system will use up an exorbitant
amount of DRAM to hold the page tables; but the 1G pages finally brings
it down to a reasonable level.  Memory sizes will keep increasing; so
this number will keep increasing.

An argument can be made to convert the applications from process model
to thread model, but in the real world that may not be always practical.
Hopefully this helps explain the use case where this is valuable.

This patch (of 3):

In preparation for adding the ability to handle PUD pages, convert
vm_operations_struct.pmd_fault to vm_operations_struct.huge_fault.  The
vm_fault structure is extended to include a union of the different page
table pointers that may be needed, and three flag bits are reserved to
indicate which type of pointer is in the union.

[ross.zwisler@linux.intel.com: remove unused function ext4_dax_huge_fault()]
  Link: http://lkml.kernel.org/r/1485813172-7284-1-git-send-email-ross.zwisler@linux.intel.com
[dave.jiang@intel.com: clear PMD or PUD size flags when in fall through path]
  Link: http://lkml.kernel.org/r/148589842696.5820.16078080610311444794.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545058784.17912.6353162518188733642.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Dave Jiang
11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Christoph Hellwig
8ff6daa17b iomap: constify struct iomap_ops
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-30 16:32:25 -08:00
Ross Zwisler
6affb9d7b1 dax: fix build warnings with FS_DAX and !FS_IOMAP
As reported by Arnd:

  https://lkml.org/lkml/2017/1/10/756

Compiling with the following configuration:

  # CONFIG_EXT2_FS is not set
  # CONFIG_EXT4_FS is not set
  # CONFIG_XFS_FS is not set
  # CONFIG_FS_IOMAP depends on the above filesystems, as is not set
  CONFIG_FS_DAX=y

generates build warnings about unused functions in fs/dax.c:

  fs/dax.c:878:12: warning: `dax_insert_mapping' defined but not used [-Wunused-function]
   static int dax_insert_mapping(struct address_space *mapping,
              ^~~~~~~~~~~~~~~~~~
  fs/dax.c:572:12: warning: `copy_user_dax' defined but not used [-Wunused-function]
   static int copy_user_dax(struct block_device *bdev, sector_t sector, size_t size,
              ^~~~~~~~~~~~~
  fs/dax.c:542:12: warning: `dax_load_hole' defined but not used [-Wunused-function]
   static int dax_load_hole(struct address_space *mapping, void **entry,
              ^~~~~~~~~~~~~
  fs/dax.c:312:14: warning: `grab_mapping_entry' defined but not used [-Wunused-function]
   static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
                ^~~~~~~~~~~~~~~~~~

Now that the struct buffer_head based DAX fault paths and I/O path have
been removed we really depend on iomap support being present for DAX.
Make this explicit by selecting FS_IOMAP if we compile in DAX support.

This allows us to remove conditional selections of FS_IOMAP when FS_DAX
was present for ext2 and ext4, and to remove an #ifdef in fs/dax.c.

Link: http://lkml.kernel.org/r/1484087383-29478-1-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-24 16:26:14 -08:00