54505 Commits

Author SHA1 Message Date
Matthew Wilcox
0f2de9a420
export __set_page_dirty
XFS currently contains a copy-and-paste of __set_page_dirty().  Export
it from buffer.c instead.

Link: http://lkml.kernel.org/r/20180313132639.17387-6-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:48 +05:30
John Galt
791a20fee1
Revert "erofs: fixes for compilation"
This reverts commit c7bf11979051cda0e7b37857289503fa4831c549.

Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:47 +05:30
Hongyu Jin
a0d2b5b999
erofs: fix use-after-free of on-stack io[]
The root cause is the race as follows:
Thread #1                              Thread #2(irq ctx)

z_erofs_runqueue()
  struct z_erofs_decompressqueue io_A[];
  submit bio A
  z_erofs_decompress_kickoff(,,1)
                                       z_erofs_decompressqueue_endio(bio A)
                                       z_erofs_decompress_kickoff(,,-1)
                                       spin_lock_irqsave()
                                       atomic_add_return()
  io_wait_event()	-> pending_bios is already 0
  [end of function]
                                       wake_up_locked(io_A[]) // crash

Referenced backtrace in kernel 5.4:

[   10.129422] Unable to handle kernel paging request at virtual address eb0454a4
[   10.364157] CPU: 0 PID: 709 Comm: getprop Tainted: G        WC O      5.4.147-ab09225 #1
[   11.556325] [<c01b33b8>] (__wake_up_common) from [<c01b3300>] (__wake_up_locked+0x40/0x48)
[   11.565487] [<c01b3300>] (__wake_up_locked) from [<c044c8d0>] (z_erofs_vle_unzip_kickoff+0x6c/0xc0)
[   11.575438] [<c044c8d0>] (z_erofs_vle_unzip_kickoff) from [<c044c854>] (z_erofs_vle_read_endio+0x16c/0x17c)
[   11.586082] [<c044c854>] (z_erofs_vle_read_endio) from [<c06a80e8>] (clone_endio+0xb4/0x1d0)
[   11.595428] [<c06a80e8>] (clone_endio) from [<c04a1280>] (blk_update_request+0x150/0x4dc)
[   11.604516] [<c04a1280>] (blk_update_request) from [<c06dea28>] (mmc_blk_cqe_complete_rq+0x144/0x15c)
[   11.614640] [<c06dea28>] (mmc_blk_cqe_complete_rq) from [<c04a5d90>] (blk_done_softirq+0xb0/0xcc)
[   11.624419] [<c04a5d90>] (blk_done_softirq) from [<c010242c>] (__do_softirq+0x184/0x56c)
[   11.633419] [<c010242c>] (__do_softirq) from [<c01051e8>] (irq_exit+0xd4/0x138)
[   11.641640] [<c01051e8>] (irq_exit) from [<c010c314>] (__handle_domain_irq+0x94/0xd0)
[   11.650381] [<c010c314>] (__handle_domain_irq) from [<c04fde70>] (gic_handle_irq+0x50/0xd4)
[   11.659641] [<c04fde70>] (gic_handle_irq) from [<c0101b70>] (__irq_svc+0x70/0xb0)

Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220401115527.4935-1-hongyu.jin.cn@gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:46 +05:30
Yue Hu
2f1264f80c
erofs: remove the fast path of per-CPU buffer decompression
As Xiang mentioned, such path has no real impact to our current
decompression strategy, remove it directly. Also, update the return
value of z_erofs_lz4_decompress() to 0 if success to keep consistent
with LZMA which will return 0 as well for that case.

Link: https://lore.kernel.org/r/20211014065744.1787-1-zbestahu@gmail.com
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:46 +05:30
Yue Hu
4ef4bfee7a
erofs: clear compacted_2b if compacted_4b_initial > totalidx
Currently, the whole indexes will only be compacted 4B if
compacted_4b_initial > totalidx. So, the calculated compacted_2b
is worthless for that case. It may waste CPU resources.

No need to update compacted_4b_initial as mkfs since it's used to
fulfill the alignment of the 1st compacted_2b pack and would handle
the case above.

We also need to clarify compacted_4b_end here. It's used for the
last lclusters which aren't fitted in the previous compacted_2b
packs.

Some messages are from Xiang.

Link: https://lore.kernel.org/r/20210914035915.1190-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[ Gao Xiang: it's enough to use "compacted_4b_initial < totalidx". ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:45 +05:30
Yue Hu
3b1bc52404
erofs: remove the mapping parameter from erofs_try_to_free_cached_page()
The mapping is not used at all, remove it and update related code.

Link: https://lore.kernel.org/r/20210810072416.1392-1-zbestahu@gmail.com
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:44 +05:30
Yue Hu
c7e9c1ebf7
erofs: directly use wrapper erofs_page_is_managed() when shrinking
We already have the wrapper function to identify managed page.

Link: https://lore.kernel.org/r/20210810065450.1320-1-zbestahu@gmail.com
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:44 +05:30
Yue Hu
fe9d17e36f
erofs: remove the occupied parameter from z_erofs_pagevec_enqueue()
No any behavior to variable occupied in z_erofs_attach_page() which
is only caller to z_erofs_pagevec_enqueue().

Link: https://lore.kernel.org/r/20210419102623.2015-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Gao Xiang <xiang@kernel.org>
Signed-off-by: Gao Xiang <xiang@kernel.org>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:43 +05:30
Gao Xiang
11d2af2630
erofs: fix 1 lcluster-sized pcluster for big pcluster
If the 1st NONHEAD lcluster of a pcluster isn't CBLKCNT lcluster type
rather than a HEAD or PLAIN type instead, which means its pclustersize
_must_ be 1 lcluster (since its uncompressed size < 2 lclusters),
as illustrated below:

       HEAD     HEAD / PLAIN    lcluster type
   ____________ ____________
  |_:__________|_________:__|   file data (uncompressed)
   .                .
  .____________.
  |____________|                pcluster data (compressed)

Such on-disk case was explained before [1] but missed to be handled
properly in the runtime implementation.

It can be observed if manually generating 1 lcluster-sized pcluster
with 2 lclusters (thus CBLKCNT doesn't exist.) Let's fix it now.

[1] https://lore.kernel.org/r/20210407043927.10623-1-xiang@kernel.org

Link: https://lore.kernel.org/r/20210510064715.29123-1-xiang@kernel.org
Fixes: cec6e93beadf ("erofs: support parsing big pcluster compress indexes")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <xiang@kernel.org>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:42 +05:30
Gao Xiang
697b5d24af
erofs: enable big pcluster feature
Enable COMPR_CFGS and BIG_PCLUSTER since the implementations are
all settled properly.

Link: https://lore.kernel.org/r/20210407043927.10623-11-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:42 +05:30
Gao Xiang
12ec507a72
erofs: support decompress big pcluster for lz4 backend
Prior to big pcluster, there was only one compressed page so it'd
easy to map this. However, when big pcluster is enabled, more work
needs to be done to handle multiple compressed pages. In detail,

 - (maptype 0) if there is only one compressed page + no need
   to copy inplace I/O, just map it directly what we did before;

 - (maptype 1) if there are more compressed pages + no need to
   copy inplace I/O, vmap such compressed pages instead;

 - (maptype 2) if inplace I/O needs to be copied, use per-CPU
   buffers for decompression then.

Another thing is how to detect inplace decompression is feasable or
not (it's still quite easy for non big pclusters), apart from the
inplace margin calculation, inplace I/O page reusing order is also
needed to be considered for each compressed page. Currently, if the
compressed page is the xth page, it shouldn't be reused as [0 ...
nrpages_out - nrpages_in + x], otherwise a full copy will be triggered.

Although there are some extra optimization ideas for this, I'd like
to make big pcluster work correctly first and obviously it can be
further optimized later since it has nothing with the on-disk format
at all.

Link: https://lore.kernel.org/r/20210407043927.10623-10-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:41 +05:30
Gao Xiang
ec3f9e7945
erofs: support parsing big pcluster compact indexes
Different from non-compact indexes, several lclusters are packed
as the compact form at once and an unique base blkaddr is stored for
each pack, so each lcluster index would take less space on avarage
(e.g. 2 bytes for COMPACT_2B.) btw, that is also why BIG_PCLUSTER
switch should be consistent for compact head0/1.

Prior to big pcluster, the size of all pclusters was 1 lcluster.
Therefore, when a new HEAD lcluster was scanned, blkaddr would be
bumped by 1 lcluster. However, that way doesn't work anymore for
big pcluster since we actually don't know the compressed size of
pclusters in advance (before reading CBLKCNT lcluster).

So, instead, let blkaddr of each pack be the first pcluster blkaddr
with a valid CBLKCNT, in detail,

 1) if CBLKCNT starts at the pack, this first valid pcluster is
    itself, e.g.
  _____________________________________________________________
 |_CBLKCNT0_|_NONHEAD_| .. |_HEAD_|_CBLKCNT1_| ... |_HEAD_| ...
 ^ = blkaddr base          ^ += CBLKCNT0           ^ += CBLKCNT1

 2) if CBLKCNT doesn't start at the pack, the first valid pcluster
    is the next pcluster, e.g.
  _________________________________________________________
 | NONHEAD_| .. |_HEAD_|_CBLKCNT0_| ... |_HEAD_|_HEAD_| ...
                ^ = blkaddr base        ^ += CBLKCNT0
                                               ^ += 1

When a CBLKCNT is found, blkaddr will be increased by CBLKCNT
lclusters, or a new HEAD is found immediately, bump blkaddr by 1
instead (see the picture above.)

Also noted if CBLKCNT is the end of the pack, instead of storing
delta1 (distance of the next HEAD lcluster) as normal NONHEADs,
it still uses the compressed block count (delta0) since delta1
can be calculated indirectly but the block count can't.

Adjust decoding logic to fit big pcluster compact indexes as well.

Link: https://lore.kernel.org/r/20210407043927.10623-9-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:40 +05:30
Gao Xiang
f11d858b6f
erofs: support parsing big pcluster compress indexes
When INCOMPAT_BIG_PCLUSTER sb feature is enabled, legacy compress indexes
will also have the same on-disk header compact indexes to keep per-file
configurations instead of leaving it zeroed.

If ADVISE_BIG_PCLUSTER is set for a file, CBLKCNT will be loaded for each
pcluster in this file by parsing 1st non-head lcluster.

Link: https://lore.kernel.org/r/20210407043927.10623-8-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:40 +05:30
Gao Xiang
59868b3726
erofs: adjust per-CPU buffers according to max_pclusterblks
Adjust per-CPU buffers on demand since big pcluster definition is
available. Also, bail out unsupported pcluster size according to
Z_EROFS_PCLUSTER_MAX_SIZE.

Link: https://lore.kernel.org/r/20210407043927.10623-7-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:39 +05:30
Gao Xiang
51e41ca5eb
erofs: add big physical cluster definition
Big pcluster indicates the size of compressed data for each physical
pcluster is no longer fixed as block size, but could be more than 1
block (more accurately, 1 logical pcluster)

When big pcluster feature is enabled for head0/1, delta0 of the 1st
non-head lcluster index will keep block count of this pcluster in
lcluster size instead of 1. Or, the compressed size of pcluster
should be 1 lcluster if pcluster has no non-head lcluster index.

Also note that BIG_PCLUSTER feature reuses COMPR_CFGS feature since
it depends on COMPR_CFGS and will be released together.

Link: https://lore.kernel.org/r/20210407043927.10623-6-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:39 +05:30
Gao Xiang
1442ef7677
erofs: fix up inplace I/O pointer for big pcluster
When picking up inplace I/O pages, it should be traversed in reverse
order in aligned with the traversal order of file-backed online pages.
Also, index should be updated together when preloading compressed pages.

Previously, only page-sized pclustersize was supported so no problem
at all. Also rename `compressedpages' to `icpage_ptr' to reflect its
functionality.

Link: https://lore.kernel.org/r/20210407043927.10623-5-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:38 +05:30
Gao Xiang
3b3e78fdf9
erofs: introduce physical cluster slab pools
Since multiple pcluster sizes could be used at once, the number of
compressed pages will become a variable factor. It's necessary to
introduce slab pools rather than a single slab cache now.

This limits the pclustersize to 1M (Z_EROFS_PCLUSTER_MAX_SIZE), and
get rid of the obsolete EROFS_FS_CLUSTER_PAGE_LIMIT, which has no
use now.

Link: https://lore.kernel.org/r/20210407043927.10623-4-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:37 +05:30
Gao Xiang
12fba2912b
erofs: introduce multipage per-CPU buffers
To deal the with the cases which inplace decompression is infeasible
for some inplace I/O. Per-CPU buffers was introduced to get rid of page
allocation latency and thrash for low-latency decompression algorithms
such as lz4.

For the big pcluster feature, introduce multipage per-CPU buffers to
keep such inplace I/O pclusters temporarily as well but note that
per-CPU pages are just consecutive virtually.

When a new big pcluster fs is mounted, its max pclustersize will be
read and per-CPU buffers can be growed if needed. Shrinking adjustable
per-CPU buffers is more complex (because we don't know if such size
is still be used), so currently just release them all when unloading.

Link: https://lore.kernel.org/r/20210409190630.19569-1-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:37 +05:30
Gao Xiang
cf51a981a9
erofs: reserve physical_clusterbits[]
Formal big pcluster design is actually more powerful / flexable than
the previous thought whose pclustersize was fixed as power-of-2 blocks,
which was obviously inefficient and space-wasting. Instead, pclustersize
can now be set independently for each pcluster, so various pcluster
sizes can also be used together in one file if mkfs wants (for example,
according to data type and/or compression ratio).

Let's get rid of previous physical_clusterbits[] setting (also notice
that corresponding on-disk fields are still 0 for now). Therefore,
head1/2 can be used for at most 2 different algorithms in one file and
again pclustersize is now independent of these.

Link: https://lore.kernel.org/r/20210407043927.10623-2-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:36 +05:30
Ruiqi Gong
5981ae68a4
erofs: Clean up spelling mistakes found in fs/erofs
zmap.c: s/correspoinding/corresponding
zdata.c: s/endding/ending

Link: https://lore.kernel.org/r/20210331093920.31923-1-gongruiqi1@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ruiqi Gong <gongruiqi1@huawei.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:35 +05:30
Gao Xiang
ad17a42ae7
erofs: add on-disk compression configurations
Add a bitmap for available compression algorithms and a variable-sized
on-disk table for compression options in preparation for upcoming big
pcluster and LZMA algorithm, which follows the end of super block.

To parse the compression options, the bitmap is scanned one by one.
For each available algorithm, there is data followed by 2-byte `length'
correspondingly (it's enough for most cases, or entire fs blocks should
be used.)

With such available algorithm bitmap, kernel itself can also refuse to
mount such filesystem if any unsupported compression algorithm exists.

Note that COMPR_CFGS feature will be enabled with BIG_PCLUSTER.

Link: https://lore.kernel.org/r/20210329100012.12980-1-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:35 +05:30
Gao Xiang
1d6214128a
erofs: introduce on-disk lz4 fs configurations
Introduce z_erofs_lz4_cfgs to store all lz4 configurations.
Currently it's only max_distance, but will be used for new
features later.

Link: https://lore.kernel.org/r/20210329012308.28743-4-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:34 +05:30
Huang Jianan
cc21319e3a
erofs: support adjust lz4 history window size
lz4 uses LZ4_DISTANCE_MAX to record history preservation. When
using rolling decompression, a block with a higher compression
ratio will cause a larger memory allocation (up to 64k). It may
cause a large resource burden in extreme cases on devices with
small memory and a large number of concurrent IOs. So appropriately
reducing this value can improve performance.

Decreasing this value will reduce the compression ratio (except
when input_size <LZ4_DISTANCE_MAX). But considering that erofs
currently only supports 4k output, reducing this value will not
significantly reduce the compression benefits.

The maximum value of LZ4_DISTANCE_MAX defined by lz4 is 64k, and
we can only reduce this value. For the old kernel, it just can't
reduce the memory allocation during rolling decompression without
affecting the decompression result.

Link: https://lore.kernel.org/r/20210329012308.28743-3-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
[ Gao Xiang: introduce struct erofs_sb_lz4_info for configurations. ]
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:33 +05:30
Gao Xiang
04b85242d0
erofs: introduce erofs_sb_has_xxx() helpers
Introduce erofs_sb_has_xxx() to make long checks short, especially
for later big pcluster & LZMA features.

Link: https://lore.kernel.org/r/20210329012308.28743-2-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:33 +05:30
Yue Hu
5191e4d90b
erofs: don't use erofs_map_blocks() any more
Currently, erofs_map_blocks() will be called only from
erofs_{bmap, read_raw_page} which are all for uncompressed files.
So, the compression branch in erofs_map_blocks() is pointless. Let's
remove it and use erofs_map_blocks_flatmode() directly. Also update
related comments.

Link: https://lore.kernel.org/r/20210325071008.573-1-zbestahu@gmail.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:32 +05:30
Gao Xiang
c48c373bf5
erofs: complete a missing case for inplace I/O
Add a missing case which could cause unnecessary page allocation but
not directly use inplace I/O instead, which increases runtime extra
memory footprint.

The detail is, considering an online file-backed page, the right half
of the page is chosen to be cached (e.g. the end page of a readahead
request) and some of its data doesn't exist in managed cache, so the
pcluster will be definitely kept in the submission chain. (IOWs, it
cannot be decompressed without I/O, e.g., due to the bypass queue).

Currently, DELAYEDALLOC/TRYALLOC cases can be downgraded as NOINPLACE,
and stop online pages from inplace I/O. After this patch, unneeded page
allocations won't be observed in pickup_page_for_submission() then.

Link: https://lore.kernel.org/r/20210321183227.5182-1-hsiangkao@aol.com
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:32 +05:30
Huang Jianan
5dfa9f9b1a
erofs: use workqueue decompression for atomic contexts only
z_erofs_decompressqueue_endio may not be executed in the atomic
context, for example, when dm-verity is turned on. In this scenario,
data can be decompressed directly to get rid of additional kworker
scheduling overhead.

Link: https://lore.kernel.org/r/20210317035448.13921-2-huangjianan@oppo.com
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:31 +05:30
Huang Jianan
d7f08d2b81
erofs: avoid memory allocation failure during rolling decompression
Currently, err would be treated as io error. Therefore, it'd be
better to ensure memory allocation during rolling decompression
to avoid such io error.

In the long term, we might consider adding another !Uptodate case
for such case.

Link: https://lore.kernel.org/r/20210316031515.90954-1-huangjianan@oppo.com
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:30 +05:30
John Galt
28bfad6991
erofs: compression fixes
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:30 +05:30
Luan Cachoroski Halaiko
5e066335c5
erofs: fixes for compilation
Signed-off-by: Luan Cachoroski Halaiko <luhalaiko@gmail.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:29 +05:30
Gao Xiang
df3fe3ae55
erofs: force inplace I/O under low memory scenario
Try to forcely switch to inplace I/O under low memory scenario in
order to avoid direct memory reclaim due to cached page allocation.

Link: https://lore.kernel.org/r/20201209123717.12430-1-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: I8ea2d3b59c68125271f66853cf5dc6ca39e7aaa9
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:29 +05:30
Gao Xiang
0ee91cad79
erofs: simplify try_to_claim_pcluster()
simplify try_to_claim_pcluster() by directly using cmpxchg() here
(the retry loop caused more overhead.) Also, move the chain loop
detection in and rename it to z_erofs_try_to_claim_pcluster().

Link: https://lore.kernel.org/r/20201208095834.3133565-3-hsiangkao@redhat.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: I8d091ff44123b099ef199eaa4200a00b8854623f
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:28 +05:30
Gao Xiang
afd7d0f494
erofs: insert to managed cache after adding to pcl
Previously, it could be some concern to call add_to_page_cache_lru()
with page->mapping == Z_EROFS_MAPPING_STAGING (!= NULL).

In contrast, page->private is used instead now, so partially revert
commit 5ddcee1f3a1c ("erofs: get rid of __stagingpage_alloc helper")
with some adaption for simplicity.

Link: https://lore.kernel.org/r/20201208095834.3133565-2-hsiangkao@redhat.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: If250d62b47083649e96d0937eb1990b6c84d768f
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:27 +05:30
Gao Xiang
1433914b3b
erofs: get rid of magical Z_EROFS_MAPPING_STAGING
Previously, we played around with magical page->mapping for short-lived
temporary pages since we need to identify different types of pages in
the same pcluster but both invalidated and short-lived temporary pages
can have page->mapping == NULL. It was considered as safe because that
temporary pages are all non-LRU / non-movable pages.

This patch tends to use specific page->private to identify short-lived
pages instead so it won't rely on page->mapping anymore. Details are
described in "compress.h" as well.

Link: https://lore.kernel.org/r/20201208095834.3133565-1-hsiangkao@redhat.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: I2c8650e80cb6016ed828d04f89f8bd3512ca3fb2
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:27 +05:30
Vladimir Zapolskiy
0eaa1db863
erofs: remove a void EROFS_VERSION macro set in Makefile
Since commit 4f761fa253b4 ("erofs: rename errln/infoln/debugln to
erofs_{err, info, dbg}") the defined macro EROFS_VERSION has no affect,
therefore removing it from the Makefile is a non-functional change.

Link: https://lore.kernel.org/r/20201030122839.25431-1-vladimir@tuxera.com
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Vladimir Zapolskiy <vladimir@tuxera.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: Id63ad279985db2a156d62be814bf381c9bea8342
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:26 +05:30
Gao Xiang
a36de8f402
erofs: move from drivers/staging/ to fs/
Since 5.4, erofs has been moved into fs/.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Change-Id: I95dd967a0097629a9d8eaed1dc11e2cd04f47701
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-22 14:21:12 +05:30
Joe Perches
fead2462e6
sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output
commit 2efc459d06f1630001e3984854848a5647086232 upstream.

Output defects can exist in sysfs content using sprintf and snprintf.

sprintf does not know the PAGE_SIZE maximum of the temporary buffer
used for outputting sysfs content and it's possible to overrun the
PAGE_SIZE buffer length.

Add a generic sysfs_emit function that knows that the size of the
temporary buffer and ensures that no overrun is done.

Add a generic sysfs_emit_at function that can be used in multiple
call situations that also ensures that no overrun is done.

Validate the output buffer argument to be page aligned.
Validate the offset len argument to be within the PAGE_SIZE buf.

Signed-off-by: Joe Perches <joe@perches.com>
Link: https://lore.kernel.org/r/884235202216d464d61ee975f7465332c86f76b2.1600285923.git.joe@perches.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ie5b2aa618d8fbdc93a61bce992fb8c6e020d2957
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-15 10:43:52 +05:30
John Galt
70e8b4d11f
fuse: force new feature on
We know our users, no need to use flag.

Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-07 23:00:51 +05:30
Dharmendra Singh
66e0d71192
FROMLIST: fuse: Allow non-extending parallel direct writes
In general, as of now, in FUSE, direct writes on the same file are
serialized over inode lock i.e we hold inode lock for the full duration
of the write request. I could not found in fuse code a comment which
clearly explains why this exclusive lock is taken for direct writes.
Our guess is some USER space fuse implementations might be relying
on this lock for seralization and also it protects for the issues
arising due to file size assumption or write failures.  This patch
relaxes this exclusive lock in some cases of direct writes.

With these changes, we allows non-extending parallel direct writes
on the same file with the help of a flag called FOPEN_PARALLEL_WRITES.
If this flag is set on the file (flag is passed from libfuse to fuse
kernel as part of file open/create), we do not take exclusive lock instead
use shared lock so that all non-extending writes can run in parallel.

Best practise would be to enable parallel direct writes of all kinds
including extending writes as well but we see some issues such as
when one write completes and other fails, how we should truncate(if
needed) the file if underlying file system does not support holes
(For file systems which supports holes, there might be a possibility
of enabling parallel writes for all cases).

FUSE implementations which rely on this inode lock for serialisation
can continue to do so and this is default behaviour i.e no parallel
direct writes.

Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
[cyberknight777: backport and adapt to 4.14 fuse implementation]
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-07 23:00:50 +05:30
Sebastian Andrzej Siewior
1ea126ba21
kernel: sched: Provide a pointer to the valid CPU mask
In commit 4b53a3412d66 ("sched/core: Remove the tsk_nr_cpus_allowed()
wrapper") the tsk_nr_cpus_allowed() wrapper was removed. There was not
much difference in !RT but in RT we used this to implement
migrate_disable(). Within a migrate_disable() section the CPU mask is
restricted to single CPU while the "normal" CPU mask remains untouched.

As an alternative implementation Ingo suggested to use
	struct task_struct {
		const cpumask_t		*cpus_ptr;
		cpumask_t		cpus_mask;
        };
with
	t->cpus_allowed_ptr = &t->cpus_allowed;

In -RT we then can switch the cpus_ptr to
	t->cpus_allowed_ptr = &cpumask_of(task_cpu(p));

in a migration disabled region. The rules are simple:
- Code that 'uses' ->cpus_allowed would use the pointer.
- Code that 'modifies' ->cpus_allowed would use the direct mask.

While converting the existing users I tried to stick with the rules
above however… well mostly CPUFREQ tries to temporary switch the CPU
mask to do something on a certain CPU and then switches the mask back it
its original value. So in theory `cpus_ptr' could or should be used.
However if this is invoked in a migration disabled region (which is not
the case because it would require something like preempt_disable() and
set_cpus_allowed_ptr() might sleep so it can't be) then the "restore"
part would restore the wrong mask. So it only looks strange and I go for
the pointer…

Some drivers copy the cpumask without cpumask_copy() and others use
cpumask_copy but without alloc_cpumask_var(). I did not fix those as
part of this, could do this as a follow up…

So is this the way we want it?
Is the usage of `cpus_ptr' vs `cpus_mask' for the set + restore part
(see cpufreq users) what we want? At some point it looks like they
should use a different interface for their doing. I am not sure why
switching to certain CPU is important but maybe it could be done via a
workqueue from the CPUFREQ core (so we have a comment desribing why are
doing this and a get_online_cpus() to ensure that the CPU does not go
offline too early).

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[Sultan Alsawaf: adapt to floral]
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-06-07 21:47:29 +05:30
Nick Terrell
982c7e4758
BACKPORT: lib: zstd: Add kernel-specific API
This patch:
- Moves `include/linux/zstd.h` -> `include/linux/zstd_lib.h`
- Updates modified zstd headers to yearless copyright
- Adds a new API in `include/linux/zstd.h` that is functionally
  equivalent to the in-use subset of the current API. Functions are
  renamed to avoid symbol collisions with zstd, to make it clear it is
  not the upstream zstd API, and to follow the kernel style guide.
- Updates all callers to use the new API.

There are no functional changes in this patch. Since there are no
functional change, I felt it was okay to update all the callers in a
single patch. Once the API is approved, the callers are mechanically
changed.

This patch is preparing for the 3rd patch in this series, which updates
zstd to version 1.4.10. Since the upstream zstd API is no longer exposed
to callers, the update can happen transparently.

Signed-off-by: Nick Terrell <terrelln@fb.com>
Tested By: Paul Jones <paul@pauljones.id.au>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64
Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf>
[cyberknight777: backport to 4.14]
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-04-02 16:27:11 +05:30
Geert Uytterhoeven
61be0c1a78
BACKPORT: f2fs: compress: Allow modular (de)compression algorithms
If F2FS_FS is modular, enabling the compressions options
F2FS_FS_{LZ4,LZ4HZ,LZO,LZORLE,ZSTD} will make the (de)compression
algorithms {LZ4,LZ4HC,LZO,ZSTD}_{,DE}COMPRESS builtin instead of
modular, as the former depend on an intermediate boolean
F2FS_FS_COMPRESSION, which in-turn depends on tristate F2FS_FS.

Indeed, if a boolean symbol A depends directly on a tristate symbol B
and selects another tristate symbol C:

    tristate B

    tristate C

    bool A
      depends on B
      select C

and B is modular, then C will also be modular.

However, if there is an intermediate boolean D in the dependency chain
between A and B:

    tristate B

    tristate C

    bool D
      depends on B

    bool A
      depends on D
      select C

then the modular state won't propagate from B to C, and C will be
builtin instead of modular.

As modular dependency propagation through intermediate symbols is
obscure, fix this in a robust way by moving the selection of tristate
(de)compression algorithms from the boolean compression options to the
tristate main F2FS_FS option.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[cyberknight777: backport to 4.14]
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-04-02 16:27:11 +05:30
Alessio Balsini
e9dc3679b0
FROMLIST: fuse: Fix crediantials leak in passthrough read_iter
If the system doesn't have enough memory when fuse_passthrough_read_iter
is requested in asynchronous IO, an error is directly returned without
restoring the caller's credentials.
Fix by always ensuring credentials are restored.

Fixes: aa29f32988c1f84c96e2457b049dea437601f2cc ("FROMLIST: fuse: Use daemon creds in passthrough mode")
Link: https://lore.kernel.org/lkml/YB0qPHVORq7bJy6G@google.com/
Reported-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: I4aff43f5dd8ddab2cc8871cd9f81438963ead5b6
(cherry picked from commit 79a47db66416232bbc5b9fce8f417c9fad025fb1)
Signed-off-by: alk3pInjection <webmaster@raspii.tech>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-04-02 13:39:56 +05:30
Alessio Balsini
a9490d32bc
FROMLIST: fuse: Introduce passthrough for mmap
Enabling FUSE passthrough for mmap-ed operations not only affects
performance, but has also been shown as mandatory for the correct
functioning of FUSE passthrough.
yanwu noticed [1] that a FUSE file with passthrough enabled may suffer
data inconsistencies if the same file is also accessed with mmap. What
happens is that read/write operations are directly applied to the lower
file system (and its cache), while mmap-ed operations are affecting the
FUSE cache.

Extend the FUSE passthrough implementation to also handle memory-mapped
FUSE file, to both fix the cache inconsistencies and extend the
passthrough performance benefits to mmap-ed operations.

[1] https://lore.kernel.org/lkml/20210119110654.11817-1-wu-yan@tcl.com/

Bug: 179164095
Link: https://lore.kernel.org/lkml/20210125153057.3623715-9-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: Ifad4698b0380f6e004c487940ac6907b9a9f2964
Signed-off-by: Alessio Balsini <balsini@google.com>
(cherry picked from commit bf5cb932f0e0dd028dcebf3a6c2fcfedb4fd8265)
Signed-off-by: alk3pInjection <webmaster@raspii.tech>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-04-02 13:39:56 +05:30
Alessio Balsini
658dc211a1
FROMLIST: fuse: Use daemon creds in passthrough mode
When using FUSE passthrough, read/write operations are directly
forwarded to the lower file system file through VFS, but there is no
guarantee that the process that is triggering the request has the right
permissions to access the lower file system. This would cause the
read/write access to fail.

In passthrough file systems, where the FUSE daemon is responsible for
the enforcement of the lower file system access policies, often happens
that the process dealing with the FUSE file system doesn't have access
to the lower file system.
Being the FUSE daemon in charge of implementing the FUSE file
operations, that in the case of read/write operations usually simply
results in the copy of memory buffers from/to the lower file system
respectively, these operations are executed with the FUSE daemon
privileges.

This patch adds a reference to the FUSE daemon credentials, referenced
at FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl() time so that they can be used
to temporarily raise the user credentials when accessing lower file
system files in passthrough.
The process accessing the FUSE file with passthrough enabled temporarily
receives the privileges of the FUSE daemon while performing read/write
operations. Similar behavior is implemented in overlayfs.
These privileges will be reverted as soon as the IO operation completes.
This feature does not provide any higher security privileges to those
processes accessing the FUSE file system with passthrough enabled. This
is because it is still the FUSE daemon responsible for enabling or not
the passthrough feature at file open time, and should enable the feature
only after appropriate access policy checks.

Bug: 179164095
Link: https://lore.kernel.org/lkml/20210125153057.3623715-8-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: Idb4f03a2ce7c536691e5eaf8fadadfcf002e1677
Signed-off-by: Alessio Balsini <balsini@google.com>
(cherry picked from commit 5f3d78268b21d381310574af1c16882c7680ceb1)
Signed-off-by: alk3pInjection <webmaster@raspii.tech>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-04-02 13:39:56 +05:30
Alessio Balsini
5a0f00bb01
FROMLIST: fuse: Handle asynchronous read and write in passthrough
Extend the passthrough feature by handling asynchronous IO both for read
and write operations.

When an AIO request is received, if the request targets a FUSE file with
the passthrough functionality enabled, a new identical AIO request is
created. The new request targets the lower file system file and gets
assigned a special FUSE passthrough AIO completion callback.
When the lower file system AIO request is completed, the FUSE
passthrough AIO completion callback is executed and propagates the
completion signal to the FUSE AIO request by triggering its completion
callback as well.

Bug: 179164095
Link: https://lore.kernel.org/lkml/20210125153057.3623715-7-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: I47671ef36211102da6dd3ee8b2f226d1e6cd9d5c
Signed-off-by: Alessio Balsini <balsini@google.com>
(cherry picked from commit ea2b7a36847b14dee60d1f5dbf2aa26cf101c426)
Signed-off-by: alk3pInjection <webmaster@raspii.tech>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-04-02 13:39:56 +05:30
Alessio Balsini
0fe77264fe
FROMLIST: fuse: Introduce synchronous read and write for passthrough
All the read and write operations performed on fuse_files which have the
passthrough feature enabled are forwarded to the associated lower file
system file via VFS.

Sending the request directly to the lower file system avoids the
userspace round-trip that, because of possible context switches and
additional operations might reduce the overall performance, especially
in those cases where caching doesn't help, for example in reads at
random offsets.

Verifying if a fuse_file has a lower file system file associated with
can be done by checking the validity of its passthrough_filp pointer.
This pointer is not NULL only if passthrough has been successfully
enabled via the appropriate ioctl().
When a read/write operation is requested for a FUSE file with
passthrough enabled, a new equivalent VFS request is generated, which
instead targets the lower file system file.
The VFS layer performs additional checks that allow for safer operations
but may cause the operation to fail if the process accessing the FUSE
file system does not have access to the lower file system.

This change only implements synchronous requests in passthrough,
returning an error in the case of asynchronous operations, yet covering
the majority of the use cases.

Bug: 179164095
Link: https://lore.kernel.org/lkml/20210125153057.3623715-6-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: Ifbe6a247fe7338f87d078fde923f0252eeaeb668
Signed-off-by: Alessio Balsini <balsini@google.com>
(cherry picked from commit ea9685a7f9cb16b30e25386386274fdd30627c3a)
Signed-off-by: alk3pInjection <webmaster@raspii.tech>
Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-04-02 13:39:56 +05:30
Alessio Balsini
58aebf8c21
FROMLIST: fuse: Passthrough initialization and release
Implement the FUSE passthrough ioctl that associates the lower
(passthrough) file system file with the fuse_file.

The file descriptor passed to the ioctl by the FUSE daemon is used to
access the relative file pointer, that will be copied to the fuse_file
data structure to consolidate the link between the FUSE and lower file
system.

To enable the passthrough mode, user space triggers the
FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl and, if the call succeeds, receives
back an identifier that will be used at open/create response time in the
fuse_open_out field to associate the FUSE file to the lower file system
file.
The value returned by the ioctl to user space can be:
- > 0: success, the identifier can be used as part of an open/create
reply.
- <= 0: an error occurred.
The value 0 represents an error to preserve backward compatibility: the
fuse_open_out field that is used to pass the passthrough_fh back to the
kernel uses the same bits that were previously as struct padding, and is
commonly zero-initialized (e.g., in the libfuse implementation).
Removing 0 from the correct values fixes the ambiguity between the case
in which 0 corresponds to a real passthrough_fh, a missing
implementation of FUSE passthrough or a request for a normal FUSE file,
simplifying the user space implementation.

For the passthrough mode to be successfully activated, the lower file
system file must implement both read_iter and write_iter file
operations. This extra check avoids special pseudo files to be targeted
for this feature.
Passthrough comes with another limitation: no further file system
stacking is allowed for those FUSE file systems using passthrough.

Bug: 179164095
Link: https://lore.kernel.org/lkml/20210125153057.3623715-5-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: I4d8290012302fb4547bce9bb261a03cc4f66b5aa
Signed-off-by: Alessio Balsini <balsini@google.com>
(cherry picked from commit 28e86146c501a0f943fe9dc0ec0252df066a2b3d)
Signed-off-by: alk3pInjection <webmaster@raspii.tech>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-04-02 13:39:55 +05:30
Alessio Balsini
1f04880bf8
FROMLIST: fuse: Definitions and ioctl for passthrough
Expose the FUSE_PASSTHROUGH interface to user space and declare all the
basic data structures and functions as the skeleton on top of which the
FUSE passthrough functionality will be built.

As part of this, introduce the new FUSE passthrough ioctl, which allows
the FUSE daemon to specify a direct connection between a FUSE file and a
lower file system file. Such ioctl requires user space to pass the file
descriptor of one of its opened files through the fuse_passthrough_out
data structure introduced in this patch. This structure includes extra
fields for possible future extensions.
Also, add the passthrough functions for the set-up and tear-down of the
data structures and locks that will be used both when fuse_conns and
fuse_files are created/deleted.

Bug: 179164095
Link: https://lore.kernel.org/lkml/20210125153057.3623715-4-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: I732532581348adadda5b5048a9346c2b0868d539
Signed-off-by: Alessio Balsini <balsini@google.com>
(cherry picked from commit d02368d67989781a3484cd8dd71e0079d0d1bda2)
Signed-off-by: alk3pInjection <webmaster@raspii.tech>
Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-04-02 13:39:55 +05:30
Alessio Balsini
248bfcfd3d
FROMLIST: fuse: 32-bit user space ioctl compat for fuse device
With a 64-bit kernel build the FUSE device cannot handle ioctl requests
coming from 32-bit user space.
This is due to the ioctl command translation that generates different
command identifiers that thus cannot be used for direct comparisons
without proper manipulation.

Explicitly extract type and number from the ioctl command to enable
32-bit user space compatibility on 64-bit kernel builds.

Bug: 179164095
Link: https://lore.kernel.org/lkml/20210125153057.3623715-3-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: I595517c54d551be70e83c7fcb4b62397a3615004
Signed-off-by: Alessio Balsini <balsini@google.com>
(cherry picked from commit af4048924e191bda0bb85b4bf127f22cf3c70fba)
Signed-off-by: alk3pInjection <webmaster@raspii.tech>
Signed-off-by: Forenche <prahul2003@gmail.com>
2022-04-02 13:39:55 +05:30