Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.
So remove it in the files in this commit, none of which can be built as
modules.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Gabriel Krisman Bertazi <krisman@suse.de>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com>
Cc: Gabriel Krisman Bertazi <krisman@collabora.com>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: mrsrimar22 <mar.pashter1922@gmail.com>
Signed-off-by: chrisl7 <wandersonrodriguesf1@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Turn the CONFIG_UNICODE symbol into a tristate that generates some always
built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol.
Note that a lot of the IS_ENABLED() checks could be turned from cpp
statements into normal ifs, but this change is intended to be fairly
mechanic, so that should be cleaned up later.
Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Change-Id: I91c9031a7320e996b1cd931a18d79bfab05ee959
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: mrsrimar22 <mar.pashter1922@gmail.com>
Signed-off-by: chrisl7 <wandersonrodriguesf1@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Move the static keyword to the front of declarations of nfdi_test_data
and nfdicf_test_data, and resolve the following compiler warnings that
can be seen when building with warnings enabled (W=1):
fs/unicode/utf8-selftest.c:38:1: warning:
‘static’ is not at beginning of declaration [-Wold-style-declaration]
fs/unicode/utf8-selftest.c:92:1: warning:
‘static’ is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: mrsrimar22 <mar.pashter1922@gmail.com>
Signed-off-by: chrisl7 <wandersonrodriguesf1@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Commit 2b3d04787012 ("unicode: Add utf8-data module") changed the
generated utf8data file from 'utf8data.h' to 'utf8data.c', but didn't
change the comments or the .gitignore to match.
The comments should be updated too, but at least they don't cause any
visible breakage. But the gitignore file needs changing to avoid git
complaining about untracked files.
Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: mrsrimar22 <mar.pashter1922@gmail.com>
Signed-off-by: chrisl7 <wandersonrodriguesf1@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
The exported symbols in utf8-norm.c are not needed for normal
file system consumers, so move them to conditional _GPL exports
just for the selftest.
Change-Id: Ie9170398b992e5b4ece631ba302e3e8feb8b5ef7
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: chrisl7 <wandersonrodriguesf1@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
utf8data.h contains a large database table which is an auto-generated
decodification trie for the unicode normalization functions.
Allow building it into a separate module.
Based on a patch from Shreeya Patel <shreeya.patel@collabora.com>.
Change-Id: I6ce97a0249b1fedbd27b3f05d73459f5a02d3e75
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: chrisl7 <wandersonrodriguesf1@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Instead of repeatedly looking up the version add pointers to the
NFD and NFD+CF tables to struct unicode_map, and pass a
unicode_map plus index to the functions using the normalization
tables.
Change-Id: Iced7cfaa61057defd15581258a8d1b093f041759
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: chrisl7 <wandersonrodriguesf1@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Only used by the tests, so no need to keep it in the core.
Change-Id: Id1f83cc3575f2683e7020268e34336b68eb55318
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: chrisl7 <wandersonrodriguesf1@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Just use the utf8nlen implementation with a (size_t)-1 len argument,
similar to utf8_lookup. Also move the function to utf8-selftest.c, as
it isn't used anywhere else.
Change-Id: I6a5b7f73de9493ed76a2aa884db1cc6ee99655d3
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: chrisl7 <wandersonrodriguesf1@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
No actually used anywhere.
Change-Id: Ia319ea78385ec20c96036e83aac378638429e6cd
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: chrisl7 <wandersonrodriguesf1@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
Don't bother with pointless string parsing when the caller can just pass
the version in the format that the core expects. Also remove the
fallback to the latest version that none of the callers actually uses.
Change-Id: I4c0056bcdce44ea655d5da0f69f92f7a1a590445
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: chrisl7 <wandersonrodriguesf1@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
It is hardcoded and only used for a f2fs sysfs file where it can be
hardcoded just as easily.
Change-Id: I4a0e2e143a62c411dbcff89a2398ed72e8125500
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: chrisl7 <wandersonrodriguesf1@gmail.com>
Signed-off-by: Richard Raya <rdxzv.dev@gmail.com>
This adds a case insensitive hash function to allow taking the hash
without needing to allocate a casefolded copy of the string.
The existing d_hash implementations for casefolding allocate memory
within rcu-walk, by avoiding it we can be more efficient and avoid
worrying about a failed allocation.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Temporarily cache a casefolded version of the file name under lookup in
ext4_filename, to avoid repeatedly casefolding it. I got up to 30%
speedup on lookups of large directories (>100k entries), depending on
the length of the string under lookup.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
scripts/mkutf8data is used only when regenerating utf8data.h,
which never happens in the normal kernel build. However, it is
irrespectively built if CONFIG_UNICODE is enabled.
Moreover, there is no good reason for it to reside in the scripts/
directory since it is only used in fs/unicode/.
Hence, move it from scripts/ to fs/unicode/.
In some cases, we bypass build artifacts in the normal build. The
conventional way to do so is to surround the code with ifdef REGENERATE_*.
For example,
- 7373f4f83c71 ("kbuild: add implicit rules for parser generation")
- 6aaf49b495b4 ("crypto: arm,arm64 - Fix random regeneration of S_shipped")
I rewrote the rule in a more kbuild'ish style.
In the normal build, utf8data.h is just shipped from the check-in file.
$ make
[ snip ]
SHIPPED fs/unicode/utf8data.h
CC fs/unicode/utf8-norm.o
CC fs/unicode/utf8-core.o
CC fs/unicode/utf8-selftest.o
AR fs/unicode/built-in.a
If you want to generate utf8data.h based on UCD, put *.txt files into
fs/unicode/, then pass REGENERATE_UTF8DATA=1 from the command line.
The mkutf8data tool will be automatically compiled to generate the
utf8data.h from the *.txt files.
$ make REGENERATE_UTF8DATA=1
[ snip ]
HOSTCC fs/unicode/mkutf8data
GEN fs/unicode/utf8data.h
CC fs/unicode/utf8-norm.o
CC fs/unicode/utf8-core.o
CC fs/unicode/utf8-selftest.o
AR fs/unicode/built-in.a
I renamed the check-in utf8data.h to utf8data.h_shipped so that this
will work for the out-of-tree build.
You can update it based on the latest UCD like this:
$ make REGENERATE_UTF8DATA=1 fs/unicode/
$ cp fs/unicode/utf8data.h fs/unicode/utf8data.h_shipped
Also, I added entries to .gitignore and dontdiff.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Regenerate utf8data.h based on the latest UCD files and run tests
against the latest version.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This implements a in-kernel sanity test module for the utf8
normalization core. At probe time, it will run basic sequences through
the utf8n core, to identify problems will equivalent sequences and
normalization/casefold code. This is supposed to be useful for
regression testing when adding support for a new version of utf8 to
linux.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This patch integrates the utf8n patches with some higher level API to
perform UTF-8 string comparison, normalization and casefolding
operations. Implemented is a variation of NFD, and casefold is
performed by doing full casefold on top of NFD. These algorithms are
based on the core implemented by Olaf Weber from SGI.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Remove the Hangul decompositions from the utf8data trie, and do
algorithmic decomposition to calculate them on the fly. To store the
decomposition the caller of utf8lookup()/utf8nlookup() must provide a
12-byte buffer, which is used to synthesize a leaf with the
decomposition. This significantly reduces the size of the utf8data[]
array.
Changes made by Gabriel:
Rebase to mainline
Fix checkpatch errors
Extract robustness fixes and merge back to original mkutf8data.c patch
Regenerate utf8data.h
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Supporting functions for UTF-8 normalization are in utf8norm.c with the
header utf8norm.h. Two normalization forms are supported: nfdi and
nfdicf.
nfdi:
- Apply unicode normalization form NFD.
- Remove any Default_Ignorable_Code_Point.
nfdicf:
- Apply unicode normalization form NFD.
- Remove any Default_Ignorable_Code_Point.
- Apply a full casefold (C + F).
For the purposes of the code, a string is valid UTF-8 if:
- The values encoded are 0x1..0x10FFFF.
- The surrogate codepoints 0xD800..0xDFFFF are not encoded.
- The shortest possible encoding is used for all values.
The supporting functions work on null-terminated strings (utf8 prefix)
and on length-limited strings (utf8n prefix).
From the original SGI patch and for conformity with coding standards,
the utf8data_t typedef was dropped, since it was just masking the struct
keyword. On other occasions, namely utf8leaf_t and utf8trie_t, I
decided to keep it, since they are simple pointers to memory buffers,
and using uchars here wouldn't provide any more meaningful information.
From the original submission, we also converted from the compatibility
form to canonical.
Changes made by Gabriel:
Rebase to Mainline
Fix up checkpatch.pl warnings
Drop typedefs
move out of libxfs
Convert from NFKD to NFD
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The decomposition and casefolding of UTF-8 characters are described in a
prefix tree in utf8data.h, which is a generate from the Unicode
Character Database (UCD), published by the Unicode Consortium, and
should not be edited by hand. The structures in utf8data.h are meant to
be used for lookup operations by the unicode subsystem, when decoding a
utf-8 string.
mkutf8data.c is the source for a program that generates utf8data.h. It
was written by Olaf Weber from SGI and originally proposed to be merged
into Linux in 2014. The original proposal performed the compatibility
decomposition, NFKD, but the current version was modified by me to do
canonical decomposition, NFD, as suggested by the community. The
changes from the original submission are:
* Rebase to mainline.
* Fix out-of-tree-build.
* Update makefile to build 11.0.0 ucd files.
* drop references to xfs.
* Convert NFKD to NFD.
* Merge back robustness fixes from original patch. Requested by
Dave Chinner.
The original submission is archived at:
<https://linux-xfs.oss.sgi.narkive.com/Xx10wjVY/rfc-unicode-utf-8-support-for-xfs>
The utf8data.h file can be regenerated using the instructions in
fs/unicode/README.utf8data.
- Notes on the update from 8.0.0 to 11.0:
The structure of the ucd files and special cases have not experienced
any changes between versions 8.0.0 and 11.0.0. 8.0.0 saw the addition
of Cherokee LC characters, which is an interesting case for
case-folding. The update is accompanied by new tests on the test_ucd
module to catch specific cases. No changes to mkutf8data script were
required for the updates.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>