The only reason we needed this alignment was for the lookahead buffer.
Now that the lookahead buffer is relaxed to operate on bytes, we can
relax our malloc alignment requirement all the way down to the byte
level, since we mainly use lfs_malloc to allocate byte-level buffers.
This does introduce a risk that we might need word-level mallocs in the
future. If that happens we will need to decide if changing the malloc
alignment is a breaking change, or gate alignment requirements behind
user provided defines.
Found by HiFiPhile.
Now you can override littlefs's CRC implementation with some simple
defines:
-DLFS_CRC=lfs_crc
The motivation for this is the same for LFS_MALLOC/LFS_FREE. I think
these are the main "system-level" utils that users want to override.
Don't override with this something that's not CRC32! Your filesystem
will no longer be compatible with other tools! This is only intended for
provided hardware acceleration!
Now you can override littlefs's malloc with some simple defines:
-DLFS_MALLOC=my_malloc
-DLFS_FREE=my_free
This is probably what most users expected when wanting to override
malloc/free in littlefs, but it hasn't been available, since instead
littlefs provides a file-level override of builtin utils.
The thinking was that there's just too many builtins that could be
overriden, lfs_max/min/alignup/npw2/etc/etc/etc, so allowing users to
just override the util file provides the best flexibility without a ton
of ifdefs.
But it's become clear this is awkward for users that just want to
replace malloc.
Maybe the original goal was too optimistic, maybe there's a better way
to structure this file, or maybe the best API is just a bunch of ifdefs,
I have no idea! This will hopefully continue to evolve.
Likely included at some point for ssize_t, this is no longer needed and
causes some problems for embedded compilers.
Currently littlefs doesn't even use size_t/ssize_t in its definition of
lfs_size_t/lfs_ssize_t, so I don't think this will ever be required.
Found by LDong-Arm, vvn-git
The logic for endiannes conversion was wrong when LFS_NO_INTRINSICS was
set, since on endinanes match a check of that macro would prevent the
unchanged value from being returned.
Signed-off-by: Carles Cufi <carles.cufi@nordicsemi.no>
The main change here from the previous test framework design is:
1. Powerloss testing remains in-process, speeding up testing.
2. The state of a test, included all powerlosses, is encoded in the
test id + leb16 encoded powerloss string. This means exhaustive
testing can be run in CI, but then easily reproduced locally with
full debugger support.
For example:
./scripts/test.py test_dirs#reentrant_many_dir#10#1248g1g2 --gdb
Will run the test test_dir, case reentrant_many_dir, permutation #10,
with powerlosses at 1, 2, 4, 8, 16, and 32 cycles. Dropping into gdb
if an assert fails.
The changes to the block-device are a work-in-progress for a
lazily-allocated/copy-on-write block device that I'm hoping will keep
exhaustive testing relatively low-cost.
This is useful for testing the new erroring assert behavior in CI.
Asserts do not error by default, so this macro needs to be overriden.
It is possible to test this behavior using the existing option of
overriding lfs_util.h with a custom file, by using a small sed
one-line script. But this is much simpler.
This does raise the question if more of the configuration options in
lfs_util.h should be opened up for function-like macro overrides.
__VA_ARGS__ are frustrating in C. Even for their main purpose (printf),
they fall short in that they don't have a _portable_ way to have zero
arguments after the format string in a printf call.
Even if we detect compilers and use ##__VA_ARGS__ where available, GCC
emits a warning with -pedantic that is _impossible_ to explicitly
disable.
This commit contains the best solution we can think of. A bit of
indirection that adds a hidden "%s" % "" to the end of the format
string. This solution does not work everywhere as it has a runtime
cost, but it is hopefully ok for debug statements.
This is the start of reworking littlefs's testing framework based on
lessons learned from the initial testing framework.
1. The testing framework needs to be _flexible_. It was hacky, which by
itself isn't a downside, but it wasn't _flexible_. This limited what
could be done with the tests and there ended up being many
workarounds just to reproduce bugs.
The idea behind this revamped framework is to separate the
description of tests (tests/test_dirs.toml) and the running of tests
(scripts/test.py).
Now, with the logic moved entirely to python, it's possible to run
the test under varying environments. In addition to the "just don't
assert" run, I'm also looking to run the tests in valgrind for memory
checking, and an environment with simulated power-loss.
The test description can also contain abstract attributes that help
control how tests can be ran, such as "leaky" to identify tests where
memory leaks are expected. This keeps test limitations at a minimum
without limiting how the tests can be ran.
2. Multi-stage-process tests didn't really add value and limited what
the testing environment.
Unmounting + mounting can be done in a single process to test the
same logic. It would be really difficult to make this fail only
when memory is zeroed, though that can still be caught by
power-resilient tests.
Requiring every test to be a single process adds several options
for test execution, such as using a RAM-backed block device for
speed, or even running the tests on a device.
3. Added fancy assert interception. This wasn't really a requirement,
but something I've been wanting to experiment with for a while.
During testing, scripts/explode_asserts.py is added to the build
process. This is a custom C-preprocessor that parses out assert
statements and replaces them with _very_ verbose asserts that
wouldn't normally be possible with just C macros.
It even goes as far as to report the arguments to strcmp, since the
lack of visibility here was very annoying.
tests_/test_dirs.toml:186:assert: assert failed with "..", expected eq "..."
assert(strcmp(info.name, "...") == 0);
One downside is that simply parsing C in python is slower than the
entire rest of the compilation, but fortunately this can be
alleviated by parallelizing the test builds through make.
Other neat bits:
- All generated files are a suffix of the test description, this helps
cleanup and means it's (theoretically) possible to parallelize the
tests.
- The generated test.c is shoved base64 into an ad-hoc Makefile, this
means it doesn't force a rebuild of tests all the time.
- Test parameterizing is now easier.
- Hopefully this framework can be repurposed also for benchmarks in the
future.
To use, compile and run with LFS_YES_TRACE defined:
make CFLAGS+=-DLFS_YES_TRACE=1 test_format
The name LFS_YES_TRACE was chosen to match the LFS_NO_DEBUG and
LFS_NO_WARN defines for the similar levels of output. The YES is
necessary to avoid a conflict with the actual LFS_TRACE macro that
gets emitting. LFS_TRACE can also be defined directly to provide
a custom trace formatter.
Hopefully having trace statements at the littlefs C API helps
debugging and reproducing issues.
In v2, the lookahead_buffer was changed from requiring 4-byte alignment
to requiring 8-byte alignment. This was not documented as well as it
could be, and as FabianInostroza noted, this also implies that
lfs_malloc must provide 8-byte alignment.
To protect against this, I've also added an assert on the alignment of
both the lookahead_size and lookahead_buffer.
found by FabianInostroza and amitv87
The main difference here is a change from encoding "hasorphans" and
"hasmove" bits in the tag itself. This worked with the old format, but
in the new format the space these bits take up must be consistent for
each tag type. The tradeoff is that the new tag format allows for up to
256 different global states which may be useful in the future (for
example, a global free list).
The new format encodes this info in the data blob, using an additional
word of storage. This word is actually formatted the same as though it
was a tag, which simplified internal handling and may allow other tag
types in the future.
Format for global state:
[---- 96 bits ----]
[1|- 11 -|- 10 -|- 10 -|--- 64 ---]
^ ^ ^ ^ ^- move dir pair
| | | \-------------------------- unused, must be 0s
| | \--------------------------------- move id
| \---------------------------------------- type, 0xfff for move
\--------------------------------------------- has orphans
This also included another iteration over globals (renamed to gstate)
with some simplifications to how globals are handled.
There was an interesting subtlety with the existing layout of tags that
could become a problem in the future. Basically, littlefs avoids writing to
any region of storage it is not absolutely sure has been erased
beforehand. This is a part of limiting the number of assumptions about
storage. It's possible a storage technology can't support writes without
erases in a way that is undetectable at write time (Maybe changing a bit
without an erase decreases the longevity of the information stored on
the bit).
But the existing layout had a very tiny corner case where this wasn't
true. Consider the location of the valid bit in the tag struct:
[1|--- 31 ---]
^--- valid bit
The responsibility of this bit is to indicate if an attempt has been
made to write the following commit. If it is not set (the specific value
is dependent on a previous read and identified by the preceeding commit),
the assumption is that it is safe to write to the next region because it
has been erased previously. If it is set, we check if the next commit is
valid, if it isn't (because of CRC failure, likely due to power-loss), we
discard the commit. But because an attempt has been made to write to
that storage, we must then do a compaction to move to the other block in
the metadata-pair.
This plan looks good on paper, but what does it look like on storage?
The problem is that words in littlefs are in little-endian. So on
storage the tag actually looks like this:
[- 8 -|- 8 -|- 8 -|1|- 7 -]
^-- valid bit
This means that we don't actually set the valid bit before writing the
tag! We write the lower bytes first. If we lose power, we may have
written 3 bytes without this fact being detectable.
We could restructure the tag structure to store the valid bit lower,
however because none of the fields are 7 bits, this would make the
extraction more costly, and we then lose the ability to check this
valid bit with a sign comparison.
The simple solution is to just store the tag in big-endian. A small
benefit is that this will actually have a negative code cost on
big-endian machines.
This mixture of endiannesses is frustrating, however it is a pragmatic
solution with only a 20-byte code size cost.
Found while testing big-endian support. Basically, if littlefs is really
really unlucky, the block allocator could kick in while committing a
file's CTZ reference. If this happens, the block allocator will need to
traverse all CTZ skip-lists in memory, including the skip-list we're
committing. This means we can't convert the CTZ's endianness in place,
and need to make a copy on big-endian systems.
We rely on dead-code elimination from the compiler to make the
conditional behaviour for big-endian vs little-endian system a noop
determined by the lfs_tole32 intrinsic.
In looking at the common CRC APIs out there, this seemed the most
common. At least more common than the current modified-in-place pointer
API. It also seems to have a slightly better code footprint. I'm blaming
pointer optimization issues.
One downside is that lfs_crc can't report errors, however it was already
assumed that lfs_crc can not error.
The introduction of an explicit cache_size configuration allows
customization of the cache buffers independently from the hardware
read/write sizes.
This has been one of littlefs's main handicaps. Without a distinction
between cache units and hardware limitations, littlefs isn't able to
read or program _less_ than the cache size. This leads to the
counter-intuitive case where larger cache sizes can actually be harmful,
since larger read/prog sizes require sending more data over the bus if
we're only accessing a small set of data (for example the CTZ skip-list
traversal).
This is compounded with metadata logging, since a large program size
limits the number of commits we can write out in a single metadata
block. It really doesn't make sense to link program size + cache
size here.
With a separate cache_size configuration, we can be much smarter about
what we actually read/write from disk.
This also simplifies cache handling a bit. Before there were two
possible cache sizes, but these were rarely used. Note that the
cache_size is NOT written to the superblock and can be freely changed
without breaking backwards compatibility.
This is a big change stemming from the fact that resizable entries
were surprisingly complicated to implement and came in with a sizable
code cost.
The theory is that the journalling has a comparable cost to resizable
entries. Both need to handle overflowing blocks, and managing offsets is
comparable to managing attribute IDs. But by jumping all the way to full
journaling, we can statically wear-level the metadata written to
metadata pairs.
The idea of journaling littlefs's metadata has been mentioned several times in
discussions and fits well into how littlefs works. You could even view the
existing metadata log as a log of size 2.
The downside of this approach is that changing the metadata in this way
would break compatibility from the existing layout on disk. Something
that resizable entries does not do.
That being said, adopting journaling at the metadata layer offers a big
improvement to littlefs's performance and wear-leveling, with very
little cost (maybe even none or negative after resizable entries?).
- Fixed shadowed variable warnings in lfs_dir_find.
- Fixed unused parameter warnings when LFS_NO_MALLOC is enabled.
- Added extra warning flags to CFLAGS.
- Updated tests so they don't shadow the "size" variable for -Wshadow
Suggested by sn00pster, LFS_CONFIG is an opt-in user provided
configuration file that will override the util implementation in
lfs_util.h. This is useful for allowing system-specific overrides
without needing to rely on git merges or other forms of patching
for updates.
Note: It's still expected to modify lfs_utils.h when porting littlefs
to a new target/system. There's just too much room for system-specific
improvements, such as taking advantage of CRC hardware.
Rather, encouraging modification of lfs_util.h and making it easy to
modify and debug should result in better integration with the consuming
systems.
This just adds a bunch of quality-of-life improvements that should help
development and integration in littlefs.
- Macros that require no side-effects are all-caps
- System includes are only brought in when needed
- Malloc/free wrappers
- LFS_NO_* checks for quickly disabling things at the command line
- At least a little-bit more docs
Required to support big-endian processors, with the most notable being
the PowerPC architecture.
On little-endian architectures, these conversions can be optimized out
and have no code impact.
Initial patch provided by gmouchard
This helps significantly with supporting different compilers. Intrinsics for
different compilers can be added as they are found.
Note that for ARMCC, __builtin_ctz is not used. This was the result of a
strange issue where ARMCC only emits __builtin_ctz when passed the
--gnu flag, but __builtin_clz and __builtin_popcount are always emitted.
This isn't a big problem since the ARM instruction set doesn't have a
ctz instruction, and the npw2 based implementation is one of the most
efficient.
Also note that for littefs's purposes, we consider ctz(0) to be
undefined. This lets us save a branch in the software lfs_ctz
implementation.
This reduces the O(n^2logn) runtime to read a file to only O(nlog).
The extra O(n) did not touch the disk, so it isn't a problem until the
files become very large, but this solution comes with very little cost.
Long story short, you can find the block index + offset pair for a
CTZ linked-list with this series of formulas:
n' = floor(N / (B - 2w/8))
N' = (B - 2w/8)n' + (w/8)popcount(n')
off' = N - N'
n, off =
n'-1, off'+B if off' < 0
n', off'+(w/8)(ctz(n')+1) if off' >= 0
For the long story, you will need to see the updated DESIGN.md
Before, the littlefs relied on the underlying block device
to report corruption that occurs when writing data to disk.
This requirement is easy to miss or implement incorrectly, since
the error detection is only required when a block becomes corrupted,
which is very unlikely to happen until late in the block device's
lifetime.
The littlefs can detect corruption itself by reading back written data.
This requires a bit of care to reuse the available buffers, and may rely
on checksums to avoid additional RAM requirements.
This does have a runtime penalty with the extra read operations, but
should make the littlefs much more robust to different implementations.
Adopted buffer followed by size. The other order was original
chosen due to some other functions with a more complicated
parameter list.
This convention is important, as the bd api is one of the main
apis facing porting efforts.
This adds caching of the most recent read/program blocks, allowing
support of devices that don't have byte-level read+writes, along
with reduced device access on devices that do support byte-level
read+writes.
Note: The current implementation is a bit eager to drop caches where
it simplifies the cache layer. This layer is already complex enough.
Note: It may be worthwhile to add a compile switch for caching to
reduce code size, note sure.
Note: This does add a dependency on malloc, which could have a porting
layer, but I'm just using the functions from stdlib for now. These can be
overwritten with noops if the user controls the system, and keeps things
simple for now.
After quite a bit of prototyping, settled on the following functions:
- lfs_dir_alloc - create a new dir
- lfs_dir_fetch - load and check a dir pair from disk
- lfs_dir_commit - save a dir pair to disk
- lfs_dir_shift - shrink a dir pair to disk
- lfs_dir_append - add a dir entry, creating dirs if needed
- lfs_dir_remove - remove a dir entry, dropping dirs if needed
Additionally, followed through with a few other tweaks