Inlined files live in metadata and decrease storage requirements, but
may be limited to improve metadata-related performance. This is
especially important given the current plague of metadata performance.
Though decreasing inline_max may make metadata more dense and increase
block usage, so it's important to benchmark if optimizing for speed.
The underlying limits of inlined files haven't changed:
1. Inlined files need to fit in RAM, so <= cache_size
2. Inlined files need to fit in a single attr, so <= attr_max
3. Inlined files need to fit in 1/8 of a block to avoid metadata
overflow issues, this is after limiting by metadata_max,
so <= min(metadata_max, block_size)/8
By default, the largest possible inline_max is used. This preserves
backwards compatibility and is probably a good default for most use
cases.
This does have the awkward effect of requiring inline_max=-1 to
indicate disabled inlined files, but I don't think there's a good
way around this.
This extends lfs_fs_gc to now handle three things:
1. Calls mkconsistent if not already consistent
2. Compacts metadata > compact_thresh
3. Populates the block allocator
Which should be all of the janitorial work that can be done without
additional on-disk data structures.
Normally, metadata compaction occurs when an mdir is full, and results in
mdirs that are at most block_size/2.
Now, if you call lfs_fs_gc, littlefs will eagerly compact any mdirs that
exceed the compact_thresh configuration option. Because the resulting
mdirs are at most block_size/2, it only makes sense for compact_thresh to
be >= block_size/2 and <= block_size.
Additionally, there are some special values:
- compact_thresh=0 => defaults to ~88% block_size, may change
- compact_thresh=-1 => disables metadata compaction during lfs_fs_gc
Note that compact_thresh only affects lfs_fs_gc. Normal compactions
still only occur when full.
In separating the configuration of littlefs from the physical geometry
of the underlying device, we can no longer rely solely on lfs_config to
contain all of the information necessary for the simulated block devices
we use for testing.
This adds a new lfs_*bd_config struct for each of the block devices, and
new erase_size/erase_count fields. The erase_* name was chosen since
these reflect the (simulated) physical erase size and count of
erase-sized blocks, unlike the block_* variants which represent logical
block sizes used for littlefs's bookkeeping.
It may be worth adopting erase_size/erase_count in littlefs's config at
some point in the future, but at the moment doesn't seem necessary.
Changing the lfs_bd_config structs to be required is probably a good
idea anyways, as it moves us more towards separating the bds from
littlefs. Though we can't quite get rid of the lfs_config parameter
because of the block-device API in lfs_config. Eventually it would be
nice to get rid of it, but that would require API breakage.
The intention is to help interop with older minor versions of littlefs.
Unfortunately, since lfs2.0 drivers cannot mount lfs2.1 images, there are
situations where it would be useful to write to write strictly lfs2.0
compatible images. The solution here adds a "disk_version" configuration
option which determines the behavior of lfs2.1 dependent features.
Normally you would expect this to only change write behavior. But since the
main change in lfs2.1 increased validation of erased data, we also need to
skip this extra validation (fcrc) or see terrible slowdowns when writing.
When you add a function to every benchmark suite, you know if should
probably be provided by the benchmark runner itself. That being said,
randomness in tests/benchmarks is a bit tricky because it needs to be
strictly controlled and reproducible.
No global state is used, allowing tests/benches to maintain multiple
randomness stream which can be useful for checking results during a run.
There's an argument for having global prng state in that the prng could
be preserved across power-loss, but I have yet to see a use for this,
and it would add a significant requirement to any future test/bench runner.
- Fixed prettyasserts.py parsing when '->' is in expr
- Made prettyasserts.py failures not crash (yay dynamic typing)
- Fixed the initial state of the emubd disk file to match the internal
state in RAM
- Fixed true/false getting changed to True/False in test.py/bench.py
defines
- Fixed accidental substring matching in plot.py's --by comparison
- Fixed a missed LFS_BLOCk_CYCLES in test_superblocks.toml that was
missed
- Changed test.py/bench.py -v to only show commands being run
Including the test output is still possible with test.py -v -O-, making
the implicit inclusion redundant and noisy.
- Added license comments to bench_runner/test_runner
These are really just different flavors of test.py and test_runner.c
without support for power-loss testing, but with support for measuring
the cumulative number of bytes read, programmed, and erased.
Note that the existing define parameterization should work perfectly
fine for running benchmarks across various dimensions:
./scripts/bench.py \
runners/bench_runner \
bench_file_read \
-gnor \
-DSIZE='range(0,131072,1024)'
Also added a couple basic benchmarks as a starting point.
The main benefit is small test ids everywhere, though this is with the
downside of needing longer names to properly prefix and avoid
collisions. But this fits into the rest of the scripts with globally
unique names a bit better. This is a C project after all.
The other small benefit is test generators may have an easier time since
per-case symbols can expect to be unique.
This is really more work for the bench runner. With this change defines
can be manipulated at a rather high level at runtime. Which should be
useful for generating benchmarks across various dimensions.
The define grammar in the test_runner is now a bit more powerful,
accepting:
1. A single value: -DN=42
2. A list of values, which get permuted: -DN=1,2,3
3. A range: -DN=range(10)
4. Some combo: -DN=1,2,range(3,0,-1)
This is more complex in the test .toml defines, which can also be C
expressions:
1. A single value: define=42
2. A single expression: define='42*42'
3. A list: define=[1,2,3]
4. A comma separated string: define='1,2,3'
5. A range: define='42*range(10)'
6. This mess: define=[1,2,'3,4,range(2)*range(2)+3']
This is probably how the test runner should have been implemented in the
first place, but it took a few tries to get here.
This makes it so the test identifier, which is a bit longer now, fully
encodes the state of the defines in the test. This removes the need for
the extra geometry field and allows reproduction of tests with custom
defines at runtime.
The test runner may have already seemed like a solved problem, but these
changes are really to enable repurposing the test runner as a bench
runner.
Previously didn't think this would work without making test.py aware of
the number of implicit defines, which risks being incredibly fragile.
Fortunately it turns out we can defer the actual array size calculation
until the C preprocessor. This simplifies a few things.
Also a bitmap-based caching layer for the defines. Since the test
defines have been upgraded to callbacks recursive defines risk spending
a decent amount of time evaluating on every lookup. Some quick testing
shows 408015154 hits to 46160 misses so that's a good sign.
Also changed the geometries to be their own leb16-encoded part of the
test identifier. This means any geometry can be captured and reproduced
with just the test identifier. Here are the current test geometries:
./runners/test_runner --list-geometries
geometry read prog erase count size leb16
d,default 16 16 512 2048 1048576 g1gg2
e,eeprom 1 1 512 2048 1048576 1gg2
E,emmc 512 512 512 2048 1048576 gg2
n,nor 1 1 4096 256 1048576 1ggg1
N,nand 4096 4096 32768 32 1048576 ggg1ggg8
This mostly involved futzing around with some of the less intuitive
parts of Unix's named-pipes behavior.
This is a bit important since the tests can quickly generate several
gigabytes of trace output.
The main change here from the previous test framework design is:
1. Powerloss testing remains in-process, speeding up testing.
2. The state of a test, included all powerlosses, is encoded in the
test id + leb16 encoded powerloss string. This means exhaustive
testing can be run in CI, but then easily reproduced locally with
full debugger support.
For example:
./scripts/test.py test_dirs#reentrant_many_dir#10#1248g1g2 --gdb
Will run the test test_dir, case reentrant_many_dir, permutation #10,
with powerlosses at 1, 2, 4, 8, 16, and 32 cycles. Dropping into gdb
if an assert fails.
The changes to the block-device are a work-in-progress for a
lazily-allocated/copy-on-write block device that I'm hoping will keep
exhaustive testing relatively low-cost.
This simplifies the interaction between code generation and the
test-runner.
In theory it also reduces compilation dependencies, but internal tests
make this difficult.
This mostly required names for each test case, declarations of
previously-implicit variables since the new test framework is more
conservative with what it declares (the small extra effort to add
declarations is well worth the simplicity and improved readability),
and tweaks to work with not-really-constant defines.
Also renamed test_ -> test, replacing the old ./scripts/test.py,
unfortunately git seems to have had a hard time with this.
- Added --exec for wrapping the test-runner with external commands, such as
Qemu or Valgrind.
- Added --valgrind, which just aliases --exec=valgrind with a few extra
flags useful during testing.
- Dropped the "valgrind" type for tests. These aren't separate tests
that run in the test-runner, and I don't see a need for disabling
Valgrind for any tests. This can be added back later if needed.
- Readded support for dropping directly into gdb after a test failure,
either at the assert failure, entry point of test case, or entry point
of the test runner with --gdb, --gdb-case, or --gdb-main.
- Added --isolate for running each test permutation in its own process,
this is required for associating Valgrind errors with the right test
case.
- Fixed an issue where explicit test identifier conflicted with
per-stage test identifiers generated as a part of --by-suite and
--by-case.
Previously test defines were implemented using layers of index-mapped
uintmax_t arrays. This worked well for lookup, but limited defines to
constants computed at compile-time. Since test defines themselves are
actually calculated at _run-time_ (yeah, they have deviated quite
a bit from the original, compile-time evaluated defines, which makes
the name make less sense), this means defines can't depend on other
defines. Which was limiting since a lot of test defines relied on
defines generated from the geometry being tested.
This new implementation uses callbacks for the per-case defines. This
means they can easily contain full C statements, which can depend on
other test defines. This does means you can create infinitely-recursive
defines, but the test-runner will just break at run-time so don't do that.
One concern is that there might be a performance hit for evaluating all
defines through callbacks, but if there is it is well below the noise
floor:
- constants: 43.55s
- callbacks: 42.05s
- Added internal tests, which can run tests inside other source files,
allowing access to "private" functions and data
Note this required a special bit of handling our defining and later
undefining test configurations to not polute the namespace of the
source file, since it can end up with test cases from different
suites/configuration namespaces.
- Removed unnecessary/unused permutation argument to generated test
functions.
- Some cleanup to progress output of test.py.
In the test-runner, defines are parameterized constants (limited
to integers) that are generated from the test suite tomls resulting
in many permutations of each test.
In order to make this efficient, these defines are implemented as
multi-layered lookup tables, using per-layer/per-scope indirect
mappings. This lets the test-runner and test suites define their
own defines with compile-time indexes independently. It also makes
building of the lookup tables very efficient, since they can be
incrementally populated as we expand the test permutations.
The four current define layers and when we need to build them:
layer defines predefine_map define_map
user-provided overrides per-run per-run per-suite
per-permutation defines per-perm per-case per-perm
per-geometry defines per-perm compile-time -
default defines compile-time compile-time -
- Added filtering based on suite, case, perm, type, geometry
- Added --skip, --count, and --every (will be used for parallelism)
- Implemented --list-defines
- Better helptext for flags with arguments
- Other minor tweaks
- Indirect index map instead of bitmap+sparse array
- test_define_t and test_type_t
- Added back conditional filtering
- Added suite-level defines and filtering
This moves defines entirely into the runtime of the test_runner,
simplifying thing and reducing the amount of generated code that needs
to be build, at the cost of limiting test-defines to uintmax_t types.
This is implemented using a set of index-based scopes (created by
test.py) that allow different layers to override defines from other
layers, accessible through the global `test_define` function.
layers:
1. command-line overrides
2. per-case defines
3. per-geometry defines
This is to try a different design for testing, the goals are to make the
test infrastructure a bit simpler, with clear stages for building and
running, and faster, by avoiding rebuilding lfs.c n-times.