- Write on read-only file to return LFS_ERR_BADF
- Renaming directory onto file to return LFS_ERR_NOTEMPTY
- Changed LFS_ERR_INVAL in lfs_file_seek to assert
An annoying part of filesystems is that the software library can change
independently of the on-disk structures. For this reason versioning is
very important, and must be handled separately for the software and
on-disk parts.
In this patch, littlefs provides two version numbers at compile time,
with major and minor parts, in the form of 6 macros.
LFS_VERSION // Library version, uint32_t encoded
LFS_VERSION_MAJOR // Major - Backwards incompatible changes
LFS_VERSION_MINOR // Minor - Feature additions
LFS_DISK_VERSION // On-disk version, uint32_t encoded
LFS_DISK_VERSION_MAJOR // Major - Backwards incompatible changes
LFS_DISK_VERSION_MINOR // Minor - Feature additions
Note that littlefs will error if it finds a major version number that
is different, or a minor version number that has regressed.
As a copy-on-write filesystem, the truncate function is a very nice
function to have, as it can take advantage of reusing the data already
written out to disk.
littlefs had an unwritten assumption that the block device's program
size would be a multiple of the read size, and the block size would
be a multiple of the program size. This has already caused confusion
for users. Added a note and assert to catch unexpected geometries
early.
Also found that the prog/erase functions indicated they must return
LFS_ERR_CORRUPT to catch bad blocks. This is no longer true as errors
are found by CRC.
As it was, if a user operated on a directory while at the same
time iterating over the directory, the directory objects could
fall out of sync. In the best case, files may be skipped while
removing everything in a file, in the worst case, a very poorly
timed directory relocate could be missed.
Simple fix is to add the same directory tracking that is currently
in use for files, at a small code+complexity cost.
Short story, files are no longer committed to directories during
file sync/close if the last write did not complete successfully.
This avoids a set of interesting user-experience issues related
to the end-of-life behaviour of the filesystem.
As a filesystem approaches end-of-life, the chances of running into
LFS_ERR_NOSPC grows rather quickly. Since this condition occurs after
at the end of a devices life, it's likely that operating in these
conditions hasn't been tested thoroughly.
In the specific case of file-writes, you can hit an LFS_ERR_NOSPC after
parts of the file have been written out. If the program simply continues
and closes the file, the file is written out half completed. Since
littlefs has a strong garuntee the prevents half-writes, it's unlikely
this state of the file would be expected.
To make things worse, since close is also responsible for memory
cleanup, it's actually _impossible_ to continue working as it was
without leaking memory.
By prevent the file commits, end-of-life behaviour should at least retain
a previous copy of the filesystem without any surprises.
The littlefs allows buffers to be passed statically in the case
that a system does not have a heap. Unfortunately, this means we
can't round up in the case of an unaligned lookahead buffer.
Double unfortunately, rounding down after clamping to the block device
size could result in a lookahead of zero for block devices < 32 blocks
large.
The assert in littlefs does catch this case, but rounding down prevents
support for < 32 block devices.
The solution is to simply require a 32-bit aligned buffer with an
assert. This avoids runtime problems while allowing a user to pass
in the correct buffer for < 32 block devices. Rounding up can be
handled at higher API levels.
Deduplication and deorphan steps aren't required under indentical
conditions, but they can be processed in the same iteration of the
filesystem. Since lfs_alloc (requires deorphan) occurs on most write
calls to the filesystem (requires deduplication), it was simpler to
just compine the steps into a single lfs_deorphan step.
Also traded out the places where lfs_rename/lfs_remove just defer
operations to the deorphan step. This adds a bit of code, but also
significantly speeds up directory operations.
The "move problem" has been present in littlefs for a while, but I haven't
come across a solution worth implementing for various reasons.
The problem is simple: how do we move directory entries across
directories atomically? Since multiple directory entries are involved,
we can't rely entirely on the atomic block updates. It ends up being
a bit of a puzzle.
To make the problem more complicated, any directory block update can
fail due to wear, and cause the directory block to need to be relocated.
This happens rarely, but brings a large number of corner cases.
---
The solution in this patch is simple:
1. Mark source as "moving"
2. Copy source to destination
3. Remove source
If littlefs ever runs into a "moving" entry, that means a power loss
occured during a move. Either the destination entry exists or it
doesn't. In this case we just search the entire filesystem for the
destination entry.
This is expensive, however the chance of a power loss during a move
is relatively low.
Simply limiting the lookahead region to the size of
the block device fixes the problem.
Also added logic to limit the allocated region and
floor to nearest word, since the additional memory
couldn't really be used effectively.
Zero attributes are actually supported at the moment, but this change
will allow entry attribute to be added in a backwards compatible manner.
Each dir entry is now prefixed with a 32 bit tag:
4b - entry type
4b - data structure
8b - entry len
8b - attribute len
8b - name len
A full entry on disk looks a bit like this:
[- 8 -|- 8 -|- 8 -|- 8 -|-- elen --|-- alen --|-- nlen --]
[ type | elen | alen | nlen | entry | attrs | name ]
The actually contents of the attributes section is a bit handwavey
until the first attributes are implemented, but to put plans in place:
Each attribute will be prefixed with only a byte that indicates the type
of attribute. Attributes should be sorted based on portability, since
unknown attributes will force attribute parsing to stop.
This provides a path for adding inlined files in the future, which
requires multiple lengths to distinguish between the file data and name.
As an extra bonus, the directory can now be iterated over even if the
types are unknown, since the name's representation is consistent on all
entry types.
This does come at the cost of reducing types from 16-bits to 8-bits, but
I doubt this will become a problem.
Before, the littlefs relied on the underlying block device
to report corruption that occurs when writing data to disk.
This requirement is easy to miss or implement incorrectly, since
the error detection is only required when a block becomes corrupted,
which is very unlikely to happen until late in the block device's
lifetime.
The littlefs can detect corruption itself by reading back written data.
This requires a bit of care to reuse the available buffers, and may rely
on checksums to avoid additional RAM requirements.
This does have a runtime penalty with the extra read operations, but
should make the littlefs much more robust to different implementations.
More documentation may still by worthwhile (design documentation?),
but for now this provides a reasonable baseline.
- readme
- license
- header documentation
This provides a limited form of wear leveling. While wear is
not actually balanced across blocks, the filesystem can recover
from corrupted blocks and extend the lifetime of a device nearly
as much as dynamic wear leveling.
For use-cases where wear is important, it would be better to use
a full form of dynamic wear-leveling at the block level. (or
consider a logging filesystem).
Corrupted block handling was simply added on top of the existing
logic in place for the filesystem, so it's a bit more noodly than
it may have to be, but it gets the work done.
This adds a fully independent layer between the rest of the filesystem
and the block device. This requires some additionally logic around cache
invalidation and flushing, but removes the need for any higher layer to
consider read/write sizes less than what is supported by the hardware.
Additionally, these caches can be used for possible speed improvements.
This is left up to the user to optimize for their use cases. For very
limited embedded systems with byte-level read/writes, the caches could
be omitted completely, or they could even be the size of a full block
for minimizing storage access.
(A full block may not be the best for speed, consider if only a small
portion of the read block is used, but I'll leave that evaluation as an
exercise for any consumers of this library)
Adopted buffer followed by size. The other order was original
chosen due to some other functions with a more complicated
parameter list.
This convention is important, as the bd api is one of the main
apis facing porting efforts.
Originally had two seperate positions for reading/writing,
but this is inconsistent with the the posix standard, which
has a single position for reading and writing.
Also added proper handling of when the file is dirty, just
added an internal flag for this state.
Also moved the entry out of the file struct, and rearranged
some members to clean things up.
A rather involved upgrade for both files and directories, seek and
related functions are now completely supported:
- lfs_file_seek
- lfs_file_tell
- lfs_file_rewind
- lfs_file_size
- lfs_dir_seek
- lfs_dir_tell
- lfs_dir_rewind
This change also highlighted the concern that lfs_off_t is unsigned,
whereas off_t is traditionally signed. Unfortunately, lfs_off_t is
already used intensively through the codebase, so in focusing on
moving forward and avoiding getting bogged down by details, I'm going to
keep it as is and use the signed type lfs_soff_t where necessary.
Now all of the open flags are correctly handled
Even annoying cases where we can't trust the blocks that are already
on file, such as appending existing files and writing to the middle
of files.
Files are now stored directly in the index-list, instead of being
referenced by pointers that used to live there. This somewhat reduces
the complexity around handling files, while still keeping the O(logn)
lookup cost.
Removed scanning for stride
- Adds complexity with questionable benefit
- Can be added as an optimization later
Fixed handling around device boundaries and where lookahead may not be a
factor of the device size (consider small devices with only a few
blocks)
Added support for configuration with optional dynamic memory as found in
the caching configuration
This adds caching of the most recent read/program blocks, allowing
support of devices that don't have byte-level read+writes, along
with reduced device access on devices that do support byte-level
read+writes.
Note: The current implementation is a bit eager to drop caches where
it simplifies the cache layer. This layer is already complex enough.
Note: It may be worthwhile to add a compile switch for caching to
reduce code size, note sure.
Note: This does add a dependency on malloc, which could have a porting
layer, but I'm just using the functions from stdlib for now. These can be
overwritten with noops if the user controls the system, and keeps things
simple for now.
Before, the lfs had multiple paths to determine config options:
- lfs_config struct passed during initialization
- lfs_bd_info struct passed during block device initialization
- compile time options
This allowed different developers to provide their own needs
to the filesystem, such as the block device capabilities and
the higher level user's own tweaks.
However, this comes with additional complexity and action required
when the configurations are incompatible.
For now, this has been reduced to all information (including block
device function pointers) being passed through the lfs_config struct.
We just defer more complicated handling of configuration options to
the top level user.
This simplifies configuration handling and gives the top level user
the responsibility to handle configuration, which they probably would
have wanted to do anyways.
After quite a bit of prototyping, settled on the following functions:
- lfs_dir_alloc - create a new dir
- lfs_dir_fetch - load and check a dir pair from disk
- lfs_dir_commit - save a dir pair to disk
- lfs_dir_shift - shrink a dir pair to disk
- lfs_dir_append - add a dir entry, creating dirs if needed
- lfs_dir_remove - remove a dir entry, dropping dirs if needed
Additionally, followed through with a few other tweaks
Removing the dependency to the parent pointer solves
many issues with non-atomic updates of children's
parent pointers with respect to any move operations.
However, this comes with an embarrassingly terrible
runtime as the only other option is to exhaustively
check every dir entry to find a child's parent.
Fortunately, deorphaning should be a relatively rare
operation.
In writing the initial allocator, I ran into the rather
difficult problem of trying to iterate through the entire
filesystem cheaply and with only constant memory consumption
(which prohibits recursive functions).
The solution was to simply thread all directory blocks onto a
massive linked-list that spans the entire filesystem.
With the linked-list it was easy to create a traverse function
for all blocks in use on the filesystem (which has potential
for other utility), and add the rudimentary block allocator
using a bit-vector.
While the linked-list may add complexity (especially where
needing to maintain atomic operations), the linked-list helps
simplify what is currently the most expensive operation in
the filesystem, with no cost to space (the linked-list can
reuse the pointers used for chained directory blocks).
The free-list structure, while efficient for allocations, had one big
issue: complexity. Storing free blocks as a simple fifo made sense
when dealing with a single file, but as soon as you have two files
open for writing, updating the free list atomicly when the two files
can not necessarily even be written atomicly proved problematic. It's a
solvable problem, but requires many writes to keep track of everything.
Now changing direction to pursue a more "drop it on the floor" strategy.
Since allocated blocks are tracked by the filesystem, we can simply
subtract from all available blocks the blocks we know of to allocate new
blocks. This is very expensive (O(blocks in use * blocks on device)),
but greatly simplifies any interactions that result in deallocated
blocks.
Additionally, it's impossible to corrupt the free list structure
during a power failure. Anything blocks that aren't tracked are simply
"dropped on the floor", and can be allocated later.
There's still a bit of work around the actually allocator to make it
run in a somewhat reasonable frame of time while still avoiding
dynamic allocations. Currently looking at a bit-vector of free
blocks so at least strides of blocks can be skipped in a single
filesystem iteration.
Missing seek, but these are the core filesystem operations
provided by this filesystem:
- Read a file
- Append to a file
Additional work is needed around freeing the previous file, so
right now it's limited to appending to existing files, a real
append only filesystem. Unfortunately the overhead of the free
list with multiple open files is becoming tricky.
This comes with a lot of scafolding put into place around the core
of the filesystem.
Added operations:
- append an entry to a directory
- find an entry in a directory
- iterate over entries in a directory
Some to do:
- Chaining multiple directory blocks
- Recursion on directory operations
The core algorithim that backs this filesystem's goal of fault
tolerance is the alternating of "metadata pairs". Backed by a
simple core function for reading and writing, makes heavy use
of c99 designated initializers for passing info about multiple
chunks in an erase block.
Really started working out how the internal structure of the driver
will be organized. There are a few hazy lines between the intended
data structures with the goal of code reuse, so the function boundaries
may end up a bit weird.
The primary data structure backing the little fs was planned
to be a little ctz based skip-list for O(logn) lookup and
O(1) append.
Was initially planning to start with a simple linked list of
index blocks, but was having trouble implementing the free-list
on top of the structure. Went ahead and adopted the skip-list
structure since it may have actually been easier.