mirror of
https://github.com/llvm-mirror/libcxx.git
synced 2025-10-24 20:29:39 +08:00

Summary: Freestanding is *weird*. The standard allows it to differ in a bunch of odd manners from regular C++, and the committee would like to improve that situation. I'd like to make libc++ behave better with what freestanding should be, so that it can be a tool we use in improving the standard. To do that we need to try stuff out, both with "freestanding the language mode" and "freestanding the library subset". Let's start with the super basic: run the libc++ tests in freestanding, using clang as the compiler, and see what works. The easiest hack to do this: In utils/libcxx/test/config.py add: self.cxx.compile_flags += ['-ffreestanding'] Run the tests and they all fail. Why? Because in freestanding `main` isn't special. This "not special" property has two effects: main doesn't get mangled, and main isn't allowed to omit its `return` statement. The first means main gets mangled and the linker can't create a valid executable for us to test. The second means we spew out warnings (ew) and the compiler doesn't insert the `return` we omitted, and main just falls of the end and does whatever undefined behavior (if you're luck, ud2 leading to non-zero return code). Let's start my work with the basics. This patch changes all libc++ tests to declare `main` as `int main(int, char**` so it mangles consistently (enabling us to declare another `extern "C"` main for freestanding which calls the mangled one), and adds `return 0;` to all places where it was missing. This touches 6124 files, and I apologize. The former was done with The Magic Of Sed. The later was done with a (not quite correct but decent) clang tool: https://gist.github.com/jfbastien/793819ff360baa845483dde81170feed This works for most tests, though I did have to adjust a few places when e.g. the test runs with `-x c`, macros are used for main (such as for the filesystem tests), etc. Once this is in we can create a freestanding bot which will prevent further regressions. After that, we can start the real work of supporting C++ freestanding fairly well in libc++. <rdar://problem/47754795> Reviewers: ldionne, mclow.lists, EricWF Subscribers: christof, jkorous, dexonsmith, arphaman, miyuki, libcxx-commits Differential Revision: https://reviews.llvm.org/D57624 git-svn-id: https://llvm.org/svn/llvm-project/libcxx/trunk@353086 91177308-0d34-0410-b5e6-96231b3b80d8
496 lines
19 KiB
ReStructuredText
496 lines
19 KiB
ReStructuredText
==============
|
||
File Time Type
|
||
==============
|
||
|
||
.. contents::
|
||
:local:
|
||
|
||
.. _file-time-type-motivation:
|
||
|
||
Motivation
|
||
==========
|
||
|
||
The filesystem library provides interfaces for getting and setting the last
|
||
write time of a file or directory. The interfaces use the ``file_time_type``
|
||
type, which is a specialization of ``chrono::time_point`` for the
|
||
"filesystem clock". According to [fs.filesystem.syn]
|
||
|
||
trivial-clock is an implementation-defined type that satisfies the
|
||
Cpp17TrivialClock requirements ([time.clock.req]) and that is capable of
|
||
representing and measuring file time values. Implementations should ensure
|
||
that the resolution and range of file_time_type reflect the operating
|
||
system dependent resolution and range of file time values.
|
||
|
||
|
||
On POSIX systems, file times are represented using the ``timespec`` struct,
|
||
which is defined as follows:
|
||
|
||
.. code-block:: cpp
|
||
|
||
struct timespec {
|
||
time_t tv_sec;
|
||
long tv_nsec;
|
||
};
|
||
|
||
To represent the range and resolution of ``timespec``, we need to (A) have
|
||
nanosecond resolution, and (B) use more than 64 bits (assuming a 64 bit ``time_t``).
|
||
|
||
As the standard requires us to use the ``chrono`` interface, we have to define
|
||
our own filesystem clock which specifies the period and representation of
|
||
the time points and duration it provides. It will look like this:
|
||
|
||
.. code-block:: cpp
|
||
|
||
struct _FilesystemClock {
|
||
using period = nano;
|
||
using rep = TBD; // What is this?
|
||
|
||
using duration = chrono::duration<rep, period>;
|
||
using time_point = chrono::time_point<_FilesystemClock>;
|
||
|
||
// ... //
|
||
};
|
||
|
||
using file_time_type = _FilesystemClock::time_point;
|
||
|
||
|
||
To get nanosecond resolution, we simply define ``period`` to be ``std::nano``.
|
||
But what type can we use as the arithmetic representation that is capable
|
||
of representing the range of the ``timespec`` struct?
|
||
|
||
Problems To Consider
|
||
====================
|
||
|
||
Before considering solutions, let's consider the problems they should solve,
|
||
and how important solving those problems are:
|
||
|
||
|
||
Having a Smaller Range than ``timespec``
|
||
----------------------------------------
|
||
|
||
One solution to the range problem is to simply reduce the resolution of
|
||
``file_time_type`` to be less than that of nanoseconds. This is what libc++'s
|
||
initial implementation of ``file_time_type`` did; it's also what
|
||
``std::system_clock`` does. As a result, it can represent time points about
|
||
292 thousand years on either side of the epoch, as opposed to only 292 years
|
||
at nanosecond resolution.
|
||
|
||
``timespec`` can represent time points +/- 292 billion years from the epoch
|
||
(just in case you needed a time point 200 billion years before the big bang,
|
||
and with nanosecond resolution).
|
||
|
||
To get the same range, we would need to drop our resolution to that of seconds
|
||
to come close to having the same range.
|
||
|
||
This begs the question, is the range problem "really a problem"? Sane usages
|
||
of file time stamps shouldn't exceed +/- 300 years, so should we care to support it?
|
||
|
||
I believe the answer is yes. We're not designing the filesystem time API, we're
|
||
providing glorified C++ wrappers for it. If the underlying API supports
|
||
a value, then we should too. Our wrappers should not place artificial restrictions
|
||
on users that are not present in the underlying filesystem.
|
||
|
||
Having a smaller range that the underlying filesystem forces the
|
||
implementation to report ``value_too_large`` errors when it encounters a time
|
||
point that it can't represent. This can cause the call to ``last_write_time``
|
||
to throw in cases where the user was confident the call should succeed. (See below)
|
||
|
||
|
||
.. code-block:: cpp
|
||
|
||
#include <filesystem>
|
||
using namespace std::filesystem;
|
||
|
||
// Set the times using the system interface.
|
||
void set_file_times(const char* path, struct timespec ts) {
|
||
timespec both_times[2];
|
||
both_times[0] = ts;
|
||
both_times[1] = ts;
|
||
int result = ::utimensat(AT_FDCWD, path, both_times, 0);
|
||
assert(result != -1);
|
||
}
|
||
|
||
// Called elsewhere to set the file time to something insane, and way
|
||
// out of the 300 year range we might expect.
|
||
void some_bad_persons_code() {
|
||
struct timespec new_times;
|
||
new_times.tv_sec = numeric_limits<time_t>::max();
|
||
new_times.tv_nsec = 0;
|
||
set_file_times("/tmp/foo", new_times); // OK, supported by most FSes
|
||
}
|
||
|
||
int main(int, char**) {
|
||
path p = "/tmp/foo";
|
||
file_status st = status(p);
|
||
if (!exists(st) || !is_regular_file(st))
|
||
return 1;
|
||
if ((st.permissions() & perms::others_read) == perms::none)
|
||
return 1;
|
||
// It seems reasonable to assume this call should succeed.
|
||
file_time_type tp = last_write_time(p); // BAD! Throws value_too_large.
|
||
return 0;
|
||
}
|
||
|
||
|
||
Having a Smaller Resolution than ``timespec``
|
||
---------------------------------------------
|
||
|
||
As mentioned in the previous section, one way to solve the range problem
|
||
is by reducing the resolution. But matching the range of ``timespec`` using a
|
||
64 bit representation requires limiting the resolution to seconds.
|
||
|
||
So we might ask: Do users "need" nanosecond precision? Is seconds not good enough?
|
||
I limit my consideration of the point to this: Why was it not good enough for
|
||
the underlying system interfaces? If it wasn't good enough for them, then it
|
||
isn't good enough for us. Our job is to match the filesystems range and
|
||
representation, not design it.
|
||
|
||
|
||
Having a Larger Range than ``timespec``
|
||
----------------------------------------
|
||
|
||
We should also consider the opposite problem of having a ``file_time_type``
|
||
that is able to represent a larger range than ``timespec``. At least in
|
||
this case ``last_write_time`` can be used to get and set all possible values
|
||
supported by the underlying filesystem; meaning ``last_write_time(p)`` will
|
||
never throw a overflow error when retrieving a value.
|
||
|
||
However, this introduces a new problem, where users are allowed to attempt to
|
||
create a time point beyond what the filesystem can represent. Two particular
|
||
values which cause this are ``file_time_type::min()`` and
|
||
``file_time_type::max()``. As a result, the following code would throw:
|
||
|
||
.. code-block:: cpp
|
||
|
||
void test() {
|
||
last_write_time("/tmp/foo", file_time_type::max()); // Throws
|
||
last_write_time("/tmp/foo", file_time_type::min()); // Throws.
|
||
}
|
||
|
||
Apart from cases explicitly using ``min`` and ``max``, I don't see users taking
|
||
a valid time point, adding a couple hundred billions of years in error,
|
||
and then trying to update a file's write time to that value very often.
|
||
|
||
Compared to having a smaller range, this problem seems preferable. At least
|
||
now we can represent any time point the filesystem can, so users won't be forced
|
||
to revert back to system interfaces to avoid limitations in the C++ STL.
|
||
|
||
I posit that we should only consider this concern *after* we have something
|
||
with at least the same range and resolution of the underlying filesystem. The
|
||
latter two problems are much more important to solve.
|
||
|
||
Potential Solutions And Their Complications
|
||
===========================================
|
||
|
||
Source Code Portability Across Implementations
|
||
-----------------------------------------------
|
||
|
||
As we've discussed, ``file_time_type`` needs a representation that uses more
|
||
than 64 bits. The possible solutions include using ``__int128_t``, emulating a
|
||
128 bit integer using a class, or potentially defining a ``timespec`` like
|
||
arithmetic type. All three will allow us to, at minimum, match the range
|
||
and resolution, and the last one might even allow us to match them exactly.
|
||
|
||
But when considering these potential solutions we need to consider more than
|
||
just the values they can represent. We need to consider the effects they will
|
||
have on users and their code. For example, each of them breaks the following
|
||
code in some way:
|
||
|
||
.. code-block:: cpp
|
||
|
||
// Bug caused by an unexpected 'rep' type returned by count.
|
||
void print_time(path p) {
|
||
// __int128_t doesn't have streaming operators, and neither would our
|
||
// custom arithmetic types.
|
||
cout << last_write_time(p).time_since_epoch().count() << endl;
|
||
}
|
||
|
||
// Overflow during creation bug.
|
||
file_time_type timespec_to_file_time_type(struct timespec ts) {
|
||
// woops! chrono::seconds and chrono::nanoseconds use a 64 bit representation
|
||
// this may overflow before it's converted to a file_time_type.
|
||
auto dur = seconds(ts.tv_sec) + nanoseconds(ts.tv_nsec);
|
||
return file_time_type(dur);
|
||
}
|
||
|
||
file_time_type correct_timespec_to_file_time_type(struct timespec ts) {
|
||
// This is the correct version of the above example, where we
|
||
// avoid using the chrono typedefs as they're not sufficient.
|
||
// Can we expect users to avoid this bug?
|
||
using fs_seconds = chrono::duration<file_time_type::rep>;
|
||
using fs_nanoseconds = chrono::duration<file_time_type::rep, nano>;
|
||
auto dur = fs_seconds(ts.tv_sec) + fs_nanoseconds(tv.tv_nsec);
|
||
return file_time_type(dur);
|
||
}
|
||
|
||
// Implicit truncation during conversion bug.
|
||
intmax_t get_time_in_seconds(path p) {
|
||
using fs_seconds = duration<file_time_type::rep, ratio<1, 1> >;
|
||
auto tp = last_write_time(p);
|
||
|
||
// This works with truncation for __int128_t, but what does it do for
|
||
// our custom arithmetic types.
|
||
return duration_cast<fs_seconds>().count();
|
||
}
|
||
|
||
|
||
Each of the above examples would require a user to adjust their filesystem code
|
||
to the particular eccentricities of the representation, hopefully only in such
|
||
a way that the code is still portable across implementations.
|
||
|
||
At least some of the above issues are unavoidable, no matter what
|
||
representation we choose. But some representations may be quirkier than others,
|
||
and, as I'll argue later, using an actual arithmetic type (``__int128_t``)
|
||
provides the least aberrant behavior.
|
||
|
||
|
||
Chrono and ``timespec`` Emulation.
|
||
----------------------------------
|
||
|
||
One of the options we've considered is using something akin to ``timespec``
|
||
to represent the ``file_time_type``. It only seems natural seeing as that's
|
||
what the underlying system uses, and because it might allow us to match
|
||
the range and resolution exactly. But would it work with chrono? And could
|
||
it still act at all like a ``timespec`` struct?
|
||
|
||
For ease of consideration, let's consider what the implementation might
|
||
look like.
|
||
|
||
.. code-block:: cpp
|
||
|
||
struct fs_timespec_rep {
|
||
fs_timespec_rep(long long v)
|
||
: tv_sec(v / nano::den), tv_nsec(v % nano::den)
|
||
{ }
|
||
private:
|
||
time_t tv_sec;
|
||
long tv_nsec;
|
||
};
|
||
bool operator==(fs_timespec_rep, fs_timespec_rep);
|
||
fs_int128_rep operator+(fs_timespec_rep, fs_timespec_rep);
|
||
// ... arithmetic operators ... //
|
||
|
||
The first thing to notice is that we can't construct ``fs_timespec_rep`` like
|
||
a ``timespec`` by passing ``{secs, nsecs}``. Instead we're limited to
|
||
constructing it from a single 64 bit integer.
|
||
|
||
We also can't allow the user to inspect the ``tv_sec`` or ``tv_nsec`` values
|
||
directly. A ``chrono::duration`` represents its value as a tick period and a
|
||
number of ticks stored using ``rep``. The representation is unaware of the
|
||
tick period it is being used to represent, but ``timespec`` is setup to assume
|
||
a nanosecond tick period; which is the only case where the names ``tv_sec``
|
||
and ``tv_nsec`` match the values they store.
|
||
|
||
When we convert a nanosecond duration to seconds, ``fs_timespec_rep`` will
|
||
use ``tv_sec`` to represent the number of giga seconds, and ``tv_nsec`` the
|
||
remaining seconds. Let's consider how this might cause a bug were users allowed
|
||
to manipulate the fields directly.
|
||
|
||
.. code-block:: cpp
|
||
|
||
template <class Period>
|
||
timespec convert_to_timespec(duration<fs_time_rep, Period> dur) {
|
||
fs_timespec_rep rep = dur.count();
|
||
return {rep.tv_sec, rep.tv_nsec}; // Oops! Period may not be nanoseconds.
|
||
}
|
||
|
||
template <class Duration>
|
||
Duration convert_to_duration(timespec ts) {
|
||
Duration dur({ts.tv_sec, ts.tv_nsec}); // Oops! Period may not be nanoseconds.
|
||
return file_time_type(dur);
|
||
file_time_type tp = last_write_time(p);
|
||
auto dur =
|
||
}
|
||
|
||
time_t extract_seconds(file_time_type tp) {
|
||
// Converting to seconds is a silly bug, but I could see it happening.
|
||
using SecsT = chrono::duration<file_time_type::rep, ratio<1, 1>>;
|
||
auto secs = duration_cast<Secs>(tp.time_since_epoch());
|
||
// tv_sec is now representing gigaseconds.
|
||
return secs.count().tv_sec; // Oops!
|
||
}
|
||
|
||
Despite ``fs_timespec_rep`` not being usable in any manner resembling
|
||
``timespec``, it still might buy us our goal of matching its range exactly,
|
||
right?
|
||
|
||
Sort of. Chrono provides a specialization point which specifies the minimum
|
||
and maximum values for a custom representation. It looks like this:
|
||
|
||
.. code-block:: cpp
|
||
|
||
template <>
|
||
struct duration_values<fs_timespec_rep> {
|
||
static fs_timespec_rep zero();
|
||
static fs_timespec_rep min();
|
||
static fs_timespec_rep max() { // assume friendship.
|
||
fs_timespec_rep val;
|
||
val.tv_sec = numeric_limits<time_t>::max();
|
||
val.tv_nsec = nano::den - 1;
|
||
return val;
|
||
}
|
||
};
|
||
|
||
Notice that ``duration_values`` doesn't tell the representation what tick
|
||
period it's actually representing. This would indeed correctly limit the range
|
||
of ``duration<fs_timespec_rep, nano>`` to exactly that of ``timespec``. But
|
||
nanoseconds isn't the only tick period it will be used to represent. For
|
||
example:
|
||
|
||
.. code-block:: cpp
|
||
|
||
void test() {
|
||
using rep = file_time_type::rep;
|
||
using fs_nsec = duration<rep, nano>;
|
||
using fs_sec = duration<rep>;
|
||
fs_nsec nsecs(fs_seconds::max()); // Truncates
|
||
}
|
||
|
||
Though the above example may appear silly, I think it follows from the incorrect
|
||
notion that using a ``timespec`` rep in chrono actually makes it act as if it
|
||
were an actual ``timespec``.
|
||
|
||
Interactions with 32 bit ``time_t``
|
||
-----------------------------------
|
||
|
||
Up until now we've only be considering cases where ``time_t`` is 64 bits, but what
|
||
about 32 bit systems/builds where ``time_t`` is 32 bits? (this is the common case
|
||
for 32 bit builds).
|
||
|
||
When ``time_t`` is 32 bits, we can implement ``file_time_type`` simply using 64-bit
|
||
``long long``. There is no need to get either ``__int128_t`` or ``timespec`` emulation
|
||
involved. And nor should we, as it would suffer from the numerous complications
|
||
described by this paper.
|
||
|
||
Obviously our implementation for 32-bit builds should act as similarly to the
|
||
64-bit build as possible. Code which compiles in one, should compile in the other.
|
||
This consideration is important when choosing between ``__int128_t`` and
|
||
emulating ``timespec``. The solution which provides the most uniformity with
|
||
the least eccentricity is the preferable one.
|
||
|
||
Summary
|
||
=======
|
||
|
||
The ``file_time_type`` time point is used to represent the write times for files.
|
||
Its job is to act as part of a C++ wrapper for less ideal system interfaces. The
|
||
underlying filesystem uses the ``timespec`` struct for the same purpose.
|
||
|
||
However, the initial implementation of ``file_time_type`` could not represent
|
||
either the range or resolution of ``timespec``, making it unsuitable. Fixing
|
||
this requires an implementation which uses more than 64 bits to store the
|
||
time point.
|
||
|
||
We primarily considered two solutions: Using ``__int128_t`` and using a
|
||
arithmetic emulation of ``timespec``. Each has its pros and cons, and both
|
||
come with more than one complication.
|
||
|
||
The Potential Solutions
|
||
-----------------------
|
||
|
||
``long long`` - The Status Quo
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Pros:
|
||
|
||
* As a type ``long long`` plays the nicest with others:
|
||
|
||
* It works with streaming operators and other library entities which support
|
||
builtin integer types, but don't support ``__int128_t``.
|
||
* Its the representation used by chrono's ``nanosecond`` and ``second`` typedefs.
|
||
|
||
Cons:
|
||
|
||
* It cannot provide the same resolution as ``timespec`` unless we limit it
|
||
to a range of +/- 300 years from the epoch.
|
||
* It cannot provide the same range as ``timespec`` unless we limit its resolution
|
||
to seconds.
|
||
* ``last_write_time`` has to report an error when the time reported by the filesystem
|
||
is unrepresentable.
|
||
|
||
__int128_t
|
||
~~~~~~~~~~~
|
||
|
||
Pros:
|
||
|
||
* It is an integer type.
|
||
* It makes the implementation simple and efficient.
|
||
* Acts exactly like other arithmetic types.
|
||
* Can be implicitly converted to a builtin integer type by the user.
|
||
|
||
* This is important for doing things like:
|
||
|
||
.. code-block:: cpp
|
||
|
||
void c_interface_using_time_t(const char* p, time_t);
|
||
|
||
void foo(path p) {
|
||
file_time_type tp = last_write_time(p);
|
||
time_t secs = duration_cast<seconds>(tp.time_since_epoch()).count();
|
||
c_interface_using_time_t(p.c_str(), secs);
|
||
}
|
||
|
||
Cons:
|
||
|
||
* It isn't always available (but on 64 bit machines, it normally is).
|
||
* It causes ``file_time_type`` to have a larger range than ``timespec``.
|
||
* It doesn't always act the same as other builtin integer types. For example
|
||
with ``cout`` or ``to_string``.
|
||
* Allows implicit truncation to 64 bit integers.
|
||
* It can be implicitly converted to a builtin integer type by the user,
|
||
truncating its value.
|
||
|
||
Arithmetic ``timespec`` Emulation
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Pros:
|
||
|
||
* It has the exact same range and resolution of ``timespec`` when representing
|
||
a nanosecond tick period.
|
||
* It's always available, unlike ``__int128_t``.
|
||
|
||
Cons:
|
||
|
||
* It has a larger range when representing any period longer than a nanosecond.
|
||
* Doesn't actually allow users to use it like a ``timespec``.
|
||
* The required representation of using ``tv_sec`` to store the giga tick count
|
||
and ``tv_nsec`` to store the remainder adds nothing over a 128 bit integer,
|
||
but complicates a lot.
|
||
* It isn't a builtin integer type, and can't be used anything like one.
|
||
* Chrono can be made to work with it, but not nicely.
|
||
* Emulating arithmetic classes come with their own host of problems regarding
|
||
overload resolution (Each operator needs three SFINAE constrained versions of
|
||
it in order to act like builtin integer types).
|
||
* It offers little over simply using ``__int128_t``.
|
||
* It acts the most differently than implementations using an actual integer type,
|
||
which has a high chance of breaking source compatibility.
|
||
|
||
|
||
Selected Solution - Using ``__int128_t``
|
||
=========================================
|
||
|
||
The solution I selected for libc++ is using ``__int128_t`` when available,
|
||
and otherwise falling back to using ``long long`` with nanosecond precision.
|
||
|
||
When ``__int128_t`` is available, or when ``time_t`` is 32-bits, the implementation
|
||
provides same resolution and a greater range than ``timespec``. Otherwise
|
||
it still provides the same resolution, but is limited to a range of +/- 300
|
||
years. This final case should be rather rare, as ``__int128_t``
|
||
is normally available in 64-bit builds, and ``time_t`` is normally 32-bits
|
||
during 32-bit builds.
|
||
|
||
Although falling back to ``long long`` and nanosecond precision is less than
|
||
ideal, it also happens to be the implementation provided by both libstdc++
|
||
and MSVC. (So that makes it better, right?)
|
||
|
||
Although the ``timespec`` emulation solution is feasible and would largely
|
||
do what we want, it comes with too many complications, potential problems
|
||
and discrepancies when compared to "normal" chrono time points and durations.
|
||
|
||
An emulation of a builtin arithmetic type using a class is never going to act
|
||
exactly the same, and the difference will be felt by users. It's not reasonable
|
||
to expect them to tolerate and work around these differences. And once
|
||
we commit to an ABI it will be too late to change. Committing to this seems
|
||
risky.
|
||
|
||
Therefore, ``__int128_t`` seems like the better solution.
|