Sebastian Huber 4464594567 nfsclient: Import from RTEMS
RTEMS Git commit 251c94d3d3d27e0039f01b718e5c2eb06f39fdf7.
2016-06-10 14:13:42 +02:00

549 lines
18 KiB
Plaintext

RTEMS-NFS
=========
A NFS-V2 client implementation for the RTEMS real-time
executive.
Author: Till Straumann <strauman@slac.stanford.edu>, 2002
Copyright 2002, Stanford University and
Till Straumann <strauman@slac.stanford.edu>
Stanford Notice
***************
Acknowledgement of sponsorship
* * * * * * * * * * * * * * * *
This software was produced by the Stanford Linear Accelerator Center,
Stanford University, under Contract DE-AC03-76SFO0515 with the Department
of Energy.
Contents
--------
I Overview
1) Performance
2) Reference Platform / Test Environment
II Usage
1) Initialization
2) Mounting Remote Server Filesystems
3) Unmounting
4) Unloading
5) Dumping Information / Statistics
III Implementation Details
1) RPCIOD
2) NFS
3) RTEMS Resources Used By NFS/RPCIOD
4) Caveats & Bugs
IV Licensing & Disclaimers
I Overview
-----------
This package implements a simple non-caching NFS
client for RTEMS. Most of the system calls are
supported with the exception of 'mount', i.e. it
is not possible to mount another FS on top of NFS
(mostly because of the difficulty that arises when
mount points are deleted on the server). It
shouldn't be hard to do, though.
Note: this client supports NFS vers. 2 / MOUNT vers. 1;
NFS Version 3 or higher are NOT supported.
The package consists of two modules: RPCIOD and NFS
itself.
- RPCIOD is a UDP/RPC multiplexor daemon. It takes
RPC requests from multiple local client threads,
funnels them through a single socket to multiple
servers and dispatches the replies back to the
(blocked) requestor threads.
RPCIOD does packet retransmission and handles
timeouts etc.
Note however, that it does NOT do any XDR
marshalling - it is up to the requestor threads
to do the XDR encoding/decoding. RPCIOD _is_ RPC
specific, though, because its message dispatching
is based on the RPC transaction ID.
- The NFS package maps RTEMS filesystem calls
to proper RPCs, it does the XDR work and
hands marshalled RPC requests to RPCIOD.
All of the calls are synchronous, i.e. they
block until they get a reply.
1) Performance
- - - - - - - -
Performance sucks (due to the lack of
readahead/delayed write and caching). On a fast
(100Mb/s) ethernet, it takes about 20s to copy a
10MB file from NFS to NFS. I found, however, that
vxWorks' NFS client doesn't seem to be any
faster...
Since there is no buffer cache with read-ahead
implemented, all NFS reads are synchronous RPC
calls. Every read operation involves sending a
request and waiting for the reply. As long as the
overhead (sending request + processing it on the
server) is significant compared to the time it
takes to transferring the actual data, increasing
the amount of data per request results in better
throughput. The UDP packet size limit imposes a
limit of 8k per RPC call, hence reading from NFS
in chunks of 8k is better than chunks of 1k [but
chunks >8k are not possible, i.e., simply not
honoured: read(a_nfs_fd, buf, 20000) returns
8192]. This is similar to the old linux days
(mount with rsize=8k). You can let stdio take
care of the buffering or use 8k buffers with
explicit read(2) operations. Note that stdio
honours the file-system's st_blksize field
if newlib is compiled with HAVE_BLKSIZE defined.
In this case, stdio uses 8k buffers for files
on NFS transparently. The blocksize NFS
reports can be tuned with a global variable
setting (see nfs.c for details).
Further increase of throughput can be achieved
with read-ahead (issuing RPC calls in parallel
[send out request for block n+1 while you are
waiting for data of block n to arrive]). Since
this is not handled by the file system itself, you
would have to code this yourself e.g., using
parallel threads to read from a single file from
interleaved offsets.
Another obvious improvement can be achieved if
processing the data takes a significant amount of
time. Then, having a pipeline of threads for
reading data and processing them makes sense
[thread b processes chunk n while thread a blocks
in read(chunk n+1)].
Some performance figures:
Software: src/nfsTest.c:nfsReadTest() [data not
processed in any way].
Hardware: MVME6100
Network: 100baseT-FD
Server: Linux-2.6/RHEL4-smp [dell precision 420]
File: 10MB
Results:
Single threaded ('normal') NFS read, 1k buffers: 3.46s (2.89MB/s)
Single threaded ('normal') NFS read, 8k buffers: 1.31s (7.63MB/s)
Multi threaded; 2 readers, 8k buffers/xfers: 1.12s (8.9 MB/s)
Multi threaded; 3 readers, 8k buffers/xfers: 1.04s (9.6 MB/s)
2) Reference Platform
- - - - - - - - - - -
RTEMS-NFS was developed and tested on
o RTEMS-ss20020301 (local patches applied)
o PowerPC G3, G4 on Synergy SVGM series board
(custom 'SVGM' BSP, to be released soon)
o PowerPC 604 on MVME23xx
(powerpc/shared/motorola-powerpc BSP)
o Test Environment:
- RTEMS executable running CEXP
- rpciod/nfs dynamically loaded from TFTPfs
- EPICS application dynamically loaded from NFS;
the executing IOC accesses all of its files
on NFS.
II Usage
---------
After linking into the system and proper initialization
(rtems-NFS supports 'magic' module initialization when
loaded into a running system with the CEXP loader),
you are ready for mounting NFSes from a server
(I avoid the term NFS filesystem because NFS already
stands for 'Network File System').
You should also read the
- "RTEMS Resources Used By NFS/RPCIOD"
- "CAVEATS & BUGS"
below.
1) Initialization
- - - - - - - - -
NFS consists of two modules who must be initialized:
a) the RPCIO daemon package; by calling
rpcUdpInit();
note that this step must be performed prior to
initializing NFS:
b) NFS is initialized by calling
nfsInit( smallPoolDepth, bigPoolDepth );
if you supply 0 (zero) values for the pool
depths, the compile-time default configuration
is used which should work fine.
NOTE: when using CEXP to load these modules into a
running system, initialization will be performed
automagically.
2) Mounting Remote Server Filesystems
- - - - - - - - - - - - - - - - - - -
There are two interfaces for mounting an NFS:
- The (non-POSIX) RTEMS 'mount()' call:
mount( &mount_table_entry_pointer,
&filesystem_operations_table_pointer,
options,
device,
mount_point )
Note that you must specify a 'mount_table_entry_pointer'
(use a dummy) - RTEMS' mount() doesn't grok a NULL for
the first argument.
o for the 'filesystem_operations_table_pointer', supply
&nfs_fs_ops
o options are constants (see RTEMS headers) for specifying
read-only / read-write mounts.
o the 'device' string specifies the remote filesystem
who is to be mounted. NFS expects a string conforming
to the following format (EBNF syntax):
[ <uid> '.' <gid> '@' ] <hostip> ':' <path>
The first optional part of the string allows you
to specify the credentials to be used for all
subsequent transactions with this server. If the
string is omitted, the EUID/EGID of the executing
thread (i.e. the thread performing the 'mount' -
NFS will still 'remember' these values and use them
for all future communication with this server).
The <hostip> part denotes the server IP address
in standard 'dot' notation. It is followed by
a colon and the (absolute) path on the server.
Note that no extra characters or whitespace must
be present in the string. Example 'device' strings
are:
"300.99@192.168.44.3:/remote/rtems/root"
"192.168.44.3:/remote/rtems/root"
o the 'mount_point' string identifies the local
directory (most probably on IMFS) where the NFS
is to be mounted. Note that the mount point must
already exist with proper permissions.
- Alternate 'mount' interface. NFS offers a more
convenient wrapper taking three string arguments:
nfsMount(uidgid_at_host, server_path, mount_point)
This interface does DNS lookup (see reentrancy note
below) and creates the mount point if necessary.
o the first argument specifies the server and
optionally the uid/gid to be used for authentication.
The semantics are exactly as described above:
[ <uid> '.' <gid> '@' ] <host>
The <host> part may be either a host _name_ or
an IP address in 'dot' notation. In the former
case, nfsMount() uses 'gethostbyname()' to do
a DNS lookup.
IMPORTANT NOTE: gethostbyname() is NOT reentrant/
thread-safe and 'nfsMount()' (if not provided with an
IP/dot address string) is hence subject to race conditions.
o the 'server_path' and 'mount_point' arguments
are described above.
NOTE: If the mount point does not exist yet,
nfsMount() tries to create it.
o if nfsMount() is called with a NULL 'uidgid_at_host'
argument, it lists all currently mounted NFS
3) Unmounting
- - - - - - -
An NFS can be unmounted using RTEMS 'unmount()'
call (yep, it is unmount() - not umount()):
unmount(mount_point)
Note that you _must_ supply the mount point (string
argument). It is _not_ possible to specify the
'mountee' when unmounting. NFS implements no
convenience wrapper for this (yet), essentially because
(although this sounds unbelievable) it is non-trivial
to lookup the path leading to an RTEMS filesystem
directory node.
4) Unloading
- - - - - - -
After unmounting all NFS from the system, the NFS
and RPCIOD modules may be stopped and unloaded.
Just call 'nfsCleanup()' and 'rpcUdpCleanup()'
in this order. You should evaluate the return value
of these routines which is non-zero if either
of them refuses to yield (e.g. because there are
still mounted filesystems).
Again, when unloading is done by CEXP this is
transparently handled.
5) Dumping Information / Statistics
- - - - - - - - - - - - - - - - - -
Rudimentary RPCIOD statistics are printed
to a file (stdout when NULL) by
int rpcUdpStats(FILE *f)
A list of all currently mounted NFS can be
printed to a file (stdout if NULL) using
int nfsMountsShow(FILE *f)
For convenience, this routine is also called
by nfsMount() when supplying NULL arguments.
III Implementation Details
--------------------------
1) RPCIOD
- - - - -
RPCIOD was created to
a) avoid non-reentrant librpc calls.
b) support 'asynchronous' operation over a single
socket.
RPCIOD is a daemon thread handling 'transaction objects'
(XACTs) through an UDP socket. XACTs are marshalled RPC
calls/replies associated with RPC servers and requestor
threads.
requestor thread: network:
XACT packet
| |
V V
| message queue | ( socket )
| | ^
----------> <----- | |
RPCIOD |
/ --------------
timeout/ (re) transmission
A requestor thread drops a transaction into
the message queue and goes to sleep. The XACT is
picked up by rpciod who is listening for events from
three sources:
o the request queue
o packet arrival at the socket
o timeouts
RPCIOD sends the XACT to its destination server and
enqueues the pending XACT into an ordered list of
outstanding transactions.
When a packet arrives, RPCIOD (based on the RPC transaction
ID) looks up the matching XACT and wakes up the requestor
who can then XDR-decode the RPC results found in the XACT
object's buffer.
When a timeout expires, RPCIOD examines the outstanding
XACT that is responsible for the timeout. If its lifetime
has not expired yet, RPCIOD resends the request. Otherwise,
the XACT's error status is set and the requestor is woken up.
RPCIOD dynamically adjusts the retransmission intervals
based on the average round-trip time measured (on a per-server
basis).
Having the requestors event driven (rather than blocking
e.g. on a semaphore) is geared to having many different
requestors (one synchronization object per requestor would
be needed otherwise).
Requestors who want to do asynchronous IO need a different
interface which will be added in the future.
1.a) Reentrancy
- - - - - - - -
RPCIOD does no non-reentrant librpc calls.
1.b) Efficiency
- - - - - - - -
We shouldn't bother about efficiency until pipelining (read-ahead/
delayed write) and caching are implemented. The round-trip delay
associated with every single RPC transaction clearly is a big
performance killer.
Nevertheless, I could not withstand the temptation to eliminate
the extra copy step involved with socket IO:
A user data object has to be XDR encoded into a buffer. The
buffer given to the socket where it is copied into MBUFs.
(The network chip driver might even do more copying).
Likewise, on reception 'recvfrom' copies MBUFS into a user
buffer which is XDR decoded into the final user data object.
Eliminating the copying into (possibly multiple) MBUFS by
'sendto()' is actually a piece of cake. RPCIOD uses the
'sosend()' routine [properly wrapped] supplying a single
MBUF header who directly points to the marshalled buffer
:-)
Getting rid of the extra copy on reception was (only a little)
harder: I derived a 'XDR-mbuf' stream from SUN's xdr_mem which
allows for XDR-decoding out of a MBUF chain who is obtained by
soreceive().
2) NFS
- - - -
The actual NFS implementation is straightforward and essentially
'passive' (no threads created). Any RTEMS task executing a
filesystem call dispatched to NFS (such as 'opendir()', 'lseek()'
or 'unlink()') ends up XDR encoding arguments, dropping a
XACT into RPCIOD's message queue and going to sleep.
When woken up by RPCIOD, the XACT is decoded (using the XDR-mbuf
stream mentioned above) and the properly cooked-up results are
returned.
3) RTEMS Resources Used By NFS/RPCIOD
- - - - - - - - - - - - - - - - - - -
The RPCIOD/NFS package uses the following resources. Some
parameters are compile-time configurable - consult the
source files for details.
RPCIOD:
o 1 task
o 1 message queue
o 1 socket/filedescriptor
o 2 semaphores (a third one is temporarily created during
rpcUdpCleanup()).
o 1 RTEMS EVENT (by default RTEMS_EVENT_30).
IMPORTANT: this event is used by _every_ thread executing
NFS system calls and hence is RESERVED.
o 3 events only used by RPCIOD itself, i.e. these must not
be sent to RPCIOD by no other thread (except for the intended
use, of course). The events involved are 1,2,3.
o preemption disabled sections: NONE
o sections with interrupts disabled: NONE
o NO 'timers' are used (timer code would run in IRQ context)
o memory usage: n.a
NFS:
o 2 message queues
o 2 semaphores
o 1 semaphore per mounted NFS
o 1 slot in driver entry table (for major number)
o preemption disabled sections: NONE
o sections with interrupts disabled: NONE
o 1 task + 1 semaphore temporarily created when
listing mounted filesystems (rtems_filesystem_resolve_location())
4) CAVEATS & BUGS
- - - - - - - - -
Unfortunately, some bugs crawl around in the filesystem generics.
(Some of them might already be fixed in versions later than
rtems-ss-20020301).
I recommend to use the patch distributed with RTEMS-NFS.
o RTEMS uses/used (Joel said it has been fixed already) a 'short'
ino_t which is not enough for NFS.
The driver detects this problem and enables a workaround. In rare
situations (mainly involving 'getcwd()' improper inode comparison
may result (due to the restricted size, stat() returns st_ino modulo
2^16). In most cases, however, st_dev is compared along with st_ino
which will give correct results (different files may yield identical
st_ino but they will have different st_dev). However, there is
code (in getcwd(), for example) who assumes that files residing
in one directory must be hosted by the same device and hence omits
the st_dev comparison. In such a case, the workaround will fail.
NOTE: changing the size (sys/types.h) of ino_t from 'short' to 'long'
is strongly recommended. It is NOT included in the patch, however
as this is a major change requiring ALL of your sources to
be recompiled.
THE ino_t SIZE IS FIXED IN GCC-3.2/NEWLIB-1.10.0-2 DISTRIBUTED BY
OAR.
o You may work around most filesystem bugs by observing the following
rules:
* never use chroot() (fixed by the patch)
* never use getpwent(), getgrent() & friends - they are NOT THREAD
safe (fixed by the patch)
* NEVER use rtems_libio_share_private_env() - not even with the
patch applied. Just DONT - it is broken by design.
* All threads who have their own userenv (who have called
rtems_libio_set_private_env()) SHOULD 'chdir("/")' before
terminating. Otherwise, (i.e. if their cwd is on NFS), it will
be impossible to unmount the NFS involved.
o The patch slightly changes the semantics of 'getpwent()' and
'getgrent()' & friends (to what is IMHO correct anyways - the patch is
also needed to fix another problem, however): with the patch applied,
the passwd and group files are always accessed from the 'current' user
environment, i.e. a thread who has changed its 'root' or 'uid' might
not be able to access these files anymore.
o NOTE: RTEMS 'mount()' / 'unmount()' are NOT THREAD SAFE.
o The NFS protocol has no 'append' or 'seek_end' primitive. The client
must query the current file size (this client uses cached info) and
change the local file pointer accordingly (in 'O_APPEND' mode).
Obviously, this involves a race condition and hence multiple clients
writing the same file may lead to corruption.
IV Licensing & Disclaimers
--------------------------
NFS is distributed under the SLAC License - consult the
separate 'LICENSE' file.
Government disclaimer of liability
- - - - - - - - - - - - - - - - -
Neither the United States nor the United States Department of Energy,
nor any of their employees, makes any warranty, express or implied,
or assumes any legal liability or responsibility for the accuracy,
completeness, or usefulness of any data, apparatus, product, or process
disclosed, or represents that its use would not infringe privately
owned rights.
Stanford disclaimer of liability
- - - - - - - - - - - - - - - - -
Stanford University makes no representations or warranties, express or
implied, nor assumes any liability for the use of this software.
Maintenance of notice
- - - - - - - - - - -
In the interest of clarity regarding the origin and status of this
software, Stanford University requests that any recipient of it maintain
this notice affixed to any distribution by the recipient that contains a
copy or derivative of this software.