c-user: Add SMP application issues section

This commit is contained in:
Sebastian Huber 2017-02-02 10:46:05 +01:00
parent 7b1c63cf91
commit b033e3960b
3 changed files with 213 additions and 151 deletions

View File

@ -25,7 +25,7 @@ Glossary
manager are used to service signals. manager are used to service signals.
:dfn:`atomic operations` :dfn:`atomic operations`
Atomic operations are defined in terms of *ISO/IEC 9899:2011*. Atomic operations are defined in terms of :ref:`C11 <C11>`.
:dfn:`awakened` :dfn:`awakened`
A term used to describe a task that has been unblocked and may be scheduled A term used to describe a task that has been unblocked and may be scheduled
@ -61,6 +61,16 @@ Glossary
:dfn:`buffer` :dfn:`buffer`
A fixed length block of memory allocated from a partition. A fixed length block of memory allocated from a partition.
.. _C11:
:dfn:`C11`
The standard ISO/IEC 9899:2011.
.. _C++11:
:dfn:`C++11`
The standard ISO/IEC 14882:2011.
:dfn:`calling convention` :dfn:`calling convention`
The processor and compiler dependent rules which define the mechanism used The processor and compiler dependent rules which define the mechanism used
to invoke subroutines in a high-level language. These rules define the to invoke subroutines in a high-level language. These rules define the
@ -701,6 +711,13 @@ Glossary
:dfn:`timeslice` :dfn:`timeslice`
The application defined unit of time in which the processor is allocated. The application defined unit of time in which the processor is allocated.
.. _TLS:
:dfn:`TLS`
An acronym for Thread-Local Storage :cite:`Drepper:2013:TLS`. TLS is
available in :ref:`C11 <C11>` and :ref:`C++11 <C++11>`. The support for
TLS depends on the CPU port :cite:`RTEMS:CPU`.
:dfn:`TMCB` :dfn:`TMCB`
An acronym for Timer Control Block. An acronym for Timer Control Block.

View File

@ -271,156 +271,6 @@ tree of the task needing help and other resource trees in case tasks in need
for help are produced during this operation. Thus the worst-case latency in for help are produced during this operation. Thus the worst-case latency in
the system depends on the maximum resource tree size of the application. the system depends on the maximum resource tree size of the application.
Critical Section Techniques and SMP
-----------------------------------
As discussed earlier, SMP systems have opportunities for true parallelism which
was not possible on uniprocessor systems. Consequently, multiple techniques
that provided adequate critical sections on uniprocessor systems are unsafe on
SMP systems. In this section, some of these unsafe techniques will be
discussed.
In general, applications must use proper operating system provided mutual
exclusion mechanisms to ensure correct behavior. This primarily means the use
of binary semaphores or mutexes to implement critical sections.
Disable Interrupts and Interrupt Locks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A low overhead means to ensure mutual exclusion in uni-processor configurations
is to disable interrupts around a critical section. This is commonly used in
device driver code and throughout the operating system core. In SMP
configurations, however, disabling the interrupts on one processor has no
effect on other processors. So, this is insufficient to ensure system wide
mutual exclusion. The macros
- ``rtems_interrupt_disable()``,
- ``rtems_interrupt_enable()``, and
- ``rtems_interrupt_flush()``
are disabled in SMP configurations and its use will lead to compiler warnings
and linker errors. In the unlikely case that interrupts must be disabled on
the current processor, then the
- ``rtems_interrupt_local_disable()``, and
- ``rtems_interrupt_local_enable()``
macros are now available in all configurations.
Since disabling of interrupts is not enough to ensure system wide mutual
exclusion on SMP, a new low-level synchronization primitive was added - the
interrupt locks. They are a simple API layer on top of the SMP locks used for
low-level synchronization in the operating system core. Currently they are
implemented as a ticket lock. On uni-processor configurations they degenerate
to simple interrupt disable/enable sequences. It is disallowed to acquire a
single interrupt lock in a nested way. This will result in an infinite loop
with interrupts disabled. While converting legacy code to interrupt locks care
must be taken to avoid this situation.
.. code-block:: c
:linenos:
void legacy_code_with_interrupt_disable_enable( void )
{
rtems_interrupt_level level;
rtems_interrupt_disable( level );
/* Some critical stuff */
rtems_interrupt_enable( level );
}
RTEMS_INTERRUPT_LOCK_DEFINE( static, lock, "Name" );
void smp_ready_code_with_interrupt_lock( void )
{
rtems_interrupt_lock_context lock_context;
rtems_interrupt_lock_acquire( &lock, &lock_context );
/* Some critical stuff */
rtems_interrupt_lock_release( &lock, &lock_context );
}
The ``rtems_interrupt_lock`` structure is empty on uni-processor
configurations. Empty structures have a different size in C
(implementation-defined, zero in case of GCC) and C++ (implementation-defined
non-zero value, one in case of GCC). Thus the
``RTEMS_INTERRUPT_LOCK_DECLARE()``, ``RTEMS_INTERRUPT_LOCK_DEFINE()``,
``RTEMS_INTERRUPT_LOCK_MEMBER()``, and ``RTEMS_INTERRUPT_LOCK_REFERENCE()``
macros are provided to ensure ABI compatibility.
Highest Priority Task Assumption
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On a uniprocessor system, it is safe to assume that when the highest priority
task in an application executes, it will execute without being preempted until
it voluntarily blocks. Interrupts may occur while it is executing, but there
will be no context switch to another task unless the highest priority task
voluntarily initiates it.
Given the assumption that no other tasks will have their execution interleaved
with the highest priority task, it is possible for this task to be constructed
such that it does not need to acquire a binary semaphore or mutex for protected
access to shared data.
In an SMP system, it cannot be assumed there will never be a single task
executing. It should be assumed that every processor is executing another
application task. Further, those tasks will be ones which would not have been
executed in a uniprocessor configuration and should be assumed to have data
synchronization conflicts with what was formerly the highest priority task
which executed without conflict.
Disable Preemption
~~~~~~~~~~~~~~~~~~
On a uniprocessor system, disabling preemption in a task is very similar to
making the highest priority task assumption. While preemption is disabled, no
task context switches will occur unless the task initiates them
voluntarily. And, just as with the highest priority task assumption, there are
N-1 processors also running tasks. Thus the assumption that no other tasks will
run while the task has preemption disabled is violated.
Task Unique Data and SMP
------------------------
Per task variables are a service commonly provided by real-time operating
systems for application use. They work by allowing the application to specify a
location in memory (typically a ``void *``) which is logically added to the
context of a task. On each task switch, the location in memory is stored and
each task can have a unique value in the same memory location. This memory
location is directly accessed as a variable in a program.
This works well in a uniprocessor environment because there is one task
executing and one memory location containing a task-specific value. But it is
fundamentally broken on an SMP system because there are always N tasks
executing. With only one location in memory, N-1 tasks will not have the
correct value.
This paradigm for providing task unique data values is fundamentally broken on
SMP systems.
Classic API Per Task Variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Classic API provides three directives to support per task variables. These are:
- ``rtems_task_variable_add`` - Associate per task variable
- ``rtems_task_variable_get`` - Obtain value of a a per task variable
- ``rtems_task_variable_delete`` - Remove per task variable
As task variables are unsafe for use on SMP systems, the use of these services
must be eliminated in all software that is to be used in an SMP environment.
The task variables API is disabled on SMP. Its use will lead to compile-time
and link-time errors. It is recommended that the application developer consider
the use of POSIX Keys or Thread Local Storage (TLS). POSIX Keys are available
in all RTEMS configurations. For the availablity of TLS on a particular
architecture please consult the *RTEMS CPU Architecture Supplement*.
The only remaining user of task variables in the RTEMS code base is the Ada
support. So basically Ada is not available on RTEMS SMP.
OpenMP OpenMP
------ ------
@ -521,6 +371,197 @@ the heir thread must be used during interrupt processing. For this purpose a
temporary per-processor stack is set up which may be used by the interrupt temporary per-processor stack is set up which may be used by the interrupt
prologue before the stack is switched to the interrupt stack. prologue before the stack is switched to the interrupt stack.
Application Issues
==================
Most operating system services provided by the uni-processor RTEMS are
available in SMP configurations as well. However, applications designed for an
uni-processor environment may need some changes to correctly run in an SMP
configuration.
As discussed earlier, SMP systems have opportunities for true parallelism which
was not possible on uni-processor systems. Consequently, multiple techniques
that provided adequate critical sections on uni-processor systems are unsafe on
SMP systems. In this section, some of these unsafe techniques will be
discussed.
In general, applications must use proper operating system provided mutual
exclusion mechanisms to ensure correct behavior.
Task variables
--------------
Task variables are ordinary global variables with a dedicated value for each
thread. During a context switch from the executing thread to the heir thread,
the value of each task variable is saved to the thread control block of the
executing thread and restored from the thread control block of the heir thread.
This is inherently broken if more than one executing thread exists.
Alternatives to task variables are POSIX keys and :ref:`TLS <TLS>`. All use
cases of task variables in the RTEMS code base were replaced with alternatives.
The task variable API has been removed in RTEMS 4.12.
Highest Priority Thread Never Walks Alone
-----------------------------------------
On a uni-processor system, it is safe to assume that when the highest priority
task in an application executes, it will execute without being preempted until
it voluntarily blocks. Interrupts may occur while it is executing, but there
will be no context switch to another task unless the highest priority task
voluntarily initiates it.
Given the assumption that no other tasks will have their execution interleaved
with the highest priority task, it is possible for this task to be constructed
such that it does not need to acquire a mutex for protected access to shared
data.
In an SMP system, it cannot be assumed there will never be a single task
executing. It should be assumed that every processor is executing another
application task. Further, those tasks will be ones which would not have been
executed in a uni-processor configuration and should be assumed to have data
synchronization conflicts with what was formerly the highest priority task
which executed without conflict.
Disabling of Thread Pre-Emption
-------------------------------
A thread which disables pre-emption prevents that a higher priority thread gets
hold of its processor involuntarily. In uni-processor configurations, this can
be used to ensure mutual exclusion at thread level. In SMP configurations,
however, more than one executing thread may exist. Thus, it is impossible to
ensure mutual exclusion using this mechanism. In order to prevent that
applications using pre-emption for this purpose, would show inappropriate
behaviour, this feature is disabled in SMP configurations and its use would
case run-time errors.
Disabling of Interrupts
-----------------------
A low overhead means that ensures mutual exclusion in uni-processor
configurations is the disabling of interrupts around a critical section. This
is commonly used in device driver code. In SMP configurations, however,
disabling the interrupts on one processor has no effect on other processors.
So, this is insufficient to ensure system-wide mutual exclusion. The macros
* :ref:`rtems_interrupt_disable() <rtems_interrupt_disable>`,
* :ref:`rtems_interrupt_enable() <rtems_interrupt_enable>`, and
* :ref:`rtems_interrupt_flash() <rtems_interrupt_flash>`.
are disabled in SMP configurations and its use will cause compile-time warnings
and link-time errors. In the unlikely case that interrupts must be disabled on
the current processor, the
* :ref:`rtems_interrupt_local_disable() <rtems_interrupt_local_disable>`, and
* :ref:`rtems_interrupt_local_enable() <rtems_interrupt_local_enable>`.
macros are now available in all configurations.
Since disabling of interrupts is insufficient to ensure system-wide mutual
exclusion on SMP a new low-level synchronization primitive was added --
interrupt locks. The interrupt locks are a simple API layer on top of the SMP
locks used for low-level synchronization in the operating system core.
Currently, they are implemented as a ticket lock. In uni-processor
configurations, they degenerate to simple interrupt disable/enable sequences by
means of the C pre-processor. It is disallowed to acquire a single interrupt
lock in a nested way. This will result in an infinite loop with interrupts
disabled. While converting legacy code to interrupt locks, care must be taken
to avoid this situation to happen.
.. code-block:: c
:linenos:
#include <rtems.h>
void legacy_code_with_interrupt_disable_enable( void )
{
rtems_interrupt_level level;
rtems_interrupt_disable( level );
/* Critical section */
rtems_interrupt_enable( level );
}
RTEMS_INTERRUPT_LOCK_DEFINE( static, lock, "Name" )
void smp_ready_code_with_interrupt_lock( void )
{
rtems_interrupt_lock_context lock_context;
rtems_interrupt_lock_acquire( &lock, &lock_context );
/* Critical section */
rtems_interrupt_lock_release( &lock, &lock_context );
}
An alternative to the RTEMS-specific interrupt locks are POSIX spinlocks. The
:c:type:`pthread_spinlock_t` is defined as a self-contained object, e.g. the
user must provide the storage for this synchronization object.
.. code-block:: c
:linenos:
#include <assert.h>
#include <pthread.h>
pthread_spinlock_t lock;
void smp_ready_code_with_posix_spinlock( void )
{
int error;
error = pthread_spin_lock( &lock );
assert( error == 0 );
/* Critical section */
error = pthread_spin_unlock( &lock );
assert( error == 0 );
}
In contrast to POSIX spinlock implementation on Linux or FreeBSD, it is not
allowed to call blocking operating system services inside the critical section.
A recursive lock attempt is a severe usage error resulting in an infinite loop
with interrupts disabled. Nesting of different locks is allowed. The user
must ensure that no deadlock can occur. As a non-portable feature the locks
are zero-initialized, e.g. statically initialized global locks reside in the
``.bss`` section and there is no need to call :c:func:`pthread_spin_init`.
Interrupt Service Routines Execute in Parallel With Threads
-----------------------------------------------------------
On a machine with more than one processor, interrupt service routines (this
includes timer service routines installed via :ref:`rtems_timer_fire_after()
<rtems_timer_fire_after>`) and threads can execute in parallel. Interrupt
service routines must take this into account and use proper locking mechanisms
to protect critical sections from interference by threads (interrupt locks or
POSIX spinlocks). This likely requires code modifications in legacy device
drivers.
Timers Do Not Stop Immediately
------------------------------
Timer service routines run in the context of the clock interrupt. On
uni-processor configurations, it is sufficient to disable interrupts and remove
a timer from the set of active timers to stop it. In SMP configurations,
however, the timer service routine may already run and wait on an SMP lock
owned by the thread which is about to stop the timer. This opens the door to
subtle synchronization issues. During destruction of objects, special care
must be taken to ensure that timer service routines cannot access (partly or
fully) destroyed objects.
False Sharing of Cache Lines Due to Objects Table
-------------------------------------------------
The Classic API and most POSIX API objects are indirectly accessed via an
object identifier. The user-level functions validate the object identifier and
map it to the actual object structure which resides in a global objects table
for each object class. So, unrelated objects are packed together in a table.
This may result in false sharing of cache lines. The effect of false sharing
of cache lines can be observed with the `TMFINE 1
<https://git.rtems.org/rtems/tree/testsuites/tmtests/tmfine01>`_ test program
on a suitable platform, e.g. QorIQ T4240. High-performance SMP applications
need full control of the object storage :cite:`Drepper:2007:Memory`.
Therefore, self-contained synchronization objects are now available for RTEMS.
Directives Directives
========== ==========

View File

@ -284,3 +284,7 @@
year = {2016}, year = {2016},
url = {https://hal.archives-ouvertes.fr/hal-01295194/document}, url = {https://hal.archives-ouvertes.fr/hal-01295194/document},
} }
@misc{RTEMS:CPU,
title = {{RTEMS CPU Architecture Supplement}},
url = {https://docs.rtems.org/branches/master/cpu-supplement.pdf},
}