
e-mail: <iripoll@disca.upv.es>
e-mail: <pisa@cmp.felk.cvut.cz>
e-mail: <luca@ocera.org>
e-mail: <pj@sssup.it>
e-mail: <lANUSSE@ortolan.cea.fr>
e-mail: <ssaez@disca.upv.es>
Copyright © 2002 OCERA
Table 1. Project Co-ordinator
Organisation: | UPVLC |
Responsible person: | Alfons Crespo |
Address: | Camino Vera, 14. CP: 46022, Valencia, Spain |
Phone: | +34 9877576 |
Fax: | +34 9877579 |
| E-mail: | alfons@disca.upv.es |
Table 2. Participant List
| Role | Id. | Name | Acronym | Country |
|---|---|---|---|---|
| CO | 1 | Universidad Politécnica de Valencia | UPVLC | E |
| CR | 2 | Scuola Superiore S. Anna | SSSA | I |
| CR | 3 | Czech Technical University in Prague | CTU | CZ |
| CR | 4 | CEA | CEA | FR |
| CR | 5 | UNICONTROLS | UC | CZ |
| CR | 6 | MNIS | MNIS | FR |
| CR | 7 | VISUAL TOOLS S.A. | VT | E |
The objective of this workpackage is to make a study of the state of the art of real-time technology which is made available by the research community, and to determine what types of mechanisms actually turn out to be most useful for real-time applications. In concrete, this workpackage will analyse the real-time operating systems (RTOS) features and extract the main characteristics that will be included in the OCERA development.
The project is to produce and integrate prototypes of various innovative real-time techniques, research areas of particular interest include scheduling, resource management, fault-tolerance and communication with real-time constraints, which are identified as the key elements to provide predictable and high performance distributed real-time operating systems.
RTOS is a generic term for a set of operating systems that provide support for real-time applications. There is a wide range of RTOS, from the small and simple enough to fit in a few kilobytes of memory that can run on simple processors, to the high-end range RTOS that provides full graphical user interface that require several megabytes of RAM and powerful processors (MMU, protected mode, etc.).
Originally, Linux was designed to be used in a server or desktop environment. Since then, Linux has evolved and grow to be used in almost all the computer areas, among others, in embedded systems, parallel clusters, realtime systems, etc.
There is a large group of researchers, hackers and companies adding realtime capabilities: reducing the memory requirement, porting Linux to embedded processors, improving the response time, etc. Real Time Linux is just another piece of software developed in a Open-Source development methodology. There is not a single company or research group that concentrates all the development of realtime Linux, but a set of not connected, overlapped or even rival implementations are taking place simultaneously.
The two main contributions of this working package are:
A list of features that are available in commercial RTOS.
A detailed description of the real-time features already implemented in Linux. Features that are included in the Linux kernel by default or that are distributed separately.
The rest of this deliverable is organised as follows: In the first part a summary of the POSIX and OSEK standard followed by the list of main features we are interested and are taken into consideration in the RTOS analysis. Several summary tables are presented in the last section of the fist part. In the second part of the deliverable a study of each RTOS is presented.
The conclusions of this deliverable can be found in the deliverable D3.2 "Functionality Not Available in Open RTOS" of the Workpackage 3 "Market Analysis".
POSIX, that stands for Portable Operating System Interface, is a standard that is being jointly developed by the IEEE and The Open Group. It defines a standard operating system interface and environment, including a command interpreter (or "shell"), and common utility programs to support applications portability at the source code level. The current revision of POSIX is The Open Group Base Specifications Issue 6 and also the IEEE Std 1003.1-2001.
This standard is composed by four major components:
Base Definitions: This include general terms, concepts and interfaces common to entire standard.
System Interfaces: This comprises the definitions for system service functions for the C programming language, function and portability issues, error handling and recovery.
Shell and Utilities: It contains the definitions for a standard source code-level interface to command interpretation services.
Rationale: It contains information that does not fit well into the rest of the document structure.
The IEEE Std 1003.1-2001 standard is a single common revision to IEEE Std 1003.1-1996, IEEE Std 1003.2-1992, and the Base Specifications of The Open Group Single UNIX Specification, Version 2. In order to develope the current revision several base documents has been used. The base documents that are involved in the definition of system interfaces are:
IEEE Std 1003.1-1996 (POSIX-1) (incorporating IEEE Stds 1003.1-1990, 1003.1b-1993, 1003.1c-1995, and 1003.1i-1995)
The following amendments to the POSIX.1-1990 standard:
IEEE P1003.1a draft standard (Additional System Services)
IEEE Std 1003.1d-1999 (Additional Realtime Extensions)
IEEE Std 1003.1g-2000 (Protocol-Independent Interfaces (PII))
IEEE Std 1003.1j-2000 (Advanced Realtime Extensions)
IEEE Std 1003.1q-2000 (Tracing)
Open Group Technical Standard, February 1997, System Interface Definitions, Issue 5 (XBD5)
Open Group Technical Standard, February 1997, System Interfaces and Headers, Issue 5 (XSH5)
Open Group Technical Standard, January 2000, Networking Services, Issue 5.2 (XNS5.2)
ISO/IEC 9899:1999, Programming Languages - C.
As it can be observed this standard includes support for source portability of applications with realtime requirements, but this support is maintly optional for POSIX-conforming implementations. The specific functional areas included for realtime support and their definitions[1] are basically the following:
Semaphores.
A minimum synchronization primitive to serve as a basis for more complex synchronization mechanisms to be defined by the application program.
Process Memory Locking.
A performance improvement facility to bind application programs into the high-performance random access memory of a computer system. This avoids potential latencies introduced by the operating system in storing parts of a program that were not recently referenced on secondary memory devices.
Memory Mapped Files.
A facility to allow applications to access files as part of the address space.
Shared Memory Objects
An object that represents memory that can be mapped concurrently into the address space of more than one process.
Priority Scheduling.
A performance and determinism improvement facility to allow applications to determine the order in which threads that are ready to run are granted access to processor resources.
Realtime Signal Extension.
A determinism improvement facility to enable asynchronous signal notifications to an application to be queued without impacting compatibility with the existing signal functions.
Timers.
A mechanism that can notify a thread when the time as measured by a particular clock has reached or passed a specified value, or when a specified amount of time has passed.
Interprocess Communication.
A functionality enhancement to add a high-performance, deterministic interprocess communication facility for local communication.
Synchronized Input and Output.
A determinism and robustness improvement mechanism to enhance the data input and output mechanisms, so that an application can ensure that the data being manipulated is physically present on secondary mass storage devices.
Asynchronous Input and Output.
A functionality enhancement to allow an application process to queue data input and output commands with asynchronous notification of completion.
Another optional support that can be interesting for the developemnt of embedded applications is the Threads support. This extension to POSIX defines functionality to support multiple flows of control within a process. These flows of control are called threads and they share their address space and most of the resources and attributes defined in the operating system for the owner process.
The specific functional areas included in threads support are:
Thread management: the creation, control, and termination of multiple flows of control that share a common address space.
Synchronization primitives optimized for tightly coupled operation of multiple control flows in a common, shared address space.
Finally, the IEEE Std 1003.1-2001 standard also proposes a set of tracing facilities that can be quite useful at the development stage of a embedded real-time application. The tracing facilities defined in the standard allow a process to select a set of trace event types, to activate a trace stream of the selected trace events as they occur in the flow of execution, and to retrieve the recorded trace events. A trace event is a data object that represents an action executed by the system, and that is recorded in a trace stream. The trace events can be retrieved later from the trace stream, allowing the system behaviour analysis.
All these functionalities are not mandotory in a POSIX-conforming implementation, but defined as a set of options that may be supported by that system. In this line, the IEEE Std 1003.1-2001 standard defines several XSI extensions that groups together several of these options in so-called XSI Option Groups. The option groups that can be of interest for embedded real-time applications are described in the next section.
The X/Open System Interface is the core application programming interface for systems conforming to the Single UNIX Specification. This is a superset of the mandatory requirements for conformance to IEEE Std 1003.1-2001.
A system that wants to be a XSI-conforming implementation shall meet the criteria for POSIX conformance and support all functions and headers defined in IEEE Std 1003.1-2001 as part of the XSI extension. Additionally, it shall support the following options defined in the standard: File Synchronization, Memory Mapped Files, Memory Protection, Threads, Thread Process-Shared Synchronization, Thread Stack Address Attribute and Thread Stack Address Size.
Despite of these options that are mandatory to be a XSI-conforming system, the system may also support one or more of the following XSI Option Groups: Encryption, Realtime, Advanced Realtime, Realtime Threads, Advanced Realtime Threads, Tracing, XSI STREAMS and Legacy. All the realtime option groups jointly with the tracing option group are clearly of great interest for developing embedded real-time applications. The options each option groups requires are detailed next:
The options of IEEE Std 1003.1-2001 that are grouped together in this option group are: Asynchronous, Synchronized and Prioritized Input and Output, File Synchronization, Memory Mapped Files, Shared Memory Objects, Process and Range Memory Locking, Memory Protection, Semaphores, Timers, Realtime Signals Extension, Message Passing and Process Scheduling.
The options of the standard that are grouped together in the Advanced Realtime option group are: Advisory Information, Clock Selection, Process CPU-Time Clocks, Monotonic Clock, Timeouts, Typed Memory Objects, Spawn and Process Sporadic Server.
This option group includes the following options: Thread Priority Inheritance and Protection, and Thread Execution Scheduling.
The Advanced Realtime Threads option group requires the following POSIX options: Thread CPU-Time Clocks, Thread Sporadic Server, Spin Locks and Barriers.
This option group includes the following tracing facility options: Trace, Trace Event Filter, Trace Inherit and Trace Log.
OSEK/VDX [2] is a joint project of the automotive industry that aims to the definition of an industry standard for an open-ended architecture for distributed control units in vehicles.
The objective of the standard is to describe an environment which supports efficient utilization of resources for automotive control unit application software. This standard can be viewed as a set of API for real-time operating system (OSEK) integrated on a network management system (VDX) that together describes the characteristics of a distributed environment that can be used for developing automotive applications.
The typical applications that have to be implemented have tight real-time constraints and an high criticality (for example, a power-train application). Moreover, these applications have to be made in a huge number of unit, therefore there is a need to reduce the memory footprint to a minimum enhancing as possible the OS performance.
Here are some keywords that helps to better characterize the philosophy that drove the main architectural choices of the OSEK Operating System:
The operating system is intended for use on a wide range control units (either system with minimal hardware resources like RAM, ROM, CPU time, i.e. 8 bit microcontrollers). To support a wide range of systems the standard defines four conformance classes that tightly specifies the main features of an OS. Note that memory protection is not supported at all.
The standard specifies an ISO/ANSI-C interface between the application and the operating system that is identical in all the implementations of the OS. The aim of this interface is to give the ability to transfer an application software from one ECU to another ECU without bigger changes inside the application. Due to the wide variety of hardware where the OS has to work in, the standard does not specify any interface for the Input/Output subsystem. Note that this fact reduces (if not prohibits) the portability of the application source code, since the I/O system is one of the main software part that impacts on the architecture of the software. We can say that the prime focus is not to achieve 100% compatibility between the application modules, but to ease their direct portability between compliant operating systems.
Another prerequisite needed to adapt the OS to a wide range of hardware is a high degree of modularity and configurability. This configurability is reflected by the toolchain proposed by the OSEK standard, where some configuration tools help the designer in tuning the system services and the system footprint. Moreover, a language called OIL (OSEK Implementation Language) is proposed to help the definition of a standardized configuration information.
All the OS objects and features are statically allocated. This fact allow to simplify all the OS: the number of application tasks, resources and services requested are defined at compile time. Note that this approach ease the implementation of an OS capable of running on ROM, and moreover it is completely different from a dynamic approach followed in other OS standards like for example POSIX.
The OSEK Standard provides the specification of OSEKTime OS, a time triggered OS that can be fully integrated in the OSEK/VDX framework.
In the following sections the main features of the OSEK/VDX standard will be analyzed in detail.
The architecture on which an OSEK Operating System is based can be viewed as a traditional fixed priority approach.
Each task in the system can be a basic task (BT) or an extended task (ET) (extended tasks are basic tasks that can react to external asynchronous events).
Every task in the system has assigned a fixed priority (statically assigned at compile time), and the scheduler always selects the higher priority task from the ready task queue. Interrupt service routines typically preempt the running task (except in case the running task uses resources).
To provide support for different features in the Operating system, the various requirements of the application in terms of number of tasks, memory consumption and like are listed in four conformance classes. The compliance of an OSEK OS is always stated with respect to one conformance class. Basically, conformance classes exist to allow partial implementations of the standard along pre-defined lines, creating an upgrade path from classes of lesser functionality to classes of higher functionality with no change to the application tasks.
The conformance classes specifies different requirements for the following attributes:
Multiple requesting task activations (only one activation or more than one)
Task types (basic tasks only or basic and extended tasks)
Number of tasks per priority (one or more than one)
The following conformance classes are defined by the standard :
Only basic tasks limited to one activation request per task and one task per priority, while all tasks have different priorities.
Like BCC1, plus more than one activation request per task and more than one task per priority.
Like BCC1, plus extended tasks.
Like ECC1, plus more than one task per priority and multiple requesting of task activation allowed for basic tasks.
In the OSEK OS, a task provides the framework for the concurrent and asynchronous execution of functions. The Scheduler is then responsible for scheduling tasks following a well defined scheduling algorithm.
The OSEK operating system provides two kind of tasks: basic tasks and extended tasks. The only difference between the two concepts is that extended tasks are allowed to use the operating system call WaitEvent(). Basically that call allow an extended task to release the CPU waiting for an asynchronous event without terminating the current instance.
Each task in the system has assigned a fixed priority (statically assigned at compile time; the value 0 is defined as the lowest priority of a task), and it can be preemptive or non-preemptive. If the running task is preemptive the scheduler always made a preemption when needed, otherwise it reschedules the system at the end of the running task instance. A preemptive task can disable preemption for a while locking a resource called RES_SCHEDULER.
In any moment of its life a task is characterized by its state. The OSEK standard defines four task states:
In the running state, the CPU is assigned to the task, so that its instructions can be executed. Only one task can be in this state at any point in time, while all the other states can be adopted simultaneously by several tasks.
All functional prerequisites for a transition into the running state exist, and the task only waits for allocation of the processor. The scheduler decides which ready task is executed next.
A task cannot continue execution because it has to wait for at least one event. Only Extended tasks can jump into this state (because they are the only that can use events).
In the suspended state the task is passive and can be activated.
Note that basic tasks have no waiting state: a basic task can only represent a synchronization point at the beginning and at the end of the task. Application parts with internal synchronization points have to be implemented by more than one basic task. An advantage of extended tasks is that they can handle a coherent job in a single task, no matter which synchronization requests are active. Whenever current information for further processing is missing, the extended task switches over into the waiting state. It exits this state whenever corresponding events signal the receipt or the update of the desired data or events.
Depending on the conformance class a basic task can be activated once or multiple times. The latter means that an activation issued when a task is not in the suspended state will be recorded and then executed when the task will finish the current instance.
The termination of a task instance only happen when a task terminates itself (to simplify the OS, no explicit task kill primitives are provided).
The OSEK Operating system gives a support for Application Modes. In real applications, an embedded system may execute different applications in a mutually exclusive way (for example, the normal operation, a factory test, and like). The application mode is a means to structure the software running in the system according to those different conditions and are a clean mechanism for development of totally separate systems. Once the operating system has been started, it is not allowed to change the application mode. Typically each application mode uses its own subset of tasks, ISRs, alarms and timing conditions, although if some kind of sharing between modes is possible.
The start up performance is another safety critical issue for embedded system in automotive applications since reset conditions may occur during normal operation (for example, a power-train application should be capable of rebooting the whole system in a few microseconds, because the system must safely control the spark on the engine cylinders). The system startup is completely left to the particular implementation, although if some hint is given on how design the boot-up sequence. In any case the standard suggest the avoidance of lengthy or complicated starting procedures.
Since the standard must be suitable for different microcontrollers, the specification of interrupt handling routines only cover the general approach that a compliant OS should follow, without coping with any hardware related issues.
In particular, the standard provides two kind of ISR handlers:
The ISR does not use an operating system service. In practice, the OS does not handle these interrupts, and the designer is free to write his handler, with the only restriction that he can not call any OS service. Typically, these are the fastest highest priority interrupts.
The ISR is handled by the system, so OS calls can be called from the handler.
Inside any ISR no rescheduling will take place. Rescheduling takes place on termination of the ISR category 2 if a preemptable task has been interrupted and if no other interrupt is active. At the end of the ISR category 1 no rescheduling takes place too, and this is the reason because ISR category 1 should have the highest priority in a correct design.
The event mechanism is only provided for extended tasks and can be used to communicate binary information that synchronize these tasks on asynchronous events. Each extended task owns a set of events, that can be triggered by other (basic and extended) tasks or by ISR of category 2.
The typical behavior of an extended task is to wait for asynchronous events calling the OS service WaitEvent(). This service usually blocks the task until an event arrives. After servicing the event, the task calls again WaitEvent() to wait other events.
Events can be set only if the task is not in the suspended state. This seems to suggest that an extended task should never be in the suspended state.
The scheduler decides on the basis of the task priority which is the next of the ready tasks to be transferred into the running state (dynamic priority management is not supported). Tasks on the same priority level are started depending on their order of activation.
The OSEK standard provides four flavors of fixed priority scheduling, outlined below:
Full preemptive scheduling means that the running task may be rescheduled at any instruction by the arrival of high priority tasks.
Non preemptive scheduling means that task switching is only performed via one of a selection of explicitly defined system services (like task termination, explicit call to the scheduler and arrival of an event that wakes up an extended task).
Since preemptiveness is a task attribute, preemptive and non-preemptive tasks can be mixed in the same application. The running task will influence the policy really used.
This scheduling policy is very similar to the preemption threshold technology, where threshold values are implemented using the OSEK Priority Ceiling protocol together with internal resources locked and unlocked at the start and at the end of every task instance.
The standard provides support for binary resources that can be used to implement critical sections. Priority inversion and deadlock are avoided using a variant of the SRP called OSEK Priority Ceiling.
The protocol in fact is a version of SRP adapted to fixed priority:
every resource as assigned a ceiling that is the maximum priority of the tasks (and ISRs) that use the resource;
when a task requires a resource, its current priority is raised to the ceiling of the resource;
when a task releases a resource, the priority of this task is reset to the priority which was dynamically assigned before requiring that resource.
The normal properties of SRP applies to the protocol. In particular, Priority inversion, chained blocking and deadlocks are avoided. Moreover, there is no need for waiting queues, since a task can be scheduled only when all the resources it needs are free.
Resources are typically used by task only. In the OSEK standard, resources can be used either by a task or by an ISR of category 2. An ISR that use a resource can be thought as an high priority task: its execution can be delayed due to lower ISRs or tasks accessing resources with ceiling greater or equal than the IRS priority. This is the natural behavior in those systems where tasks activations and priorities are mapped on interrupts, and the raising of task's priorities is done with a proper programming of the interrupt controller.
The OSEK standard also provides a support for Preemption Thresholds through the use of internal resources. An internal resource is simply a resource that is locked when a task instance starts, and is unlocked when the task instance ends. The ceiling of the internal resources can be thought as the Preemption Threshold of the tasks.
In the same way the standard provides a special resource called RES_SCHEDULER that can be used to disable preemption. In practice, the RES_SCHEDULER is a resource with ceiling equal to the maximum priority in the system. In the same way a non preemptive task can be thought as a task that use an internal resource with the same ceiling of RES_SCHEDULER.
Finally note that, although a technique similar to Preemption Threshold is used, stack sharing between tasks of a same Non Preemption Group can not be exploited due of the OS calls WaitEvent() and Schedule() [3]. In fact, these calls releases the internal resource taken by a task, letting execution to more than one task in the same Non Preemption Group.
The OSEK operating system provides services for processing recurring events (for example, timers that provide an interrupt at regular intervals, or encoders at axles that generate an interrupt in case of a constant change of an angle). These events are recorded into implementation dependent counters, then used by software alarms. When an alarm (that can be one-shot or periodic) fires, a task can be activated, or an event can be set, or finally an alarm-callback routine can be called. Alarms and counters are statically defined at compile time. The only dynamic parameters that can be set are when an alarm has to expire and the period of a cyclic alarm.
To ease the tracing and the debugging of the system the OSEK standard provides system specific hook routines to allow user-defined actions within the OS internal processing. These hook routines are called by the operating system and they are composed by user code that is executed into an OS primitive, usually with ISR of category 2 disabled. These routines are only allowed to use a subset of API functions (mainly they can use functions for get internal OS states, to ease the tracing of the application). They are called at system startup, at system shutdown, before and after a preemption, and in case of an error. In particular, two different kinds of errors are distinguished:
The operating system could not execute the requested service correctly, but assumes the correctness of its internal data.
The operating system can no longer assume correctness of its internal data. In this case the operating system calls the centralized system shutdown.
The standard gives two ways of handling errors: a centralized way (using an Error Hook that is called every time an error occurs in a system primitive), and a decentralized way (where the application code must check itself for the correctness of the return value of every primitive).
The OSEK standard comprises also an agreement on interfaces and protocols for in-vehicle communication called OSEK COM. The term in-vehicle communication means both communication between nodes and internal communication in a node of the whole vehicle. The basic idea is to provide a standardized API for software communication that is independed from the particular communication media used in a way to ease porting of applications between different hardwares.
The OSEK COM standard is composed by:
An Interaction layer which provides communication services for the transfer of application messages.
A Network layer which provides services for the unacknowledged and segmented transfer of application messages. The network layer provides flow control mechanisms to enable interfacing of communication peers featuring different level of performance and capabilities.
A Data link layer interface which provides services for the unacknowledged transfer of individual data packets over a network to the layers above.
OSEK COM provides a rich set of communication facilities but it is likely that many applications will only require a subset of this functionality. For that reason, the standard defines a set of conformance classes to enable the integration of OSEK COM in systems featuring various levels of capabilities in a scalable way, enabling the car producer to integrate software parts produced by different suppliers.
OSEK COM defines these levels as Communication Conformance Classes (CCCs). The main purpose of the conformance classes is to ensure that applications which have been for a particular conformance class are portable across different OSEK implementations and ECUs featuring that same or higher level of communication functionality. OSEK COM defines five communication conformance classes to provide support from ECU internal communication only (CCCA) up to inter-ECU external communication (CCC2).
For an OSEK implementation to be compliant, message handling for intra processor communication has to be offered. The minimum functionality required is CCCA as described in the OSEK COM specification.
The OSEK standard also cover a standardization of basic and non-competitive infrastructure between the various embedded systems that can be present in a vehicle. In fact, very often electronic control units made by different manufacturers are networked within vehicles by serial data communication links.
For that reason the standard propose a Network Management system (OSEK NM) that provides standardised features which ensure the functionality of inter-networking by standardised interfaces.
The essential task of NM is to ensure the safety and the reliability of a communication network. This is obtained implementing access restriction to each node (access must be restricted only from authorized entities), keeping the whole network tolerant to faults, and implementing diagnostic features capable of monitoring the status of the network in an indirect (monitoring application messages) or in a direct way (monitoring by dedicated NM communication using token principle).
Moreover, the network management also cover the initialization of network resources, network configuration the co-ordination of global operation modes (e.g. network wide sleep mode), and a support for diagnosis.
The OSEK Standard produced a specification for a Time Triggered Operating System (OSEKtime OS) that aims to represent a uniform functioning environment for single processor distributed embedded control units with a fault-tolerant communication layer.
The OSEKtime operating system supports static scheduling and offers all basic services for real-time applications, i.e., interrupt handling, dispatching, system time and clock synchronization, local message handling, and error detection mechanisms.
The OSEKtime operating system serves as a basis for application programs which are independent of each other, and provides their environment on a processor. The are two types of entities: interrupt services routines managed by the operating system and time triggered tasks.
The Osek Operating system can coexist with a OSEKTime OS, for handling both time triggered and event-driven computations on the same Embedded Control Unit. Basically, the interface of the OSEK OS remains the same (apart from some small changes in the startup/shutdown procedures). The main concept is that the OSEKTime OS assigns its idle time to the OSEK OS.
The OSEK OS, its tasks and its interrupts have always a lower priority than the similar entities in the OSEKTime OS. Non-preemptive tasks remains non-preemptive only in the OSEK/VDX domain, and they can be preempted bt the Time Triggered dispatcher and by the Time Triggered Tasks.
One of the first conclusions achieved in the first project meeting was then importance of the standards, and in particular the realtime extensions defined in the POSIX standard. POSIX is a mature, well developed, independent set of standards then are followed by most of the UNIX industry. Also the use of already existing standards in the open source community is a must.
OSEK is a RTOS specification designed to fulfil the requirement of the automotive industry. It was designed for systems with small hardware resources like 8 bit processes with no MMU. Although there are some ports of Linux to small processors like MC68000 family, Linux and RTLinux are specially designed for mid-range to high range processors. This is why we decided to select POSIX as the reference standard used in this analysis.
POSIX stands for stands for Portable Operating System Interface, and is an IEEE standard designed to facilitate application portability. It is the codification and standardisation of the common core of UNIX™ APIs, and realtime OS. A detailed list and status of all the sub-standards that are part of POSIX can be found in the PACS web page. The subset of POSIX standards that are related to real time or embedded systems are listed below:
1003.1b: Realtime Extensions (IEEE Approved)
1003.1c: Threads (IEEE Approved)
1003.1d: Additional Realtime Extensions (IEEE Approved)
1003.1j: Advanced Realtime Extensions (IEEE Approved)
1003.1q: Tracing (IEEE Approved)
1003.5: Ada binding to 1003.1 (IEEE Approved)
1003.5a: Ada Update (IEEE Reaffirmed)
1003.5b: Ada Realtime (IEEE Reaffirmed)
Following is the list of features that has been studied in every analyzed RTOS:
General overview and architecture. The internal architecture of the RTOS determines the overall performance and the type of environments where it can be applied. For example, a RTOS with protected address space for processes will have longer context switches times than that with a flat memory space.
Hardware support
Processors supported.
Multiprocessor support.
Process management
Weighted and/or lightweight processes. Lightweight processes (threads) is an efficient method to implement concurrent systems due to the efficient resource management and fast context switching. The POSIX thread standard define many realtime features.
Scheduling policy. POSIX define four scheduling policies: SCHED_FIFO, SCHED_RR, SCHED_SPORADIC and SCHED_OTHER. What kind of scheduling policies provide for hard and soft tasks?
Periodic tasks. Most of the scheduling theory was developed to provide support to periodic tasks. POSIX do not provide specific system calls to implement periodic activities; it has to be implemented by mean of timers and signals. Do the system provide any method to implement periodic tasks?
Range of priorities. POSIX require at least 32 priorities.
Thread creation and cancellation. POSIX standard provides a lot of facilities to deal the concurrency problems related to the termination of the running tasks. Do the RTOS provide a way to start and terminate threads in a safe way?
Memory management
Memory protection. This is a mayor issue in realtime, since it is a key feature to provide fault tolerance.
Dynamic memory allocation. Fixed block size memory allocation? do provide the standard malloc() and free() functions?
Inter-process communication
Semaphores. The most common and versatile synchronisation primitive.
Mutex. Mutexes are a special type of semaphore used to protect critical sections. The main difference with normal semaphores is that the thread that locks the mutex becomes the owner of the mutex, and this thread is the only one that can unlock it. Which type of mutex are available (spin-lock, recursive, etc.)?
Priority inversion control. Do provide some type of priority inversion control in concurrency control primitives?
Message queues. Real time extensions of POSIX include a message queue interface.
Mailboxes
Shared memory Since normal weight processes are executed in a protected memory space and threads (lightweight processes) are executed in a single common memory space. This functionality is only applicable to systems with separate memory spaces.
Signals (POSIX signals)
Time and timers
Time facilities. How the time is handled is one of the most important issues in a RTOS: Absolute/relative delays, time resolution, how timers are managed, configurable timer resolution, types of clocks, etc.
Facilities to add new hardware timers. Embedded boards are usually equipped with special timer and watchdog hardware. Do the RTOS provide some utility to use it?
Driver programming (low level programming)
Interrupts. How interrupts are handled? What facilities provide the RTOS to write interrupt handlers?
System facilities to do IO. Embedded systems usually have some custom devices. The RTOS has to provide some facilities to allow the engineer to write its own device drivers.
Quality of Service
Network
Available protocols.
Filesystems
Filesystem support.
Reservation features.
Trace and debug
Program debugging. Debugging and trace utilities are fundamental features that a RTOS must provide. There are two characteristics that stress the need of these type of tools: on one hand, embedded systems use to have limited communication capabilities (sometime only a serial line); on the other hand realtime systems should contain no bugs, this is achieved by means of a careful design and extensive debugging.
Event/timing debugging. Timing correctness is as important as logical correctness in realtime systems. The RTOS must provide timing information about the execution of the application. Recently, POSIX.1q added tracing facilities that allow the user to trace system and user events.
Miscelanea
Graphic support.
Development environment. Most RTOS, commercial and open source, use GNU development utilities: gcc, binutils, gdb, etc.
Programming languages. The "C" language is available in all the RTOS, but more and more other languages like ADA or C++ are also used.
Compatibility with other RTOS APIS. Like pSOS, VxWorks, OSEK, self propietary API.
|
Hardware |
Multi-processor |
Scheduling |
Concurrency | |
|---|---|---|---|---|
|
Linux |
Alpha, ARM, i386, MIPS, PowerPC, Sparc, SuperH, Etrax, m68k, PA-RISC |
Yes |
SCHED_FIFO, SCHED_RR, SCHED_OTHER |
UNIX-processes & Pthreads |
|
RTLinux |
i386, PPC, ARM |
Yes |
SCHED_FIFO |
Pthreads |
|
RTAI |
i386, MIPS, PPC, ARM, m68k-nommu |
Yes |
Fixed priority |
Lightweight processes |
|
RTEMS 4.5+ |
M68k, ColdFire, SuperH, i386, MIPS, i960, PowerPC, SPARC, AMD A29K. PA-RISC |
Static allocation |
SCHED_FIFO, SCHED_RR, SCHED_OTHER |
Pthreads |
|
QNX |
I386, ARM, MIPS, PowerPC, SuperH |
Yes |
SCHED_FIFO, SCHED_RR, SCHED_OTHER |
UNIX-processes, Lightweight processes, Pthreads |
|
VxWorks 5.x |
m68k/CPU32/ColdFire/PowerPC, i386, ARM, SuperH, MIPS |
Optional |
Fixed priority, Rodun-robin |
Lightweight processes |
|
LynxOS |
i386, m68k, PowerPC, ARM, SPARC, PA-RISC |
Yes |
FIFO, Priority Quantum, Round-Robin, Non-preemptive |
UNIX-processes & Pthreads |
|
Priorities lower- higher |
Memory protection |
Dynamic memory |
Inter-process communication | |
|---|---|---|---|---|
|
Linux |
(0-100) |
Yes |
Yes |
Semaphores, Mutexes, Condition-var., shared-mem, signals, pipes. |
|
RTLinux |
(0-1000000) |
No |
No |
Semaphores, Mutexes, Condition-var., FIFO |
|
RTAI |
(0x3fffFfff-0) |
No |
Yes |
Semaphores, Mutexes, Condition-var., FIFO, Mailbox, shared-mem, net_rpc, Pqueues. |
|
RTEMS 4.5+ |
(255-1) (254-1) Posix |
No |
Yes |
Semaphores, Mutexes, Condition-var., Pqueues, Events. |
|
QNX |
(0-31) |
Yes |
Yes |
Semaphores, Mutexes, Condition-var., Barriers, Atomic operations, rd/wr-locks, Pqueues, shared-mem, FIFO. |
|
VxWorks 5.x |
(255-0) |
No |
Optional |
Semaphores, Mutexes, Message, RTSignals, |
|
LynxOS |
(0-255) |
Yes |
Yes |
Semaphores, Mutexes, shared-mem, Pqueues, signals, pipes, Condition-Var. |
|
Priority inversion control |
Time resolution |
Timers |
Low level programming | |
|---|---|---|---|---|
|
Linux |
None |
Configurable (HighResTimers) |
POSIX timers |
No interrupt programming. |
|
RTLinux |
Immediate ceiling |
Hardware dependant |
None |
Full control HW |
|
RTAI |
Inheritance |
Hardware dependant |
None |
Full control HW |
|
RTEMS 4.5+ |
Inheritance & Immediate ceiling |
Hardware dependant |
POSIX timers |
Full control HW |
|
QNX |
Immediate ceiling |
Configurable |
POSIX timers |
Interrupts can be handled by user processes. |
|
VxWorks 5.x |
Inheritance |
Configurable |
Watchdog timers, POSIX timers |
Full control HW |
|
LynxOS |
Inheritance |
Configurable |
POSIX timers |
POSIX-style threads of execution within the kernel for interrupt handling. |
|
QoS |
Network |
Filesystem | |
|---|---|---|---|
|
Linux |
FIFO, CBQ, CSZ, ATM, PRIO, RED, SFQ, TLE, TBF, GRED, Diffserv, Ingress, RSVP |
IP, UDP, TCP, SLIP, PPP, ICMP, DHCP, RARP, RARP, TFTP, RPC, FTP, HTTP |
ReiserFS, ext2, ext3, NFS, CIFS, ADFS, FAT, VFAT, NTFS, CRAMFS, ISO9660, MINIX, QNX4, ROM, JFS, XFS, Flash |
|
RTLinux |
None |
None |
None |
|
RTAI |
None |
IP, UDP |
None |
|
RTEMS 4.5+ |
None |
IP, UDP, TCP, SLIP, PPP, ICMP, DHCP, RARP, TFTP, RPC, FTP, HTTP, CORBA |
IMFS, DOSFS/FAT |
|
QNX |
None |
IP, UDP, TCP, ARP, ICMP, IGMP, QNET |
RAMFS, QNX4, DOS, ISO9660, ext2, NFS, CIFS, Flash |
|
VxWorks 5.x |
None |
TCP/IP IP, UDP, TCP, IGMP, ICMP, ARP RIP 1/2 SLIP, CSLIP, PPP BOOTP, DNS, DHCP, TFTP FTP, RLOGIN, RSH, TELNET |
FAT, NFS, raw, TrueFFS |
|
LynxOS |
None |
IP, UDP, TCP, ICMP, IGMP, ARP, RARP, DHCP, NAT, RPC, NTPv3, Raw, Zebra routing, TFTP |
Lynx Fast File system, ISO9660, NFS, RAMFS. |
|
Debug |
Languages |
API compatibilitiy | |
|---|---|---|---|
|
Linux |
GDB, DDD, Insight, System debugg, LTT |
C, C++, ADA, Java, |
POSIX 1003.1, VxWorks, pSOS |
|
RTLinux |
Simple trace, GDB |
C, C++ |
POSIX 1003.1c |
|
RTAI |
KGDB |
C |
Custom, POSIX 1003.1b |
|
RTEMS 4.5+ |
GDB, DDD, Debug over: ethernet, serial, BDM |
C, C++, ADA |
RTEID/ORKID,uITRON 3.0, POSIX 1003.1b |
|
QNX |
GDB, memory overrruns. |
C, C++, Java |
POSIX 1003.1, POSIX 1003.1b |
|
VxWorks 5.x |
GDB, Debug over: ethernet, serial, WindView, Trace |
C, C++ |
Propietary (VxWorks), POSIX 1003.1, POSIX 1003.1b |
|
LynxOS |
GDB, Insight, Debug over: ethernet, serial |
C, C++, ADA |
Propietary, POSIX.1/.1b/.1c, Unix BSD 4.3. ABI compatibility with Linux 2.4 |
Linux is a full-featured UNIX® implementation. The main design criteria of the Linux kernel is the throughput while realtime and predictability is not an issue. The main handicap to consider Linux as a realtime system is that the kernel is not preemptable; that is, while the processor executes kernel code, no other process or event can preempt kernel execution.
Although Linux is not a realtime system, it has some features, already included in the mainstream source code or distributed as patch files, designed to provide realtime to Linux. These are the features described in this section.
From the programmer point of view, there are two main programming paradigms to build a realtime application: weight-processes (normal UNIX processes) and lightweight-processes (known as threads or LWP). Linux provides support for both execution environments, mostly based on POSIX standards 1003.b and 1003.c respectively.
Most of the realtime extensions are not included into the standard "C" library "libc". These system calls are implemented and distributed in in the GNU "C" library (glibc) but are located in a separate library file called "librt". Therefore, to compile a program that makes use of realtime features, the compiler must be invoked flag the "-lrt".
From the very first versions of Linux, the scheduler was realtime POSIX compatible. It supports, among others, the fixed priority (SCHED_FIFO) policy, which is the base feature to build a realtime systems. It also provides the POSIX required: SCHED_RR and SCHED_OTHER. The range of priorities is [0..99].
A lot of work has been done to improve the performance of the scheduler through a careful design which yield to a new scheduler code and structure. Next stable Linux kernel (2.4.19), as well as unstable kernel development (2.5.x), will replace the old scheduler code with an improved "O(1) scheduler" developed by Ingo Molnar . This new scheduler is able to manage a large number of processes with no overhead degradation.
It is not possible to build realtime applications on a system with virtual memory. The random and long delays introduced when RAM is exhausted and swapping is required is intolerable in a realtime system. Linux provides the mlock() and mlockall() functions that disables paging for the specified range of memory, or for the whole process respectively. Therefore, all the "locked" memory will stay in RAM until the process exits or unlocks the memory.
mlock() and mlockall() are included in the POSIX realtime extensions.
One of the main communications paradigms used in realtime applications is shared memory.
Linux processes can share memory with each other and with drivers the POSIX.1b call mmap(). This function can be used both, to map into main memory a regular file and to map shared memory objects. When multiple processes map the same memory object (or file), they share access to the underlying data, which is a efficient way to communicate large amounts of data between processed.
Since version 2.4.x and glibc 2.2 (GNU "C" library), Linux provides open shared memory objects, which is part of the POSIX realtime extensions. This API has the following functions: shm_open() and shm_unlink().
Current POSIX API defines two different timer facilities:
BSD timers: setitimer() and getitimer() functions.
IEEE 1003.1b REALTIME timers: timer_create(), timer_settime(), timer_getoverrun(), etc.
This patch can provide high resolution timers with very low overhead because of two main design issues: the use of several timing and interrupt hardware sources (the old 8254, the Pentiun internal instruction TSC, and the ACPI[4] timers when available), and a clever data structure to maintain the timers.
The patch provides high resolution clocks: CLOCK_REALTIME_HR and CLOCK_MONOTONIC_HR. And the accompanying functions: clock_settime(), clock_gettime(), clock_getres(). Also the POSIX timers functions are implemented: timer_create(), timer_delete(), timer_settime(), timer_gettime() and timer_getoverrun().
POSIX extended the signals generation and delivery to improve the realtime capabilities. Signals take an important role in realtime as the way to inform the processes of the occurrence of asynchronous events like high-resolution timer expiration, fast inter-process message arrival, asynchronous I/O completion and explicit signal delivery.
The main characteristics of this type of signals are:
The range of realtime signals supported by Linux is form 32 (SIGRTMIN) to 63 (SIGRTMAX).
Signals can deliver a small piece of data (an integer or a pointer) to the signal handler (signal-catching function).
Signals that carry information are delivered in chronological FIFO order.
It is possible to automatically create a thread in response to a signal.
Linux fully supports the POSIX realtime signals standard.
Asynchronous I/O (AIO) is the POSIX interface to provide high efficiency asynchronous I/O access. The standard way to access I/O devices (files, drivers, sockets, fifos, etc.) defined by UNIX the read() write() blocking sequence, where a next file access is performed only when the previous request has been completed. AIO mechanism provides the ability to overlap application processing and I/O operations initiated by the application.
A process can start one or more IO requests to a single file or multiples files and continue its execution. Also, a single system call can start a sequence of I/O operation on one or several files, which reduces the overhead due to context switches.
There are two implementations available in Linux: the one provided at the library level by using non-aio system calls (included in the glibc/librt since version 2.1); and the kernel implementation first developed by SGI™ called KAIO (till linux kernel 2.4.0), and now the Linux-AIO which provides this functionality to newer kernels.
Current Linux implementation of POSIX threads (POSIX 1003.1c) is based on the work done by Xavier Leroy, known as LinuxThreads. LinuxThreads is now integrated in the glibc, and distributed as a part of the it. It provides kernel-level threads: threads are created with the clone() system call and all scheduling is done in the kernel. This kind of threading is defined as 1:1, i.e. each thread is mapped to a Linux process.
LinuxThreads implements most of the POSIX API: mutex, condition variables, cancellation, signals, timed calls, etc. The library also provides POSIX semaphores. Mutex do not implement any protocol to prevent priority inversion. Since threads are scheduled by the Linux scheduler, the scheduling policies are the same as in Linux: SCHED_FIFO, SCHED_RR and SCHED_OTHER.
Nowadays, linux offers a sophisticated component for bandwidth management called Traffic Control. This component supports method for classifying, prioritising, and limiting both incoming and out-coming traffic. Therefore, linux can do the following list of things: limit bandwidth for certain computers, help to fairly share bandwidth, protect the Internet from abuses, restrict access, do routing based on user id, MAC address, source IP address ... and so on.
For working with this subsystem, the kernel versions 2.2.x has to be patched, but the versions 2.4.x and uppers implement directly this functioning.
The following figure shows the network subsystem:
There are four components:
Input demultiplexing: Decides if a incoming packets are passed to higher layers or are directly forwarded to the network.
Upper Layers: Processes packets and may also generate new traffic and pass it to the lower layers.
Forwarding: This layer performs the selection of the output interface, the selection of the next hop, encapsulation, etc.
Output Queueing or Traffic Control: This is the most important component and decides if packets are queued or dropped, decides in which order packets are sent, etc.
Once the traffic control releases a packet for sending, the network device driver sends it to the network.
The traffic control component consist of the following elements: queueing disciplines (qdisc), classes (within a queueing discipline), filters and policing
In this way, queueing discipline provides a method to enqueue a packet. A class is the place where packets are stored and processed in a specific way, afterwards, the qdisc selects the following packet for sending from classes. Filters are used by a qdisc to assign incoming packets to one of its classes. And finally, policing is used to ensure that incoming traffic does not exceed certain bounds.
The following picture illustrates an example of traffic control configuration:
This configuration consists of a queuing discipline with two delay priorities, as well as, two classes: the higher class contains a token bucket filter discipline that limits the traffic, while the lower class contains a FIFO qdisc. Therefore, while the higher class has packets for sending (rate < 1Mbps), the priority qdisc selects a packets from this class. The filter decides which packets are sent to the higher class. Once a priority qdisc selects the following packet for sending, the network driver sends it on the network.
In conclusion, the traffic control layer decides whether the packets are queued or dropped, in which order the packets are sent, and finally it may delay packet transmission. Moreover, the traffic control elements can be combined in a modular way to support Differentiated Service (DS), Integrated Service (RSVP), ATM and so on.
The following four sections describe the traffic control elements.
Each network interface has a queue discipline attached with it, which controls how packets are enqueued and treated.
A qdisc is a black box, which is able to enqueue packets and dequeue them using its own algorithm, for example, a CBQ qdisc uses a WRR (Weight Round Robin) scheduling to select the following packet for sending on the network.
Moreover, qdisc are divided into two categories:
Classfull: Qdiscs that may have child qdiscs attached to them.
Leafs: Qdiscs that have not child's.
The available classfull qdiscs are:
PRIO a n-band strict priority scheduler,
CBQ Class Based Queue,
CSZ Clak-Scott-Zhang,
ATM Asynchronous Transfer Model,
DSMARK - DSCP a Diff-Serv Code Point marker and
INGRESS
The available leafs qdiscs are:
FIFO a simple FIFO (it is the default qdisc),
TBF Token Bucket Filter,
RED Random Early Detection,
GRED Generalised Random Early Detection,
TEQL Traffic Equaliser and
SFQ Stochastic Fair Queue.
A class is attached to a qdisc. However, queueing disciplines and classes are intimately tied together; the presence of classes and their semantics are fundamental properties of the queueing discipline. There is only one available class. This is the CBQ class. Note that a CBQ may work as queue discipline or class.
Filters are used to classify packets based on certain properties of them (address IP ...). The supported filters are:
rsvp (use RSVP protocol for classification),
u32 (anything in the header may be used for classification),
fw (use the firewall rules for classification),
route (use routing table for classification decisions) and
tcindex (use the DS field for classification).
Note that the u32 filter is the most advanced filter available and the tcindex filter is used in DiffServ (differentiation services).
The goal of policing is to ensure that traffic does not exceed certain bounds. There are four types of policing mechanisms: policing decisions by filter, refusal to enqueue a packet, dropping a packet from an inner queueing discipline and dropping a packet when en-queuing a new one.
Linux was developed around UNIX and POSIX standards. Therefore, its native API is mostly POSIX compatible.
Also MontaVista™ developed Wind River pSOS® and Wind River VxWorks® two emulation environments. These emulators are designed to ease the port of legacy RTOS code Linux. The emulation libraries are available at Sourceforge.
When scheduling a real-time user-level Linux process, the real-time performance can be affected by bottom halves (BHs) execution, kernel non-preemptable sections, and so on. The kernel latency is a quantity used to measure the difference between the theoretical schedule and the actual one.
The kernel latency is defined as follows: Let T be a real-time process that requires execution at time t, and let t' be the time at which T is actually scheduled; we define the kernel latency experienced by T as L = t' - t.
The biggest source of kernel latency are kernel non-preemptable sections (including BHs and Interrupt Service Routines - ISRs). In fact, non-preemptable sections in the kernel can prevent a high priority task from being scheduled for a very long time (up to 100ms). For example, if interrupts are disabled at time t, task T can only enter the ready queue later when interrupts are re-enabled. In addition, even if T enters the ready queue at the correct time t and has the highest real-time priority in the system, it may still not be scheduled if another task is running in the kernel in a non-preemptable section. In this case, T will be scheduled when the kernel exits the non-preemptable section at time t'.
The length of a kernel non-preemptable section depends on the strategy that the kernel uses to guarantee the consistency of its internal structures, and on the internal organization of the kernel. The standard Linux kernel is based on the classical monolithic structure, in which the consistency of kernel structures is enforced by allowing at most one execution flow in the kernel at any given time. This is achieved by disabling preemption when an execution flow enters the kernel, i.e., when a system call is invoked or when an interrupt fires. When a system call is invoked, the thread that invokes it enters in the kernel and becomes non-preemptable, returning preemptable when the execution exits from the kernel. When an interrupt fires, a short ISR is invoked with interrupts disabled: the ISR acknowledges the hardware interrupt and schedules a BH for execution. The BH will be executed in a non-preemptive way immediately before returning to user mode; hence, if the ISR interrupts a system call, the BH will be executed only after that the system call is completed (system calls can synchronize with ISRs by explicitly disabling interrupts). Summing up, in a standard Linux kernel, the maximum latency is equal to the maximum length of a system call plus the processing time of all the interrupts that fire before returning to user mode. Unfortunately, this value can be as large as 100ms.
Two different approaches can be used to reduce the size of kernel non-preemptable sections: the one used by the Low-Latency patches (Ingo Molnar and Andrew Morton)[LowLat], and the one used by the kernel preemptability patch (MontaVista, TimeSys)[kpreem, TimeSys].
This approach ``corrects'' the monolithic structure by inserting explicit rescheduling points (that effectively are preemption points) inside the kernel. In this approach, when a thread is executing inside the kernel it can explicitly decide to yield the CPU to some other thread. In this way, the size of non-preemptable sections is reduced, thus decreasing the latency. In a low-latency kernel, the consistency of kernel data is enforced by using cooperative scheduling (instead of non-preemptive scheduling) when the execution flow enters the kernel.
This approach is also used by some real-time versions of Linux, such as RED Linux. In a low-latency kernel, the maximum latency decreases to the maximum time between two rescheduling points. Since the low-latency patch has been carefully hand-tuned for quite a long time, it performs surprisingly well.
The preemptable approach, used in most real-time systems, removes the constraint of a single execution flow inside the kernel. Thus it is not necessary to disable preemption when an execution flow enters the kernel. To support full kernel preemptability, kernel data must be explicitly protected using mutexes or spinlocks. The Linux preemptable kernel patch, sponsored by MontaVista, uses this approach and makes the kernel fully preemptable. Kernel preemption is disabled only when a spinlock is held.
A similar approach is used by TimeSys, that uses mutexes instead of spinlocks, and provide priority inheritance. While the MontaVista patch disables preemption when a spinlock is acquired, the TimeSys one is based on blocking synchronization.
In a preemptable kernel, the maximum latency is determined by the maximum amount of time for which a spinlock is held inside the kernel. Again, it is important to note that BHs are serialized using a spinlock, thus they contribute to the latency.
An additional patch (lock-breaking) merges some of the low-latency rescheduling points into the preemptable kernel, for decreasing the amount of time for which spinlocks are held.
We measured the latency for the standard, Low-Latency, preemptable and preemptable with lock-breaking kernels while running different loads in background. All the experiments were performed using the Latency Benchmark tool downloadable from [FT]. The results are shown in the following figures:
From the first figure, it is possible to see that standard Linux exhibits high latencies at the end of the memory stress (a program that reads and writes a large array in memory), during the I/O stress (a program that reads and writes large amount of data on a file), when accessing the /proc file system, and when switching the caps/lock led. The large latency at the end of the memory stress is due to the munmap() system call. Comparing the figures it is possible to see that the low latency kernel solves all the problems except the /proc file system access and the caps/lock switch. On the other hand, the preemptable kernel eliminates the large latency in the /proc fs access, but does not solve the problem with the memory stress, and is not as effective as the low latency kernel in reducing the latency during the I/O stress. Finally, the lock-breaking preemptable kernel seems to provide good real-time performance, but still has some problem during the I/O stress.
By repeating similar experiments for longer amounts of time, we verified that the low-latency kernel is characterized by larger average latencies respect to the preemptable and lock-breaking preemptable kernels, but reduces the worst case latencies. In other words, the tail of the probability distribution function is shorter for a low-latency kernel. As an example, the PDFs of the latency during the I/O stress are reported in the following figures (note that this new experiments were performed on a computer that is faster than the one used for the previous experiments).
According to the previous figures, it seems that the low-latency patch provides better real-time performance. This is probably due to the fact that the Linux kernel still contains ``big spinlocks'' that are held for long amounts of time, and that the lock-breaking patch does not ``break'' properly. A classical example is the spinlock used to serialize BHs. However, in the future Linux developers will likely decrease the size of kernel critical sections (to improve SMP performance), hence it seems to be reasonable to think that the latency of the preemptable and lock-breaking preemptable kernel will decrease.
There are two different versions of RTLinux: RTLinux/Pro and RTLinux/Open.
RTLinux/Open is available on the web. The main (only) developer of this version was FSMLabs and stops its development in 2Q/2001. RTLinux/Pro is available for a free from FSMLabs and the license is non-GPL.
This paper is focused of the RTLinux/Open version.
There are two approaches to provide real-time performance in a Linux system:
Improving the Linux kernel preemption.
Adding a new software layer beneath Linux kernel with full control of interrupts and processor key features.
RTLinux is a small, fast operating system, following the POSIX 1003.13 "minimal realtime operating system" standard.
RTLinux adds a layer of virtual hardware between the standard Linux kernel and the computer hardware (see Figure 8-1). As far as the standard Linux kernel is concerned, this new layer appears to be the actual hardware. RTLinux implements a complete and predictable RTOS with no interference from the non-real-time Linux. The RTLinux threads are executed directly by a fixed-priority scheduler. The whole Linux kernel, and all the normal Linux processes, are managed by the RTLinux scheduler as the background task. This way, it is possible to have a complete general purpose operating system running on top of a small predictable RTOS.
There are three main modifications done to the Linux kernel in order to virtualize the hardware so that RTLinux can take full control of the machine: The RTLinux layer takes direct control of all the hardware interrupts, interrupts that are not controlled by real-time threads are forwarded to the Linux upper level; RTLinux also takes the control of the timer hardware (8254 and APIC when available) and implements a virtual timer to Linux; and the last modification done to Linux to remove the basic control of the hardware from Linux is to replace all the cli and sti (disable and enable interrupt flag) functions calls from the Linux kernel so that Linux can no make a real disable but a virtual interrupt disable. These modifications are quiet complex and tricky, but do not require large code (Linux) modifications.
RTLinux provides an execution environment "below" the Linux kernel. One consequence of this, is that realtime threads can not use Linux services because deadlock or system inconsistencies may happen. To overcome this problem, the realtime system has to be divided into two separated layers: the hard realtime layer, executed on top of RTLinux, and the soft realtime, executed as normal Linux processes. Several mechanisms (FIFO, shared memory) can be used to communicate threads in both layers.
The two layer approach is a useful method to provide hard realtime while having all the features of a desktop operating system. It decouples the mechanism of the realtime kernel from the mechanism of the general purpose Linux kernel so that each system can be optimized independently.
i386, PPC, ARM (StrongARM/iPAQ).
Yes. It is available for i386 architectures.
There are three scheduling policies available: SCHED_FIFO, SCHED_SPORADIC SCHED_EDF. SCHED_FIFO is a fixed priority scheduling and threads with the same priority are scheduled in FIFO order. SCHED_SPORADIC an implementaion of the sporadic server used to run aperiodic activities. SCHED_EDF implements a dynamic priority scheduling policy the EDF ( Earliest Deadline First). Each thread has a fixed priority and a deadline. Threads are sorted by priority, but same priority threads are scheduled according to the EDF policy.
The system provides special system calls to implement periodic threads. pthread_make_periodic_np() and pthread_wait_np()
Minimum and maximum priority is 0 and 1000000 respectively. There is no limit in the number of running threads, but the scheduling cost is proportional to the number of threads. Current scheduler code is designed to handle efficiently a low number of threads (around 10).
It provides all the POSIX Thread termination facilities and also some extensions to remove termitate threads easier.
A thread can terminate itself by calling pthread_exit() function. pthread_join() suspend execution the execution of the calling thread until the target thread terminates. To implement this behavior, when a thread exits, the system do not delete the supporting data until other thread has joined. The system also provides the pthread_detach() function to indicate that the target thread will not be joined and the system support data can be reclaimed as soon as the thread exits.
A thread can request the cancellation (termination) of other thread: pthread_cancel(). The thread that receives the cancellation request, depending on the cancelability state, can do one o the following action:
PTHREAD_CANCEL_DISABLE: The cancel request is ignored.
PTHREAD_CANCEL_DEFERRED: The thread will be canceled but only at some safe points.
PTHREAD_CANCEL_ASYNCHRONOUS:The thread is canceled immediately.
A thread can install cancellation cleanup handlers: pthread_cleanup_push() and pthread_cleanup_pop(). The cleanup handlers are called when the thread exits, is canceled or the handler is removed.
pthread_delete_np() function can be used to termintate inmediately a thread and can be used instead of the pair: pthread_cancel()/pthread_join()
Although RTLinux is designed to run in processors with MMU, all the application threads and the RTLinux kernel run in the same address space. There is no memory protection between threads and the kernel and also between threads themselves.
From the point of view of memory management, RTLinux is the guest operating system of the Linux Kernel. The Linux kernel has the whole control of the memory.
RTLinux do not provide dynamic memory allocation nor use it internally. The main argument is that dynamic memory allocation is not predictable if implemented efficiently. The Real-Time goal of predictability is usually achieved by preallocating most of the resources the threads will use at run time.
It i