WP1 - RTOS State of the Art Analysis

Deliverable D1.1 - RTOS Analysis

Ismael Ripoll

DISCA, Universidad Politecnica de Valencia

Pavel Pisa

Luca Abeni

ReTiS Lab --- Scuola Superiore S. Anna

e-mail: 

Paolo Gai

ReTiS Lab --- Scuola Superiore S. Anna

e-mail: 

Agnes Lanusse

Sergio Saez

DISCA, Universidad Politecnica de Valencia

Bruno Privat


Table of Contents
I. Introduction
1. Introduction
2. POSIX Standard
Introduction
XSI System Interfaces and Extensions
3. OSEK/VDX
Introduction
Architecture of the OSEK/VDX operating system
Task Management
Application modes and system startup
Interrupt processing
Events
Scheduling
Resource Management
Miscellaneous
OSEK COM
OSEK NM
OSEKTime
4. Analyzed features
5. Summary
II. RTOS Analysis
6. Realtime Support in Linux
Scheduling
Virtual Memory
Shared Memory
High Resolution POSIX Timers (HRT)
Realtime Signals
POSIX Asynchronous I/O
POSIX Threads
Quality of Service
Compatibility
7. Low-Latency Patches for Linux
Kernel Latency
Possible Solutions
Evaluation
8. RTLinux/GPL
Architecture overview
Hardware characteristics
Process management
Memory management
Inter-Process communication
Time and timers
Driver programming
Quality of Service
Network
Filesystems
Trace and debug
Miscelanea
9. RTAI
Architecture overview
Hardware characteristics
Process management
Memory management
Inter-Process communication
Time and timers
Driver programming
Quality of Service
Network
Filesystems
Trace and debug
Miscelanea
10. RTEMS 4.5+
Hardware characteristics
Process management
Memory management
Inter-Process communication
Time and timers
Driver programming
Quality of Service
Network
Filesystems
Trace and debug
Miscellaneous
11. QNX
Architecture overview
Hardware characteristics
Process management
Memory management
Inter-Process communication
Time and timers
Driver programming
Quality of Service
Network
Filesystems
Trace and debug
Miscelanea
12. VxWorks 5.x
Hardware characteristics
Process management
Memory management
Inter-Process communication
Time and timers
Driver programming
Quality of Service
Network
Filesystems
Trace and debug
Miscelaneous
13. LynxOS
Hardware characteristics
Process management
Memory management
Inter-Process communication
Time and timers
Driver programming
Quality of Service
Network
Filesystems
Trace and debug
Miscelanea
Modularity
Bibliography
A. GNU Free Documentation License
PREAMBLE
APPLICABILITY AND DEFINITIONS
VERBATIM COPYING
COPYING IN QUANTITY
MODIFICATIONS
COMBINING DOCUMENTS
COLLECTIONS OF DOCUMENTS
AGGREGATION WITH INDEPENDENT WORKS
TRANSLATION
TERMINATION
FUTURE REVISIONS OF THIS LICENSE
How to use this License for your documents
List of Tables
1. Project Co-ordinator
2. Participant List
3. Document Version
10-1. Thread Creation and Deletion calls
11-1. Thread creation and deletion calls
11-2. Synchronization Services
11-3. Communication Services
11-4. Semaphore management primitives
11-5. Mutexes management primitives
11-6. Atomic Operations
11-7. Message Passing API
11-8. Signals range
11-9. Interrupt Handling API
List of Figures
6-1. Network Subsystem
6-2. Traffic Control configuration
7-1. Latency in the Standard Kernel
7-2. Latency in the Low Latency Kernel
7-3. Latency in the Preemptable Kernel
7-4. Latency in the Preemptable Lock-Breaking Kernel
7-5. PDF of the Latency in the Low-Latency Kernel
7-6. PDF of the Latency in the Lock-Breaking Kernel
8-1. RTLinux layer architecture
11-1. QNX architecture
11-2. QNX message passing model

Document presentation

Table 1. Project Co-ordinator

Organisation:

UPVLC

Responsible person:

Alfons Crespo

Address:

Camino Vera, 14. CP: 46022, Valencia, Spain

Phone:

+34 9877576

Fax:

+34 9877579

E-mail: alfons@disca.upv.es

Table 2. Participant List

RoleId.NameAcronymCountry
CO1Universidad Politécnica de ValenciaUPVLCE
CR2Scuola Superiore S. Anna SSSAI
CR3Czech Technical University in Prague CTUCZ
CR4CEACEAFR
CR5UNICONTROLSUCCZ
CR6MNISMNISFR
CR7VISUAL TOOLS S.A.VTE

Table 3. Document Version

ReleaseDateReason of change
1.0November, First Release

I. Introduction


Chapter 1. Introduction

The objective of this workpackage is to make a study of the state of the art of real-time technology which is made available by the research community, and to determine what types of mechanisms actually turn out to be most useful for real-time applications. In concrete, this workpackage will analyse the real-time operating systems (RTOS) features and extract the main characteristics that will be included in the OCERA development.

The project is to produce and integrate prototypes of various innovative real-time techniques, research areas of particular interest include scheduling, resource management, fault-tolerance and communication with real-time constraints, which are identified as the key elements to provide predictable and high performance distributed real-time operating systems.

RTOS is a generic term for a set of operating systems that provide support for real-time applications. There is a wide range of RTOS, from the small and simple enough to fit in a few kilobytes of memory that can run on simple processors, to the high-end range RTOS that provides full graphical user interface that require several megabytes of RAM and powerful processors (MMU, protected mode, etc.).

Originally, Linux was designed to be used in a server or desktop environment. Since then, Linux has evolved and grow to be used in almost all the computer areas, among others, in embedded systems, parallel clusters, realtime systems, etc.

There is a large group of researchers, hackers and companies adding realtime capabilities: reducing the memory requirement, porting Linux to embedded processors, improving the response time, etc. Real Time Linux is just another piece of software developed in a Open-Source development methodology. There is not a single company or research group that concentrates all the development of realtime Linux, but a set of not connected, overlapped or even rival implementations are taking place simultaneously.

The two main contributions of this working package are:

  1. A list of features that are available in commercial RTOS.

  2. A detailed description of the real-time features already implemented in Linux. Features that are included in the Linux kernel by default or that are distributed separately.

To achieve these goals, we have focused our study in a small (compared with the large amount to existing) number of RTOS: VxWorks, QNX, RTEMPS and LynxOS. And also the main Linux kernel and all the extensions to improve the real-time capabilities of Linux like RTLinux, RTAI and the preemptable patch are analysed and compared in this paper.

The rest of this deliverable is organised as follows: In the first part a summary of the POSIX and OSEK standard followed by the list of main features we are interested and are taken into consideration in the RTOS analysis. Several summary tables are presented in the last section of the fist part. In the second part of the deliverable a study of each RTOS is presented.

The conclusions of this deliverable can be found in the deliverable D3.2 "Functionality Not Available in Open RTOS" of the Workpackage 3 "Market Analysis".


Chapter 2. POSIX Standard

Introduction

POSIX, that stands for Portable Operating System Interface, is a standard that is being jointly developed by the IEEE and The Open Group. It defines a standard operating system interface and environment, including a command interpreter (or "shell"), and common utility programs to support applications portability at the source code level. The current revision of POSIX is The Open Group Base Specifications Issue 6 and also the IEEE Std 1003.1-2001.

This standard is composed by four major components:

  • Base Definitions: This include general terms, concepts and interfaces common to entire standard.

  • System Interfaces: This comprises the definitions for system service functions for the C programming language, function and portability issues, error handling and recovery.

  • Shell and Utilities: It contains the definitions for a standard source code-level interface to command interpretation services.

  • Rationale: It contains information that does not fit well into the rest of the document structure.

The IEEE Std 1003.1-2001 standard is a single common revision to IEEE Std 1003.1-1996, IEEE Std 1003.2-1992, and the Base Specifications of The Open Group Single UNIX Specification, Version 2. In order to develope the current revision several base documents has been used. The base documents that are involved in the definition of system interfaces are:

  • IEEE Std 1003.1-1996 (POSIX-1) (incorporating IEEE Stds 1003.1-1990, 1003.1b-1993, 1003.1c-1995, and 1003.1i-1995)

  • The following amendments to the POSIX.1-1990 standard:

    • IEEE P1003.1a draft standard (Additional System Services)

    • IEEE Std 1003.1d-1999 (Additional Realtime Extensions)

    • IEEE Std 1003.1g-2000 (Protocol-Independent Interfaces (PII))

    • IEEE Std 1003.1j-2000 (Advanced Realtime Extensions)

    • IEEE Std 1003.1q-2000 (Tracing)

  • Open Group Technical Standard, February 1997, System Interface Definitions, Issue 5 (XBD5)

  • Open Group Technical Standard, February 1997, System Interfaces and Headers, Issue 5 (XSH5)

  • Open Group Technical Standard, January 2000, Networking Services, Issue 5.2 (XNS5.2)

  • ISO/IEC 9899:1999, Programming Languages - C.

As it can be observed this standard includes support for source portability of applications with realtime requirements, but this support is maintly optional for POSIX-conforming implementations. The specific functional areas included for realtime support and their definitions[1] are basically the following:

  • Semaphores.

    A minimum synchronization primitive to serve as a basis for more complex synchronization mechanisms to be defined by the application program.

  • Process Memory Locking.

    A performance improvement facility to bind application programs into the high-performance random access memory of a computer system. This avoids potential latencies introduced by the operating system in storing parts of a program that were not recently referenced on secondary memory devices.

  • Memory Mapped Files.

    A facility to allow applications to access files as part of the address space.

  • Shared Memory Objects

    An object that represents memory that can be mapped concurrently into the address space of more than one process.

  • Priority Scheduling.

    A performance and determinism improvement facility to allow applications to determine the order in which threads that are ready to run are granted access to processor resources.

  • Realtime Signal Extension.

    A determinism improvement facility to enable asynchronous signal notifications to an application to be queued without impacting compatibility with the existing signal functions.

  • Timers.

    A mechanism that can notify a thread when the time as measured by a particular clock has reached or passed a specified value, or when a specified amount of time has passed.

  • Interprocess Communication.

    A functionality enhancement to add a high-performance, deterministic interprocess communication facility for local communication.

  • Synchronized Input and Output.

    A determinism and robustness improvement mechanism to enhance the data input and output mechanisms, so that an application can ensure that the data being manipulated is physically present on secondary mass storage devices.

  • Asynchronous Input and Output.

    A functionality enhancement to allow an application process to queue data input and output commands with asynchronous notification of completion.

Another optional support that can be interesting for the developemnt of embedded applications is the Threads support. This extension to POSIX defines functionality to support multiple flows of control within a process. These flows of control are called threads and they share their address space and most of the resources and attributes defined in the operating system for the owner process.

The specific functional areas included in threads support are:

  • Thread management: the creation, control, and termination of multiple flows of control that share a common address space.

  • Synchronization primitives optimized for tightly coupled operation of multiple control flows in a common, shared address space.

Finally, the IEEE Std 1003.1-2001 standard also proposes a set of tracing facilities that can be quite useful at the development stage of a embedded real-time application. The tracing facilities defined in the standard allow a process to select a set of trace event types, to activate a trace stream of the selected trace events as they occur in the flow of execution, and to retrieve the recorded trace events. A trace event is a data object that represents an action executed by the system, and that is recorded in a trace stream. The trace events can be retrieved later from the trace stream, allowing the system behaviour analysis.

All these functionalities are not mandotory in a POSIX-conforming implementation, but defined as a set of options that may be supported by that system. In this line, the IEEE Std 1003.1-2001 standard defines several XSI extensions that groups together several of these options in so-called XSI Option Groups. The option groups that can be of interest for embedded real-time applications are described in the next section.


XSI System Interfaces and Extensions

The X/Open System Interface is the core application programming interface for systems conforming to the Single UNIX Specification. This is a superset of the mandatory requirements for conformance to IEEE Std 1003.1-2001.

A system that wants to be a XSI-conforming implementation shall meet the criteria for POSIX conformance and support all functions and headers defined in IEEE Std 1003.1-2001 as part of the XSI extension. Additionally, it shall support the following options defined in the standard: File Synchronization, Memory Mapped Files, Memory Protection, Threads, Thread Process-Shared Synchronization, Thread Stack Address Attribute and Thread Stack Address Size.

Despite of these options that are mandatory to be a XSI-conforming system, the system may also support one or more of the following XSI Option Groups: Encryption, Realtime, Advanced Realtime, Realtime Threads, Advanced Realtime Threads, Tracing, XSI STREAMS and Legacy. All the realtime option groups jointly with the tracing option group are clearly of great interest for developing embedded real-time applications. The options each option groups requires are detailed next:

Realtime

The options of IEEE Std 1003.1-2001 that are grouped together in this option group are: Asynchronous, Synchronized and Prioritized Input and Output, File Synchronization, Memory Mapped Files, Shared Memory Objects, Process and Range Memory Locking, Memory Protection, Semaphores, Timers, Realtime Signals Extension, Message Passing and Process Scheduling.

Advanced Realtime

The options of the standard that are grouped together in the Advanced Realtime option group are: Advisory Information, Clock Selection, Process CPU-Time Clocks, Monotonic Clock, Timeouts, Typed Memory Objects, Spawn and Process Sporadic Server.

Realtime Threads

This option group includes the following options: Thread Priority Inheritance and Protection, and Thread Execution Scheduling.

Advanced Realtime Threads

The Advanced Realtime Threads option group requires the following POSIX options: Thread CPU-Time Clocks, Thread Sporadic Server, Spin Locks and Barriers.

Tracing

This option group includes the following tracing facility options: Trace, Trace Event Filter, Trace Inherit and Trace Log.


Chapter 3. OSEK/VDX

Introduction

OSEK/VDX [2] is a joint project of the automotive industry that aims to the definition of an industry standard for an open-ended architecture for distributed control units in vehicles.

The objective of the standard is to describe an environment which supports efficient utilization of resources for automotive control unit application software. This standard can be viewed as a set of API for real-time operating system (OSEK) integrated on a network management system (VDX) that together describes the characteristics of a distributed environment that can be used for developing automotive applications.

The typical applications that have to be implemented have tight real-time constraints and an high criticality (for example, a power-train application). Moreover, these applications have to be made in a huge number of unit, therefore there is a need to reduce the memory footprint to a minimum enhancing as possible the OS performance.

Here are some keywords that helps to better characterize the philosophy that drove the main architectural choices of the OSEK Operating System:

Scalability.

The operating system is intended for use on a wide range control units (either system with minimal hardware resources like RAM, ROM, CPU time, i.e. 8 bit microcontrollers). To support a wide range of systems the standard defines four conformance classes that tightly specifies the main features of an OS. Note that memory protection is not supported at all.

Portability of software.

The standard specifies an ISO/ANSI-C interface between the application and the operating system that is identical in all the implementations of the OS. The aim of this interface is to give the ability to transfer an application software from one ECU to another ECU without bigger changes inside the application. Due to the wide variety of hardware where the OS has to work in, the standard does not specify any interface for the Input/Output subsystem. Note that this fact reduces (if not prohibits) the portability of the application source code, since the I/O system is one of the main software part that impacts on the architecture of the software. We can say that the prime focus is not to achieve 100% compatibility between the application modules, but to ease their direct portability between compliant operating systems.

Configurability.

Another prerequisite needed to adapt the OS to a wide range of hardware is a high degree of modularity and configurability. This configurability is reflected by the toolchain proposed by the OSEK standard, where some configuration tools help the designer in tuning the system services and the system footprint. Moreover, a language called OIL (OSEK Implementation Language) is proposed to help the definition of a standardized configuration information.

Statically allocated OS.

All the OS objects and features are statically allocated. This fact allow to simplify all the OS: the number of application tasks, resources and services requested are defined at compile time. Note that this approach ease the implementation of an OS capable of running on ROM, and moreover it is completely different from a dynamic approach followed in other OS standards like for example POSIX.

Support for time triggered architectures.

The OSEK Standard provides the specification of OSEKTime OS, a time triggered OS that can be fully integrated in the OSEK/VDX framework.

In the following sections the main features of the OSEK/VDX standard will be analyzed in detail.


Architecture of the OSEK/VDX operating system

The architecture on which an OSEK Operating System is based can be viewed as a traditional fixed priority approach.

Each task in the system can be a basic task (BT) or an extended task (ET) (extended tasks are basic tasks that can react to external asynchronous events).

Every task in the system has assigned a fixed priority (statically assigned at compile time), and the scheduler always selects the higher priority task from the ready task queue. Interrupt service routines typically preempt the running task (except in case the running task uses resources).

To provide support for different features in the Operating system, the various requirements of the application in terms of number of tasks, memory consumption and like are listed in four conformance classes. The compliance of an OSEK OS is always stated with respect to one conformance class. Basically, conformance classes exist to allow partial implementations of the standard along pre-defined lines, creating an upgrade path from classes of lesser functionality to classes of higher functionality with no change to the application tasks.

The conformance classes specifies different requirements for the following attributes:

  • Multiple requesting task activations (only one activation or more than one)

  • Task types (basic tasks only or basic and extended tasks)

  • Number of tasks per priority (one or more than one)

The following conformance classes are defined by the standard :

BCC1

Only basic tasks limited to one activation request per task and one task per priority, while all tasks have different priorities.

BCC2

Like BCC1, plus more than one activation request per task and more than one task per priority.

ECC1

Like BCC1, plus extended tasks.

ECC2

Like ECC1, plus more than one task per priority and multiple requesting of task activation allowed for basic tasks.


Task Management

In the OSEK OS, a task provides the framework for the concurrent and asynchronous execution of functions. The Scheduler is then responsible for scheduling tasks following a well defined scheduling algorithm.

The OSEK operating system provides two kind of tasks: basic tasks and extended tasks. The only difference between the two concepts is that extended tasks are allowed to use the operating system call WaitEvent(). Basically that call allow an extended task to release the CPU waiting for an asynchronous event without terminating the current instance.

Each task in the system has assigned a fixed priority (statically assigned at compile time; the value 0 is defined as the lowest priority of a task), and it can be preemptive or non-preemptive. If the running task is preemptive the scheduler always made a preemption when needed, otherwise it reschedules the system at the end of the running task instance. A preemptive task can disable preemption for a while locking a resource called RES_SCHEDULER.

In any moment of its life a task is characterized by its state. The OSEK standard defines four task states:

running

In the running state, the CPU is assigned to the task, so that its instructions can be executed. Only one task can be in this state at any point in time, while all the other states can be adopted simultaneously by several tasks.

ready

All functional prerequisites for a transition into the running state exist, and the task only waits for allocation of the processor. The scheduler decides which ready task is executed next.

waiting

A task cannot continue execution because it has to wait for at least one event. Only Extended tasks can jump into this state (because they are the only that can use events).

suspended

In the suspended state the task is passive and can be activated.

Note that basic tasks have no waiting state: a basic task can only represent a synchronization point at the beginning and at the end of the task. Application parts with internal synchronization points have to be implemented by more than one basic task. An advantage of extended tasks is that they can handle a coherent job in a single task, no matter which synchronization requests are active. Whenever current information for further processing is missing, the extended task switches over into the waiting state. It exits this state whenever corresponding events signal the receipt or the update of the desired data or events.

Depending on the conformance class a basic task can be activated once or multiple times. The latter means that an activation issued when a task is not in the suspended state will be recorded and then executed when the task will finish the current instance.

The termination of a task instance only happen when a task terminates itself (to simplify the OS, no explicit task kill primitives are provided).


Application modes and system startup

The OSEK Operating system gives a support for Application Modes. In real applications, an embedded system may execute different applications in a mutually exclusive way (for example, the normal operation, a factory test, and like). The application mode is a means to structure the software running in the system according to those different conditions and are a clean mechanism for development of totally separate systems. Once the operating system has been started, it is not allowed to change the application mode. Typically each application mode uses its own subset of tasks, ISRs, alarms and timing conditions, although if some kind of sharing between modes is possible.

The start up performance is another safety critical issue for embedded system in automotive applications since reset conditions may occur during normal operation (for example, a power-train application should be capable of rebooting the whole system in a few microseconds, because the system must safely control the spark on the engine cylinders). The system startup is completely left to the particular implementation, although if some hint is given on how design the boot-up sequence. In any case the standard suggest the avoidance of lengthy or complicated starting procedures.


Interrupt processing

Since the standard must be suitable for different microcontrollers, the specification of interrupt handling routines only cover the general approach that a compliant OS should follow, without coping with any hardware related issues.

In particular, the standard provides two kind of ISR handlers:

ISR category 1

The ISR does not use an operating system service. In practice, the OS does not handle these interrupts, and the designer is free to write his handler, with the only restriction that he can not call any OS service. Typically, these are the fastest highest priority interrupts.

ISR category 2

The ISR is handled by the system, so OS calls can be called from the handler.

Inside any ISR no rescheduling will take place. Rescheduling takes place on termination of the ISR category 2 if a preemptable task has been interrupted and if no other interrupt is active. At the end of the ISR category 1 no rescheduling takes place too, and this is the reason because ISR category 1 should have the highest priority in a correct design.


Events

The event mechanism is only provided for extended tasks and can be used to communicate binary information that synchronize these tasks on asynchronous events. Each extended task owns a set of events, that can be triggered by other (basic and extended) tasks or by ISR of category 2.

The typical behavior of an extended task is to wait for asynchronous events calling the OS service WaitEvent(). This service usually blocks the task until an event arrives. After servicing the event, the task calls again WaitEvent() to wait other events.

Events can be set only if the task is not in the suspended state. This seems to suggest that an extended task should never be in the suspended state.


Scheduling

The scheduler decides on the basis of the task priority which is the next of the ready tasks to be transferred into the running state (dynamic priority management is not supported). Tasks on the same priority level are started depending on their order of activation.

The OSEK standard provides four flavors of fixed priority scheduling, outlined below:

Full Preemptive Scheduling

Full preemptive scheduling means that the running task may be rescheduled at any instruction by the arrival of high priority tasks.

Non Preemptive Scheduling

Non preemptive scheduling means that task switching is only performed via one of a selection of explicitly defined system services (like task termination, explicit call to the scheduler and arrival of an event that wakes up an extended task).

Mixed Preemptive Scheduling

Since preemptiveness is a task attribute, preemptive and non-preemptive tasks can be mixed in the same application. The running task will influence the policy really used.

Task Grouping Using Internal Resources

This scheduling policy is very similar to the preemption threshold technology, where threshold values are implemented using the OSEK Priority Ceiling protocol together with internal resources locked and unlocked at the start and at the end of every task instance.


Resource Management

The standard provides support for binary resources that can be used to implement critical sections. Priority inversion and deadlock are avoided using a variant of the SRP called OSEK Priority Ceiling.

The protocol in fact is a version of SRP adapted to fixed priority:

  • every resource as assigned a ceiling that is the maximum priority of the tasks (and ISRs) that use the resource;

  • when a task requires a resource, its current priority is raised to the ceiling of the resource;

  • when a task releases a resource, the priority of this task is reset to the priority which was dynamically assigned before requiring that resource.

The normal properties of SRP applies to the protocol. In particular, Priority inversion, chained blocking and deadlocks are avoided. Moreover, there is no need for waiting queues, since a task can be scheduled only when all the resources it needs are free.

Resources are typically used by task only. In the OSEK standard, resources can be used either by a task or by an ISR of category 2. An ISR that use a resource can be thought as an high priority task: its execution can be delayed due to lower ISRs or tasks accessing resources with ceiling greater or equal than the IRS priority. This is the natural behavior in those systems where tasks activations and priorities are mapped on interrupts, and the raising of task's priorities is done with a proper programming of the interrupt controller.

The OSEK standard also provides a support for Preemption Thresholds through the use of internal resources. An internal resource is simply a resource that is locked when a task instance starts, and is unlocked when the task instance ends. The ceiling of the internal resources can be thought as the Preemption Threshold of the tasks.

In the same way the standard provides a special resource called RES_SCHEDULER that can be used to disable preemption. In practice, the RES_SCHEDULER is a resource with ceiling equal to the maximum priority in the system. In the same way a non preemptive task can be thought as a task that use an internal resource with the same ceiling of RES_SCHEDULER.

Finally note that, although a technique similar to Preemption Threshold is used, stack sharing between tasks of a same Non Preemption Group can not be exploited due of the OS calls WaitEvent() and Schedule() [3]. In fact, these calls releases the internal resource taken by a task, letting execution to more than one task in the same Non Preemption Group.


Miscellaneous

The OSEK operating system provides services for processing recurring events (for example, timers that provide an interrupt at regular intervals, or encoders at axles that generate an interrupt in case of a constant change of an angle). These events are recorded into implementation dependent counters, then used by software alarms. When an alarm (that can be one-shot or periodic) fires, a task can be activated, or an event can be set, or finally an alarm-callback routine can be called. Alarms and counters are statically defined at compile time. The only dynamic parameters that can be set are when an alarm has to expire and the period of a cyclic alarm.

To ease the tracing and the debugging of the system the OSEK standard provides system specific hook routines to allow user-defined actions within the OS internal processing. These hook routines are called by the operating system and they are composed by user code that is executed into an OS primitive, usually with ISR of category 2 disabled. These routines are only allowed to use a subset of API functions (mainly they can use functions for get internal OS states, to ease the tracing of the application). They are called at system startup, at system shutdown, before and after a preemption, and in case of an error. In particular, two different kinds of errors are distinguished:

Application errors

The operating system could not execute the requested service correctly, but assumes the correctness of its internal data.

Fatal errors

The operating system can no longer assume correctness of its internal data. In this case the operating system calls the centralized system shutdown.

The standard gives two ways of handling errors: a centralized way (using an Error Hook that is called every time an error occurs in a system primitive), and a decentralized way (where the application code must check itself for the correctness of the return value of every primitive).


OSEK COM

The OSEK standard comprises also an agreement on interfaces and protocols for in-vehicle communication called OSEK COM. The term in-vehicle communication means both communication between nodes and internal communication in a node of the whole vehicle. The basic idea is to provide a standardized API for software communication that is independed from the particular communication media used in a way to ease porting of applications between different hardwares.

The OSEK COM standard is composed by:

  • An Interaction layer which provides communication services for the transfer of application messages.

  • A Network layer which provides services for the unacknowledged and segmented transfer of application messages. The network layer provides flow control mechanisms to enable interfacing of communication peers featuring different level of performance and capabilities.

  • A Data link layer interface which provides services for the unacknowledged transfer of individual data packets over a network to the layers above.

OSEK COM provides a rich set of communication facilities but it is likely that many applications will only require a subset of this functionality. For that reason, the standard defines a set of conformance classes to enable the integration of OSEK COM in systems featuring various levels of capabilities in a scalable way, enabling the car producer to integrate software parts produced by different suppliers.

OSEK COM defines these levels as Communication Conformance Classes (CCCs). The main purpose of the conformance classes is to ensure that applications which have been for a particular conformance class are portable across different OSEK implementations and ECUs featuring that same or higher level of communication functionality. OSEK COM defines five communication conformance classes to provide support from ECU internal communication only (CCCA) up to inter-ECU external communication (CCC2).

For an OSEK implementation to be compliant, message handling for intra processor communication has to be offered. The minimum functionality required is CCCA as described in the OSEK COM specification.


OSEK NM

The OSEK standard also cover a standardization of basic and non-competitive infrastructure between the various embedded systems that can be present in a vehicle. In fact, very often electronic control units made by different manufacturers are networked within vehicles by serial data communication links.

For that reason the standard propose a Network Management system (OSEK NM) that provides standardised features which ensure the functionality of inter-networking by standardised interfaces.

The essential task of NM is to ensure the safety and the reliability of a communication network. This is obtained implementing access restriction to each node (access must be restricted only from authorized entities), keeping the whole network tolerant to faults, and implementing diagnostic features capable of monitoring the status of the network in an indirect (monitoring application messages) or in a direct way (monitoring by dedicated NM communication using token principle).

Moreover, the network management also cover the initialization of network resources, network configuration the co-ordination of global operation modes (e.g. network wide sleep mode), and a support for diagnosis.


OSEKTime

The OSEK Standard produced a specification for a Time Triggered Operating System (OSEKtime OS) that aims to represent a uniform functioning environment for single processor distributed embedded control units with a fault-tolerant communication layer.

The OSEKtime operating system supports static scheduling and offers all basic services for real-time applications, i.e., interrupt handling, dispatching, system time and clock synchronization, local message handling, and error detection mechanisms.

The OSEKtime operating system serves as a basis for application programs which are independent of each other, and provides their environment on a processor. The are two types of entities: interrupt services routines managed by the operating system and time triggered tasks.

The Osek Operating system can coexist with a OSEKTime OS, for handling both time triggered and event-driven computations on the same Embedded Control Unit. Basically, the interface of the OSEK OS remains the same (apart from some small changes in the startup/shutdown procedures). The main concept is that the OSEKTime OS assigns its idle time to the OSEK OS.

The OSEK OS, its tasks and its interrupts have always a lower priority than the similar entities in the OSEKTime OS. Non-preemptive tasks remains non-preemptive only in the OSEK/VDX domain, and they can be preempted bt the Time Triggered dispatcher and by the Time Triggered Tasks.


Chapter 4. Analyzed features

One of the first conclusions achieved in the first project meeting was then importance of the standards, and in particular the realtime extensions defined in the POSIX standard. POSIX is a mature, well developed, independent set of standards then are followed by most of the UNIX industry. Also the use of already existing standards in the open source community is a must.

OSEK is a RTOS specification designed to fulfil the requirement of the automotive industry. It was designed for systems with small hardware resources like 8 bit processes with no MMU. Although there are some ports of Linux to small processors like MC68000 family, Linux and RTLinux are specially designed for mid-range to high range processors. This is why we decided to select POSIX as the reference standard used in this analysis.

POSIX stands for stands for Portable Operating System Interface, and is an IEEE standard designed to facilitate application portability. It is the codification and standardisation of the common core of UNIX™ APIs, and realtime OS. A detailed list and status of all the sub-standards that are part of POSIX can be found in the PACS web page. The subset of POSIX standards that are related to real time or embedded systems are listed below:

  • 1003.1b: Realtime Extensions (IEEE Approved)

  • 1003.1c: Threads (IEEE Approved)

  • 1003.1d: Additional Realtime Extensions (IEEE Approved)

  • 1003.1j: Advanced Realtime Extensions (IEEE Approved)

  • 1003.1q: Tracing (IEEE Approved)

  • 1003.5: Ada binding to 1003.1 (IEEE Approved)

  • 1003.5a: Ada Update (IEEE Reaffirmed)

  • 1003.5b: Ada Realtime (IEEE Reaffirmed)

Following is the list of features that has been studied in every analyzed RTOS:

  • General overview and architecture. The internal architecture of the RTOS determines the overall performance and the type of environments where it can be applied. For example, a RTOS with protected address space for processes will have longer context switches times than that with a flat memory space.

  • Hardware support

    • Processors supported.

    • Multiprocessor support.

  • Process management

    • Weighted and/or lightweight processes. Lightweight processes (threads) is an efficient method to implement concurrent systems due to the efficient resource management and fast context switching. The POSIX thread standard define many realtime features.

    • Scheduling policy. POSIX define four scheduling policies: SCHED_FIFO, SCHED_RR, SCHED_SPORADIC and SCHED_OTHER. What kind of scheduling policies provide for hard and soft tasks?

    • Periodic tasks. Most of the scheduling theory was developed to provide support to periodic tasks. POSIX do not provide specific system calls to implement periodic activities; it has to be implemented by mean of timers and signals. Do the system provide any method to implement periodic tasks?

    • Range of priorities. POSIX require at least 32 priorities.

    • Thread creation and cancellation. POSIX standard provides a lot of facilities to deal the concurrency problems related to the termination of the running tasks. Do the RTOS provide a way to start and terminate threads in a safe way?

  • Memory management

    • Memory protection. This is a mayor issue in realtime, since it is a key feature to provide fault tolerance.

    • Dynamic memory allocation. Fixed block size memory allocation? do provide the standard malloc() and free() functions?

  • Inter-process communication

    • Semaphores. The most common and versatile synchronisation primitive.

    • Mutex. Mutexes are a special type of semaphore used to protect critical sections. The main difference with normal semaphores is that the thread that locks the mutex becomes the owner of the mutex, and this thread is the only one that can unlock it. Which type of mutex are available (spin-lock, recursive, etc.)?

    • Priority inversion control. Do provide some type of priority inversion control in concurrency control primitives?

    • Message queues. Real time extensions of POSIX include a message queue interface.

    • Mailboxes

    • Shared memory Since normal weight processes are executed in a protected memory space and threads (lightweight processes) are executed in a single common memory space. This functionality is only applicable to systems with separate memory spaces.

    • Signals (POSIX signals)

  • Time and timers

    • Time facilities. How the time is handled is one of the most important issues in a RTOS: Absolute/relative delays, time resolution, how timers are managed, configurable timer resolution, types of clocks, etc.

    • Facilities to add new hardware timers. Embedded boards are usually equipped with special timer and watchdog hardware. Do the RTOS provide some utility to use it?

  • Driver programming (low level programming)

    • Interrupts. How interrupts are handled? What facilities provide the RTOS to write interrupt handlers?

    • System facilities to do IO. Embedded systems usually have some custom devices. The RTOS has to provide some facilities to allow the engineer to write its own device drivers.

  • Quality of Service

  • Network

    • Available protocols.

  • Filesystems

    • Filesystem support.

    • Reservation features.

  • Trace and debug

    • Program debugging. Debugging and trace utilities are fundamental features that a RTOS must provide. There are two characteristics that stress the need of these type of tools: on one hand, embedded systems use to have limited communication capabilities (sometime only a serial line); on the other hand realtime systems should contain no bugs, this is achieved by means of a careful design and extensive debugging.

    • Event/timing debugging. Timing correctness is as important as logical correctness in realtime systems. The RTOS must provide timing information about the execution of the application. Recently, POSIX.1q added tracing facilities that allow the user to trace system and user events.

  • Miscelanea

    • Graphic support.

    • Development environment. Most RTOS, commercial and open source, use GNU development utilities: gcc, binutils, gdb, etc.

    • Programming languages. The "C" language is available in all the RTOS, but more and more other languages like ADA or C++ are also used.

    • Compatibility with other RTOS APIS. Like pSOS, VxWorks, OSEK, self propietary API.


Chapter 5. Summary

Hardware

Multi-processor

Scheduling

Concurrency

Linux

Alpha, ARM, i386, MIPS, PowerPC, Sparc, SuperH, Etrax, m68k, PA-RISC

Yes

SCHED_FIFO, SCHED_RR, SCHED_OTHER

UNIX-processes &

Pthreads

RTLinux

i386, PPC, ARM

Yes

SCHED_FIFO

Pthreads

RTAI

i386, MIPS, PPC, ARM, m68k-nommu

Yes

Fixed priority

Lightweight processes

RTEMS 4.5+

M68k, ColdFire, SuperH, i386, MIPS, i960, PowerPC, SPARC, AMD A29K. PA-RISC

Static allocation

SCHED_FIFO, SCHED_RR, SCHED_OTHER

Pthreads

QNX

I386, ARM, MIPS, PowerPC, SuperH

Yes

SCHED_FIFO, SCHED_RR, SCHED_OTHER

UNIX-processes, Lightweight processes, Pthreads

VxWorks 5.x

m68k/CPU32/ColdFire/PowerPC, i386, ARM, SuperH, MIPS

Optional

Fixed priority, Rodun-robin

Lightweight processes

LynxOS

i386, m68k, PowerPC, ARM, SPARC, PA-RISC

Yes

FIFO, Priority Quantum, Round-Robin, Non-preemptive

UNIX-processes &

Pthreads

Priorities

lower- higher

Memory protection

Dynamic memory

Inter-process communication

Linux

(0-100)

Yes

Yes

Semaphores, Mutexes, Condition-var., shared-mem, signals, pipes.

RTLinux

(0-1000000)

No

No

Semaphores, Mutexes, Condition-var., FIFO

RTAI

(0x3fffFfff-0)

No

Yes

Semaphores, Mutexes, Condition-var., FIFO, Mailbox, shared-mem, net_rpc, Pqueues.

RTEMS 4.5+

(255-1)

(254-1) Posix

No

Yes

Semaphores, Mutexes, Condition-var., Pqueues, Events.

QNX

(0-31)

Yes

Yes

Semaphores, Mutexes, Condition-var., Barriers, Atomic operations, rd/wr-locks, Pqueues, shared-mem, FIFO.

VxWorks 5.x

(255-0)

No

Optional

Semaphores, Mutexes, Message, RTSignals,

LynxOS

(0-255)

Yes

Yes

Semaphores, Mutexes, shared-mem, Pqueues, signals, pipes, Condition-Var.

Priority inversion control

Time

resolution

Timers

Low level programming

Linux

None

Configurable (HighResTimers)

POSIX timers

No interrupt programming.

RTLinux

Immediate ceiling

Hardware dependant

None

Full control HW

RTAI

Inheritance

Hardware dependant

None

Full control HW

RTEMS 4.5+

Inheritance &

Immediate ceiling

Hardware dependant

POSIX timers

Full control HW

QNX

Immediate ceiling

Configurable

POSIX timers

Interrupts can be handled by user processes.

VxWorks 5.x

Inheritance

Configurable

Watchdog timers, POSIX timers

Full control HW

LynxOS

Inheritance

Configurable

POSIX timers

POSIX-style threads of execution within the kernel for interrupt handling.

QoS

Network

Filesystem

Linux

FIFO, CBQ, CSZ, ATM, PRIO, RED, SFQ, TLE, TBF, GRED, Diffserv, Ingress, RSVP

IP, UDP, TCP, SLIP, PPP, ICMP, DHCP, RARP, RARP, TFTP, RPC, FTP, HTTP

ReiserFS, ext2, ext3, NFS, CIFS, ADFS, FAT, VFAT, NTFS, CRAMFS, ISO9660, MINIX, QNX4, ROM, JFS, XFS, Flash

RTLinux

None

None

None

RTAI

None

IP, UDP

None

RTEMS 4.5+

None

IP, UDP, TCP, SLIP, PPP, ICMP, DHCP, RARP, TFTP, RPC, FTP, HTTP, CORBA

IMFS, DOSFS/FAT

QNX

None

IP, UDP, TCP, ARP, ICMP, IGMP, QNET

RAMFS, QNX4, DOS, ISO9660, ext2, NFS, CIFS, Flash

VxWorks 5.x

None

TCP/IP IP, UDP, TCP, IGMP, ICMP, ARP RIP 1/2 SLIP, CSLIP, PPP BOOTP, DNS, DHCP, TFTP FTP, RLOGIN, RSH, TELNET

FAT, NFS, raw, TrueFFS

LynxOS

None

IP, UDP, TCP, ICMP, IGMP, ARP, RARP, DHCP, NAT, RPC, NTPv3, Raw, Zebra routing, TFTP

Lynx Fast File system, ISO9660, NFS, RAMFS.

Debug

Languages

API compatibilitiy

Linux

GDB, DDD,

Insight,

System debugg, LTT

C, C++, ADA, Java,

POSIX 1003.1,

VxWorks, pSOS

RTLinux

Simple trace, GDB

C, C++

POSIX 1003.1c

RTAI

KGDB

C

Custom,

POSIX 1003.1b

RTEMS 4.5+

GDB, DDD,

Debug over: ethernet, serial, BDM

C, C++, ADA

RTEID/ORKID,uITRON 3.0, POSIX 1003.1b

QNX

GDB, memory overrruns.

C, C++, Java

POSIX 1003.1, POSIX 1003.1b

VxWorks 5.x

GDB, Debug over: ethernet, serial, WindView, Trace

C, C++

Propietary (VxWorks),

POSIX 1003.1, POSIX 1003.1b

LynxOS

GDB, Insight, Debug over: ethernet, serial

C, C++, ADA

Propietary, POSIX.1/.1b/.1c, Unix BSD 4.3. ABI compatibility with Linux 2.4


Chapter 6. Realtime Support in Linux

Linux is a full-featured UNIX® implementation. The main design criteria of the Linux kernel is the throughput while realtime and predictability is not an issue. The main handicap to consider Linux as a realtime system is that the kernel is not preemptable; that is, while the processor executes kernel code, no other process or event can preempt kernel execution.

Although Linux is not a realtime system, it has some features, already included in the mainstream source code or distributed as patch files, designed to provide realtime to Linux. These are the features described in this section.

From the programmer point of view, there are two main programming paradigms to build a realtime application: weight-processes (normal UNIX processes) and lightweight-processes (known as threads or LWP). Linux provides support for both execution environments, mostly based on POSIX standards 1003.b and 1003.c respectively.

Most of the realtime extensions are not included into the standard "C" library "libc". These system calls are implemented and distributed in in the GNU "C" library (glibc) but are located in a separate library file called "librt". Therefore, to compile a program that makes use of realtime features, the compiler must be invoked flag the "-lrt".


Scheduling

From the very first versions of Linux, the scheduler was realtime POSIX compatible. It supports, among others, the fixed priority (SCHED_FIFO) policy, which is the base feature to build a realtime systems. It also provides the POSIX required: SCHED_RR and SCHED_OTHER. The range of priorities is [0..99].

A lot of work has been done to improve the performance of the scheduler through a careful design which yield to a new scheduler code and structure. Next stable Linux kernel (2.4.19), as well as unstable kernel development (2.5.x), will replace the old scheduler code with an improved "O(1) scheduler" developed by Ingo Molnar . This new scheduler is able to manage a large number of processes with no overhead degradation.


Virtual Memory

It is not possible to build realtime applications on a system with virtual memory. The random and long delays introduced when RAM is exhausted and swapping is required is intolerable in a realtime system. Linux provides the mlock() and mlockall() functions that disables paging for the specified range of memory, or for the whole process respectively. Therefore, all the "locked" memory will stay in RAM until the process exits or unlocks the memory.

mlock() and mlockall() are included in the POSIX realtime extensions.


Shared Memory

One of the main communications paradigms used in realtime applications is shared memory.

Linux processes can share memory with each other and with drivers the POSIX.1b call mmap(). This function can be used both, to map into main memory a regular file and to map shared memory objects. When multiple processes map the same memory object (or file), they share access to the underlying data, which is a efficient way to communicate large amounts of data between processed.

Since version 2.4.x and glibc 2.2 (GNU "C" library), Linux provides open shared memory objects, which is part of the POSIX realtime extensions. This API has the following functions: shm_open() and shm_unlink().


High Resolution POSIX Timers (HRT)

Current POSIX API defines two different timer facilities:

  • BSD timers: setitimer() and getitimer() functions.

  • IEEE 1003.1b REALTIME timers: timer_create(), timer_settime(), timer_getoverrun(), etc.

Linux provides the BSD POSIX timers with a timing resolution around 10ms, which is clearly not suitable of realtime applications. The Hight Resolution Timers (HRT) project, sponsored by MontaVista, provides microsecond resolution with lower overhead following the IEEE 1003.b POSIX API. It is distributed as a patch file which can be downloaded from Sourceforge. Current distribution works with Linux kernel 2.4.18.

This patch can provide high resolution timers with very low overhead because of two main design issues: the use of several timing and interrupt hardware sources (the old 8254, the Pentiun internal instruction TSC, and the ACPI[4] timers when available), and a clever data structure to maintain the timers.

The patch provides high resolution clocks: CLOCK_REALTIME_HR and CLOCK_MONOTONIC_HR. And the accompanying functions: clock_settime(), clock_gettime(), clock_getres(). Also the POSIX timers functions are implemented: timer_create(), timer_delete(), timer_settime(), timer_gettime() and timer_getoverrun().


Realtime Signals

POSIX extended the signals generation and delivery to improve the realtime capabilities. Signals take an important role in realtime as the way to inform the processes of the occurrence of asynchronous events like high-resolution timer expiration, fast inter-process message arrival, asynchronous I/O completion and explicit signal delivery.

The main characteristics of this type of signals are:

  • The range of realtime signals supported by Linux is form 32 (SIGRTMIN) to 63 (SIGRTMAX).

  • Signals can deliver a small piece of data (an integer or a pointer) to the signal handler (signal-catching function).

  • Signals that carry information are delivered in chronological FIFO order.

  • It is possible to automatically create a thread in response to a signal.

Linux fully supports the POSIX realtime signals standard.


POSIX Asynchronous I/O

Asynchronous I/O (AIO) is the POSIX interface to provide high efficiency asynchronous I/O access. The standard way to access I/O devices (files, drivers, sockets, fifos, etc.) defined by UNIX the read() write() blocking sequence, where a next file access is performed only when the previous request has been completed. AIO mechanism provides the ability to overlap application processing and I/O operations initiated by the application.

A process can start one or more IO requests to a single file or multiples files and continue its execution. Also, a single system call can start a sequence of I/O operation on one or several files, which reduces the overhead due to context switches.

There are two implementations available in Linux: the one provided at the library level by using non-aio system calls (included in the glibc/librt since version 2.1); and the kernel implementation first developed by SGI™ called KAIO (till linux kernel 2.4.0), and now the Linux-AIO which provides this functionality to newer kernels.


POSIX Threads

Current Linux implementation of POSIX threads (POSIX 1003.1c) is based on the work done by Xavier Leroy, known as LinuxThreads. LinuxThreads is now integrated in the glibc, and distributed as a part of the it. It provides kernel-level threads: threads are created with the clone() system call and all scheduling is done in the kernel. This kind of threading is defined as 1:1, i.e. each thread is mapped to a Linux process.

LinuxThreads implements most of the POSIX API: mutex, condition variables, cancellation, signals, timed calls, etc. The library also provides POSIX semaphores. Mutex do not implement any protocol to prevent priority inversion. Since threads are scheduled by the Linux scheduler, the scheduling policies are the same as in Linux: SCHED_FIFO, SCHED_RR and SCHED_OTHER.


Quality of Service

Nowadays, linux offers a sophisticated component for bandwidth management called Traffic Control. This component supports method for classifying, prioritising, and limiting both incoming and out-coming traffic. Therefore, linux can do the following list of things: limit bandwidth for certain computers, help to fairly share bandwidth, protect the Internet from abuses, restrict access, do routing based on user id, MAC address, source IP address ... and so on.

For working with this subsystem, the kernel versions 2.2.x has to be patched, but the versions 2.4.x and uppers implement directly this functioning.

Network subsystem overview

The following figure shows the network subsystem:

Figure 6-1. Network Subsystem

There are four components:

  • Input demultiplexing: Decides if a incoming packets are passed to higher layers or are directly forwarded to the network.

  • Upper Layers: Processes packets and may also generate new traffic and pass it to the lower layers.

  • Forwarding: This layer performs the selection of the output interface, the selection of the next hop, encapsulation, etc.

  • Output Queueing or Traffic Control: This is the most important component and decides if packets are queued or dropped, decides in which order packets are sent, etc.

Once the traffic control releases a packet for sending, the network device driver sends it to the network.

Traffic Control overview

The traffic control component consist of the following elements: queueing disciplines (qdisc), classes (within a queueing discipline), filters and policing

In this way, queueing discipline provides a method to enqueue a packet. A class is the place where packets are stored and processed in a specific way, afterwards, the qdisc selects the following packet for sending from classes. Filters are used by a qdisc to assign incoming packets to one of its classes. And finally, policing is used to ensure that incoming traffic does not exceed certain bounds.

The following picture illustrates an example of traffic control configuration:

Figure 6-2. Traffic Control configuration

This configuration consists of a queuing discipline with two delay priorities, as well as, two classes: the higher class contains a token bucket filter discipline that limits the traffic, while the lower class contains a FIFO qdisc. Therefore, while the higher class has packets for sending (rate < 1Mbps), the priority qdisc selects a packets from this class. The filter decides which packets are sent to the higher class. Once a priority qdisc selects the following packet for sending, the network driver sends it on the network.

In conclusion, the traffic control layer decides whether the packets are queued or dropped, in which order the packets are sent, and finally it may delay packet transmission. Moreover, the traffic control elements can be combined in a modular way to support Differentiated Service (DS), Integrated Service (RSVP), ATM and so on.

The following four sections describe the traffic control elements.

Queueing discipline

Each network interface has a queue discipline attached with it, which controls how packets are enqueued and treated.

A qdisc is a black box, which is able to enqueue packets and dequeue them using its own algorithm, for example, a CBQ qdisc uses a WRR (Weight Round Robin) scheduling to select the following packet for sending on the network.

Moreover, qdisc are divided into two categories:

  • Classfull: Qdiscs that may have child qdiscs attached to them.

  • Leafs: Qdiscs that have not child's.

The available classfull qdiscs are:

  • PRIO a n-band strict priority scheduler,

  • CBQ Class Based Queue,

  • CSZ Clak-Scott-Zhang,

  • ATM Asynchronous Transfer Model,

  • DSMARK - DSCP a Diff-Serv Code Point marker and

  • INGRESS

The available leafs qdiscs are:

  • FIFO a simple FIFO (it is the default qdisc),

  • TBF Token Bucket Filter,

  • RED Random Early Detection,

  • GRED Generalised Random Early Detection,

  • TEQL Traffic Equaliser and

  • SFQ Stochastic Fair Queue.

Classes

A class is attached to a qdisc. However, queueing disciplines and classes are intimately tied together; the presence of classes and their semantics are fundamental properties of the queueing discipline. There is only one available class. This is the CBQ class. Note that a CBQ may work as queue discipline or class.

Filters

Filters are used to classify packets based on certain properties of them (address IP ...). The supported filters are:

  • rsvp (use RSVP protocol for classification),

  • u32 (anything in the header may be used for classification),

  • fw (use the firewall rules for classification),

  • route (use routing table for classification decisions) and

  • tcindex (use the DS field for classification).

Note that the u32 filter is the most advanced filter available and the tcindex filter is used in DiffServ (differentiation services).

Policing

The goal of policing is to ensure that traffic does not exceed certain bounds. There are four types of policing mechanisms: policing decisions by filter, refusal to enqueue a packet, dropping a packet from an inner queueing discipline and dropping a packet when en-queuing a new one.


Compatibility

Linux was developed around UNIX and POSIX standards. Therefore, its native API is mostly POSIX compatible.

Also MontaVista™ developed Wind River pSOS® and Wind River VxWorks® two emulation environments. These emulators are designed to ease the port of legacy RTOS code Linux. The emulation libraries are available at Sourceforge.


Chapter 7. Low-Latency Patches for Linux

Kernel Latency

When scheduling a real-time user-level Linux process, the real-time performance can be affected by bottom halves (BHs) execution, kernel non-preemptable sections, and so on. The kernel latency is a quantity used to measure the difference between the theoretical schedule and the actual one.

The kernel latency is defined as follows: Let T be a real-time process that requires execution at time t, and let t' be the time at which T is actually scheduled; we define the kernel latency experienced by T as L = t' - t.

The biggest source of kernel latency are kernel non-preemptable sections (including BHs and Interrupt Service Routines - ISRs). In fact, non-preemptable sections in the kernel can prevent a high priority task from being scheduled for a very long time (up to 100ms). For example, if interrupts are disabled at time t, task T can only enter the ready queue later when interrupts are re-enabled. In addition, even if T enters the ready queue at the correct time t and has the highest real-time priority in the system, it may still not be scheduled if another task is running in the kernel in a non-preemptable section. In this case, T will be scheduled when the kernel exits the non-preemptable section at time t'.

The length of a kernel non-preemptable section depends on the strategy that the kernel uses to guarantee the consistency of its internal structures, and on the internal organization of the kernel. The standard Linux kernel is based on the classical monolithic structure, in which the consistency of kernel structures is enforced by allowing at most one execution flow in the kernel at any given time. This is achieved by disabling preemption when an execution flow enters the kernel, i.e., when a system call is invoked or when an interrupt fires. When a system call is invoked, the thread that invokes it enters in the kernel and becomes non-preemptable, returning preemptable when the execution exits from the kernel. When an interrupt fires, a short ISR is invoked with interrupts disabled: the ISR acknowledges the hardware interrupt and schedules a BH for execution. The BH will be executed in a non-preemptive way immediately before returning to user mode; hence, if the ISR interrupts a system call, the BH will be executed only after that the system call is completed (system calls can synchronize with ISRs by explicitly disabling interrupts). Summing up, in a standard Linux kernel, the maximum latency is equal to the maximum length of a system call plus the processing time of all the interrupts that fire before returning to user mode. Unfortunately, this value can be as large as 100ms.


Possible Solutions

Two different approaches can be used to reduce the size of kernel non-preemptable sections: the one used by the Low-Latency patches (Ingo Molnar and Andrew Morton)[LowLat], and the one used by the kernel preemptability patch (MontaVista, TimeSys)[kpreem, TimeSys].


Low-Latency Linux

This approach ``corrects'' the monolithic structure by inserting explicit rescheduling points (that effectively are preemption points) inside the kernel. In this approach, when a thread is executing inside the kernel it can explicitly decide to yield the CPU to some other thread. In this way, the size of non-preemptable sections is reduced, thus decreasing the latency. In a low-latency kernel, the consistency of kernel data is enforced by using cooperative scheduling (instead of non-preemptive scheduling) when the execution flow enters the kernel.

This approach is also used by some real-time versions of Linux, such as RED Linux. In a low-latency kernel, the maximum latency decreases to the maximum time between two rescheduling points. Since the low-latency patch has been carefully hand-tuned for quite a long time, it performs surprisingly well.


Preemptable Linux

The preemptable approach, used in most real-time systems, removes the constraint of a single execution flow inside the kernel. Thus it is not necessary to disable preemption when an execution flow enters the kernel. To support full kernel preemptability, kernel data must be explicitly protected using mutexes or spinlocks. The Linux preemptable kernel patch, sponsored by MontaVista, uses this approach and makes the kernel fully preemptable. Kernel preemption is disabled only when a spinlock is held.

A similar approach is used by TimeSys, that uses mutexes instead of spinlocks, and provide priority inheritance. While the MontaVista patch disables preemption when a spinlock is acquired, the TimeSys one is based on blocking synchronization.

In a preemptable kernel, the maximum latency is determined by the maximum amount of time for which a spinlock is held inside the kernel. Again, it is important to note that BHs are serialized using a spinlock, thus they contribute to the latency.

An additional patch (lock-breaking) merges some of the low-latency rescheduling points into the preemptable kernel, for decreasing the amount of time for which spinlocks are held.


Evaluation

We measured the latency for the standard, Low-Latency, preemptable and preemptable with lock-breaking kernels while running different loads in background. All the experiments were performed using the Latency Benchmark tool downloadable from [FT]. The results are shown in the following figures:

Figure 7-1. Latency in the Standard Kernel

Figure 7-2. Latency in the Low Latency Kernel

Figure 7-3. Latency in the Preemptable Kernel

Figure 7-4. Latency in the Preemptable Lock-Breaking Kernel

From the first figure, it is possible to see that standard Linux exhibits high latencies at the end of the memory stress (a program that reads and writes a large array in memory), during the I/O stress (a program that reads and writes large amount of data on a file), when accessing the /proc file system, and when switching the caps/lock led. The large latency at the end of the memory stress is due to the munmap() system call. Comparing the figures it is possible to see that the low latency kernel solves all the problems except the /proc file system access and the caps/lock switch. On the other hand, the preemptable kernel eliminates the large latency in the /proc fs access, but does not solve the problem with the memory stress, and is not as effective as the low latency kernel in reducing the latency during the I/O stress. Finally, the lock-breaking preemptable kernel seems to provide good real-time performance, but still has some problem during the I/O stress.

By repeating similar experiments for longer amounts of time, we verified that the low-latency kernel is characterized by larger average latencies respect to the preemptable and lock-breaking preemptable kernels, but reduces the worst case latencies. In other words, the tail of the probability distribution function is shorter for a low-latency kernel. As an example, the PDFs of the latency during the I/O stress are reported in the following figures (note that this new experiments were performed on a computer that is faster than the one used for the previous experiments).

Figure 7-5. PDF of the Latency in the Low-Latency Kernel

Figure 7-6. PDF of the Latency in the Lock-Breaking Kernel

According to the previous figures, it seems that the low-latency patch provides better real-time performance. This is probably due to the fact that the Linux kernel still contains ``big spinlocks'' that are held for long amounts of time, and that the lock-breaking patch does not ``break'' properly. A classical example is the spinlock used to serialize BHs. However, in the future Linux developers will likely decrease the size of kernel critical sections (to improve SMP performance), hence it seems to be reasonable to think that the latency of the preemptable and lock-breaking preemptable kernel will decrease.


Chapter 8. RTLinux/GPL

There are two different versions of RTLinux: RTLinux/Pro and RTLinux/Open.

RTLinux/Open is available on the web. The main (only) developer of this version was FSMLabs and stops its development in 2Q/2001. RTLinux/Pro is available for a free from FSMLabs and the license is non-GPL.

This paper is focused of the RTLinux/Open version.


Architecture overview

There are two approaches to provide real-time performance in a Linux system:

  1. Improving the Linux kernel preemption.

  2. Adding a new software layer beneath Linux kernel with full control of interrupts and processor key features.

These two approaches are known as "preemption improvement" and "interrupt abstraction" respectively. This second approach is the one used by RTLinux.

RTLinux is a small, fast operating system, following the POSIX 1003.13 "minimal realtime operating system" standard.

RTLinux adds a layer of virtual hardware between the standard Linux kernel and the computer hardware (see Figure 8-1). As far as the standard Linux kernel is concerned, this new layer appears to be the actual hardware. RTLinux implements a complete and predictable RTOS with no interference from the non-real-time Linux. The RTLinux threads are executed directly by a fixed-priority scheduler. The whole Linux kernel, and all the normal Linux processes, are managed by the RTLinux scheduler as the background task. This way, it is possible to have a complete general purpose operating system running on top of a small predictable RTOS.

Figure 8-1. RTLinux layer architecture

There are three main modifications done to the Linux kernel in order to virtualize the hardware so that RTLinux can take full control of the machine: The RTLinux layer takes direct control of all the hardware interrupts, interrupts that are not controlled by real-time threads are forwarded to the Linux upper level; RTLinux also takes the control of the timer hardware (8254 and APIC when available) and implements a virtual timer to Linux; and the last modification done to Linux to remove the basic control of the hardware from Linux is to replace all the cli and sti (disable and enable interrupt flag) functions calls from the Linux kernel so that Linux can no make a real disable but a virtual interrupt disable. These modifications are quiet complex and tricky, but do not require large code (Linux) modifications.

RTLinux provides an execution environment "below" the Linux kernel. One consequence of this, is that realtime threads can not use Linux services because deadlock or system inconsistencies may happen. To overcome this problem, the realtime system has to be divided into two separated layers: the hard realtime layer, executed on top of RTLinux, and the soft realtime, executed as normal Linux processes. Several mechanisms (FIFO, shared memory) can be used to communicate threads in both layers.

The two layer approach is a useful method to provide hard realtime while having all the features of a desktop operating system. It decouples the mechanism of the realtime kernel from the mechanism of the general purpose Linux kernel so that each system can be optimized independently.


Hardware characteristics

Supported processors:

i386, PPC, ARM (StrongARM/iPAQ).

Supported multi processor:

Yes. It is available for i386 architectures.


Process management

Scheduling policy:

There are three scheduling policies available: SCHED_FIFO, SCHED_SPORADIC SCHED_EDF. SCHED_FIFO is a fixed priority scheduling and threads with the same priority are scheduled in FIFO order. SCHED_SPORADIC an implementaion of the sporadic server used to run aperiodic activities. SCHED_EDF implements a dynamic priority scheduling policy the EDF ( Earliest Deadline First). Each thread has a fixed priority and a deadline. Threads are sorted by priority, but same priority threads are scheduled according to the EDF policy.

Periodic threads:

The system provides special system calls to implement periodic threads. pthread_make_periodic_np() and pthread_wait_np()

Range of priorities and maximum number of threads:

Minimum and maximum priority is 0 and 1000000 respectively. There is no limit in the number of running threads, but the scheduling cost is proportional to the number of threads. Current scheduler code is designed to handle efficiently a low number of threads (around 10).

Thread creation and deletion:

It provides all the POSIX Thread termination facilities and also some extensions to remove termitate threads easier.

A thread can terminate itself by calling pthread_exit() function. pthread_join() suspend execution the execution of the calling thread until the target thread terminates. To implement this behavior, when a thread exits, the system do not delete the supporting data until other thread has joined. The system also provides the pthread_detach() function to indicate that the target thread will not be joined and the system support data can be reclaimed as soon as the thread exits.

A thread can request the cancellation (termination) of other thread: pthread_cancel(). The thread that receives the cancellation request, depending on the cancelability state, can do one o the following action:

  • PTHREAD_CANCEL_DISABLE: The cancel request is ignored.

  • PTHREAD_CANCEL_DEFERRED: The thread will be canceled but only at some safe points.

  • PTHREAD_CANCEL_ASYNCHRONOUS:The thread is canceled immediately.

A thread can install cancellation cleanup handlers: pthread_cleanup_push() and pthread_cleanup_pop(). The cleanup handlers are called when the thread exits, is canceled or the handler is removed.

pthread_delete_np() function can be used to termintate inmediately a thread and can be used instead of the pair: pthread_cancel()/pthread_join()


Memory management

Protected address spaces

Although RTLinux is designed to run in processors with MMU, all the application threads and the RTLinux kernel run in the same address space. There is no memory protection between threads and the kernel and also between threads themselves.

From the point of view of memory management, RTLinux is the guest operating system of the Linux Kernel. The Linux kernel has the whole control of the memory.

Dynamic memory allocation:

RTLinux do not provide dynamic memory allocation nor use it internally. The main argument is that dynamic memory allocation is not predictable if implemented efficiently. The Real-Time goal of predictability is usually achieved by preallocating most of the resources the threads will use at run time.

It i