![]() |
|
||||
Writing Device Drivers for LynxOS |
Synchronization
This chapter describes synchronization issues and the LynxOS mechanisms available to device drivers to handle these issues.
Introduction
There are a number of synchronization mechanisms that can be used in a LynxOS device driver. These include:
Kernel semaphores can be used to protect critical code regions as well as to manage shared data and resources in a controlled manner. The functions supporting kernel semaphores include: swait(), ssignal(), ssignaln(), and sreset().
Disabling interrupts and preemption are mechanisms used to protect code segments that are considered atomic and must be completed without interruption. The calls that support disabling of interrupts and preemption include: disable(), restore(), sdisable(), and srestore().
The following table summarizes the LynxOS synchronization functions that support device drivers. A complete description of these functions is available in their respective man pages.
What is Synchronization?
Synchronization ensures that certain events occur in a definite order within a non-deterministic environment (such as a concurrent, preemptive operating system). In a device driver this usually means ensuring that shared resources such as devices, buffers, queues, an so on are accessed in a protected and controlled manner so that processes do not interfere with each other's access to shared resources.
- Methods that support the coordinated use of shared resources by causing processes to suspend execution when a shared resource is not available.
- Protection to critical code sections. These are code segments considered to be atomic that must be all completed or not at all.
- Mechanisms to prevent system failure due to inherent conditions of concurrent and preemptive operating system environments such as race conditions and deadlock.
Managing Shared Data Resources
Semaphores are a mechanism available to LynxOS device drivers to manage shared resources (statics structure and shared buffers and queues, for example). Semaphores can partition the device driver code into critical code regions that must obtain access to a shared resource before continuing to execute. The semaphore is a mechanism used to lock and release a shared resource. Code that must access the shared resource can only do so if the resource is unlocked. If the shared resource is unlocked, the code locks it and proceeds. If the shared resource is locked, the code must wait (block) until the resource becomes free.
The mechanism of locking and releasing shared resources with semaphores is described in more detail in "Kernel Semaphores".
Protecting Critical Code Sections
Within a device driver, it is necessary to prevent interrupt routines from accessing shared data or resources such as buffers or queues that are being modified by a process. To accomplish this, interrupts can be disabled with the disable() function and subsequently re-enabled with the restore() function.
It is important to keep the code being executed between the disable() and restore() functions short in order to avoid degradation of the overall system response to interrupts. (Note that disable() also disables task preemption.)
Following is a basic example using disable() and restore():
The variable ps must be a local variable and should never be modified by the driver. Each call to disable() must have a corresponding call to restore(), using the same variable.
The sdisable() and srestore() functions are used to disable task preemption only. Disabling of task preemption is necessary to prevent the kernel, other drivers, or applications from accessing shared data and resources while they are being modified by a device driver process. The kernel continues to handle interrupts while preemption is disabled.
The sdisable() and srestore() functions are used in much the same way as disable() and restore(). Following is a basic example of sdisable() and srestore():
The variable ps must be a local variable and should never be modified by the driver. Each call to sdisable() must have a corresponding call to srestore(), using the same variable.
A critical code region is blocked out by the disable()/restore() or sdisable()/srestore() calls. Within a device driver, the critical code region should only contain the instructions necessary to complete an atomic transaction on a shared resource and interrupts and task preemption must be re-enabled immediately after the transaction is complete.
Nesting Critical Regions
It is also possible to nest critical regions. As a general rule, a less selective mechanism can be nested inside a more selective one. For instance, the following is permissible:
Note that different local variables must be used for the two mechanisms. However, the converse is not true. It is not permitted to do the following:
In any case, the inner sdisable()/srestore() is completely redundant, as preemption is already disabled by the outer disable().
Avoiding Deadlock & Race Conditions
Deadlock typically occurs when two semaphores are not accessed in the same order in two different processes (or threads). As a result, each process is holding a semaphore and is waiting to gain access to the semaphore that the other process is holding. In this condition the processes wait forever for a semaphore that will never be released.
Deadlock can be avoided by ensuring that multiple semaphores are always acquired in the same order by every process. This ensures that two processes do not gain access to two different semaphores and wait indefinitely for the other to release the second semaphore.
Race conditions occur when two or more processes access the same shared resource at the same time. In particular, problems occur when a process that is accessing a shared resource gets preempted by another process that accesses the same resource and changes the state of that resource before the first process has completed its transaction on the resource. The result is that the first process is now working with a compromised version of the shared resource.
To avoid race conditions, shared data and resources must be accessed in a controlled manner. The code that accesses shared resources should be considered a critical code region, which can be protected from preemption by disabling interrupts or preemption.
Kernel Semaphores
A kernel semaphore is an integer variable that is declared by the device driver. Semaphores must be visible in all contexts. This means that the memory for a semaphore must not be allocated on the stack.
Kernel semaphores are counting semaphores, they can be initialized to any non-negative value. A semaphore is acquired using the swait() function.
If the semaphore value is greater than zero, it is simply decremented and the task continues. If the semaphore value is less than or equal to zero, the task blocks and is put on the wait queue of the semaphore. Tasks on this queue are kept in priority order.
A semaphore is signaled using the ssignal() function. If there are tasks waiting on the semaphore's queue, the highest priority task is woken up. Otherwise the semaphore value is incremented.
Kernel semaphores have state. The semaphore's value remembers how many times the semaphore has been waited on or signaled. This is important for event synchronization. If an event occurs but there are no tasks waiting for that event, the fact that the event occurred is not forgotten.
Kernel semaphores are not owned by a particular task. Any task can signal a semaphore, not just the task that initialized it. This is necessary to allow kernel semaphores to be used as an event synchronization mechanism but requires care when the semaphore is used for mutual exclusion.
The flag argument to the swait() function allows a task to specify how signals are handled while it is blocked on a semaphore. If the task does not block, this argument is not used. There are three possibilities for flag, specified using symbolic constants defined in kernel.h:
Other Kernel Semaphore Functions
There are a number of other functions used to manipulate kernel semaphores. These are:
Using Kernel Semaphores for Mutual Exclusion
When used to protect a critical code region, the kernel semaphore should be initialized to 1 or -1. This allows the first task to lock the semaphore and enter the region. Other tasks (including a kernel thread) that attempt to enter the same region will block until the semaphore is unlocked. Each call to swait() must have a corresponding call to ssignal().
swait (&s->mutex, SEM_SIGIGNORE);
/* enter critical code region */
...
...
/* access resource */
...
ssignal (&s->mutex); /* leave critical code region */
Signals can normally be ignored when using a kernel semaphore as a mutex. Compared to waiting for an I/O device, a critical code region is relatively short so there is little need to be able to interrupt a task that is waiting on the mutex. Unlike an event, which is never guaranteed to occur, execution of a critical code region cannot fail. The task holding the mutex is bound, sooner or later, to get to the point where the mutex is released.
![]()
Priority Inheritance Semaphores
In a multi-tasking system that uses a fixed priority scheduler, a problem known as priority inversion can occur. Consider a situation where a task holds some resource. This task is preempted by a higher priority task that requires access to the same resource. The higher priority task must wait until the lower priority task releases the resource. But the lower priority task may be prevented from executing (and therefore from releasing the resource) by other tasks of intermediate priority.
One solution to this problem is to use priority inheritance whereby the priority of the task holding the resource is temporarily raised to the priority of the highest priority task waiting for that resource until it releases the resource. LynxOS kernel semaphores support priority inheritance. In order to function with priority inheritance, the semaphore's value must be initialized by the kernel function pi_init().
This feature is should only used in the context of a kernel semaphore being used as a mutex mechanism.
Event Synchronization
A kernel semaphore is the mechanism used to implement event synchronization in a LynxOS driver. The value of the semaphore should be initialized to 0, indicating that no events have occurred.
Handling Signals
Because there is often no guarantee that an event will occur, signals should be allowed to abort the swait() using SEM_SIGABORT. This way, a task can be interrupted if the event it is waiting for never arrives. If signals are ignored, there is no way to interrupt the task in the case of problems, so the task can remain blocked indefinitely. The driver must check the return code from swait() to determine whether a signal has been received. As an alternative to SEM_SIGABORT, timeouts can be used if the timing of events is known in advance.
It is sometimes useful for an application to be able to handle signals while it is blocked on a semaphore but without aborting the wait. This is possible using the SEM_SIGRETRY flag to swait(). Signals are delivered to the application and the swait() automatically restarted. There is no way for the driver to know whether any signals were delivered while the task was blocked on the semaphore.
A word of caution is necessary concerning the use of SEM_SIGRETRY. If the signal handler in the application calls exit(3), then the swait() in the driver will never return. This could cause problems if the task had blocked while holding some resources. These resources will never be freed. To avoid this type of problem, a driver can use SEM_SIGABORT in conjunction with the kernel function deliversigs(). This allows the application to receive signals in a timely fashion, but without the risk of losing resources in the driver.
if (swait (&s->event_sem, SEM_SIGABORT)
{
cleanup (s); /* prepare for possible termination by signal handler */
deliversigs (); /* may never return */
}
Using sreset() with Event Synchronization Semaphores
Two example uses of sreset() discussed below are:
Handling Error Conditions
A driver must handle errors that may occur. For example, what should it do if an unrecoverable error is detected on a device? A frequent approach is to set an error flag and wake up any tasks that are waiting on the device:
But the driver cannot assume that when swait() returns, the expected event has occurred. The swait() could have been woken up because an error was detected. So some extra logic is required when using the event synchronization semaphore:
if (swait (&s->event_sem, SEM_SIGABORT))
{
pseterr (EINTR);
return (SYSERR);
}
if (s->error)
{
pseterr (EIO);
return (SYSERR);
}
Variable Length Transfers
The second example with sreset() uses the following scenario: A device or producer process generates data at a variable rate. Data can also be consumed in variable sized pieces by multiple tasks. At some point, a number of consumer tasks may be blocked on an event synchronization semaphore, each waiting for different amounts of data, as illustrated below.
Synchronization Mechanisms![]()
When data becomes available, what should the driver do? Without adding extra complexity and overhead to the driver, there is no easy way for the driver to calculate how many of the waiting tasks it can satisfy (and should, therefore, wake up). A simple solution is to call sreset(), which will wake all tasks, which then consume the available data according to their priorities. Tasks that are awakened but find no data have to wait again on the event semaphore.
Caution when Using sreset()
To maintain coherency of the semaphore queue, sreset() must synchronize with calls to ssignal(). Because ssignal() can be called from an interrupt handler, sreset() disables interrupts internally while it is waking up all the blocked tasks. Because the number of tasks blocked on a semaphore is not limited, this could lead to unbounded interrupt disable times if sreset() is used without proper consideration.
To avoid this problem, another technique must be used in driver design where an unknown number of tasks could be blocked on a semaphore. One possibility is to wake tasks in a cascade manner. The call to sreset() is replaced by a call to ssignal(), which wakes up the first blocked task. This task is then responsible for unblocking the next blocked task, which unblocks the next one, and so on, until there are no more blocked tasks. A negative semaphore indicates that there are blocked tasks. This is illustrated in the modified error handling code from the previous section:
if (error_found)
{
s->error++;
if (s->event_sem < 0)
ssignal (&s->event_sem);
}
...
if (swait (&s->event_sem, SEM_SIGABORT))
{
pseterr (EINTR);
return (SYSERR);
}
if (s->error)
{
if (s->event_sem < 0)
ssignal (&s->event_sem);
pseterr (EIO);
return (SYSERR);
}
Because tasks are queued on a semaphore in priority order, they will still be awakened and executed in the same order as when using sreset(). There is no penalty with using this technique.
Resource Pool Management
LynxOS kernel semaphores can also be used as a counting semaphore for managing a resource pool. The value of the semaphore should be initialized to the number of resources in the pool. To allocate a resource, swait() is used. ssignal() is used to free a resource. The following code shows an example of using swait() to allocate and ssignal() to free a resource.
swait (&s->pool_sem, SEM_SIGRETRY);
sdisable (ps);
resource = s->pool_freelist;
s->pool_freelist = resource->next;
srestore (ps);
return (resource);
}
free (s, resource)
struct statics *s;
struct resource *resource;
{
struct resource *resource;
int ps;
sdisable (ps);
resource->next = s->pool_freelist;
s->pool_freelist = resource;
srestore (ps);
ssignal (&s->pool_sem);
}
The counting semaphore functions implicitly as an event synchronization semaphore too. When the pool is empty, an attempt to allocate will block until another task frees a resource.
A mutex mechanism is still needed to protect the code that manipulates the free list. The combining of different synchronization techniques is discussed more fully in the following section.
Combining Synchronization Mechanisms
The examples discussed in the preceding sections have all been fairly straightforward in that they have only used one synchronization mechanism. In an actual driver, the scenarios are often far more complex and require combining different techniques. The following sections discuss when and how synchronization mechanisms should be combined.
Manipulating a Free List
This example illustrates the use of interrupt disabling to remove an item from a free list, but in particular, what the driver can do if the free list is empty.
One possibility is that the driver blocks until another task puts something back on the free list. This scenario requires the use of a mutex and an event synchronization semaphore. Two different approaches to this problem are illustrated in the following examples. The first example is deliberately complicated to demonstrate various synchronization techniques.
/* get_item : get item off free list, blocking if
list is empty */
struct item *
get_item (s)
struct statics *s;
{
struct item *p;
int ps;
do
{
disable (ps); /* enter critical code */
if (p = s->freelist) /* take 1st item on list */
s->freelist = p->next;
else
/* list was empty, so wait */
swait (&s->freelist_sem, SEM_SIGIGNORE);
restore (ps); /* exit critical code */
} while (!p);
return (p);
}
/* put_item : put item on free list, wake up waiting tasks */
put_item (s, p)
struct statics *s;
struct item *p;
{
int ps;
disable (ps); /* enter critical code */
p->next = s->freelist; /* put item on list */
s->freelist = p;
if (s->freelist_sem < 0)
ssignal (&s->freelist_sem); /* wake up waiter */
restore (ps); /* exit critical code */
}
There are a number of points of interest illustrated by this example:
- The example uses SEM_SIGIGNORE for simplicity. If SEM_SIGABORT is used, the return value from swait() must be checked.
- The example uses the disable()/restore() mechanism for mutual exclusion. This allows the free list to be accessed from an interrupt handler using put_item(). get_item() should never be called from an interrupt handler though, as it may block. If the free list is not accessed by the interrupt handler, sdisable()/srestore() can be used instead.
- The get_item() function uses the value of the item taken off the list to determine if the list was empty or not. Note that the freelist_sem() is being used simply as an event synchronization mechanism, not a counting semaphore. (Managing a free list with a counting semaphore is illustrated in the second approach). As a consequence, the code that puts items back on the free list must signal the semaphore only if there is a task waiting. Otherwise, if the semaphore was signalled every time an item is put back, the semaphore count would become positive and a task calling swait() in get_item() would return immediately, even though the list is still empty.
- Blocking with interrupts disabled may seem at first like a dangerous thing to do. This is necessary, as restoring interrupts before the swait() would introduce a race condition. LynxOS saves the interrupt state on a per task basis. So, when this task blocks and the scheduler switches to another task, the interrupt state will be set to that associated with the new task. But, from the point of view of the task executing the above code, the swait() executes atomically with interrupts disabled.
- swait()/ssignal() cannot be used as the mutex mechanism in this particular example as this could lead to a deadlock situation where one task is blocked in the swait() while holding the mutex. Other tasks wishing to put items back on the list will not be able to enter the critical region. If a critical code region may block, care must be taken not to introduce possibility of a deadlock. To avoid a deadlock, sdisable()/srestore() or disable()/restore() should be used as the mutex mechanism rather than swait()/ssignal(). But, once again, the critical code region must be kept as short as possible to avoid having an adverse effect on the system's real-time responsiveness. An alternative would be to raise an error condition if the list is empty, rather than block. This would allow swait()/ssignal() to be used as the mutex mechanism.
- A call to ssignal() in put_item() may make a higher priority task eligible to execute but the context switch will not occur until preemption is re-enabled with restore().
In the second approach to this problem, a kernel semaphore is used as a counting semaphore to manage items on the free list. The value of the semaphore should be initialized to the number of items on the list.
struct item *
get_item (s)
struct statics *s;
{
struct item *p;
int ps;
swait (&s->free_count, SEM_SIGRETRY);
disable (ps);
p = s->freelist;
s->freelist = p->next;
restore (ps);
return (p);
}
This code illustrates the following points:
- A kernel semaphore used as a counting semaphore incorporates the functionality of an event synchronization semaphore. swait() blocks when no items are available and ssignal() wakes up waiting tasks.
- The example uses the disable()/restore() mechanism for mutual exclusion. This allows the free list to be accessed from an interrupt handler using put_item(). get_item() should never be called from an interrupt handler though, as it may block. If the free list is not accessed by the interrupt handler, sdisable()/srestore() can be used instead.
- The event synchronization is outside of the critical code region so there is no possibility of deadlock. Therefore, swait()/ssignal() could be used as the mutex mechanism if the code does not need to be called from an interrupt handler.
- The function put_item() could be modified to allow several items to be put back on the list using ssignaln(). But items can only be consumed one at time, since there is no function swaitn().
Signal Handling and Real-Time Response
"Handling Signals" discussed the use of the SEM_SIGRETRY flag with swait(). It is not advisable to use swait() with this flag inside a critical code region protected with disable()/restore() or sdisable()/srestore(). The reason for this is that, internally, swait() calls the kernel function deliversigs() to deliver signals when the SEM_SIGRETRY flag is used. If the swait() is within a region with interrupts or preemption disabled, then the execution time for deliversigs() will contribute to the total interrupt or preemption disable time, as illustrated in the following example:
sdisable (ps); /* enter critical region */
...
swait (&s->event_sem, SEM_SIGRETRY);
/* may call deliversigs internally */
...
srestore (ps); /* leave critical region */
In order to minimize the disable times it is better to use SEM_SIGABORT and re-enable interrupts or preemption before calling deliversigs(). The above code then becomes:
![]() LynuxWorks, Inc. 855 Branham Lane East San Jose, CA 95138 http://www.lynuxworks.com 1.800.255.5969 |
![]() |
![]() |
![]() |
![]() |