TOC PREV NEXT INDEX

Writing Device Drivers for LynxOS


Synchronization

This chapter describes synchronization issues and the LynxOS mechanisms available to device drivers to handle these issues.

Introduction

There are a number of synchronization mechanisms that can be used in a LynxOS device driver. These include:

Kernel semaphores can be used to protect critical code regions as well as to manage shared data and resources in a controlled manner. The functions supporting kernel semaphores include: swait(), ssignal(), ssignaln(), and sreset().

Disabling interrupts and preemption are mechanisms used to protect code segments that are considered atomic and must be completed without interruption. The calls that support disabling of interrupts and preemption include: disable(), restore(), sdisable(), and srestore().

The following table summarizes the LynxOS synchronization functions that support device drivers. A complete description of these functions is available in their respective man pages.

Synchronization Support Functions  
Call
Summary
swait()
swait() causes the calling process to wait on a semaphore.
The prototype for swait() is:
int swait(int *s, int flag)
s is a pointer to a semaphore and flag is an argument that specifies whether or not signals are delivered to the process while it is waiting. swait() cannot be used from within an ISR.
ssignal()
ssignal() increments a semaphore and wakes up one process that is waiting on that semaphore. Processes are awakened in priority order.
The prototype for ssignal() is:
int ssignal(int *s)
s is a pointer to a semaphore.
disable()
disable() disables interrupts and task preemption.
The prototype for disable() is:
void disable(int ps)
ps must be a local stack variable of the invoking function.
restore()
restore() restores interrupts and task preemption.
The prototype for restore() is:
void restore(int ps)
ps is the same variable used in the corresponding disable() call.
sdisable()
sdisable() disables task preemption.
The prototype for disable() is:
void disable(int ps)
ps must be a local stack variable of the invoking function.
srestore()
srestore() restores task preemption.
The prototype for srestore() is:
void srestore(int ps)
ps is the same variable used in the corresponding sdisable() call.
sreset()
sreset() wakes up all processes that are waiting on a semaphore and sets the semaphore value to zero.
The prototype for sreset() is:
void sreset(int *s)
s is a pointer to a semaphore.
signaln()
ssignaln() signals a semaphore a specified number of times.
The prototype for signaln() is:
int signaln(int *s, int count)
s is a pointer to a semaphore and count is the number of times to signal the semaphore.
scount()
scount() returns the value of the semaphore.
The prototype for scount() is:
int scount(int *s)
If the value of s is negative, it indicates the number of processes that are waiting on the semaphore. If s is zero, no processes are waiting on the semaphore. If the value is greater than zero, it represents the number of times the semaphore can be waited on without having to wait for the semaphore to be signaled (see ssignal()).
pi_init()
Used to initialize a priority inheritance semaphore.

What is Synchronization?

Synchronization ensures that certain events occur in a definite order within a non-deterministic environment (such as a concurrent, preemptive operating system). In a device driver this usually means ensuring that shared resources such as devices, buffers, queues, an so on are accessed in a protected and controlled manner so that processes do not interfere with each other's access to shared resources.

Synchronization provides:

Managing Shared Data Resources

Semaphores are a mechanism available to LynxOS device drivers to manage shared resources (statics structure and shared buffers and queues, for example). Semaphores can partition the device driver code into critical code regions that must obtain access to a shared resource before continuing to execute. The semaphore is a mechanism used to lock and release a shared resource. Code that must access the shared resource can only do so if the resource is unlocked. If the shared resource is unlocked, the code locks it and proceeds. If the shared resource is locked, the code must wait (block) until the resource becomes free.

The mechanism of locking and releasing shared resources with semaphores is described in more detail in "Kernel Semaphores".

Protecting Critical Code Sections

Within a device driver, it is necessary to prevent interrupt routines from accessing shared data or resources such as buffers or queues that are being modified by a process. To accomplish this, interrupts can be disabled with the disable() function and subsequently re-enabled with the restore() function.

It is important to keep the code being executed between the disable() and restore() functions short in order to avoid degradation of the overall system response to interrupts. (Note that disable() also disables task preemption.)

Following is a basic example using disable() and restore():

int ps;
disable (ps);   /* disable all interrupts */
...
...
restore (ps);     /* restore interrupt state */

The variable ps must be a local variable and should never be modified by the driver. Each call to disable() must have a corresponding call to restore(), using the same variable.

Note: The restore() function actually restores the state that existed before disable() was called. So, if interrupts were already disabled when disable() was called, the first call to restore() does not re-enable them.

The sdisable() and srestore() functions are used to disable task preemption only. Disabling of task preemption is necessary to prevent the kernel, other drivers, or applications from accessing shared data and resources while they are being modified by a device driver process. The kernel continues to handle interrupts while preemption is disabled.

The sdisable() and srestore() functions are used in much the same way as disable() and restore(). Following is a basic example of sdisable() and srestore():

int ps;
sdisable (ps);  /* disable task preemption */
...
...
restore (ps);  /* restore preemption state */

The variable ps must be a local variable and should never be modified by the driver. Each call to sdisable() must have a corresponding call to srestore(), using the same variable.

Note: The srestore() function actually restores the state that existed before sdisable() was called. So if interrupts were already disabled when sdisable() was called, the (first) call to srestore() does not re-enable them.

A critical code region is blocked out by the disable()/restore() or sdisable()/srestore() calls. Within a device driver, the critical code region should only contain the instructions necessary to complete an atomic transaction on a shared resource and interrupts and task preemption must be re-enabled immediately after the transaction is complete.

Nesting Critical Regions

It is also possible to nest critical regions. As a general rule, a less selective mechanism can be nested inside a more selective one. For instance, the following is permissible:

int sps, ps;
sdisable (sps);
...
disable (ps);
...
restore (ps);
...
srestore (sps);

Note that different local variables must be used for the two mechanisms. However, the converse is not true. It is not permitted to do the following:

disable (ps);
...
sdisable (sps);
...
srestore (sps);
...
restore (ps);

In any case, the inner sdisable()/srestore() is completely redundant, as preemption is already disabled by the outer disable().

Avoiding Deadlock & Race Conditions

Deadlock typically occurs when two semaphores are not accessed in the same order in two different processes (or threads). As a result, each process is holding a semaphore and is waiting to gain access to the semaphore that the other process is holding. In this condition the processes wait forever for a semaphore that will never be released.

Deadlock can be avoided by ensuring that multiple semaphores are always acquired in the same order by every process. This ensures that two processes do not gain access to two different semaphores and wait indefinitely for the other to release the second semaphore.

Race conditions occur when two or more processes access the same shared resource at the same time. In particular, problems occur when a process that is accessing a shared resource gets preempted by another process that accesses the same resource and changes the state of that resource before the first process has completed its transaction on the resource. The result is that the first process is now working with a compromised version of the shared resource.

To avoid race conditions, shared data and resources must be accessed in a controlled manner. The code that accesses shared resources should be considered a critical code region, which can be protected from preemption by disabling interrupts or preemption.

Kernel Semaphores

A kernel semaphore is an integer variable that is declared by the device driver. Semaphores must be visible in all contexts. This means that the memory for a semaphore must not be allocated on the stack.

Kernel semaphores are counting semaphores, they can be initialized to any non-negative value. A semaphore is acquired using the swait() function.

If the semaphore value is greater than zero, it is simply decremented and the task continues. If the semaphore value is less than or equal to zero, the task blocks and is put on the wait queue of the semaphore. Tasks on this queue are kept in priority order.

A semaphore is signaled using the ssignal() function. If there are tasks waiting on the semaphore's queue, the highest priority task is woken up. Otherwise the semaphore value is incremented.

Kernel semaphores have state. The semaphore's value remembers how many times the semaphore has been waited on or signaled. This is important for event synchronization. If an event occurs but there are no tasks waiting for that event, the fact that the event occurred is not forgotten.

Kernel semaphores are not owned by a particular task. Any task can signal a semaphore, not just the task that initialized it. This is necessary to allow kernel semaphores to be used as an event synchronization mechanism but requires care when the semaphore is used for mutual exclusion.

The flag argument to the swait() function allows a task to specify how signals are handled while it is blocked on a semaphore. If the task does not block, this argument is not used. There are three possibilities for flag, specified using symbolic constants defined in kernel.h:

SEM_SIGIGNORE
Signals have no effect on the blocked task. Any signals sent to the task while it is waiting on the semaphore remain pending and will be delivered at some future time.
SEM_SIGRETRY
Signals are delivered to the task. If the task's signal handler returns, the task automatically waits again on the semaphore. Signal delivery is transparent to the driver as the swait() function does not indicate whether any signals were delivered.
SEM_SIGABORT
If a signal is sent to the task while it is blocked on the semaphore, the swait() is aborted. The task is woken up and swait() returns a nonzero value. The signal remains pending.

Other Kernel Semaphore Functions

There are a number of other functions used to manipulate kernel semaphores. These are:

ssignal(n)
Used to signal a semaphore n times. This is equivalent to calling ssignal() n times.
sreset()
Resets the semaphore value to 0 and wakes up all tasks that are waiting on the semaphore.
scount()
Returns the semaphore value.

Using Kernel Semaphores for Mutual Exclusion

When used to protect a critical code region, the kernel semaphore should be initialized to 1 or -1. This allows the first task to lock the semaphore and enter the region. Other tasks (including a kernel thread) that attempt to enter the same region will block until the semaphore is unlocked. Each call to swait() must have a corresponding call to ssignal().

swait (&s->mutex, SEM_SIGIGNORE);
/* enter critical code region */
...
...
/* access resource */
...
ssignal (&s->mutex);     /* leave critical code region */

Signals can normally be ignored when using a kernel semaphore as a mutex. Compared to waiting for an I/O device, a critical code region is relatively short so there is little need to be able to interrupt a task that is waiting on the mutex. Unlike an event, which is never guaranteed to occur, execution of a critical code region cannot fail. The task holding the mutex is bound, sooner or later, to get to the point where the mutex is released.

Caution! sreset() and ssignaln() should never be used on a kernel semaphore that is used for mutual exclusion as in both cases this could lead to more than one task executing the critical code concurrently.

Priority Inheritance Semaphores

In a multi-tasking system that uses a fixed priority scheduler, a problem known as priority inversion can occur. Consider a situation where a task holds some resource. This task is preempted by a higher priority task that requires access to the same resource. The higher priority task must wait until the lower priority task releases the resource. But the lower priority task may be prevented from executing (and therefore from releasing the resource) by other tasks of intermediate priority.

One solution to this problem is to use priority inheritance whereby the priority of the task holding the resource is temporarily raised to the priority of the highest priority task waiting for that resource until it releases the resource. LynxOS kernel semaphores support priority inheritance. In order to function with priority inheritance, the semaphore's value must be initialized by the kernel function pi_init().

pi_init (&s->mutex);

This feature is should only used in the context of a kernel semaphore being used as a mutex mechanism.

Event Synchronization

A kernel semaphore is the mechanism used to implement event synchronization in a LynxOS driver. The value of the semaphore should be initialized to 0, indicating that no events have occurred.

Waiting for an event:

if (swait (&s->event_sem, SEM_SIGABORT))
{
  pseterr (EINTR);
  return (SYSERR);
}

Signaling an event:

ssignal (&s->event_sem);

Handling Signals

Because there is often no guarantee that an event will occur, signals should be allowed to abort the swait() using SEM_SIGABORT. This way, a task can be interrupted if the event it is waiting for never arrives. If signals are ignored, there is no way to interrupt the task in the case of problems, so the task can remain blocked indefinitely. The driver must check the return code from swait() to determine whether a signal has been received. As an alternative to SEM_SIGABORT, timeouts can be used if the timing of events is known in advance.

It is sometimes useful for an application to be able to handle signals while it is blocked on a semaphore but without aborting the wait. This is possible using the SEM_SIGRETRY flag to swait(). Signals are delivered to the application and the swait() automatically restarted. There is no way for the driver to know whether any signals were delivered while the task was blocked on the semaphore.

A word of caution is necessary concerning the use of SEM_SIGRETRY. If the signal handler in the application calls exit(3), then the swait() in the driver will never return. This could cause problems if the task had blocked while holding some resources. These resources will never be freed. To avoid this type of problem, a driver can use SEM_SIGABORT in conjunction with the kernel function deliversigs(). This allows the application to receive signals in a timely fashion, but without the risk of losing resources in the driver.

if (swait (&s->event_sem, SEM_SIGABORT)
{
  cleanup (s); /* prepare for possible termination by signal handler */
  deliversigs (); /* may never return */
}

Using sreset() with Event Synchronization Semaphores

Two example uses of sreset() discussed below are:

Handling Error Conditions

A driver must handle errors that may occur. For example, what should it do if an unrecoverable error is detected on a device? A frequent approach is to set an error flag and wake up any tasks that are waiting on the device:

if (error_found) {
  s->error++;
  sreset (&s->event_sem);
}

But the driver cannot assume that when swait() returns, the expected event has occurred. The swait() could have been woken up because an error was detected. So some extra logic is required when using the event synchronization semaphore:

if (swait (&s->event_sem, SEM_SIGABORT))
{
  pseterr (EINTR);
  return (SYSERR);
}
if (s->error)
{
  pseterr (EIO);
  return (SYSERR);
}

Variable Length Transfers

The second example with sreset() uses the following scenario: A device or producer process generates data at a variable rate. Data can also be consumed in variable sized pieces by multiple tasks. At some point, a number of consumer tasks may be blocked on an event synchronization semaphore, each waiting for different amounts of data, as illustrated below.

Synchronization Mechanisms

When data becomes available, what should the driver do? Without adding extra complexity and overhead to the driver, there is no easy way for the driver to calculate how many of the waiting tasks it can satisfy (and should, therefore, wake up). A simple solution is to call sreset(), which will wake all tasks, which then consume the available data according to their priorities. Tasks that are awakened but find no data have to wait again on the event semaphore.

Caution when Using sreset()

To maintain coherency of the semaphore queue, sreset() must synchronize with calls to ssignal(). Because ssignal() can be called from an interrupt handler, sreset() disables interrupts internally while it is waking up all the blocked tasks. Because the number of tasks blocked on a semaphore is not limited, this could lead to unbounded interrupt disable times if sreset() is used without proper consideration.

To avoid this problem, another technique must be used in driver design where an unknown number of tasks could be blocked on a semaphore. One possibility is to wake tasks in a cascade manner. The call to sreset() is replaced by a call to ssignal(), which wakes up the first blocked task. This task is then responsible for unblocking the next blocked task, which unblocks the next one, and so on, until there are no more blocked tasks. A negative semaphore indicates that there are blocked tasks. This is illustrated in the modified error handling code from the previous section:

if (error_found)
{
  s->error++;
  if (s->event_sem < 0)
    ssignal (&s->event_sem);
}
...
if (swait (&s->event_sem, SEM_SIGABORT))
{
  pseterr (EINTR);
  return (SYSERR);
}
if (s->error)
{
  if (s->event_sem < 0)
  ssignal (&s->event_sem);
  pseterr (EIO);
  return (SYSERR);
}

Because tasks are queued on a semaphore in priority order, they will still be awakened and executed in the same order as when using sreset(). There is no penalty with using this technique.

Resource Pool Management

LynxOS kernel semaphores can also be used as a counting semaphore for managing a resource pool. The value of the semaphore should be initialized to the number of resources in the pool. To allocate a resource, swait() is used. ssignal() is used to free a resource. The following code shows an example of using swait() to allocate and ssignal() to free a resource.

struct resource *
allocate (s)
struct statics *s;
{
  struct resource *resource;
  int ps;
  swait (&s->pool_sem, SEM_SIGRETRY);
  sdisable (ps);
  resource = s->pool_freelist;
  s->pool_freelist = resource->next;
  srestore (ps);
  return (resource);
}
free (s, resource)
struct statics *s;
struct resource *resource;
{
  struct resource *resource;
  int ps;
  sdisable (ps);
  resource->next = s->pool_freelist;
  s->pool_freelist = resource;
  srestore (ps);
  ssignal (&s->pool_sem);
}

The counting semaphore functions implicitly as an event synchronization semaphore too. When the pool is empty, an attempt to allocate will block until another task frees a resource.

A mutex mechanism is still needed to protect the code that manipulates the free list. The combining of different synchronization techniques is discussed more fully in the following section.

Combining Synchronization Mechanisms

The examples discussed in the preceding sections have all been fairly straightforward in that they have only used one synchronization mechanism. In an actual driver, the scenarios are often far more complex and require combining different techniques. The following sections discuss when and how synchronization mechanisms should be combined.

Manipulating a Free List

This example illustrates the use of interrupt disabling to remove an item from a free list, but in particular, what the driver can do if the free list is empty.

One possibility is that the driver blocks until another task puts something back on the free list. This scenario requires the use of a mutex and an event synchronization semaphore. Two different approaches to this problem are illustrated in the following examples. The first example is deliberately complicated to demonstrate various synchronization techniques.

/* get_item : get item off free list, blocking if
list is empty */
struct item *
get_item (s)
struct statics *s;
{
  struct item *p;
  int ps;
  do
  {
    disable (ps); /* enter critical code */
    if (p = s->freelist) /* take 1st item on list */
      s->freelist = p->next;
    else
      /* list was empty, so wait */
      swait (&s->freelist_sem, SEM_SIGIGNORE);
    restore (ps); /* exit critical code */
  } while (!p);
return (p);
}

/* put_item : put item on free list, wake up waiting tasks */
put_item (s, p)
struct statics *s;
struct item *p;
{
  int ps;
  disable (ps); /* enter critical code */
  p->next = s->freelist; /* put item on list */
  s->freelist = p;
  if (s->freelist_sem < 0)
    ssignal (&s->freelist_sem); /* wake up waiter */
  restore (ps); /* exit critical code */
}

There are a number of points of interest illustrated by this example:

In the second approach to this problem, a kernel semaphore is used as a counting semaphore to manage items on the free list. The value of the semaphore should be initialized to the number of items on the list.

struct item *
get_item (s)
struct statics *s;
{
  struct item *p;
  int ps;

  swait (&s->free_count, SEM_SIGRETRY);
  disable (ps);
  p = s->freelist;
  s->freelist = p->next;
  restore (ps);
  return (p);
}

put_item (s, p)
struct statics *s;
struct item *p;
{
  int ps;
  disable (ps);
  p->next = s->freelist;
  s->freelist = p;
  restore (ps);
  ssignal (&s->free_count);
}

This code illustrates the following points:

Signal Handling and Real-Time Response

"Handling Signals" discussed the use of the SEM_SIGRETRY flag with swait(). It is not advisable to use swait() with this flag inside a critical code region protected with disable()/restore() or sdisable()/srestore(). The reason for this is that, internally, swait() calls the kernel function deliversigs() to deliver signals when the SEM_SIGRETRY flag is used. If the swait() is within a region with interrupts or preemption disabled, then the execution time for deliversigs() will contribute to the total interrupt or preemption disable time, as illustrated in the following example:

sdisable (ps); /* enter critical region */
...
swait (&s->event_sem, SEM_SIGRETRY);
   /* may call deliversigs internally */
...
srestore (ps);      /* leave critical region */

In order to minimize the disable times it is better to use SEM_SIGABORT and re-enable interrupts or preemption before calling deliversigs(). The above code then becomes:

sdisable (ps);    /* enter critical region */
...
while (swait (&s->event_sem, SEM_SIGABORT))
{
  srestore (ps);  /* re-enable pre-emption before delivering signals */
  deliversigs (); /* may never return */
  sdisable (ps);
}
...
srestore (ps);     /* leave critical region */



LynuxWorks, Inc.
855 Branham Lane East
San Jose, CA 95138
http://www.lynuxworks.com
1.800.255.5969
TOC PREV NEXT INDEX