Chapter 2.1 Memory Management Unit and Multiprocessing

The MMU lets the software define the mapping of virtual addresses to
physical addresses, a.k.a. bus addresses.

2.1.0 What is the MMU good for?
The reasons to use an MMU include:
- Access more memory:
  Without an MMU, the PDP-11 CPU can access 2^16 bytes, which is the
  number of virtual addresses. With an MMU, the PDP-11 can access 2^18
  bytes, which is the number of bus addresses. Because of two extra
  address bits, DEC gave an MMU to the PDP-11 -- three years after its
  birth. The alternative, which is to widen the virtual addresses,
  turns out to be much more expensive: it affects the width of the
  general registers and nearly every machine instruction. It took DEC
  another three years to introduce a PDP-11 with 32 bit virtual
  addresses. The changes were drastic enough to justify a new family
  name for 32-bit PDP-11, namely VAX (Virtual Address eXtension). On
  all modern architecures, the number of virtual addresses exceeds
  the size of installed memory, so the original incentive using an
  MMU disappeared.

- Relocation:
  If you want to load more than one program they have to be assigned to
  non overlapping memory ranges, leading to different origins.
  Addresses relative to the origin thus need to be adjusted. The MMU is
  exploited to avoid this by mapping identical virtual address ranges
  to different bus addresses. This is also called hardware relocation.
  Compare this to secondary boot programs, where the relocation is done
  by the assembler.

- Protection: 
  A program can only access memory that it can address. To protect
  memory (or device registers) from a process, don't let the MMU map to
  it.

2.1.1 The MMU of the PDP-11
The virtual address range is devided into eight  8K pages.  The mapping
of a page is controlled by its page address register and its page
description register (PAR0 - PAR7 and PDR0 - PDR7). The page address
register contains a click number, a click being a range of 64 bus
addresses. The 64 byte click can be viewed as the unit of memory
mapping and allocation.

The PDR contains the following subfields:
size	number of clicks in lower part less one, range: [0, 128).
	the lower part is [0, size], the upper part is [size, 128).
upart	A boolean, meaning upper (if true) or lower part of page is mapped.
racc	A boolean, meaning read access allowed.
wacc	A boolean, meaning write access allowed.

With MMU turned off a hardwired mapping is in effect, which maps the
first seven pages to the identical bus addresses, and the last, the I/O
page, is mapped to the last page of bus addresses. All addresses grant
read/write permissions.

Exercise:
Describe the contents of the paging registers such that the MMU
emulates the hardwired address mapping.

Just as there are two stack pointers, there are two sets of paging
register, one active in kernel mode, one active in user mode.

There are two instructions that let you access words as if in previous
mode(PM):

	move to previous space (mtpi)
	move from previous space (mfpi).

These instructions take one operand specification. They pull respective
push a word using the current mode stack, and write respective read the
operand.  If the operand is a memory word, its address is translated
using the paging registers from the PM, if the operand is the stack
pointer, the one active in previous mode is accessed.

Exercise:
When executed in user mode, the RTI instruction won't set the PM field
to kernel mode when restoring the PSW. Why is this important for Unix?

After power on, all 32 paging registers are set to zero.

The MMU executes a "segmentation fault trap" if memory is accessed
through virtual addresses that are not mapped or don't have the
appropiate read/write permissions.

2.1.2 The storage segments of a machine program.
A machine program as stored in an a.out format consists of three segments:
- A text segment, containing the program code.
- A data segment, containing explicitly initialized data.
- A bss segment, representing implicitly initialized data. 

The data represented in the bss segment will all be initialized to zero
when the program is loaded. Since its dull to store zeroes, the bss
segment is not written to the a.out file.

The C language puts string constants, global and local static variables
into these segments. The location of these data is fixed for the
lifetime of the program. On the contrary, storage for local nonstatic
variables is allocated dynamically on the stack. Storage to these
variables is allocated when the subroutine is called and freed when the
subroutine returns.

The data and bss segment are seperated only in the a.out structure.
When the program is loaded, there is only one combined data segment.

The a.out format specifies two types of programs, the type "executable"
which is the format of the boot and standalone programs and the type
"pure executable". Text and data segment of an ordinary executable is
layed out contiguously in terms of its virtual addresses, whereas the
data segment of a pure executable continues at the next page boundary
after the text segment.  The pure layout lets you control the mapping
of the data segment independently from the mapping of the text segment.

Exercise:
Three pure executables have text segments with the sizes 
	a) 16K-2
	b) 16K
	c) 16K+2
Where do the data segments start?

The machine programs of both types of executables are built relative to
origin zero.

The a.out header contains the sizes of each segment. They can be printed
by the size(I) command. For a small V6 kernel it prints
	23460+1382+15438=40280 (116530)
These decimal numbers are the sizes in bytes of the text, data and bss
segments and its sum. For the octal addict the sum is given in octal
notation as well.

The file(I) of V6 command prints "executable" and "pure executable" to
indicate the type of the file.

2.1.3 Kernel mode address mapping
The kernel, like all standalone programs, is an ordinary executable.

Before the kernel turns on the MMU, it needs to set up the MMU kernel
mode paging registers.

Exercise:
Guess what happens if the MMU is turned on but the pageing registers were
left as they are after power on?

The mapping of all but page six is set up to emulate the hardwired
mapping.  From page six, only the lower 1K addresses are mapped to
memory. This is allocated to the "user block", which contains per
kernel process data, namely the "user" structure (290 Byte) and the
kernel stack. Despite its name, the user block is addressable only in
kernel mode.

Exercise:
What do you think is the initial value of the kernel stack pointer?

A user block is allocated to each process. During a context switch,
PAR6 is updated so it points to the user block of the next process. The
other kernel mode paging registers are not modified after
initialization. Text and data segment are shared among the kernel
processes.

The part of the user state that needs to be restored on return to user
mode is saved on the kernel stack, that is, in the user block.
Furthermore, copies of the user mode paging registers are kept in the
user block. In course of a context switch, these registers are
reloaded.  This way, the contents of the user block control which of
the user processes will be continued by the RTI instruction.

2.1.4 User mode address mapping
As opposed to kernel mode, storage is not shared in user mode.  This
keeps the hard stuff related to shared memory confined to kernel code.

Starting with page 0, the addresses are mapped to include just enough
clicks for text and data. For ordinary executables, the PDRs are set to
map the lower part with read/write permission. For pure executables,
the text segment is mapped with read only permission, the data segment
with read/write permission. Since a text segment is not modifyable, it
can be shared among processes executing the same program, without
introducing shared memory complications to user land.

In both types the upper part of page 7 is mapped to memory allocated to
the user mode stack. Initially, 20 memory clicks are allocated to the
user stack. Naturally, read/write permission is turned on for the
stack.

Exercise:
Determine the initial value of the user stack pointer.

Since addresses just below the stack are not mapped, a stack overflow
will effect a segmentation fault. The trap routine then tries to
allocate more memory to the user stack; reprograms the user paging
registers to account for the larger stack, undoes any modifications to
the registers that were side effects of the trapped instruction and
returns to the user process, with the PC pointing to the offending
instruction, thus reexecuting it with a greater stack.  The MMU
supports this task by leaving the PC of the trap causing instruction in
a special MMU register.

Exercise:
Determine the initial value of the size field in PDR7.

To allow for dynamic storage allocation, Unix provides the brk(II)
system call. It moves the break between mapped and unmapped addresses
effectively changing the size of the data segment.