DOS System Architecture - Part 2

DOS memory structure is divided up into three distinct regions: conventional (the first 1MB of memory, directly addressable by the CPU with the base 640k used for the operating system), expanded (using page frame addressing to extend beyond the 1MB barrier), and extended (beyond 1MB and directly addressed by 80286 processors or better in protected mode). As mentioned in the first section, when the system starts up, it creates a interrupt vector table (a map of service locations for hardware interaction). This table is loaded into the very bottom end of conventional memory starting at address 00000h through 003FFh. DOS is loaded next, starting at 00400h with the resident section of COMMAND.COM, the memory buffer space, and drivers loaded from the CONFIG.SYS file.

Once the operating system proper has loaded, the remaining available memory in the base 640k (known as the transient program area or TPA) is used to load the non-resident part of COMMAND.COM. At this point, any externally executed programs are run in the remaining space. If their memory requirements exceed the available space, the non-resident part of COMMAND.COM is overwritten with the program code. This transient portion of the command interpreter is rewritten to the memory space when the program that used it releases it upon termination.

Within the transient program area, the memory is divided by blocks. These blocks are segmented into 16 byte paragraphs which begin their addressing at absolute addresses that are multiples of 16. The paragraphs are indexed with a header that indicates the paragraph status. DOS checks these headers on each memory access to determine if the memory is available and valid. If the header returns an exit value it is deemed usable and program instructions can be loaded there. If, however, the header returns an error condition, DOS cannot use the memory paragraph. It then calls an error handler, which returns a memory allocation error. Paragraph allocation is based on three methods: bottom first, occupying the lowest starting address of a paragraph block large enough to service the request; top first, occupying the highest address of a paragraph block large enough to service the request; and optimum fit, using the smallest contiguous memory space identified.

There are two types of programs managed within the TPA. The .EXE files are loaded with a header variable that identifies how much memory must be available to the program in order to run - this is the MINALLOC field. If there is available memory beyond the minimum required, a second variable identifies how much the maximum memory space should be - this is the MAXALLOC field. These two header fields instruct the DOS loading process how to allocate memory for .EXE files. The second type of file is binary, identified by the .COM extension. These files, unlike .EXE files, are linear in memory use with no headers. Instead, they are allocated any free memory available from the first lowest paragraph upwards until it is used by another program or the operating system. Both .EXE and .COM files are essentially leased the entire available memory space within the 640k range to execute.

The remaining conventional memory above 640k was designated reserved and used to load ROM BIOS routines for additional expansion cards. Later versions of DOS included the LOADHIGH command, which allowed the loading of the APPEND, DOSKEY, DOSSHELL, GRAPHICS, KEYB, MODE, NLSFUNC, PRINT, and SHARE commands into the reserved memory space between 640k and 1MB. By loading these DOS programs high, more conventional base memory in the first 640k was available for programs to run. As program sizes increased, the need to provide more memory led to the development of expanded memory.

Expanded memory was designed to allow programs to go beyond the 1MB barrier of conventional memory and free them from the confines of the base 640k. In the 8086/8088 configuration, the expanded memory strategy consisted of an 8-bit memory board with an accompanying software driver to set up the expanded memory access handler. On 80286 and better systems with installed RAM beyond 1MB, the EMM driver was the only requirement. All x86 implementations of expanded memory are accomplished with the CONFIG.SYS file loading it as an installable device driver to create the page frame. The page frame is a section of memory in the reserved memory above 640k that maps to a logical memory stack. The 64k section of memory it occupies can be assigned in the CONFIG.SYS to avoid addressing conflicts with other drivers or BIOS routines in the reserved memory space. Device drivers that map to memory pages maintain their address ownership until the system is restarted - they do not release the memory space.

Like conventional memory, extended memory is able to directly address memory in a linear stack from 10000h up to 15MB. But unlike expanded memory, it does not need a software interface to address the segments. In order to let real-mode DOS programs run above 1MB, special interrupts are used by the BIOS to access it. The first section of extended memory is called the HMA, or High Memory Area. This is the first 64k of memory addresses beyond 10000h. On 8086/8088 machines, the physical address limit of 1MB was wrapped around back to the bottom of the stack, with a fixed limitation - these systems could not use extended memory. On 80286 systems and up, there was a special toggle called the A20 line. Segment:offset addressing is 20-bit up to 1MB, and a 21st bit is required to access memory beyond 1MB. In real mode, the A20 is turned off and the maximum allowable address range is 1MB using linear addressing. But by turning the A20 handler on, the address range can be extended into memory beyond 1MB by virtue of having the 21st bit available in protected mode (bits are numbered starting at zero, which is why the "20" is used for the 21st bit). The A20 allowed programs to load in the memory immediately above the 1MB boundary. As of DOS version 5.00, parts of the operating system can be loaded in the HMA. There is one exception: DOS can access the first 64k above 1MB in real mode on a 80286 because of a design issue with the CPU. The 80286 was built with backward compatibility for the addressing stack on the 8086. With a 21st address line available, the addressing wrap did not occur the exact same way as with the 8086, and this allowed DOS to see the HMA in real mode.

Technically, a segment of memory starts at what is known as a paragraph boundary. A paragraph is 16 bytes long, beginning at an absolute address that is a multiple of 16. This sets the last 4 bits (or single hex value) to zero. The easiest way to visualize this is to remember the bottom (or first) paragraph in the entire memory space is 00000H. Counting the number of paragraphs away from the bottom gives you the segment address. While this is a basic way to compute the memory segment, it is not practical. Instead, the segment is identified by what is known as the absolute address, which combines the segment and offset into a single value. Take the segment address and shift it left one digit in value. Then add the offset address for the absolute address. Example: segment address is 0070: and offset is 0020. Shift 0070 left one bit and it becomes 0700. Add 0020 to 0700 and the absolute address is 0720H.

DOS Memory Map

-------------- ------------------------------------------------
10000:FFFF Start of Unassigned Memory
10000:FFEF End of High Memory Area (HMA)
10000:0000 EXTENDED MEMORY START - Start of HMA
-------------- ------------------------------------------------
FFFF:FFFF UPPER MEMORY END - 1MB barrier on 8086
F800:0000 System BIOS - Available
F000:E000 ROM Extension - ROM BIOS Shadowing
F000:4000 ROM BASIC
F000:0000 Reserved ROM
E000:0000 End Upper Memory Unassigned
D000:0000 EXPANDED MEMORY Page Frame Start
C800:0000 Start Upper Memory Unassigned
C000:8000 ROM Hard Disk Control
C000:0000 ROM Extension - Video Shadowing
B000:0800 CGA Video
B000:0000 Monochrome Video
A000:0000 UPPER MEMORY START: Reserved ROM/EGA Video for PS/2 systems
-------------- ------------------------------------------------
9000:FFFF |BASE MEMORY END: Top of Transient Program Area (TPA)
          |COMMAND.COM transient reloader
          |Start of Transient Program Area (TPA) - TSR Utilities
          DOS Interrupt 22H, 23H, and 24H services
          COMMAND.COM resident code
          DOS Interrupt 21H services
0000:0600 IBMBIO.COM / IO.SYS
0000:0500 DOS Communication Area
0000:04F0 User Communication Area
0000:04AC Reserved
0000:0400 BIOS Data Area
0000:0000 BASE MEMORY START: Interrupt Vector Table
-------------- ------------------------------------------------