DOS System Architecture - Part 1


The original IBM PC was based on the 8088 processor, released by Intel on June 1, 1979. Although a 16-bit processor internally, it used an 8-bit external data bus. The 8086 had actually been released on June 8, 1978 before the 8088, but at the time its cost was prohibitive for mass-production. IBM adopted the x86 platform as a result of a manufacturing agreement with Intel to allow IBM to produce the 8086 for their Displaywriter typewriter. In exchange, Intel was granted rights to use IBM's bubble memory technology. The marketing strategy for the original PC was not just to release an innovative product but also to keep the purchase price down. In order to meet this, IBM decided to go with the 8088. By using the 8088, IBM took an effective cut in processor speed by 20% vs. the 8086, but the objective of lowering the system cost was achieved.

The x86 processor design used a addressable memory stack of 1MB. The 1MB memory stack was 20 bits long, with the ability to address 1,048,576 bytes. Because the processor was based on a 16-bit internal address, this left 4 bits unresolved. To work around the 16 to 20 bit addressing problem, the memory segment and offset design was implemented. This memory space was divided into two sections - the memory segment, and the memory offset (the two are written with a colon between them as a separator). The segments were 64k in size, and the offsets were individual memory addresses within each segment using hexadecimal values. Using hexadecimal (a single hex digit is a representation of 4 bits), the entire 20-bit memory space could then be addressed with the bottom being 00000H and the top being FFFFFH.

When the 8086/8088 processor generates a segment address, it is stored in a register. There are four basic registers to the design: the CS register (for Code Segment) holds the addresses of the next instructions to be executed in the active code, as pointed to with the Instruction Pointer, or IP:. The DS register (for Data Segment) points to the memory segment that contains the static variables and overlay code for the program. The ES register (for Extra Segment) is used to move transient code between the CS and the DS. The SS register (for Stack Segment) points to the address of the active memory stack (the SS and DS registers are the same, but the SS uses a pointer called the Stack Pointer (designated as SS:SP) that serves as a "gas gauge" within the filling and shrinking space of the DS as code is processed.

There are also four general purpose registers in addition to the CS, DS, ES, and SS. These additional registers hold 16-bit values ("words" are 16-bit double-byte values). These registers are the AX, BX, CX, and DX registers. As the processor works on instructions, the high order or low order byte value is used. These general purpose registers are split to identify the high and low values - each one has two sections for high and low, i.e. the AX register has AL and AH subsets. DOS uses these for interrupt services on the processor, with the operating system code making system calls for file, disk, memory, and device control by moving specific values in and out of the registers. This information can then be passed to a calling program to identify the DOS version the system is running (some programs need specific versions). The processor is interrupted from its current cycle to service the DOS request, which returns the version to the application. There are more than 70 interrupt functions used by DOS.

Independent of DOS is system code built into the IBM PC hardware BIOS in read-only memory. It is a collection of routines and system calls responsible for the primary startup phase of the system initialization, handling the bootstrap, hardware interrogation and error checking, initializing video graphics with a built-in 128 character ASCII set, maintenance for the system date and time, and related low-level hardware checks. While the 8086/8088 BIOS is machine-specific, the portability of DOS programs is assured by IBM-clone BIOS chips using almost identical hardware routines. DOS has a subset of I/O handlers that compensate for the differences, thus making the applications portable across machine platforms.

The BIOS requires hardware setup information to be stored for access whenever the system starts. In the original IBM PC this was accomplished with a diagnostic diskette and small DIP switches to define the hardware configuration. The system BIOS contains ISR's (Interrupt Service Routines) that perform all of these hardware-primitive functions. When the system is powered on, the BIOS creates a vector table, a 1024 byte map for the service entry points for the BIOS interrupts. The BIOS also creates a 256 byte map known as the BIOS Data Area. This contains references to the number of ports, installed drives, video, keyboard flags, drive motor status, error codes, etc. and is referenced for the actual system configuration.

With the advent of the IBM PC-AT series, the system BIOS was upgraded from the reference disk/DIP configuration to a 64 byte memory chip called CMOS (Complementary Metal Oxide Semiconductor - originally part of the clock chip). The CMOS was powered by an on-board battery on the PC motherboard that provided a low voltage state to maintain the memory information when the system was turned off. This greatly enhanced the computer's ability to manage the hardware configuration. The biggest improvement was the ability to store the time and date (prior to CMOS, the time and date would always be set to January 1, 1980 12:00am on the original PC/XT models, requiring manual setting each time the machine was powered on).

With configurable hardware settings now available, the system setup could be altered through the keyboard vs. booting a reference disk whenever a hardware change was made, and the changes would be passed along to DOS whenever a software interrupt to the processor for device services was initiated. With the continued development of newer hardware, older PC BIOS could not keep up because the interrupt routines did not communicate with the new designs. This led to manufacturers offering BIOS upgrade chips that were built with newer interrupt services built in, or OEM vendors would ship software device drivers to compensate. These drivers would be loaded by DOS at startup to recognize and communicate with the new devices.

Later versions of CMOS are menu-driven, and able to control distinct areas of system performance, including memory speed, device ordering, caching, integrated port characteristics, and more. As modern system board chipsets became more complex, BIOS options grew considerably, with control over a variety of advanced settings. Thus, the system BIOS expanded from a ROM programmed by a diskette, to a EEPROM that could be reprogrammed, to the modern Flash BIOS of today that can be programmed through the motherboard. Through each of these advances, DOS has remained the intermediary between the application and the system, changing with each version to reflect the improved support for new hardware and application features.

When a computer is powered on, the system BIOS initiates a pre-operative system test (POST). This routine first checks the CMOS for system configuration, then it checks the video BIOS on the graphics controller to make sure video memory and the graphics processor are initialized and ready. The BIOS then interrogates the value at address 0000:0472h - if the value is 0000h the BIOS assumes a cold boot and proceeds to continue with the POST. If the value is set for 1234h, the BIOS recognizes a warm boot and skips the remaining tests. The cold boot sequence continues with checking the memory by running a read write test against memory addresses, tests the serial and parallel communications ports, then it searches for input devices, verifying the keyboard first. The BIOS then checks the system board buses for installed cards, and tests their communication.

At any point during this initial system check, if an error is detected the BIOS will issue a beep code through the speaker. If everything passes, the monitor will display system details about the BIOS version and the video system, followed by a display of the attached peripherals detected (in post-IBM XT machines). Once the POST has completed, the system CMOS is once again referenced, this time to identify boot devices, which are then enumerated as first device 0x80, second device 0x81, etc. The BIOS then attempts to initialize the first device identified as a boot device to load an operating system. If the first device fails to load, the second device in the list is attempted. If the BIOS is unsuccessful in locating a boot device from the list in CMOS, the boot process will fail and the system will halt.

If the BIOS is successful in locating a boot device, the first sector of the disk is read into memory address 7C00H - this first sector contains the master boot record which contains the partition information and the bootstrap loader. The loader is a primitive routine that loads the basic I/O code into memory address 600H. It self-checks the data, and returns an error displayed on the monitor as a non-system disk if it does not recognize the code and prompts the user to insert another. Once the bootstrap on a DOS partition has taken place, the first DOS file that handles basic input and output is loaded. The partitions are identified in a hierarchy of order: primary partitions first, logical (or extended) partitions second. The first physical partition identified becomes C:, the second D:, etc. with the caveat that physical partitions take precedence over extended ones.

The system then loads the extended partition table (if found) to query and verify all logical partitions present and recognizes them for use. At this point, the primary partition that is marked active is booted and the operating system code is loaded. The first DOS file, IO.SYS (or IBMBIO.COM) continues the boot process. This is the layer that serves as the intermediary between hardware and software. This first file loads initialization code and device drivers necessary to implement the DOS FAT file system. This file also enumerates the disk partitions (unlike the BIOS routine that enumerates the physical disks). IO.SYS then loads the SYSINIT routine, releasing the memory used for the first stage of the BIOS routines to bootstrap the system, and it calls the MSDOS.SYS file.

MSDOS.SYS (or IBMDOS.COM or ZDOS.SYS) handles device I/O independently of hardware platform, handling high-level requests for random or sequential disk access and file access by handing off the requests to lower-level devices. Again, with the exception of IBM, which had Microsoft write DOS specifically for their hardware, all other OEM versions of MSDOS.SYS are virtually identical to each other, unlike the IO.SYS file, which is not hardware independent and is written for specific manufacturers. MSDOS.SYS also seeks out the CONFIG.SYS file in the root directory of the active primary partition, in order to load and initialize any device drivers listed for specific configuration. Once MSDOS.SYS has loaded and completed its initialization, it loads COMMAND.COM, the DOS command interpreter. COMMAND.COM is the file that presents the user interface, accepts commands, and processes the directives.

COMMAND.COM proceeds to split itself into two memory configurations - the resident portion and the transient portion. The resident portion installs itself immediately above the MSDOS.SYS file in low memory. This portion handles the error-trapping and processing of commands in batch files. It also handles the process of reloading the transient portion, which is used to interpret commands from user input as a work area vs. a resident code body. The Transient portion is loaded into the high end of the memory stack, and is occasionally overwritten by application programs using the same memory, hence its transient nature and the necessity to reload it.

The COMMAND.COM file is constructed with a number of commands built into it - these are known as internal commands and can be distinguished by the fact they are not stand-alone executables, nor do they require any disk read/write to run. These internal commands vary by DOS version, as some are enhancements. Once COMMAND.COM has completely loaded the resident and transient portions of its code, it checks the root directory for the AUTOEXEC.BAT file and executes any listed internal commands or applications using its built-in batch file handler routines.

At this point DOS is fully loaded and operational. Depending on the version, the actual loading time for items in the CONFIG.SYS and AUTOEXEC.BAT will of course vary, but a common set of drivers is typically for IDE CDROMs, serial devices, memory management, and disk compression.