ISP523 Home

ISP523

Fundamentals of Information Technology - Syllabus Spring 2006

Introduction to Personal Computers

Computer System Organization

As shown above, computer system hardware divides naturally into four major internal sections:

The central processing unit ("CPU"). The CPU contains the working registers and performs the arithmetic and logic functions.
The memory/storage. The memory contains instructions (software) and data and is divided into subsections (e.g., RAM and Disk) according to speed.
The user interface components (input/output). Display, mouse, keyboard, and printer are the focal point of interaction between users and the machine.
The network interface components. The network, if present, provides access to external resources that may be in the same room or anywhere around the world.

The physical arrangements may take many forms:

everything but the printer within a single case (e.g., a laptop);
everything but the keyboard and printer in a single case (e.g., an iMac/eMac);
everything but the keyboard, monitor, and printer in a single case (e.g., a standard desktop PC).
sometimes network interface components like a modem will be external to the main system case, and sometimes one or more disk drives will also be external to the main system case.

Our concern here is with the logical or conceptual organization of the computer system, which is shared by all modern personal computers, regardless of their external packaging.

Return to top

The CPU

The major internal data path of the computer system is called the "bus." It will typically consist of a few dozen parallel conductors, some to carry data, some (possibly the same ones at other times) to carry the "address" indicating the destination or origin of data, and several to carry "handshaking" signals that coordinate data transfers. Some computer systems include both a "system" bus for internal CPU data transfers and also an "I/O" bus for data transfers between the CPU and various peripheral devices. In systems with multiple buses, the main working memory may be located on either bus, or may indeed be accessible from both.

The "word size" of the system bus is often used to characterize a computer. It refers to the number of bits simultaneously transferred, which therefore implies a minimum for the number of wires in the bus. Wider buses are of course more expensive, because more connections have to be made among the system components. The original IBM PC used an eight-bit bus; modern PCs use 16- or 32-bit buses. Large-scale systems use 64- or 128-bit buses.

Most business programming is done in a higher-level language, such as BASIC, C, or COBOL. There is an advantage, however, to understanding something about the internal operations by which the statements of those languages are accomplished. The CPU contains the arithmetic and logic unit that executes the instructions, operating on the data, and the CPU also contains several "registers" that can store data. One register, known as the "program counter" (PC), is dedicated to holding the address in RAM of the next instruction to be executed. Other registers may contain, for example, data, the address of data, the value to be added to another register's contents to calculate the address of data, or the address of the memory location that contains the address of the data. The CPU's internal word size is the largest unit of information that a CPU instruction deals with "at once." Some computers have different memory word-size, bus word-size, and CPU word-size. Simple microprocessors typically have fewer registers, more of those registers are specialized, and their word size typically is smaller, than is the case for minicomputers or "mainframe" computers. Modern CISC microprocessors (such as the Motorola 68040 and the Intel Pentium - P4 is latest), and modern RISC microprocessors (such as the Apple/Motorola/IBM PowerPC - G5 is latest and the Compaq, formerly Digital, Alpha) are all quite comparable to mainframe computers in their word size and in the number of registers and their degree of specialization.

The instructions executed by the CPU fall into two general classes: data manipulation and instruction-sequence alteration. The data manipulation can be addition, subtraction, copying from RAM to a CPU register, or vice versa, etc. Sequence alterations involve such things as subroutine jumps and returns, branching to one of several points in a program based on the results of a previous operation, etc.. Each statement of a higher level language will typically be accomplished by several machine instructions.

Return to top

Memory

The three major types of computer memory are random access memory ("RAM"), disk, and tape.

RAM is the main working memory: CPU instructions can only be executed from RAM, software has to be loaded into RAM (from disk, or tape, or through a network connection) before the instructions can be executed, and data has to be loaded into RAM before it can be manipulated by those instructions. RAM is typically the fastest memory and the most expensive per character. Sometimes called "core" although magnetic cores are now rarely used, RAM is today typically made with semiconductor integrated circuits. Electronic RAM, unlike disks, tapes, and the older magnetic core RAM, loses all stored information promptly when power is turned off. All of the specific items of information stored in RAM are available equally rapidly, typically within a tenth to a hundredth of a microsecond.

RAM memory is organized into "bits" (the smallest unit of information, which can have the value of either zero or one), "nibbles" (a group of bits), "bytes" (an eight-bit nibble), and "words" (the largest unit of information that can be transferred between memory and the CPU in one step). Since the introduction of the IBM System 360 mainframes in the 1960s, the most common nibble size has been eight bits, known as a "byte." (Since there are 256 different 8-bit numbers, one byte is more than adequate to store a character of text.) Some computers permit addressing of parts of a word; the common case is to have word sizes that are multiples of 8 bits, but with each 8-bit "byte" addressable.

RAM and disk data are addressed with numbers whose values have a specific number of bits. For example, if 16 bits are used to address memory, there are two raised to the 16th power (65,536) different bytes that can be specified. Two raised to the tenth power is 1,024, so it is common to (ab)use the standard metric prefixes as shown here:

prefix     metric meaning       computer meaning

kilo                1,000                  1,024
Mega            1,000,000              1,048,576
Giga        1,000,000,000          1,073,741,824
Tera    1,000,000,000,000      1,099,511,627,776

Depending on the context, this introduces a modest ambiguity, because people rarely are explicit about whether they are using the prefix with its metric or computer meaning.

Magnetic disks are the common mass storage memory. As rotating memories, disks involve a variable delay time before the data "come around again", but then the information is transferred quite rapidly. Magnetic tape memory is the cheapest per character and slowest, often requiring many minutes to find the needed information.

Tape drives and disk drives both store information as permanent magnetization of the coating on a surface. The information density in bits per square mm is comparable, but on average the tape drive takes longer to reach a given file because the surface is so very long and narrow. The disk drive, like an LP record or CD, uses the outer two thirds or so of a circular disk. Unlike the LP, whose information is arranged sequentially along a single spiral groove, the computer disk has its information arranged on a set of concentric, circular tracks, each of which is divided circumferentially into sectors. The usual arrangement provides a single electromagnet ("head"), used both to write and to read the information on a given surface, that is moved from one radius to another, depending on the location of the needed information. This imposes a delay to position the heads at the correct radius ("seek time", typically 10 to 40 milliseconds), before waiting for the right sector to come around under the head ("rotational latency", typically 5 to 10 milliseconds for a hard disk, ten or twenty times that for a floppy disk, which spins more slowly).

The following summarizes the characteristics of several information storage technologies:

        RAM       HD    DAT tape    DVD   8xCD     ethernet    floppy		Flash (SD)
								(Single layer)
Type	volatile   NV		NV		 NV		 NV		  [NA]		  NV	  	   NV
delay  50 ns    15 ms    20 min    100MS   300 ms   0.01 ms     200 ms		  10ms
rate   60 MB/s   3 MB/s 0.8 MB/s  3 MB/s  1.2 MB/s   1 MB/s    0.5 MB/s	  1MB/s
size    2 GB    12 GB     2 GB    4.7 GB   0.7 GB    [NA]       1.4 MB		  1GB
cost $100/GB    $1/GB   $.5/GB    $.2/GB    $1/GB    [NA]       $.5/MB		$75/GB

     ns   = nanosecond = one billionth of a second
     ms   = millisecond = one thousandth of a second
     MB/s = megabyte per second
     MB   = megabyte
     GB   = gigabyte
	  NV   = non-volatile

The cost figures given for tape and floppy drives do not include the cost of the drive, only the cost of the removable media (diskette or cartridge). The delay and rate figures given for ethernet assume that a file server connected by those network technologies has so much RAM that the entire file requested is already in RAM, and that there is no competing traffic on the network. In real-world situations, of course, the delays will be longer, determined by events on the server unless the network is overloaded. Typical delays will be approximately equal to those for the hard disks the files are actually stored on, because the extra delay for the server software manipulations required to move the data through the network will be roughly offset by the fact that sometimes the data will already be in RAM. The data rates will almost always be slower than the raw network rate shown above (it is rare for an active network to provide even a third of its theoretical data transfer rate to a single connection), and therefore substantially slower than a hard disk. File servers accessed through dial-up internet connections are about 10-100 times slower even than Ethernet.

High-speed ethernet (100 Mbits/sec, instead of the usual 10 Mbits/sec) network interfaces are becoming more common. Their actual advantage can be up to twenty times faster, because they typically use separate wires for incoming and outgoing data ("full-duplex"), while regular ethernet typically uses the same wires for both ("half-duplex"). It is still the case, however, that the sustained throughput for a real network connection is rarely as fast as one-third of the raw network data rate.

A sector is the smallest amount of information that the disk drive can transfer at once; typical sector sizes are 256, 512, 1024, or 2048 bytes. The operating system software can be designed so that the smallest amount of information transferred in normal file activity (the "block" size) is any fraction or multiple of the disk's sector size; usually, however, the sizes are in ratios of small integers, most often being equal. Operating systems will typically permit only certain sector sizes to be used by disk drives to be used with the operating system.

Return to top

User Interface Components

The user's input to the computer is typically through a keyboard (often with mouse or trackball for graphics-oriented input), and output from the computer is typically on a video screen or printer. Modern personal computers and engineering workstations incorporate the keyboard and screen as part of the computer system. Minicomputers, mainframes, and supercomputers, on the other hand, usually are operated with the keyboard and video screen as a distinct, separate "terminal," or are used from a personal computer or workstation connected to the shared system through a high-speed network. When a network-connected personal computer or workstation is used in conjunction with such central computers, it will often be running a "terminal emulation" software package that permits it to present to the user and to the central computer the appearance of such an isolated terminal.

Video displays are characterized by the number of picture elements ("pixels") horizontally and vertically, by the number of different colors each pixel can be, and by the number of times per second that the display is refreshed. In some situations, the limiting characteristic will be the rate at which the displayed information can be changed. The original VGA display specification provides 640 pixels horizontally and 480 vertically. This is, respectively, slightly more and slightly less than a standard American ("NTSC") broadcast television signal (535 each) and much better than the signal that can be recorded and played back on a home VCR (about 250 horizontally and 535 vertically). NTSC video signals, including both broadcast signals and those played back from a VCR, include only about 100 pixels of color information per row (each of them wide): the brightness can change about three times as suddenly as the color. For television viewing from normal distances, this coincides with the capabilities of human vision.

The standard VGA display provides for up to 8 bits of color information, that is, up to 256 different colors, at any one time. IBM and compatible PCs usually operate with those 256 different colors being selected by storing color values into a color data table within the display interface electronics. Different versions of VGA displays permit those color values to have different numbers of bits, and hence to be chosen from a "pallette" of thousands or millions of different colors. At any one time, only 256 different colors can be displayed, but which 256, from among the larger available pallette, can be changed under program control by storing different values into the color data table. The various forms of "Super VGA" provide more pixels horizontally, more pixels vertically, more simultaneous colors, a larger pallette, or some combination of those features.

All standard Macintosh color displays operate with a fixed pallette, so that at all times the display will have the same 16 colors, or the same 256 colors, etc., and all Macintosh models set for a particular number of colors will have exactly the same colors available.

Human vision is capable of discerning about 40 different shades of intensity for light of any given color. If allowed to adapt to very dim light levels, the chemical reactions used by the retina change (it takes about fifteen or twenty minutes to become fully "dark adapted"). In that case, the sensitivity of the eyes is about one million times what it is in bright sunlight, but still only about 40 different shades of intensity can be distinguished. Five bits provides thirty-two light levels and six bits provides sixty-four light levels; allowing for separate control each of red, green, and blue intensity, we can see that at most 18 bits are required to produce a color image that the human eye will see as indistinguishable from perfection. Since computer systems are routinely constructed to deal with 8-bit bytes, the most common high fidelity color displays use 16-bits ("thousands" of colors) or 24-bits ("millions" of colors).

The human eye's ability to discern detail is such that text and images printed with 600 pixels per inch are visibly superior to ("sharper" or "crisper" than) those printed with 300 pixels per inch. Some printers (particularly ink-jet models) are capable of more pixels per inch in one direction than in the other. For the most part, the human eye and brain will perceive the image quality based on the worse dimension. There is some, limited, benefit in perceived image quality from improving only one dimension, with diminishing returns. It is rarely worth the cost to improve one dimension beyond the point where it has twice as many pixels as the other.

Video displays usually provide between 75 and 100 pixels per inch. Therefore, a modern printer will provide between 30 and 150 times as many pixels per square inch as a video display provides. This is another one of the reasons why on-screen viewing is often less satisfactory than reading from printed output.

The amount of information required to represent an image (when stored on disk, sent through a network, or sent to a printer), is the product of the number of pixels horizontally, times the number of pixels vertically, times the number of bytes required for each pixel. For example, a 2-inch by 3-inch photograph, displayed with 256 colors at 75 pixels per inch in both dimensions, will contain 33,750 bytes, but a 4-inch by 6-inch photograph, also displayed with 256 colors at 75 pixels per inch in both dimensions, being twice as large in each dimension will contain four times as many bytes: 135,000. Using "thousands of colors" (16-bit) requires twice as many bytes, and using "millions of colors" (24-bit) requires three times as many bytes.

We can see why a lot of effort has been put into inventing image "compression" techniques (such as the GIF and JPEG algorithms used by most Web browsers, including Netscape and Internet Explorer) that permit an acceptable rendition of an image to be stored and transmitted using far fewer bytes. The basic concept is simple: it only takes a few bytes to say mathematically the equivalent of "this whole region here is pure blue," no matter how big the region is. Most graphic images contain regions of uniform or gradually changing colors, so that mathematical descriptions can be quite terse compared to simply listing out the strengths of the red, the green, and the blue components of each pixel within the region.

Return to top

Network Connections

If terminals are within a few thousand feet of the computer, the cheapest type of connection is usually either direct wiring (serial port connections) or a local area network. For much longer distances the two common methods are a dedicated wide area network and using the telephone system. Network connections are generally more expensive, but permit much higher data transfer rates. In order to use the phone lines one must translate the signals from the computer and terminal into forms the phone system can transmit reliably. The devices at each end that do this are known as "modems", a contraction of "modulator-demodulator", and may be completely separate from the computer or terminals, or may be part of either. Modems used to include microphones and loudspeakers and work by placing the telephone headset against those parts of the modem (hence the description "acoustically coupled"). Modern modems connect directly to the telephone wiring.

Return to top

Predicting System Performance

For any given task, system performance is usually limited by one primary "bottleneck." In some situations, a secondary bottleneck would impose a limit only slightly better than the primary bottleneck, so that very little improvement can be obtained by changes that just remove the primary bottleneck.

Conceptually, there are two kinds of bottlenecks:

A subsystem (e.g., CPU, RAM, disk, display, network).
A data path between two subsystems (e.g., system bus, SCSI I/O bus).

Benchmarks

A program designed to exercise a computer system in a reproducible way to measure performance is called a "benchmark." Because the combination of activities in the benchmark will have a major impact on the measured performance, benchmarks must be designed and chosen with care and the results interpreted with caution. The system performance on the same benchmark can be significantly changed by using different parameters for the compilation and linking of the code (for example, many compilers include an option to check each reference to an element of an array to be sure that it is calling up data from a memory location within the region allocated to that array; such checking enhances the integrity of the software, but does slow it down).

In general the only reliable benchmark for predicting system performance with your workload is testing performed with your software. Even that can be difficult if the system in question needs to perform its tasks when being used by hundreds of different people doing a variety of different things. Typically, such tests are performed by writing special software to run on other machines attached to the network, permitting them to emulate network-connected users, with random delays between "keystrokes," simulating an interactive load.

: This benchmark is appropriate for scientific and engineering workloads on workstations. The related SPECrate is appropriate for scientific and engineering workloads on multi-user systems. The old version of this benchmark used the original VAX (VAX-11/780) model as its standard unit - typical numbers on that scale are now over 200 for the best machines. More recent versions use separate integer-intensive and floating-point-intensive workloads, and report two values. The relative rankings of different machines may be reversed, since some computer architectures emphasize integer performance and some emphasize floating-point performance.
: A variety of benchmarks exist for transaction processing workloads. The simpler ones provide measurement of "transactions per second" (TPS), while the more complex provide measurement of "transactions per minute" (TPM).

Return to top

MIPS

"Millions of instructions per second" or "Meaningless indicator of performance statistic": the primary difficulty is that different types of computers have different sets of usable machine language instructions, and different relative speeds for the different instructions.

A secondary difficulty with the use of MIPS to predict performance is that in many situations the speed of the CPU is not the limiting factor on system performance. We go into more detail in other sections, but other possible bottlenecks include the following:

The delay before data starts coming back from disk drives,
The rate at which data is returned from disk drives once it starts coming back,
The delay in retrieving information from system RAM,
The rate at which data can be sent through network connections,
The speed with which the screen display can be updated.

MIPS is useful only when the two machines being compared are of very similar design: same machine language, same types of input/output circuitry, same types of peripheral devices, etc., and are expected to perform very similar work.

Clock Speed

If two systems have the same type of CPU (both Pentium III, both PowerPC G3, etc.) then comparing their clock speeds provides a good first approximation to the relative speeds of the systems for CPU-bound tasks. Several factors can complicate the comparison.

CPU-external vs CPU-internal: the internal speed of the CPU may be a multiple of the external clock signal. For example, so-called 486 DX2 chips operate with an internal speed twice that of the external clock. (You can readily guess which number will appear in advertisements.)
CPU vs primary bus: the CPU may operate at a speed that is a multiple of the speed used by the primary system bus, the interconnection between the CPU, RAM, and I/O interface circuitry.
CPU vs memory: a common high-end CPU clock speed is 2.5GHz, while a memory bus may operate at 400Mhz or less. Therefore many CPU cycles can be wasted in idle "wait states" until a memory read or write is completed. See the discussion of memory cache, in the next section. The memory bus speed is a very imporant, often overlooked measurement in considering a PC.
CPU vs I/O bus: in systems that have a separate I/O bus, its speed may be different from that of the CPU.

Return to top

Enhancing System Performance

System performance enhancements typically address one or more bottlenecks.

System Interconnect: Bus or Switch

The conventional personal computer has a primary system interconnect that is a "bus." A bus will consist of a number of parallel wires, some for data, some for address, and some for handshaking (coordination). The bus will be shared by all of the devices, and data being transferred from RAM to CPU, for example, will prevent simultaneous transfer of data from Disk to RAM.

An alternative technology, based on telephone central office design, is known as a "crossbar switch." With such a switch, there are multiple paths, so that any two devices on the system interconnect can exchange data without that activity blocking an exchange of data between any two other devices. Such switches are expensive, and have been used primarily in large-scale time-shared or server systems. It is possible that over the next few years they will become more common, as speed increases in other areas leave the system interconnect as the remaining bottleneck.

Cache

Cache memory addresses bottlenecks in the RAM subsystem and bottlenecks in the data paths between RAM and the CPU. A memory cache is a small, very high-speed memory that is connected to the CPU through a very high-speed bus. The cache stores recently accessed data, keeping track of the data and of the address from which it came. Since it is common for software to contain loops, which re-access the same memory locations after small numbers of intervening instructions, a modest size cache can often provide requested information (instructions or data) promptly without having to go out on the system bus to fetch it (slowly) from the RAM.

The original IBM PC's CPU operated at 4.8 MHz (208 ns/cycle), with RAM that responded in 150 ns, so that the CPU never had to wait an extra cycle. A modern CPU may operate at 400 MHz (2.50 ns/cycle), faster by a factor of at least 80. That modern CPU will, however, be connected to RAM that responds in 10 ns, faster by just a factor of 15. Thus, the modern CPU will routinely have to wait until the fourth or fifth cycle before the information is available from RAM. There is every reason to expect that these trends will continue: CPUs will get faster by leaps and bounds, and RAM will get faster too, but much more gradually. Therefore, we can see that memory cache technology is already (and will continue to be) vital to achieving optimum performance from personal computers.

Modern CPU chips usually have some cache designed into the chip itself; this is known as "primary cache," or "level 1 cache." Some CPU circuit boards include cache memory; this is known as "secondary cache," or "level 2 cache." The expensive part of the cache is the very high speed RAM. Often the CPU will be designed to include the circuitry that keeps track of the data, the cache controller, but with an empty socket for the cache RAM to be added as an optional upgrade.

An even more expensive technique is to install the L2 cache on a dedicated bus, operating at much higher speeds than the system bus. In the most extreme case, such a "lookaside cache" will operate at the same speed as the CPU itself, so that there are no wait states for data retrieved from the L2 cache. The expense occurs because the L2 cache RAM needs to be very fast and because the CPU has to be constructed with interface circuitry to drive both the lookaside cache bus and the regular system bus. This technique is being used with Compaq (formerly Digital) Alpha systems, and with Apple Macintosh PowerPC G3 and G4 systems.

There are diminishing returns for increased cache sizes, but with a typical system design the cheapest performance boost is likely to be the installation of a 256 KB or 512 KB secondary cache. The Macintosh operating system running on PowerPC chips benefits particularly from larger cache sizes, because so much Macintosh software is still designed for the original Motorola 68000-series CPU, and runs on the PowerPC by being interpreted by a software emulator of the 68000. Those systems will benefit from having a cache that is large enough to contain both the entire emulator and also a significant portion of the application code.

The speed advantage provided by a cache will depend on the software being used and the type of work done. The cache provides more benefit if the same portion of RAM is repeatedly referenced for a while, then another portion, and so on. The cache provides more benefit if the references to a given address are mostly reads, with relatively few writes, because any modified value has to be written "all the way back out to RAM," and that still takes the full 10 - 50 ns. Some brave system designers use caches that pretend to the CPU that they are done writing to memory as soon as the value is stored in cache, so that the CPU can go on to its next step while the memory circuits complete the process of storing the new value into RAM.

Return to top

Disk Drives

In many situations the performance of the disk drives will be the limiting factor. There are five primary methods for improving disk performance:

: Simply buying a faster disk drive will improve performance unless other bottlenecks are involved. This is generally a reasonable approach for upgrading older systems, since disk drive technology continues to improve. A larger disk drive of the same nominal speed will perform faster when used with the same data as a smaller drive, because all of the data will be close together at one end of the larger drive, so that its heads do not have to move as far (the same "seek time" specification will provide an appreciably faster actual average seek delay for the partly empty larger drive).
: Faster data transfer between the drive and the interface (e.g., SCSI vs fast SCSI vs wide SCSI vs fast, wide, differential SCSI, vs ultra SCSI) is more likely to be an issue for large installations with many disk drives on the same controller.
: A disk controller that uses direct memory access ("DMA") is more efficient. Ordinary disk controllers require CPU action to move each byte from the disk controller to the appropriate location in RAM (or vice versa). In early personal computers this was a reasonable choice. Now, however, with background printing, for example, being a common activity, the ability to have multiple activities in progress at the same time means that overall speed in accomplishing tasks with the computer can be enhanced by using disk controllers that can be told by the operating system to transfer the information stored in a particular disk block into a specified range of RAM addresses, and then notify the CPU when the whole block has been transferred, instead of notifying the CPU as each byte becomes available. DMA disk controller technology has been around for years in time-shared (server) systems.
: Redundant Arrays of Independent Disks ("RAID") technology provides for larger capacity, higher data transfer rates, or lower failure rates than single drives can provide. A single RAID controller connects multiple drives to the computer system, presenting the appearance of a single drive with improved characteristics. RAID technology is appropriate for some applications of personal computers (video editing systems, for example) and for many time-shared (server) systems.
: Just as CPU speeds are increased by installing a memory cache between RAM and the CPU's internal registers, so disk drive performance can be increased by installing a RAM buffer between the magnetic media and the CPU. This can be accomplished by the operating system (as Windows, the Macintosh, and most servers do), using system RAM, or the disk controller or the disk drive can have cache RAM installed within them. Care must be taken to ensure that all information written to the disk is actually recorded in the permanent magnetization of the data surface before power is removed from the drive. For this reason, many systems use the RAM cache only for reading from the disk, and do not signal completion of the write operation until the information has actually been recorded.

CPU categories

CISC and RISC

Some systems have machine languages that include single instructions that can perform multiple actions (so-called "Complex Instruction Set Computers," CISC) and others have machine languages that include only single-action instructions (so-called "Reduced Instruction Set Computers," RISC). A RISC machine operating at a higher number of instructions per second may still be slower to get work accomplished, because, on the average, it takes more instructions to accomplish any specific task on a RISC machine.

In general, at any given cost level it is possible to design the electronics so that RISC CPUs can execute so many more instructions per second that they will outperform CISC designs of comparable technology. This includes the effects of the greater ease with which compilers can be written for the simpler RISC architectures. Even though, in principle, a CISC design would require fewer instructions to accomplish the same task, in practice, reliable compilers for CISC machines rarely actually generate machine language code that takes full advantage of those more powerful instructions.

Scalar vs vector vs superscalar

A scalar computer completes the execution of each instruction before it begins the execution of the next instruction.

Vector computers, such as the traditional Cray supercomputers, perform the same instruction on multiple items in an overlapping sequence. For example, 16 values stored in consecutive memory locations can be loaded into 16 internal CPU registers (for later arithmetic operations) by a single instruction, with the various stages of execution overlapping so that the time to complete loading the 16 values is much less than 16 times the time that would be required to complete the loading of one value.

Superscalar computers, such as the Digital Alpha AXP, can perform two or more different instructions, on different data, at the same time.

Multiprocessor CPUs

Most personal computers have a single general-purpose CPU. Within the last few years a few personal computers have appeared on the market (primarily intended for use as local area network servers) with two or four CPUs, but for many years, time-shared (server) systems have been available with multiple CPUs. There are four categories of multiprocessor systems:

: The hardware and operating system can be designed so that one CPU is the primary CPU and takes responsibility for all I/O operations, while the other CPUs continue to execute compute-bound jobs. This is known as asymmetric multiprocessing. In general-purpose systems the master CPU becomes a bottleneck, so that the other CPUs are often idle.
: With careful design of hardware and operating system, all the CPUs can respond to all types of events in the system. This is known as symmetric multiprocessing (SMP). It can be more efficient than asymmetric multiprocessing, but takes longer to design, and can be more difficult for the developers to debug.
: These systems feature hardware and software that permits the multiple CPUs installed in the backplane to be configured as several smaller SMP partitions, cooperating with each other (a "cluster"), communicating through the backplane (much faster than standard network connections), and with the allocation of CPUs among the partitions being modifiable "on-the-fly" (hence the "adaptive" description). In 1999, Compaq started selling ASMP systems using the Alpha processor and running OpenVMS (code name "galaxy"), and they have announced their intention to start selling larger, higher-performance ASMP systems in 2000 (code name "wildfire"). This high-end server technology is not likely to directly impact personal computers for quite some time.
: The extreme case of SMP is a massively parallel computer, which may have over 1,000 CPUs. Massively parallel computers appear to be excellent tools for some types of computation, especially in the scientific and technical fields. As their software evolves, it is possible that within the next several years they will become more generally useful than they are now.

Getting efficient use out of as many as a dozen CPUs, even in an SMP configuration, is still an experimental art. There will be "overhead" to perform the operating system instructions required to coordinate among the multiple CPUs. One example of that overhead is known as "cache coherency": if each CPU has a cache, and one writes a changed value of the information at a particular address, the other CPU's caches may contain different values, so that inconsistent results would be obtained in later operations. Dedicated electronics and operating system routines must cooperate to prevent such problems.

In general, for an interactive multi-user load, the "overhead" of coordinating among the multiple CPUs will require at least 5% of the total CPU capacity of a two-processor system. Most operating systems have not been designed to support SMP, and display worse overhead than this. Furthermore, as each additional CPU is added, a smaller and smaller percentage of that added capacity is available for real work. The size of the overhead, and the way it increases with CPU count, are unforgiving measures of the quality of an SMP operating system.

Return to top

CD-ROM

CD-ROM drives provide two primary features for personal computers.

Multimedia: sound and video clips require large data files and moderately fast data transfer, but are often used in situations that can tolerate a delay of a large fraction of a second before the data starts to arrive. This is an ideal match to the characteristics of CD-ROM drives.
Software installation: operating systems and major applications have grown so large that they can require dozens of floppy disks. Using CD-ROMs instead, provides two primary benefits: first, large capacity and good speed for installing the software, and second, durability for re-installing the software at a later date if some problem should arise.

Backup Devices

Information stored on magnetic media, such as hard disk drives, can be altered or destroyed (either intentionally or by accident). "Anything worth saving is worth saving a backup copy of." There are two critical issues with data backups:

In order to provide protection of important information, the backup copy needs to be stored at a sufficiently great distance that no reasonable catastrophe will destroy both the original and the backup. At the very least, that means keeping the backup in a different building.
Backup policy and procedures should be formulated with the understanding that people simply won't make backups of their own files unless it is very convenient.

The first point implies that the backups should be done with removable-media drives (floppy disks, cartridge disks, or tapes). The second point implies that personal computers should be provided with high-capacity, high-speed drives for backups (unless an adequately fast network is available and an operations staff is employed performing centralized backups).

: The two primary tape technologies for large backups are DAT and DLT. DAT is cheaper, smaller, slower, and less reliable than DLT.
: The primary manufacturer of removable cartridge disk drives is Iomega (ZIP and JAZ drives), now that SyQuest is out of business. The ZIP drive is substantially slower than a conventional hard disk, but much faster than a floppy. The JAZ drive is comparable to the speed of a conventional hard disk.
: Although these devices are somewhat slower than Jaz disks (and the CDs have a smaller capacity), their speed is adequate for many purposes, their media cost is lower, and they are less fragile. DVD-RAM, in particular, seems likely to be an attractive backup medium during the next several years. Be careful when buying DVD's - be aware of the differences in the formats + and - and the difference between Dual Format and Dual Layer.

Return to top

Operating Systems and Application Software

Introduction to Personal Computers

Computer System Organization

The CPU

Memory

User Interface Components

Network Connections

Predicting System Performance

Benchmarks

SPECmarks

Transaction Processing (TPS and TPM)

MIPS

Clock Speed

Enhancing System Performance

System Interconnect: Bus or Switch

Cache

Disk Drives

Faster Drives

Faster Connections

Direct Memory Access

Disk Arrays

RAM Cache

CPU categories

CISC and RISC

Scalar vs vector vs superscalar

Multiprocessor CPUs

Asymmetric Multiprocessors

Symmetric Multiprocessors ("SMP")

Adaptive Symmetric Multiprocessors ("ASMP")

Massively Parallel Multiprocessors

CD-ROM

Backup Devices

Tape

Removable Cartridge Disks

DVD-RAM, DVD-R/W, DVD +R, DVD -R, Dual Format, Dual Layer, CD-R, and CD-RW

Operating Systems and Application Software

Operating System Characteristics

Application Software Selection

Training

Vendors