As shown above, computer system hardware divides naturally into four major internal sections:
The physical arrangements may take many forms:
Our concern here is with the logical or conceptual organization of the computer system, which is shared by all modern personal computers, regardless of their external packaging.
The "word size" of the system bus is often used to characterize a computer. It refers to the number of bits simultaneously transferred, which therefore implies a minimum for the number of wires in the bus. Wider buses are of course more expensive, because more connections have to be made among the system components. The original IBM PC used an eight-bit bus; modern PCs use 16- or 32-bit buses. Large-scale systems use 64- or 128-bit buses.
Most business programming is done in a higher-level language, such as BASIC, C, or COBOL. There is an advantage, however, to understanding something about the internal operations by which the statements of those languages are accomplished. The CPU contains the arithmetic and logic unit that executes the instructions, operating on the data, and the CPU also contains several "registers" that can store data. One register, known as the "program counter" (PC), is dedicated to holding the address in RAM of the next instruction to be executed. Other registers may contain, for example, data, the address of data, the value to be added to another register's contents to calculate the address of data, or the address of the memory location that contains the address of the data. The CPU's internal word size is the largest unit of information that a CPU instruction deals with "at once." Some computers have different memory word-size, bus word-size, and CPU word-size. Simple microprocessors typically have fewer registers, more of those registers are specialized, and their word size typically is smaller, than is the case for minicomputers or "mainframe" computers. Modern CISC microprocessors (such as the Motorola 68040 and the Intel Pentium - P4 is latest), and modern RISC microprocessors (such as the Apple/Motorola/IBM PowerPC - G5 is latest and the Compaq, formerly Digital, Alpha) are all quite comparable to mainframe computers in their word size and in the number of registers and their degree of specialization.
The instructions executed by the CPU fall into two general classes: data manipulation and instruction-sequence alteration. The data manipulation can be addition, subtraction, copying from RAM to a CPU register, or vice versa, etc. Sequence alterations involve such things as subroutine jumps and returns, branching to one of several points in a program based on the results of a previous operation, etc.. Each statement of a higher level language will typically be accomplished by several machine instructions.
RAM is the main working memory: CPU instructions can only be executed from RAM, software has to be loaded into RAM (from disk, or tape, or through a network connection) before the instructions can be executed, and data has to be loaded into RAM before it can be manipulated by those instructions. RAM is typically the fastest memory and the most expensive per character. Sometimes called "core" although magnetic cores are now rarely used, RAM is today typically made with semiconductor integrated circuits. Electronic RAM, unlike disks, tapes, and the older magnetic core RAM, loses all stored information promptly when power is turned off. All of the specific items of information stored in RAM are available equally rapidly, typically within a tenth to a hundredth of a microsecond.
RAM memory is organized into "bits" (the smallest unit of information, which can have the value of either zero or one), "nibbles" (a group of bits), "bytes" (an eight-bit nibble), and "words" (the largest unit of information that can be transferred between memory and the CPU in one step). Since the introduction of the IBM System 360 mainframes in the 1960s, the most common nibble size has been eight bits, known as a "byte." (Since there are 256 different 8-bit numbers, one byte is more than adequate to store a character of text.) Some computers permit addressing of parts of a word; the common case is to have word sizes that are multiples of 8 bits, but with each 8-bit "byte" addressable.
RAM and disk data are addressed with numbers whose values have a specific number of bits. For example, if 16 bits are used to address memory, there are two raised to the 16th power (65,536) different bytes that can be specified. Two raised to the tenth power is 1,024, so it is common to (ab)use the standard metric prefixes as shown here:
Depending on the context, this introduces a modest ambiguity, because people rarely are explicit about whether they are using the prefix with its metric or computer meaning.prefix metric meaning computer meaning
kilo 1,000 1,024
Mega 1,000,000 1,048,576
Giga 1,000,000,000 1,073,741,824
Tera 1,000,000,000,000 1,099,511,627,776
Magnetic disks are the common mass storage memory. As rotating memories, disks involve a variable delay time before the data "come around again", but then the information is transferred quite rapidly. Magnetic tape memory is the cheapest per character and slowest, often requiring many minutes to find the needed information.
Tape drives and disk drives both store information as permanent magnetization of the coating on a surface. The information density in bits per square mm is comparable, but on average the tape drive takes longer to reach a given file because the surface is so very long and narrow. The disk drive, like an LP record or CD, uses the outer two thirds or so of a circular disk. Unlike the LP, whose information is arranged sequentially along a single spiral groove, the computer disk has its information arranged on a set of concentric, circular tracks, each of which is divided circumferentially into sectors. The usual arrangement provides a single electromagnet ("head"), used both to write and to read the information on a given surface, that is moved from one radius to another, depending on the location of the needed information. This imposes a delay to position the heads at the correct radius ("seek time", typically 10 to 40 milliseconds), before waiting for the right sector to come around under the head ("rotational latency", typically 5 to 10 milliseconds for a hard disk, ten or twenty times that for a floppy disk, which spins more slowly).
The following summarizes the characteristics of several information storage technologies:
RAM HD DAT tape DVD 8xCD ethernet floppy Flash (SD)
(Single layer) Type volatile NV NV NV NV [NA] NV NV
delay 50 ns 15 ms 20 min 100MS 300 ms 0.01 ms 200 ms 10ms
rate 60 MB/s 3 MB/s 0.8 MB/s 3 MB/s 1.2 MB/s 1 MB/s 0.5 MB/s 1MB/s
size 2 GB 12 GB 2 GB 4.7 GB 0.7 GB [NA] 1.4 MB 1GB
cost $100/GB $1/GB $.5/GB $.2/GB $1/GB [NA] $.5/MB $75/GB
ns = nanosecond = one billionth of a second
ms = millisecond = one thousandth of a second
MB/s = megabyte per second
MB = megabyte
GB = gigabyte NV = non-volatile
The cost figures given for tape and floppy drives do not include the cost of the drive, only the cost of the removable media (diskette or cartridge). The delay and rate figures given for ethernet assume that a file server connected by those network technologies has so much RAM that the entire file requested is already in RAM, and that there is no competing traffic on the network. In real-world situations, of course, the delays will be longer, determined by events on the server unless the network is overloaded. Typical delays will be approximately equal to those for the hard disks the files are actually stored on, because the extra delay for the server software manipulations required to move the data through the network will be roughly offset by the fact that sometimes the data will already be in RAM. The data rates will almost always be slower than the raw network rate shown above (it is rare for an active network to provide even a third of its theoretical data transfer rate to a single connection), and therefore substantially slower than a hard disk. File servers accessed through dial-up internet connections are about 10-100 times slower even than Ethernet.
High-speed ethernet (100 Mbits/sec, instead of the usual 10 Mbits/sec) network interfaces are becoming more common. Their actual advantage can be up to twenty times faster, because they typically use separate wires for incoming and outgoing data ("full-duplex"), while regular ethernet typically uses the same wires for both ("half-duplex"). It is still the case, however, that the sustained throughput for a real network connection is rarely as fast as one-third of the raw network data rate.
A sector is the smallest amount of information that the disk drive can transfer at once; typical sector sizes are 256, 512, 1024, or 2048 bytes. The operating system software can be designed so that the smallest amount of information transferred in normal file activity (the "block" size) is any fraction or multiple of the disk's sector size; usually, however, the sizes are in ratios of small integers, most often being equal. Operating systems will typically permit only certain sector sizes to be used by disk drives to be used with the operating system.
Video displays are characterized by the number of picture elements ("pixels") horizontally and vertically, by the number of different colors each pixel can be, and by the number of times per second that the display is refreshed. In some situations, the limiting characteristic will be the rate at which the displayed information can be changed. The original VGA display specification provides 640 pixels horizontally and 480 vertically. This is, respectively, slightly more and slightly less than a standard American ("NTSC") broadcast television signal (535 each) and much better than the signal that can be recorded and played back on a home VCR (about 250 horizontally and 535 vertically). NTSC video signals, including both broadcast signals and those played back from a VCR, include only about 100 pixels of color information per row (each of them wide): the brightness can change about three times as suddenly as the color. For television viewing from normal distances, this coincides with the capabilities of human vision.
The standard VGA display provides for up to 8 bits of color information, that is, up to 256 different colors, at any one time. IBM and compatible PCs usually operate with those 256 different colors being selected by storing color values into a color data table within the display interface electronics. Different versions of VGA displays permit those color values to have different numbers of bits, and hence to be chosen from a "pallette" of thousands or millions of different colors. At any one time, only 256 different colors can be displayed, but which 256, from among the larger available pallette, can be changed under program control by storing different values into the color data table. The various forms of "Super VGA" provide more pixels horizontally, more pixels vertically, more simultaneous colors, a larger pallette, or some combination of those features.
All standard Macintosh color displays operate with a fixed pallette, so that at all times the display will have the same 16 colors, or the same 256 colors, etc., and all Macintosh models set for a particular number of colors will have exactly the same colors available.
Human vision is capable of discerning about 40 different shades of intensity for light of any given color. If allowed to adapt to very dim light levels, the chemical reactions used by the retina change (it takes about fifteen or twenty minutes to become fully "dark adapted"). In that case, the sensitivity of the eyes is about one million times what it is in bright sunlight, but still only about 40 different shades of intensity can be distinguished. Five bits provides thirty-two light levels and six bits provides sixty-four light levels; allowing for separate control each of red, green, and blue intensity, we can see that at most 18 bits are required to produce a color image that the human eye will see as indistinguishable from perfection. Since computer systems are routinely constructed to deal with 8-bit bytes, the most common high fidelity color displays use 16-bits ("thousands" of colors) or 24-bits ("millions" of colors).
The human eye's ability to discern detail is such that text and images printed with 600 pixels per inch are visibly superior to ("sharper" or "crisper" than) those printed with 300 pixels per inch. Some printers (particularly ink-jet models) are capable of more pixels per inch in one direction than in the other. For the most part, the human eye and brain will perceive the image quality based on the worse dimension. There is some, limited, benefit in perceived image quality from improving only one dimension, with diminishing returns. It is rarely worth the cost to improve one dimension beyond the point where it has twice as many pixels as the other.
Video displays usually provide between 75 and 100 pixels per inch. Therefore, a modern printer will provide between 30 and 150 times as many pixels per square inch as a video display provides. This is another one of the reasons why on-screen viewing is often less satisfactory than reading from printed output.
The amount of information required to represent an image (when stored on disk, sent through a network, or sent to a printer), is the product of the number of pixels horizontally, times the number of pixels vertically, times the number of bytes required for each pixel. For example, a 2-inch by 3-inch photograph, displayed with 256 colors at 75 pixels per inch in both dimensions, will contain 33,750 bytes, but a 4-inch by 6-inch photograph, also displayed with 256 colors at 75 pixels per inch in both dimensions, being twice as large in each dimension will contain four times as many bytes: 135,000. Using "thousands of colors" (16-bit) requires twice as many bytes, and using "millions of colors" (24-bit) requires three times as many bytes.
We can see why a lot of effort has been put into inventing image "compression" techniques (such as the GIF and JPEG algorithms used by most Web browsers, including Netscape and Internet Explorer) that permit an acceptable rendition of an image to be stored and transmitted using far fewer bytes. The basic concept is simple: it only takes a few bytes to say mathematically the equivalent of "this whole region here is pure blue," no matter how big the region is. Most graphic images contain regions of uniform or gradually changing colors, so that mathematical descriptions can be quite terse compared to simply listing out the strengths of the red, the green, and the blue components of each pixel within the region.
For any given task, system performance is usually limited by one primary "bottleneck." In some situations, a secondary bottleneck would impose a limit only slightly better than the primary bottleneck, so that very little improvement can be obtained by changes that just remove the primary bottleneck.
Conceptually, there are two kinds of bottlenecks:
In general the only reliable benchmark for predicting system performance with your workload is testing performed with your software. Even that can be difficult if the system in question needs to perform its tasks when being used by hundreds of different people doing a variety of different things. Typically, such tests are performed by writing special software to run on other machines attached to the network, permitting them to emulate network-connected users, with random delays between "keystrokes," simulating an interactive load.
A secondary difficulty with the use of MIPS to predict performance is that in many situations the speed of the CPU is not the limiting factor on system performance. We go into more detail in other sections, but other possible bottlenecks include the following:
MIPS is useful only when the two machines being compared are of very similar design: same machine language, same types of input/output circuitry, same types of peripheral devices, etc., and are expected to perform very similar work.
System performance enhancements typically address one or more bottlenecks.
An alternative technology, based on telephone central office design, is known as a "crossbar switch." With such a switch, there are multiple paths, so that any two devices on the system interconnect can exchange data without that activity blocking an exchange of data between any two other devices. Such switches are expensive, and have been used primarily in large-scale time-shared or server systems. It is possible that over the next few years they will become more common, as speed increases in other areas leave the system interconnect as the remaining bottleneck.
The original IBM PC's CPU operated at 4.8 MHz (208 ns/cycle), with RAM that responded in 150 ns, so that the CPU never had to wait an extra cycle. A modern CPU may operate at 400 MHz (2.50 ns/cycle), faster by a factor of at least 80. That modern CPU will, however, be connected to RAM that responds in 10 ns, faster by just a factor of 15. Thus, the modern CPU will routinely have to wait until the fourth or fifth cycle before the information is available from RAM. There is every reason to expect that these trends will continue: CPUs will get faster by leaps and bounds, and RAM will get faster too, but much more gradually. Therefore, we can see that memory cache technology is already (and will continue to be) vital to achieving optimum performance from personal computers.
Modern CPU chips usually have some cache designed into the chip itself; this is known as "primary cache," or "level 1 cache." Some CPU circuit boards include cache memory; this is known as "secondary cache," or "level 2 cache." The expensive part of the cache is the very high speed RAM. Often the CPU will be designed to include the circuitry that keeps track of the data, the cache controller, but with an empty socket for the cache RAM to be added as an optional upgrade.
An even more expensive technique is to install the L2 cache on a dedicated bus, operating at much higher speeds than the system bus. In the most extreme case, such a "lookaside cache" will operate at the same speed as the CPU itself, so that there are no wait states for data retrieved from the L2 cache. The expense occurs because the L2 cache RAM needs to be very fast and because the CPU has to be constructed with interface circuitry to drive both the lookaside cache bus and the regular system bus. This technique is being used with Compaq (formerly Digital) Alpha systems, and with Apple Macintosh PowerPC G3 and G4 systems.
There are diminishing returns for increased cache sizes, but with a typical system design the cheapest performance boost is likely to be the installation of a 256 KB or 512 KB secondary cache. The Macintosh operating system running on PowerPC chips benefits particularly from larger cache sizes, because so much Macintosh software is still designed for the original Motorola 68000-series CPU, and runs on the PowerPC by being interpreted by a software emulator of the 68000. Those systems will benefit from having a cache that is large enough to contain both the entire emulator and also a significant portion of the application code.
The speed advantage provided by a cache will depend on the software being used and the type of work done. The cache provides more benefit if the same portion of RAM is repeatedly referenced for a while, then another portion, and so on. The cache provides more benefit if the references to a given address are mostly reads, with relatively few writes, because any modified value has to be written "all the way back out to RAM," and that still takes the full 10 - 50 ns. Some brave system designers use caches that pretend to the CPU that they are done writing to memory as soon as the value is stored in cache, so that the CPU can go on to its next step while the memory circuits complete the process of storing the new value into RAM.
In general, at any given cost level it is possible to design the electronics so that RISC CPUs can execute so many more instructions per second that they will outperform CISC designs of comparable technology. This includes the effects of the greater ease with which compilers can be written for the simpler RISC architectures. Even though, in principle, a CISC design would require fewer instructions to accomplish the same task, in practice, reliable compilers for CISC machines rarely actually generate machine language code that takes full advantage of those more powerful instructions.
Vector computers, such as the traditional Cray supercomputers, perform the same instruction on multiple items in an overlapping sequence. For example, 16 values stored in consecutive memory locations can be loaded into 16 internal CPU registers (for later arithmetic operations) by a single instruction, with the various stages of execution overlapping so that the time to complete loading the 16 values is much less than 16 times the time that would be required to complete the loading of one value.
Superscalar computers, such as the Digital Alpha AXP, can perform two or more different instructions, on different data, at the same time.
Getting efficient use out of as many as a dozen CPUs, even in an SMP configuration, is still an experimental art. There will be "overhead" to perform the operating system instructions required to coordinate among the multiple CPUs. One example of that overhead is known as "cache coherency": if each CPU has a cache, and one writes a changed value of the information at a particular address, the other CPU's caches may contain different values, so that inconsistent results would be obtained in later operations. Dedicated electronics and operating system routines must cooperate to prevent such problems.
In general, for an interactive multi-user load, the "overhead" of coordinating among the multiple CPUs will require at least 5% of the total CPU capacity of a two-processor system. Most operating systems have not been designed to support SMP, and display worse overhead than this. Furthermore, as each additional CPU is added, a smaller and smaller percentage of that added capacity is available for real work. The size of the overhead, and the way it increases with CPU count, are unforgiving measures of the quality of an SMP operating system.
The first point implies that the backups should be done with removable-media drives (floppy disks, cartridge disks, or tapes). The second point implies that personal computers should be provided with high-capacity, high-speed drives for backups (unless an adequately fast network is available and an operations staff is employed performing centralized backups).
Updated 1/1/2006
University at Albany - State University of New York