Quantitative Methods I: Psy
501 |
Using Excel for Data Analysis and Graphs The instructions below demonstrate the use of Excel to produce descriptive statistics and graphs that are then incorporated into an APA-style data report. The data for this demonstration are 200 adult human heights. Nonetheless, the steps in Exel for computing descriptive statistics, creating tables, and creating figures are the same.
Tabular and Graphical Displays of Data
Frequency tables. Data are grouped into convenient intervals with upper and lower real limits. Frequencies and cumulative frequencies are computed for each interval. Table 1 Frequency Table for Adult Heights (N=200) in Two-Inch Intervals
Clicking OK produces all the data you need except for the cumulative frequency, which is easily obtained by adding the
current bin frequency to the cumulative frequency above it.
Polygons. Beginning with the first interval with a frequency of 0 and going to the last with a frequency of zero, frequencies per
interval are plotted as points and the points are connected with straight lines. Midpoints are again used on the x-axis.
Cumulative histograms and polygons. Cumulative plots are used to make relative rankings more obvious. Frequently a dual y-axis will be used with one showing cumulative frequencies and the other showing percentile ranks. Put upper limits to intervals on the abscissa, not the midpoints. When using Excel, you have the option to produce a dual axis graph that shows both a histogram and a cumulative frequency polygon for the same data. Choose the "cumulative frequency" option in the dialog box that opens up when you select Tools/Data Analysis/Histogram. The graphical methods above are all classical methods of showing distributional information. John Tukey in 1977 introduced some newer methods that are now catching on. One of the most popular is the box plot. (He also created the stem-and-leaf plot which was introduced as a "table" earlier but obviously has graphical properties.) Box Plot: Box and whiskers plot was the original name. Need 75th percentile (upper hinge), 25th percentile (lower hinge), median, upper fence, and lower fence. Fences are invisible limits in the sense that you compute them but do not graph them. There are different conventions for computing fences. The ones we will use are computed relative to the upper and lower "hinges" (i.e., P75 and P25). The lower fence is at a distance of 1.5*(75th percentile pt.-25th percentile pt) below the lower hinge. Upper fence is at a distance of 1.5*(75th percentile pt. - 25th percentile pt.) above the upper hinge. Draw lines (whiskers) to the last data point within the fences, then use individual markers (such as the ◦'s used here, though commonly asterices are used) for the remaining points which are termed outliers. Whiskers cover 99.3% of values in a normal distribution. [z=.674 is at 75th, -674 at 25th, interquartile range is 1.348. Whiskers extend to last values inside the range +2.696 and -2.696, an interval that includes 99.3% of values.) Any values outside of whiskers are plotted as asterices and are labeled outliers. In its current version, Excel does not have a box plot charts option (but many statistical packages do). But you can still use Excel to create a graph that is pretty much like what you want. There are add-ins for Excel that produce these charts (along with a host of other statistical procedures not included in the native version of Excel), but they tend to be quite expensive. I don't recommend them. Spend your money on SPSS, Systat, SAS, or Sigma Plot instead. To develop a box plot in Excel, compute the following statistics in the order given in a column of a worksheet: P25, Min,P50,Max,P75. Open the Chart wizard using Insert/Chart. Choose Line graph, data in rows, and Finish. This will produce a graph with five data points lined up, each with its own unique marker. Now click on the graph to open up the dialog box for formatting the data series. Choose options and check the boxes for high-low bars and up-down bars. You may also want to adjust the gap size. This produces a graph like the one below, which can be further edited to eliminate unwanted markers.
|