Using Standard Deviation for Rate Analysis

By Mike DeChristopher, K1KAA


04/21/08 - page still live, more added soon...

Standard deviation can be used to find a spread of values in terms of a random variable, values multiset, population, or minor probability such as distribution. In amateur radio contesting, we can use it as an additional variable to analyze rate. This is best used to show rate over time, and to compare to a pre-set goal. Most logging programs will allow you to set hourly rate goals, which makes this process easier, and I highly recommend it if you are going to be serious about post-contest analysis. Prior to getting too deep into the subject, however, we should look at the basics...

Variance, in the sense of Standard Deviation (StDev), is the average of squared difference between a data value and the mean value of a set.

{4,8}
Mean=6
Deviation={-2,2}
Deviation squared={4,4}
Variance/average of deviations squared=4

Fundamentally simple - like most statistics. Now, how do we figure StDev? A simple example to illustrate...

Two numbers: 4 and 8
Avg.= ( 4 + 8 ) / 2 = 6
Deviation= ( 4 - 6 ) = -2, ( 8 - 6 ) = 2
Now we amplify larger deviations to turn negative values positive...
( -2 )2 = 4 , 22 = 4
Sum of squares=8
Above divided by count of values= 8 / 2 = 4
SqRoot of the quotient= 2
StDev= 2

One can also find the StDev of a random variable, where expected value and variance of that value are used to produce the result. Not much use, at least in the practical sense, to us contesters. What StDev is most useful for is evaluating how close to the goal you came, time and time again. In plain english, StDev can measure how much you missed your target rate for any given hour; either above (which is desirable) or below (which indicates slow periods, or your over-optimism).

In relation to a 48-hour contest...

Any single hour1 = 1 / 48
Any single minute2 = 1 / 2,880
Any single second3 = 1 / 172,800

The final two units are probably not much use, but in univariate measurements, one could use 3 times count of 2, for example, but that would be taking the long road to something achieved in a simpler fashion. In a generic world, we would be measuring a single hour (1/48 in a 48 hour contest) against a pre-set goal, or against the other 47 values to show the StDev over the length of the contest (this is my preferred method).

How I post analyze logs from the contests here is a simple but concrete method. Rate analysis is only one of the statistical functions we "crop", but it is one of the most telltale. I take the average, median, and mode of actual data, and compare that with the average, median, and mode of goal presets. This means the addition, division, and multiplication of 48 different values, so a statistical calculator carries advantage. Keep in mind, however, that most algebraic calculators (and their graphic calculator relatives) measure statistics only in the Logistic and SinReg regressions, as well as ANOVA, 2SampFTest, and 2-SampTTest (in the case of the TI-84Plus). None of these will do you much good.

The chart below shows rate by hour over the duration of a 48-hour contest. Rate actual="rateact", rate expected="rateexp" and is the pre-determined goal for that hour, or string of goals for all hours. Standard Deviation can be used to demonstrate the difference between the two. I usually measure from the polynomial trend line - both of which are represented below - to prevent the data from "squaring off" like the raw values have in the chart.

Sumrateact=8700
Countrateact=48
Averagerateact=181.25

Applicable calculationsrateexp are performed, but not represented for the sake of basic demonstration.

Now we need to find the deviation from each hour, or the cumulative deviation for the entire contest (which also doesn't do us much good). Instead of extrapolating 48 different data sets here, lets convey the point by using hour 40. In that hour, the actual rate was 200, but the expected was 300 (all numbers were kept in hundreds for this theoretical contest). So...

( 200 + 300 ) / 2 = 250
( 200 - 250 ) = -50
( 300 - 250 ) = 50
( -50 )2 = 2500
( 50 )2 = 2500
2500 + 2500 = 5000
5000 / 2 = 2500
Sqrt of 2500 = 50
StDev40 = 50

From the above example, we see that our standard deviation from our expected, or our goal, was 50. This example was simple, of course, but it can be applied to multiple hours. We could place the entire contest in a data set, and represent that as {1,2,3,4...48}, with separate sets for rateexp and rateact, to generate a final "overall" standard deviation, which is most useful.

If you plan to do serious log analysis, the easiest medium to use is Microsoft Excel with a statistics plugin. Bring the "currentcontest" cabrillo file into Excel (the program should be able to open it, and later versions will automatically make it a data list with filter tabs at the column headers). Go to Add-Ins, click the stat plugin, mouseover Descriptive Stats, and click univariate statistics. The next menu that comes up may have a different format, but the standard deviation checkbox will most likely be under the "summary" tab. This menu will also allow you to generate the probability of data falling in the top 5%, bottom 5%, or any other segmentation you would like (I tend to calculate the top 10% only).

Here are some fast figures for your own calculations...

Count=48
Sum=1176
Average=24.5

Count=24
Sum=300
Average=12.5


Return Home