The challenge for this month’s Exercise is to calculate the standard deviation for a given set of data points. It’s a “whole population calculation” because all the data points are present. The trick is to follow the equation, transforming it from cryptic math mumbo-jumbo into C code.
Figure 1 shows the equation I used, which follows these steps:
- Calculate the mean for the values, what I would normally call the “average.” Tally up all the data points and divide by the number of data points.
- Subtract the mean from each element in the data set and square the result. These values are totaled to obtain their mean. The result is the variance.
- Obtain the square root of the variance to get the standard deviation.
The sample code skeleton provided in the Exercise post listed the values[]
array as the data set. Variable items
is set to the number of items in the data set: items = sizeof(values)/sizeof(int);
These two variables are passed to the stddev() function where they’re manipulated to calculate and return the standard deviation.
Here’s my solution, which follows the steps outlined above:
2024_05-Exercise.c
#include <stdio.h> #include <math.h> /* whole population calculation */ double stddev(int v[],int items) { int x,total; double mean,variance; /* calculate sum */ total = 0; for( x=0; x<items; x++ ) total += v[x]; /* calculate the mean (average) */ mean = (double)total/items; /* calculate deviations */ total = 0; for( x=0; x<items; x++ ) total += (v[x]-mean)*(v[x]-mean); variance = (double)total/items; return( sqrt(variance) ); } int main() { int values[] = { 10, 12, 23, 23, 16, 23, 21, 16 }; int x,items; /* output the array's values */ items = sizeof(values)/sizeof(int); printf("Values:"); for( x=0; x<items; x++ ) { printf(" %2d",values[x]); } putchar('\n'); printf("The standard deviation is %.4f\n", stddev(values,items) ); return 0; }
The first for loop in the stddev() function obtains the total of all values in the data set, v[]
. The mean is calculated: mean = (double)total/items;
The double cast is required as mean
is a double variable.
The next for loop calculates the deviation: Variable total
tallies the squares of the difference between each value v[x]
and the mean: total += (v[x]-mean)*(v[x]-mean);
The multiplication expression squares the values, which is my preferred method over using the pow() function.
Variable variance
is assigned the value of the total of the squares divided by the number of items.
Finally, the return statement returns the square root of variable variance
, which is the whole population standard deviation value. Remember to link in the math library to assist with the sqrt() function.
Here’s a sample run:
Values: 10 12 23 23 16 23 21 16
The standard deviation is 4.8990
I checked my solution’s result with the standard deviation calculated by ChatGPT on the same data set and received the same value. So I think I’m good. I hope you’re solution also met met with success and that you didn’t use ChatGPT to write it for you.
Here’s my contribution. I checked the answer with ChatGPT. Actually I lied. I checked it with LibreOffice Calc.
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
double stddevpop(double data[], double size);
int main()
{
double data[] = { 10, 12, 23, 23, 16, 23, 21, 16 };
double sd = stddevpop(data, 8);
printf(“%f\n”, sd);
return EXIT_SUCCESS;
}
double stddevpop(double data[], double size)
{
double sum = 0.0;
double sumsquares = 0.0;
for (int i = 0; i < size; i++)
{
sum += data[i];
sumsquares += pow(data[i], 2);
}
double mean = sum / size;
double meanofsquares = sumsquares / size;
double squareofmean = pow(mean, 2);
double variance = meanofsquares – squareofmean;
double sd = sqrt(variance);
// The five lines above could be replaced with this.
// Horrible isn’t it?
// double sd = sqrt((sumsquares / size) – (pow((sum / size), 2)));
return sd;
}
I like the horrible!
Have you ever written about why it’s necessary to use -lm when you #include ? I’ve always wondered but never bothered to find out.
The above comment was suppose to say #include math.h.
It depends on the platform. In Windows, the math library is either part of the standard library or it’s linked in automatically. In Linux, it must be linked in manually. I don’t recall for macOS, though I don’t remember using -lm specifically.
I just had a reader from Australia who had this same problem with -lm under Linux.
It would seem that the math routines are common enough that they should be in the standard library, but I don’t know why.
In C in a Nutshell math.h is listed as one of the 29 standard headers. I haven’t used Windows for over 10 years and I can’t remember what the situation was. I used Eclipse hooked up to gcc, and before that Borland Turbo C and can’t remember doing anything specific for math.h but it was a long time ago.