Elementary Statistics – Linear Transformations

 

Original Data

Suppose a class contains 16 students. Recently, the class took an exam; the raw scores and a dotplot of the scores are shown below.

 

Exam scores (in points)

80  80  80  80  82  82  78  78  79  79  79  81  81  81  77  83

 

 

For these scores the descriptive statistics are shown below.

 

Descriptive Statistics: exam score (in points)

Variable      Mean  StDev  Minimum      Q1  Median      Q3  Maximum

exam score  80.000  1.633   77.000  79.000  80.000  81.000   83.000

 

 

Additive Transformation

Now suppose the professor decides to add 5 points to each of the scores. The equation for this particular linear transformation is . The dotplot of the new observations is shown below (along with the graph of the original data).

 

 

Note the distribution of scores has simply shifted up, but there’s no change in the spread of the data. This additive constant affects measures of location (minimum, median, quartiles, maximum, and mean), but does not affect measures of spread (standard deviation, interquartile range, range).

 


The descriptive statistics for the transformed data are shown below.

 

Descriptive Statistics: exam score plus 5 (in points)

Variable            Mean  StDev  Minimum      Q1  Median      Q3  Maximum

exam score plus   85.000  1.633   82.000  84.000  85.000  86.000   88.000

 

Note the measures of location all increased by 5, but the measures of spread stayed the same.

 

 

Multiplicative Transformation

Reconsider the original data. Suppose the professor decides to increase each score by 20%—that is, each score is multiplied by 1.2. The equation for this particular linear transformation is . The dotplot of the new observations is shown below (along with the graph of the original data).

 

 

Note the distribution of scores has shifted up, and the transformed data are slightly more spread out. This multiplicative constant affects both measures of location (minimum, median, quartiles, maximum, and mean) and measures of spread (standard deviation, interquartile range, range).

 

The descriptive statistics for the transformed data are shown below.

 

Descriptive Statistics: exam score times 1.2 (in points)

Variable            Mean  StDev  Minimum      Q1  Median      Q3  Maximum

exam score times  96.000  1.960   92.400  94.800  96.000  97.200   99.600

 

Note both the measures of location and the measures of spread are multiplied by 1.2.

 

 


General Rules on How Transformations Affect Measures of Location and Spread

Adding a constant, a, to every score, adds this constant to the mean, median, and quartiles, but leaves the standard deviation and IQR unchanged. That is,      

 

Multiplying every score by a constant, b > 0, multiplies the mean, median, quartiles, standard deviation, and IQR by that constant. That is,      

 

Multiplying every score by a constant, b > 0, and then adding a constant, a, to every score

·         multiplies measures of location (mean, median, quartiles) by b and then adds a to each of the measures:    

·         multiplies measures of spread (standard deviation, IQR) by b:  

 

 

Finding a Transformation

Again, reconsider the original data. Suppose the professor wants a mean score of 85 points and a standard deviation of 2 points. Since the standard deviation is only affected by multiplication, the change we want in s defines what b is:  Now we can determine a, based on the change we want in the mean (since the mean is affected by both additive and multiplicative constants):  Therefore, we must multiply each score by 1.225 and then subtract 13 from each score. We can represent this transformation as  (Verify that this transformation gives the mean and standard deviation that we desire.)

 

Recall it only makes sense to use the standard deviation as a measure of spread if the mean is a good measure of center (since the standard deviation measures spread around the mean). In some cases (say, with skewed distributions), it’s better to use the median and the interquartile range (IQR) as measures of center and spread. We can also find appropriate linear transformations based on desired changes in the median and the IQR.

 

Suppose the professor wants a median score of 82 points and an IQR of 1 point. The median and IQR of the original data are 80 points and 81 – 79 = 2 points, respectively. Since the IQR is only affected by multiplication, the change we want in the IQR defines what b is:  Now we can determine a, based on the change we want in the median (since the median is affected by both additive and multiplicative constants):  Therefore, we must multiply each score by 0.5 and then add 42 to each score. We can represent this transformation as  (Verify that this transformation gives the median and IQR that we desire.)