The linear regression problem is the problem of finding the equation of the line y = a x + b that does the best job of matching a set of (x,y) data points.
For example, the picture below shows a set of data points and the regression line that is the best linear fit to those data points.
![]() |
The following process is used to compute the equation y = a x + b of the regression line passing through a set of x points x1, x2, ... , xn and y points y1, y2, ... , yn.
of the x values and the average,
, of the y values.





Write a program that reads a set of x data values from a file named "xData.txt" and a set of y data values from a file named "yData.txt" and uses those data values to compute and print the equation of the regression line passing through the data points.
To read the data from the data files you will need the following two methods.
/* Open the file with the given name and count how many
numbers it contains. */
public static int countNumbersInFile(String fileName){
int count = 0;
try {
File inFile = new File(fileName);
Scanner input = new Scanner(inFile);
while(input.hasNextDouble()) {
double dummy = input.nextDouble();
count++;
}
input.close();
}
catch(Exception ex) {
System.out.println("Unable to open the file "+fileName+".");
}
return count;
}
/* Fill the array A with numbers read from the file with the
given fileName. The array A should be sized to match the
number of doubles in the file. */
public static void readNumbersFromFile(double[] A,String fileName) {
try {
File inFile = new File(fileName);
Scanner input = new Scanner(inFile);
for(int n = 0;n < A.length;n++) {
A[n] = input.nextDouble();
}
input.close();
}
catch(Exception ex) {
System.out.println("Unable to open the file "+fileName+".");
}
}
To use these methods you will also need to place the import statements
import java.util.Scanner; import java.io.File;
at the top of your source code file.
In addition to these methods, you will also need to write the code for the following methods.
// Compute the average of the numbers in the array A public static double average(double[] A) // Return the array that results when you subtract the number v // from every number in A public static double[] subtract(double[] A,double v) // Compute the dot product of the numbers in arrays A and B public static double dotProduct(double[] A,double[] B) // Compute and print the equation of the regression line for the // data points in the arrays x and y public static void printRegression(double[] x,double[] y)
Here is some data you can use to test your program: x-data and y-data. For the data points stored in these two files your program should compute these values:




The equation of the regression line for this data is
y = 0.124862 x - 41.4304