Linear Regression (Coding)


For all coding interview questions, you should ask clarifying questions such as:

  • What would be the input to my function?

  • Should I handle erroneous inputs?

  • Do I need to think about the size of the input (can it fit into memory)

  • Can I use external libraries like NumPy? (since for many ML models this would be easier than coding them from scratch)

For this question, the easiest solution is to use the closed-form solution as it is easy to code once the input can fit into memory. First, let us remind ourselves of the equation:


def train_linear_reg(X, y):
=, X)), X.T), y)
return weights

If we assume that X is a  (r x c) matrix and y is a (r x 1) matrix:

Time Complexity: 

  1. The transpose of will take O(r ⋅ c) time [result is a (c x r) matrix]

  2. The multiplication of this transposed matrix with X will take O(r ⋅ c²) [result is a (c x c) matrix]

  3. Matrix inversion would take O(c³)  [result is a (c x c) matrix]

  4. X transposed multiplied by y will take O(r ⋅ c) time [result is a (c x 1) matrix]

  5. The final multiplication (c x c) ⋅ (c x 1) will take O(c²) [result is a (c x 1) matrix]

  6. Thus the time complexity is O(r⋅c + r⋅c² + c³ + r⋅c + c²) we can reduce to O(r⋅c² + c³)


Space Complexity: transpose X will have a space complexity of (c ⋅ r)(r ⋅ c) which breaks down to O(c²) and X has a space complexity of O(r ⋅ c). Thus, the final space complexity is O(r ⋅ c + c²).

Follow Up 1:

More than likely the interviewer would like you to end up coding the gradient descent approach. So in this follow up question we would code it. Note: if the candidate first tried to code the gradient descent approach the interviewer might ask if they can code the closed form equation.

def train_linear_reg_gd(X, y, learning_rate, max_epochs):
        num_datapoints, num
_features = X.shape

        # set weights and bias to 0
        weights = np.zeros(shape=(num_features, 1))
= 0

        for i in range(max_epochs):
            # Calculate simple linear combination y = mx + c or y = X * w + b
            y_predict =, weights) + bias # O(r*c)

            # Use mean squared error to calculate the loss and then
            # get the average over all datapoints to get the cost

            cost = (1 / num_datapoints) * np.sum((y_predict - y)**2# O(r)


            # Calculate gradients
            # 1st - gradient with respect to weights

            grad_weights = (1 / num_datapoints) *, (y_predict - y)) # O(c⋅r)
            # 2st - gradient with respect to bias
            grad_bias = (1 / num_datapoints) * np.sum((y_predict - y)) # O(r)
            # update weights and bias
            weights -= learning_rate * grad_weights
-= learning_rate * grad_bias

        return weights, bias

Time Complexity: Assume e is the maximum epochs. The largest inner for loop time complexity is (r⋅c) and we do this e times so the time complexity is O(e⋅r⋅c)


Space Complexity:  We created weights which have a space complexity of O(c⋅1). Then we produce y_predict which has a space complexity of O(r⋅1) and grad_weights with O(c⋅1). Thus, the final space complexity is O(2c + r).