##### Linear Regression (Coding)

For all coding interview questions, you should ask clarifying questions such as:

• What would be the input to my function?

• Should I handle erroneous inputs?

• Do I need to think about the size of the input (can it fit into memory)

• Can I use external libraries like NumPy? (since for many ML models this would be easier than coding them from scratch)

For this question, the easiest solution is to use the closed-form solution as it is easy to code once the input can fit into memory. First, let us remind ourselves of the equation: def train_linear_reg(X, y):
weights
= np.dot(np.dot(np.linalg.inv(np.dot(X.T, X)), X.T), y)

return weights

If we assume that X is a  (r x c) matrix and y is a (r x 1) matrix:

Time Complexity:

1. The transpose of will take O(r ⋅ c) time [result is a (c x r) matrix]

2. The multiplication of this transposed matrix with X will take O(r ⋅ c²) [result is a (c x c) matrix]

3. Matrix inversion would take O(c³)  [result is a (c x c) matrix]

4. X transposed multiplied by y will take O(r ⋅ c) time [result is a (c x 1) matrix]

5. The final multiplication (c x c) ⋅ (c x 1) will take O(c²) [result is a (c x 1) matrix]

6. Thus the time complexity is O(r⋅c + r⋅c² + c³ + r⋅c + c²) we can reduce to O(r⋅c² + c³)

Space Complexity: transpose X will have a space complexity of (c ⋅ r)(r ⋅ c) which breaks down to O(c²) and X has a space complexity of O(r ⋅ c). Thus, the final space complexity is O(r ⋅ c + c²).

More than likely the interviewer would like you to end up coding the gradient descent approach. So in this follow up question we would code it. Note: if the candidate first tried to code the gradient descent approach the interviewer might ask if they can code the closed form equation.

def train_linear_reg_gd(X, y, learning_rate, max_epochs):
num_datapoints, num
_features = X.shape

# set weights and bias to 0
weights = np.zeros(shape=(num_features, 1))
bias
= 0

for i in range(max_epochs):
# Calculate simple linear combination y = mx + c or y = X * w + b
y_predict = np.dot(X, weights) + bias # O(r*c)

# Use mean squared error to calculate the loss and then
# get the average over all datapoints to get the cost

cost = (1 / num_datapoints) * np.sum((y_predict - y)**2# O(r)

print(cost)

# 1st - gradient with respect to weights

grad_weights = (1 / num_datapoints) * np.dot(X.T, (y_predict - y)) # O(c⋅r)
# 2st - gradient with respect to bias
grad_bias = (1 / num_datapoints) * np.sum((y_predict - y)) # O(r)

# update weights and bias