Answering the question you are asked in an ML interview at top companies will not get you the offer.
Suppose you get a call from the recruiter of your dream company where you have applied for the ML Engineer role. You have set a date and started preparation with an ML study guide like this one or similar. On the day of the interview, you are able to answer all the questions and are confident that you will move onto the onsite stage. However, you get a call from the recruiter saying that they have decided not to go forward. How can this be?
It is not enough to answer the question, because the interviewer wants to see that you have a deep understanding of the topic/question. I like to think about it as similar to software engineering coding interviews: if you are asked to search for an element in a sorted list, a correct but inefficient answer is to linearly search each item in the list, however, a much better answer is to use binary search. Similarly, if you are asked if logistic or linear regression is better for a classification task, then a correct but poor answer would simply be to say logistic regression, however, a better answer would be to add the reasons that a classification task violates the assumptions for linear regression. This may seem like overkill, but when you are interviewing at top companies you can be sure that all the candidates who make it onsite know that logistic regression is used for classification, but not many would go further to explain why.
For theoretical ML questions, you will typically be asked a main question and the interviewer would have 1–4 follow up questions based on the main question. When answering, as we spoke about before, it is not enough to simply state the answer but you should expand it. Expanding on you answer can take many forms such as stating the pros and cons, talking about alternative models/algorithms, writing an equation or even suggesting how to productionize your model. These additional responses would take you from an average candidate to a “strong hire” candidate.
Let us see a couple of examples below to better understand how we should approach answering ML Interview questions.
Example Of A Linear Regression Question And Answers
Question Suppose you are the ML expert among a team of healthcare professionals. The project they are working on is to determine the life expectancy of patients. This would help them in downstream tasks. One of these healthcare professionals suggests that you can use Linear Regression. Is this approach appropriate here?
Candidate 1: Yes, Linear Regression can be used since the output is a real number (life expectancy of patients).
Candidate 2: Linear Regression would be appropriate since we are predicting a continuous value. To confirm that linear regression is really appropriate, it must follow these 4 assumptions: 1. Linearity: this means that the relationship must be linear between the independent variables and dependent variable. 2. Homoscedasticity: there is constant variance of the residuals (errors). 3. Independence: independent variables (observations) are not highly correlated, in other words, they are independent of each other. 4. Normality: for any fixed value of our observations, the dependent variable is normally distributed.
One thing to keep in mind is that most companies rate candidates on a scale like 1–5 (or something similar), and to get hired you would want the interviewer to think you are a stronger candidate (a rating of 4/5) and to do this you would need to answer the questions thoroughly rather than answer what is asked on the surface level — this is a skill you need to learn. Looking at the above example, candidate 1 would probably get a 2/3 rating while candidate 2 would get a rating of 5. Both candidates answered the question, but the second candidate showed that they have a much stronger understanding of Linear Regression. Let us take a look a follow-up to the above question:
Suppose the features you get access to are: the year a person was born, BMI, the country at birth, units of alcohol consumed per week and nationality at birth. However, there is a problem with these features, what is it? Candidate 1: The problem is that there exists collinearity between country at birth and nationality at birth. This means that they are highly correlated. Candidate 2: The problem is that there exists collinearity between country at birth and nationality at birth. This means that they are highly correlated: country at birth can predict nationality at birth and vice versa. This becomes a problem because we lose interpretability as we would not be able to distinguish between individual effects of the two co-linear variables and it violates one of the assumptions of linear regression (independence). We can identify collinearity by using Variable Inflation Factors (VIF). VIF gives a score to each independent variable and this score indicates how well it is explained by other independent variables. A score above 5 is usually considered to indicate collinearity. We can generally solve collinearity by either removing one of the features or linearly combining both features. Since both are categorical values, removing country at birth or nationality at birth would be the best option.
Typically, when you get a question about what could be the problem, you should always think about answering the question in two parts:
Identify the issue
Provide ways to solve the problem (this second part shows that you are a strong candidate).
Again, candidate 1 answered the question but they did not show expert knowledge of the topic and would probably get a rating of 2/3 whereas candidate 2 showed a mastery of the topic (collinearity) and so would get a score of 5. By answering questions in this way you are showing to the interviewer that you are a strong candidate and would rank higher than other candidates interviewing for the role.