Recent quant/QR/QD interview questions

To help you guys prepare, I’ve collected some recent quant interview questions people have had from various companies:

  1. Multiple linear regression model that you’re training. To tune it, you resample the training data with added Gaussian noise to help regularize your parameters. Assuming your initial training set is large enough, how does adding additional noise affect your parameters’ confidence intervals? (DRW)
  2. Postal office has 3 employees helping people ship off packages. Suppose the time between people arriving at the 1st, 2nd, 3rd employees’ desk is random and follows exponential distribution with lambda=3, 4, and 8 minutes, respectively. The 1st, 2nd, 3rd employees have a probability of 1/4, 1/6, 1/9 of referring the service to the supervisors. Let M/N = average time in minutes between customers referred to the supervisor, where the greatest common divisor gcd(M,N) =1. M+N = ? (DRW)
  3. 3% of a country’s population has a disease. The NIH has developed a test for the disease: the test has a 98% “true positive” rate (the probability a person will test positive given they have the disease). It also has a 4% “false positive” rate (the probability a person will test positive given that they don’t have the disease). If you simultaneously take the test twice and it comes out with two positive results, what is the probability you have the disease, assuming the two tests are independent? (DRW)
  4. You have imprisoned 10 pirates and you randomly put a red or blue hat on each pirate. They can see other people’s hats but not their own. You make each pirate guess what color hat he’s wearing. If everyone gets it right, they’re free to go but if at least one person gets it wrong, they’re sentenced to die. How should the pirates guess to maximise the probability of getting released? (Two Sigma)
  5. Joint probability: f_{X, Y} (x, y) = \frac{3}{2} x^2 + 2xy,0 <= x <= 1,0 <= y <= 1,P(Y <= X) = ? (Akuna)
  6. A and B are two 7x7 matrices. Rank(A) = 4, rank(B) = 5,what’s the smallest value rank(AB) can be? (Akuna)
  7. E(e^{tX}) = \frac{1}{1-t^2},Var(X) = ? (Akuna)
  8. What to do if a model works very well on training but poorly on testing? (Citadel)
  9. If an additional feature gets added to the data, how will PCA results change? (Citadel)
  10. For PCA, what’s the difference between doing eigenvalue decomposition on the correlation matrix vs the covariance matrix? How does eigenvalue decomposition help you get the principal components? (Goldman Sachs)
5 Likes

This is awesome, thanks so much! I need any extra practice I can get

Man these are hard

Some few pointers for the brainteasers and probability questions here:
(maybe we should think of a feature enabling the reader to hide such comments in the future)

for 2: Probability question, condition the time on both which employee referred the last customer to the supervisor and which one is referring the current customer, you will get \sum_{i,j \in {1, 2, 3}} = \lambda_j P(R_i) P(R_j) (there are 9 terms in the sum, and, we should enable latex at some point here :slight_smile: ). P(R_i) being the probability of being referred to the supervisor by employee i. I let you do the calculus. Interestingly you can get actually every moment, using this conditioning, nothing special about the first one.
for 3: looks like a boring Bayes. Do you want me to go in details?
for 4: one of the classics, The first one needs to encode his result: red for pair numbers of reds and blue for impair numbers of reds. This will enable the other 9 to answer correctly, he will miss his own color 50%. Overall, the probability of getting released witch such strategy is 50%. Please let me know if you think you can do better.
for 5: pure computation, an easy way to do it: is to retrieve the marginal for X and the conditional Y knowing X (well actually cdf), which is relatively straightforward given the joint distribution.
for 6: I am thinking rank(A) + rank(B) - n = 2, don’t know how to prove it yet. Update: I can prove it now :slight_smile:
for 7: derivative LHS once for the first moment, twice for the second, and there you get your variance.

Other questions seems more related to pure knowledge of statistics. I have to refresh said knowledge. Let me know if you want me to investigate them. Hope that helps :slight_smile:

1 Like

Thanks for answering these! I’m actually preparing for some PM and DS roles and would really appreciate the extra stats help. Any advice on where I should look to get answers to stats questions like the ones @financeGuru posted?

I highly recommend An Introduction to Statistical Learning by James et al. and The Elements of Statistical Learning by Hastie et al. They both provide really good conceptual explanations and enough math for DS/PM interviews.

I second that, as luck may have it the second edition of Introduction to Statistical Learning has just came out a dozen days ago. Both books are freely available (pdf versions) on their official website. Checkout the Unsupervised Learning chapter to learn more about PCA (section 12.2 of ISLR), which seems to come up quite frequently, and also have a look at section 3.3.3 of ILSR, you will thank me later.