Beginner Machine Learning Interview Questions with Answers
Basics of ML
1. What is Machine Learning?
Machine Learning is a subset of AI that enables systems to automatically learn and improve from experience (data) without being explicitly programmed.
2. What are the different types of Machine Learning?
-
Supervised Learning → Uses labeled data (e.g., regression, classification).
-
Unsupervised Learning → Works on unlabeled data (e.g., clustering, dimensionality reduction).
-
Reinforcement Learning → Learns via rewards and penalties through trial and error.
3. Difference between AI, ML, and Deep Learning?
-
AI → Broad concept of machines simulating human intelligence.
-
ML → Subset of AI; learning patterns from data.
-
Deep Learning → Subset of ML; uses neural networks with many layers.
4. What is Overfitting?
When a model learns noise in the training data, performs well on training but poorly on unseen data.
5. How do you prevent Overfitting?
-
Cross-validation
-
Regularization (L1, L2)
-
Dropout (in neural nets)
-
Early stopping
-
More training data
6. What is Underfitting?
When a model is too simple to capture patterns in the data, performing poorly on both training and test sets.
7. Difference between Bias and Variance?
-
Bias → Error due to overly simplistic assumptions.
-
Variance → Error due to sensitivity to small fluctuations in training data.
8. What is Bias-Variance Tradeoff?
Balancing bias (underfitting) and variance (overfitting) to achieve the best generalization.
9. What are Training, Validation, and Test datasets?
-
Training → Used to train the model.
-
Validation → Used for hyperparameter tuning.
-
Test → Used to evaluate final model performance.
10. What is Cross-Validation?
A technique where the dataset is split into k folds; the model is trained on k-1 folds and tested on the remaining fold. This process repeats k times to reduce overfitting and bias.
Data & Features
11. What is Normalization vs Standardization?
-
Normalization → Scale data to [0,1].
-
Standardization → Scale data to mean 0 and std 1.
12. What is Regularization?
A technique to reduce overfitting by adding a penalty term to the loss function.
13. Difference between L1 and L2 Regularization?
-
L1 (Lasso) → Adds absolute values of coefficients → feature selection.
-
L2 (Ridge) → Adds squared values of coefficients → shrinks coefficients but keeps all.
14. What is Curse of Dimensionality?
As dimensions (features) increase, data becomes sparse, making distance-based algorithms (like k-NN) ineffective.
15. What is Dimensionality Reduction?
Reducing features while preserving important information. Example: PCA, t-SNE.
16. What is a Confusion Matrix?
A table that describes classification performance:
-
TP (True Positive)
-
TN (True Negative)
-
FP (False Positive)
-
FN (False Negative)
17. What is Precision, Recall, and F1 Score?
-
Precision = TP / (TP + FP) → How many predicted positives are correct.
-
Recall = TP / (TP + FN) → How many actual positives are correctly predicted.
-
F1 Score = Harmonic mean of Precision & Recall.
Intermediate Machine Learning Interview Questions with Answers
Algorithms & Models
18. Explain Linear Regression.
A supervised algorithm that models the relationship between dependent (Y) and independent (X) variables using a straight line.19. What assumptions does Linear Regression make?
-
Linearity
-
Independence of errors
-
Homoscedasticity (equal variance)
-
Normal distribution of errors
-
No multicollinearity
20. What is Logistic Regression?
A classification algorithm that predicts probabilities using the logistic (sigmoid) function.21. Difference between Linear and Logistic Regression?
-
Linear → continuous output
-
Logistic → probability → classification
22. Explain Decision Trees.
A tree-like model of decisions based on features, using splitting criteria (Gini, entropy).23. What are Entropy and Information Gain?
-
Entropy → measure of impurity.
-
Information Gain → reduction in entropy after a split.
24. What is Gini Impurity?
Probability of misclassifying a sample if randomly chosen.25. Explain Random Forest.
An ensemble of decision trees using bagging + feature randomness → reduces overfitting.26. What is Bagging vs Boosting?
-
Bagging → trains multiple weak learners independently, aggregates results.
-
Boosting → trains sequentially, focusing on previous errors.
27. Explain Gradient Boosting.
Sequential boosting where each new model corrects residuals of previous models.28. What is AdaBoost?
Assigns weights to samples, misclassified points get higher weights in next iteration.29. What is XGBoost?
Optimized gradient boosting with regularization, handling missing values, and parallelism.30. What is LightGBM?
Gradient boosting framework optimized for speed and large datasets (uses leaf-wise growth).31. Explain SVM.
Classifies by finding the hyperplane that maximizes the margin between classes.32. What is the Kernel Trick in SVM?
Transforms data into higher dimensions to make it linearly separable.33. What are k-Nearest Neighbors (k-NN)?
Instance-based algorithm that classifies based on majority of k nearest neighbors.34. What is Naive Bayes?
A probabilistic classifier based on Bayes’ theorem assuming feature independence.35. What is PCA?
Dimensionality reduction technique projecting data onto orthogonal principal components that maximize variance.36. What is Clustering?
Unsupervised grouping of similar data points. Examples: k-Means, Hierarchical, DBSCAN.37. Explain k-Means.
Partitions data into k clusters by minimizing within-cluster variance.38. Limitations of k-Means?
-
Requires predefining k
-
Sensitive to outliers
-
Works poorly with non-spherical clusters
39. What is Hierarchical Clustering?
Builds nested clusters using agglomerative (bottom-up) or divisive (top-down) approaches.40. What is DBSCAN?
Density-based clustering algorithm, good for arbitrary shapes and noise handling.41. What are Recommender Systems?
-
Content-based → recommends similar to past liked items.
-
Collaborative filtering → based on user-item interactions.
42. What is a Hidden Markov Model?
A statistical model where states are hidden, but outputs are observable, used in sequence modeling (speech, NLP).Model Evaluation
43. What is ROC curve?
Graph of True Positive Rate vs False Positive Rate at different thresholds.44. What is AUC score?
Area under ROC curve → measures overall classification performance.45. What is Log Loss?
Loss function for classification based on probability outputs (penalizes wrong confident predictions).46. Precision vs Recall?
-
Precision = correctness of positives
-
Recall = coverage of actual positives
47. Micro vs Macro vs Weighted F1?
-
Micro → global average
-
Macro → average of classes equally
-
Weighted → weighted by class support
48. What is Bootstrapping in ML?
Sampling with replacement to create multiple datasets for ensemble learning.49. Grid Search vs Random Search?
-
Grid → exhaustive parameter search
-
Random → random sampling of hyperparameters
50. What is Hyperparameter Tuning?
Optimizing parameters not learned during training (e.g., learning rate, tree depth).Advanced Machine Learning Interview Questions with Answers
Deep Learning
51. What is a Neural Network?
A network of interconnected layers (neurons) that learn feature representations.52. What are Activation Functions?
Introduce non-linearity. Examples: ReLU, Sigmoid, Tanh.53. ReLU, Sigmoid, Tanh differences?
-
ReLU → fast, reduces vanishing gradient
-
Sigmoid → outputs between 0–1 (probabilities)
-
Tanh → outputs between –1 and 1
54. What is Vanishing Gradient Problem?
Gradients shrink in deep nets, slowing learning (common with sigmoid/tanh).55. Exploding Gradient Problem?
Gradients grow uncontrollably → unstable training.56. What is Dropout?
Regularization by randomly dropping neurons during training.57. What is Batch Normalization?
Normalizes inputs of each layer, speeds training and improves stability.58. What is CNN?
Convolutional Neural Network, used in image recognition. Uses convolution + pooling layers.59. What is Pooling?
Reduces spatial dimensions (e.g., max pooling, average pooling).60. Explain RNN.
Recurrent Neural Network, used for sequence data, remembers previous inputs.61. What are LSTMs and GRUs?
Variants of RNN designed to handle long-term dependencies via gates.62. What is Attention Mechanism?
Focuses on important parts of input sequences dynamically.63. What is Transformer architecture?
Uses self-attention and parallelization → basis of GPT, BERT, LLMs.64. What are GANs?
Generative Adversarial Networks: Generator + Discriminator in competition to create realistic data.65. What is Reinforcement Learning?
Agent learns by interacting with environment using rewards/penalties.66. What is Q-Learning?
Value-based RL algorithm learning the best action-value function.67. What is Deep Q-Network (DQN)?
Q-learning with deep neural networks to approximate Q-values.Advanced Topics
68. What is Explainable AI (XAI)?
Techniques that make ML decisions interpretable.69. What is SHAP and LIME?
-
SHAP → Shapley values for feature contribution.
-
LIME → Local interpretable model explanations.
70. What is Federated Learning?
Training ML models across decentralized devices without sharing raw data.71. Online Learning vs Batch Learning?
-
Online → updates model incrementally
-
Batch → trained on full dataset at once
72. Semi-Supervised Learning?
Uses both labeled and unlabeled data.73. Active Learning?
Algorithm queries the most informative samples for labeling.74. Meta-Learning?
“Learning to learn” → models that generalize quickly to new tasks.75. Few-Shot and Zero-Shot Learning?
-
Few-shot → learns from very few examples.
-
Zero-shot → solves tasks without training examples.
76. Anomaly Detection?
Identifying rare patterns that deviate from expected behavior.77. Imbalanced Dataset Problem? Solutions?
Oversampling (SMOTE), undersampling, weighted loss functions.78. What is SMOTE?
Synthetic Minority Over-sampling Technique → generates new minority samples.79. Ensemble Learning?
Combining multiple models to improve performance.80. Explain Stacking.
Meta-learner combines predictions of multiple base models.ML in Practice
81. How do you deploy an ML model?
Export model → wrap with API → deploy via cloud or container.82. What is MLOps?
Set of practices to automate ML lifecycle (CI/CD, monitoring, retraining).83. Concept Drift vs Data Drift?
-
Concept drift → target distribution changes
-
Data drift → feature distribution changes
84. What is Feature Engineering?
Transforming raw data into meaningful features.85. Feature Selection vs PCA?
-
Selection → choosing most relevant features.
-
PCA → projecting into new feature space.
86. How to handle missing values?
-
Imputation (mean, median, mode, KNN)
-
Dropping missing rows/columns
87. What is a Pipeline in ML?
Sequential steps (preprocessing → training → evaluation) automated.88. Challenges in real-world ML?
-
Data quality issues
-
Model interpretability
-
Deployment & scaling
-
Fairness & bias
89. How to choose the right algorithm?
Based on data type, problem type, interpretability vs accuracy trade-off.90. How do you evaluate regression models?
Metrics: RMSE, MAE, R² score.91. How do you evaluate classification models?
Metrics: Accuracy, Precision, Recall, F1, ROC-AUC.92. Offline vs Online evaluation?
-
Offline → testing on historical data
-
Online → live testing (A/B testing)
93. How to scale ML models for big data?
-
-
Distributed training (Spark, Hadoop)
-
Parallelization
-
Model compression95. What is Model Drift and how do you detect it?
Model drift happens when a model’s performance degrades over time due to changing data distributions.
-
Types: Data drift (input features change), Concept drift (target changes).
-
Detection: Monitor metrics, statistical tests (KL divergence, PSI).
96. What is A/B Testing in ML?
An online evaluation method where two versions of a model (A = control, B = new) are tested on different user groups to compare performance.97. What is Model Interpretability and why is it important?
Ability to understand how a model makes predictions.-
Importance: Trust, debugging, regulatory compliance (e.g., healthcare, finance).
-
Techniques: Feature importance, SHAP, LIME.
98. What is a Confusion in Causal Inference?
When correlation is mistaken for causation due to hidden factors (confounders).
Solution: Randomized controlled trials, instrumental variables, propensity score matching.99. What is Hyperparameter vs Parameter in ML?
-
Parameter → learned during training (weights in regression, neural net weights).
-
Hyperparameter → set before training (learning rate, number of trees, k in k-NN).
100. What is Cold Start Problem in Recommender Systems?
Difficulty in recommending items when user/item has no historical data.
Solutions: Content-based filtering, hybrid recommenders, demographic data. -
Comments
Post a Comment