
Lung Cancer Prediction with Gene Expressions
This project involved developing a machine learning pipeline to classify lung cancer with maximum accuracy using a minimal set of genes. Starting with over 51,000 genes, I used Mutual Information for feature selection and a Random Forest model to discover that a highly efficient model using only 2 key genes could achieve perfect classification accuracy.
PythonPandasScikit-learnRandom Forestt-SNEJupyterRole: Data Analyst & ML Researcher