02. All My Projects

Lung Cancer Prediction with Gene Expressions

Lung Cancer Prediction with Gene Expressions

This project involved developing a machine learning pipeline to classify lung cancer with maximum accuracy using a minimal set of genes. Starting with over 51,000 genes, I used Mutual Information for feature selection and a Random Forest model to discover that a highly efficient model using only 2 key genes could achieve perfect classification accuracy.

PythonPandasScikit-learnRandom Forestt-SNEJupyter

Role: Data Analyst & ML Researcher

MIMIC-IV Data Extraction with BigQuery

MIMIC-IV Data Extraction with BigQuery

Executed complex SQL queries in Google BigQuery to perform an ETL (Extract, Transform, Load) process on the large-scale MIMIC-IV clinical database. The goal was to build a clean, analysis-ready dataset for studying patient ICU stays and outcomes.

SQLGoogle BigQueryGoogle Cloud PlatformETLData Warehousing

Role: Data Analyst