Technologies: Python, Juputer Notebooks
Skills & Methodologies: Data Preprocessing, Imputation, K-Fold Cross Validation, Logistic Regression, Linear/Quadratic Discriminate Analysis, Random Forests, Support Vectore Machines, K-Nearest Neighbors, Naive Bayes
GitHub Repository
Using data collected from weather stations scattered across Australia my group attempted to predict whether it would rain the next day. Data was first preprocessed using pandas to convert datatypes, align naming conventions, etc. Then imputation techniques were used to fill in missing data entries. Lastly, a variety of models were tested. Logistic regression performed best with 85% accuracy. A complete summary of the results can be found in the PowerPoint presentation in the associated GitHub Repo.