Projects using Python, Pandas, SQL, R, Data Visualization Libraries, and Machine Learning Models
Click Project Title link below to see Jupyter Notebooks with detailed Python Code
- Using input HR turnover data set from Kaggle web site: https://www.kaggle.com/giripujar/hr-analytics
- Read in data via Python Pandas DataFrame, conducted initial review of fields and data analysis via Pivot Tables.
- Converted String data fields to numeric using a look-up table for one and GetDummies command for the other.
- Create Training and Testing datasets and run Logistic Regression, SVC, and Random Forest Machine Learning models for prediction.
- Random Forest is best prediction model for this data with a score of 99.

- Using Pitchfork Album Review SQL database from Kaggle web site: https://www.kaggle.com/nolanbconaway/pitchfork-data
- The multiple-table SQLite database is read into a DataFrame with Inner Join statements in Python.
- Exploratory Data Analysis is performed to analyze key variables and relationships.
- Pivot Table created with the Genre of albums reviewed by Year.
- DataFrame created from Pivot Table and format converted to “Regular” Dataframe for charting.
- A Matplotlib line chart created showing album review Genres by Year.
- DataFrame created with the top 4 record labels represented for visualization.
- A Plotly Express “SunBurst” chart created to display Pitchfork’s top record labels and music genres for each one.

- Class project created when student at NoVA Data School Data Science Boot Camp
- Read in Data via R program code, conducted initial review of fields and data analysis via Pivot Tables.
- Create Training and Testing datasets and run Linear Regression model for prediction.

- Using Wildfire SQL database from Kaggle web site: https://www.kaggle.com/rtatman/188-million-us-wildfires
- Exploratory Data Analysis is performed to analyze key variables in US Wildfires and relationships.
- Assume Domain Knowledge to add a Category to the different types of Wildfire Causes.
- Add Plotly Express “Tree Map” chart to visually show the categories and causes of wildfires.
- Create Logistic Model to predict the Category of wildfires given other variables.
