Projects
Flight Delay Patterns - R
- Explored factors associated with flight delay patterns from 2012 through 2016.
- Manipulated 2.8GB of data, and analyzed 30 million flights.
- Created a Github website to display project findings, including a report, video, and interactive data visuals made by Plotly and ggplot.
Examining the Impact of the New York Advantage Program on the NYC Homeless Population - Python
- Evaluated the effect of Advantage program on the number of NYC homeless people by constructing a linear regression model.
- Manipulated data and built data visualization to help understand patterns of change in the NYC homeless population and how patterns related to other relevant variables (population growth, policy changes, seasonal changes, etc.)
- Report is available here
Using statistical learning to analyze Airbnb listings - R
- Built a predictive model of rental price; model was selected from various statistical learning algorithms of linear and nonlinear regression based on performance (using methods include ridge, lasso, PLS, splines, GAM).
- Classified high/low review score by selecting model from logistic regression, LDA, QDA, tree methods, and SVM.
- Report is available here
Biostatistical Methods - R
- Identified variables associated with hospital length of stay (LoS) patterns to construct a predictive model using Multiple Linear Regression.
- Ran rigorous model diagnostics (check influential point, collinearity, interaction, etc.) and model validation (Cross-Validation, bootstrap).
- Report is available here, and repo is available here
Comparing effects of LAGB and RYGB, and examine associations in weight loss - SAS
- Built a model to identify measured variables associated with weight loss using multiple linear regression.
- Compared obesity treatments effect overtime using longitudinal data analysis.
- Created descriptive statistics for relevant variables.
- Post is available here
Homeworks
Data Science - Shiny Dashboard with NYC restaurant inspection data
- Shiny Dashboard with NYC restaurant inspection data: Created a interactable shiny dashboard to visualize the NYC restaurant grade and violation description based on boro and cuisine type.
Statistical Learning
- Linear Regression & Nonlinear Regression: Fitting Linear Regression using least squares, ridge, lasso,and partial least squares. Fitting Nonlinear Regression using spline and generalized additive model. Report is availabe here
- Classification: Classify market returns (positive or negative) for the S&P 500 stock index using Logistic Regression, Linear Discriminant Analysis, Quadratic Discriminant Analysis, and K-Nearest-Neighbor classifier. Report is available here