Bobby W. Lindsey
Data Science predicting the future

This post contains links to Data Science related material that I wanted to keep track of. As such, this post will be updated over time.


Principal Components Analysis (PCA for short) is a technique used to reduce the dimensions of a data set. There are a few helpful ways (at least for the math literate) to explain what PCA does:


Hypothesis testing is one of the most important tools in the sciences. It allows you to investigate a thing you’re interested in and tells you how surprised you should be about the results. It’s the detective that tells you whether you should continue investigating your theory or divert efforts elsewhere. Does that diet pill you’re taking actually work? How much sleep do you really need? Does that HR-mandated team-building exercise really help strengthen your relationship with your coworkers? (Spoiler alert, it doesn’t).


Ok, we’re not dummies here, but dummy variables can be a tricky business for some. As a data scientist, you’re going to be working with dummy variables and when deploying models like k-nearest neighbors or ordinary least squares regression, your data set had better contain all continuous variables. But what if your data set has some categorical variables? Introducing dummy variables.