Getting Started with Statistics for Data Science15 Feb 2016
Learning statistics can be a daunting journey for aspiring data scientists that are not coming from a quantitative field. Whether you are a computer science undergrad, a developer in seek of a career change or a MBA graduate, it seems that the statistical part of data science is often the most intimidating one. As a business school graduate, it was for me.
Statistics are a serious discipline, some people spend their live studying them. As an aspiring data scientist, how should you approach learning stats? What do you need to know? What’s the best way to learn about stats? Here’s how you should go about this.
You can get tremendous value from understanding simple statistical concepts. In many data science projects, you don’t need advanced stats knowledge to draw significant conclusions. For this reason, you should focus on learning the basics of statistics, applying them to your work and expanding from there.
The two main branches of statistics that you need know are descriptive statistics and inferential statistics. You can get a ton of value by understanding those properly.
Descriptive statistics describe quantitatively a collection of information. They summarize the observed data. Contrarily to inferential statistics, they are not deducing facts about the greater population. They are only describing the collected data set.
You have surely interacted with those statistics in the past. Some common measurements in descriptive statistics gauge the central tendency (mean, median, mode…) and others the variability (standard deviation…) of the data set.
Inferential statistics enables us to infer properties about a population based on a sample data set. They use the sample to form conclusions beyond the collected data.
In practical data science, inferential statistics are heavily used when comparing conversion rates, analyzing an experiment such as an A/B test, etc.
For me, online classes worked like a charm to learn the basics:
These classes are interactive, include exercices and videos. I find they are a very good way to get started in this field. They will provide you just enough knowledge so you can start getting more comfortable with statistics.
On a general note, I recommend the book Naked Statistics: Stripping the Dread from the Data. This book by Charles Wheelan covers, amongst others, the topics of descriptive/inferential statistics and provides a good overview of each field. It demystifies statistics through some very concrete and cheerful examples.
Build from there
Remember, the best way to learn these concepts is by applying your knowledge to concrete examples. Once you have started to integrate those concepts in your analyses, I recommend you pick up a statistics manual, such as All of Statistics and deepen your knowledge.