SQL for Data Analysis08 Mar 2016
I’ve a had a few discussions over the past months with people wanting to get into the field of data analysis. One of the most frequent question I get is: “What programming language shoud I learn to get my first data analysis job? Should I learn R or Python?”. My answer to them is: none of these for now. First learn SQL.
SQL is the most common denominator for data analysis. It’s a special-purpose programming language designed to interact with databases. It is ubiquitous. As a data analyst you will inevitably interact with an SQL interface on a frequent basis.
SQL is a simple language that gets the job done. You can learn it’s basics in a few hours and with those concepts you can powerfully wrangle your data, whether it’s a few rows or millions of data points.
With the rise of SQL-based business intelligence tools such as Looker, Periscope and Mode Analytics, it is now, more than ever, an indispensable tool for a data analyst. The big data ecosystem is also being ruled by SQL. Hadoop, Redshift and other massive parallel processing data warehouse are interfacing in SQL. As a data analyst, you have no excuses not to master it.
Get your hands dirty
As with many skills, the best way to learn SQL is by getting your hands dirty. To get started, I’d suggest going through Learn SQL The Hard Way. While the book is not complete yet, it is a good hands-on reference to get started. It helps you get setup with a local version of SQLite and teaches you some basic commands to create, retrieve, update and delete data.
Once you get more familiar with SQL, load up one of those awesome public data sets in your local SQLite instance. Pick one dataset that you are interested in, formulate some questions about it and analyze it. Publish the results of your analysis, including your code, on Github. This will enable you to practice your analysis and SQL skills while building your own portfolio. Even if it’s not perfect, it great to start building out some assets out in the wild.