At its core, data science is not impossible to learn. However, it's the learning process that gets people stuck. People don't know what they need to learn. They hit the search engines which only adds to the confusion. This article will help you take a birdseye view of the field of data science.
High Level Overview of the Data Science Process
Let's start with a list of the steps that many data science projects may follow:
This process is an oversimplification, to say the least. Each stage has its own quirks associated with it. But, for the purposes of this discussion, the oversimplification will suffice.
What Are the Two Words?
When you explore the above steps, they can help serve as a guide as to what to learn. Data science is a combination of statistical analysis, programming, subject matter expertise, and above all, problem solving.
However, mastering the concepts will require the two words. These two words are simple, but
The two words that will help you are:
Repetition and Usage
Let's Start with Repetition
Learning is repetitive. This means that when you go through a training module, don't be afraid to go through it again, and then again. Unless you have the learning abilities of Albert Einstein, you aren't going to grasp the concepts of most machine learning concepts on the first round.
Next Is Usage.
You need to use what you learn. Period. If you don't have a job in data science where you can use it on a daily basis, invent ways to use it. Run algorithms on every single data set you get your hands on from this day forward. Do this, even if you don't understand the output yet. Familiarity with the models is going to help you big time.
Join groups and read industry mumbo jumbo. In the beginning, most of it will seem like gibberish. But, the repetition part of the equation will eventually help you know exactly what is being said.
No Substitute For Learning
The two words that I believe will help you learn data science won't replace the brute force learning that is required. You need to learn programming, data munging/cleansing, statistics, visualization, and other such concepts. Repetition is important, but you need to actually get started learning the techniques and concepts for it to work.
Usage is useless (say that 5 times fast!) if you don't have a basic grasp of the concepts to begin with. You need to take the time to learn the core concepts and then keep working with them until they become second nature.
Hidden Within the Two Words
I get that there are a bunch of concepts hidden in the two words I am describing. For instance, with repetition, you need to repeat the right techniques. You need to know what techniques matter and when to use them.
Also, there should be a third word added into this equation. That word is Question. You need to ask questions until your brain explodes. You need to find the right forums, talk to the right people, etc. You need to apply what you learn (usage) and then form questions when something doesn't make sense. Then you need to ask.
I want to keep this article as simple as possible. Therefore, I am sticking with my two-word paradigm. Feel free to insert as many words in between as you like.
Getting Yourself Ready for Repetition and Usage
How do you prepare yourself so that you are at the point of applying these two words? You need a plan. I have a template of a plan that you can use to get you started.
Feel free to alter this template to your needs. Or come up with something completely different. However, do come up with a plan of your own. That is crucial to your success!
Plan in Detail
Subscribe to Job Listing Websites
Initially, you aren't going to look for a job. You are simply going to learn what requirements employers are looking for from candidates. Add any requirements to your journal (next) as potential subjects to learn.
Create a Journal
At the start of your data science quest, much of the terms will be foreign to you. It will be like learning a new language. If you don't record the terms when you see them, you won't remember where you referenced them. This will slow down your learning.
When you see a term that seems important, write it down in your journal. Feel free to keep an electronic journal. Just make sure you use it and refer to it.
Data Science Learning Websites
Data science is one of the hottest fields right now. It's so hot, in fact, that more training sites are appearing with increasing frequency. This is both good and bad. It's good as you have more options and it's bad because, well you have more options!
Readers of this website know I love the R language. It wasn't an easy migration, as I come from a Java/C/C# background. R is completely different than these languages, and with good reason: it is meant for statisticians. However, it comes with great sadness to state that the popularity of the R language is fading in the data analysis circles. Python is winning all the battles against R, and is likely to be declared the winner of the war of programming lanuages.
Having said this, it's time for you to learn Python, if you haven't already embarked on that journey. There's just no getting around it at this point. The industry has spoken and Python is the victor!
The good news is Python is quite easy to learn. If you already know languages such Java and C, then Python will be no effort for you to learn. If you have no programming languages under your belt, Python is one of the better langauges to start with as it is easy to pick quickly. Now, if R was your first language, then learning Python is still possible, but you need to shift your mindset away from how R works. There are some similarities here and there, but it is a bit of a relearning process.
Now that you know that learning Python is mandatory for data science, there's more good news. You have great resources available to get you started, and they're free. WooHoo!
Start with Kaggle. The website offers a free mini course in Python that will teach you much of what you need to know to get started with the language. You won't become an expert, but you will be capable of understanding many of the tutorials you find after you complete the course.
The next course I highly recommend is the Experiments With Data course from Analytics Vidhya. It goes through how to explore data including hypothesis generation and modeling. It's a great introductory course.
For both Kaggle and Analytics Vidhya, you will need to sign up for access to these courses.
The following is another good resource, one where you don't even need to sign up. It's called the Hitchhiker's Guide to Python.
One more thing...
Don't pay for a Python course. It's not necessary. I have seen websites charging hundreds of dollars for an intro to Python course. My thinking is these folks are smoking some whacky weed. They are catering to those who don't know any better. Seriously, do a search on YouTube if you have to. There are plenty of tutorials there that can get you up to speed.
Setting Your Expectations
There are many facets to data science. Don't expect to grasp the concepts immediately and don't beat yourself up when you don't. Another stronger word that you may want to apply is persistence. It's like repetition, but it will come in handy when you let that repetition slip on occasion.
Take it slow, but be consistent (yet another word). It's better to work fifteen or twenty minutes per day rather than marathon sessions one day and forget about it for weeks.