Are you considering a career in data science? It is one of the hottest fields and growing. Even if you aren’t looking for a full-time career in the area, you may find it helps you with gleaning insight from your data and systems. Whatever the reason, finding the right resources is essential. This article will help with the resources to become a data scientist.
As you embark on your research into learning about the field, you’ll find the choices are overwhelming. Every day that passes, it seems another resource emerges and declares itself the quintessential resource for learning and prospering with data science. While there are several great resources, the majority are part of the hype cycle we are currently experiencing.
Data science is an evolving field, and no resource could be considered complete. Still, it is worth it to have a few resources to get started. Of course, many will argue that I left out this resource and that. But, Rome was not built in a day and neither will a listing of resources on data science.
My other goal for this resource is to introduce inexpensive and even free resources to get started. No one wants to spend thousands of dollars on learning a field only to release that it isn’t what they were hoping for.
Oddly, with data science resources, it’s not always about getting what you pay for. I have seen courses offered for introductory Python or R programming that were charging more than $500.
The creators of these resources are out of touch. You can find quality resources to train you on these introductory courses that are either free or will cost you a minimal amount. Many of these low-cost options will teach you what you need to know to get you started in coding.
You have to ask what is it about these high-end courses offerings that would justify the cost. The course creators are trying to take advantage of those who are misinformed. If you are one of those course creators, feel free to comment below about what you are offering with this intro-level courses that justify that huge cost you’re charging.
Here is what I would suggest. Take a free Python course using one of the options listed below. Then, if you happen to find the syllabus of one of the high-end intro Python courses, compare what they are teaching to the free course. You have nothing to lose by taking the free course.
The moral of the story is to make sure you know what you are paying for.
Many data science resources focus much of their efforts on teaching programming. While this is an important component to data science, someone studying the field must be well rounded. A data scientist needs to know statistics, visualizations, presentation, data cleansing, and business domain knowledge.
Learning programming and statistics is an excellent first step in your data science career. These are foundational skills. The resources in this guide will only focus on these topics. I will follow up with next-step resources in another article.
While it seems like a tall order to learn all the necessary skills for data science, the structure of the field is still forming. As the industry matures, you’ll see specializations in each of the concentrations. However, all data scientists will need to have a base understanding of each of them.
The prevalent languages in data science are Python and R. Python is overtaking R as the choice language due to its ease of learning. Both solutions have extensive support, and there seem to be competitive forces at work here.
If you are wondering which to learn R or Python, the correct answer is to learn both. That may be easier said than done. But, doing so accomplishes two objectives. The first is that you have access to more opportunities.
Companies often require specific languages for their job requirements. I believe they go overboard with this as if you are skilled in a few languages you can pick others up easily. However, that is the way of the world. It is a constraint we need to deal with.
Another reason to learn both languages is it will help you during your training. If you know R but find a lot of training modules in Python, you will likely continue your search to find R tutorials. While the number of training modules increases frequently, you’ll waste less time when you know both languages. The basics of the language aren’t the issue. It’s when you get into more advanced concepts.
Kaggle is a phenomenal resource that offers competitions to data science professionals. They use real-world problems submitted by companies, and these companies reward the best solution with prizes, often significant cash awards. Some of the contests are meant for practice and won’t earn you prizes. But, many will. Kaggle is a worthy resource to bookmark.
Kaggle also helps beginners with free courses. At the current time, it is offering an intro to Python course (free). Upon completion, you can continue on your coursework. Current course offerings:
Udemy offers both paid and free courses. The website has a large selection of courses on many topics, including data science.
The website used to include a filter when searching for free courses, but it appears to be gone. You can still find free courses with a Google search as follows:
free python courses udemy.com
If you want to find other courses offered for free, replace the “python” in the above search with the topic you want to learn. It’s not perfect, but you can usually find something of interest.
With Udemy, the website frequently offers significant discounts ($10-$15) per course. If you find a course that you’d like to take that isn’t free, consider waiting for a promotional deal. They happen several times per year, often within weeks of one another.
It can be frustrating as it’s a waiting game for the courses become discounted, but it does happen frequently. When dealing with beginning level data science, there are plenty of alternatives to Udemy.
When dealing with open-source computer languages such as R and Python, there are bound to be resources for learning from the language creators. This is true with both R and Python.
This website falls in the category of Massively Open Online Education (MOOC). It was initially a cooperative effort with Google and edX. Many universities and large corporations (Microsoft) have joined the effort to offer training and micro masters degree-like programs. Many of the courses are offered for free (audit) but require paying for a certificate to become verified.
The interface is a bit cumbersome when first starting out, but the website gives tutorials on how to use it. It is interactive and often includes a discussion board for questions and comments. It won’t take too long to get used to the interface, however.
There are several courses offered in the data science field on edX. You can learn both R and Python (as well as other courses) from this resource. This resource provides Capstone projects which help you to reinforce your learning from other courses.
If you are learning Python, having a set of recipes available to reference is incredibly useful. It’s excellent for every Python coder but indispensable for beginners. You can even download an iPython notebook for Jupyter. It comes with detailed explanations, which makes this resource even more invaluable. It does require signing up to access, however.
While learning statistics is a necessary part of the discipline included with data science, you don’t need to have advanced knowledge for this skill to be useful. You’ll need to understand probability, descriptive statistics, regression, and the fundamentals of hypothesis testing. You should also have a good understanding of Bayesian concepts.
Too often, you’ll read that you need to be an expert in statistics. That’s not true. It seems like everyone wants data scientists to be experts in every aspect of the field.
Obviously, the more statistics you know, the better. As long as you further your studies, you should not let your lack of “being an expert” in statistics hold you back from applying for jobs. You will need the basics, however, and that is what you’ll find with the following resources.
If you have never heard of Khan Academy, now would be a good time to see what they have to offer. This resource is geared towards students. However, they offer math and statistics classes. If you want to brush up or learn linear algebra, which is helpful in data science, this resource provides courses for that as well, which are free.
At some point during your learning, you’ll need to come to grips with hypothesis testing. It is a core concept in data science and machine learning. You may have dreaded learning it in an intro statistics class in high school or college. You likely forgot about it as soon as you were done with the class. I found a great YouTube video that explains the concept for those that want to learn it or need a refresher.
This is not a structured tutorial in the real sense of the word. It is more of a reference to several other websites with explanations on how those websites can help you learn statistics. It also explains why you need to learn the concepts mentioned.
The resources in this article are by no means extensive. The field is dynamic, and you’ll find new resources popping up almost daily. As mentioned previously, there is more to data science than learning a language or two and applying a few statistical concepts. Machine learning, artificial intelligence, presentations skills, business domain expertise, data analysis, and data cleansing are all skills that will be needed and should be considered the next steps toward your journey to data science mastery.
James is a data science writer who has several years' experience in writing and technology. He helps others who are trying to break into the technology field like data science. If this is something you've been trying to do, you've come to the right place. You'll find resources to help you accomplish this.