If I see another tutorial using the Iris dataset, I'm going to jump off a bridge. Not literally, of course. But I am not interested in learning about the properties of flowers. Some people are, and that's great! Who knows? Maybe someday, I'll be put in a situation where that I need to learn about flowers!
For now, though, flowers are not part of my raison d'etre. Instead, I love stats on baseball which defines one of my use cases. If you want to learn data science, it's time to find your use cases.
If you aren't familiar with the Iris dataset, it's a common database about the properties of flowers and is used for examples among instructors for data science.
Formally, use cases involve actors and processes. A use case determines which actors (people or other processes) will have a use for processes defined (or to be defined). It's the reason for creating or using a process.
Use cases can be defined for any industry, including data science. They help you determine the who, what, why, where, and how for your data science efforts. When you create your analysis, you'll know the reasons for doing so when guided by use cases.
It's difficult to learn a foreign language by studying only words and phrases. You can spend years on the grammar and memorization aspects of a language. But until you are immersed into a situation where you start using the language, you won't advance. This is a use case. You determine that you need to plunge yourself into a situation where you are forced to use the language.
People learning languages tend to travel to destinations that speak the learned languages. Traveling defines their use cases for learning languages. It helps them breakthrough the barrier of the academics to the useful. It's what makes all the hard work of learning worthwhile.
The concept is no different with data science. If you are struggling to find a job because you lack experience, create your own opportunities. The internet is filled with resources, including data, stories, forums, and websites to help you with your goals.
Start with your interests. I mentioned above that I have no interest in learning about the intricacies of flowers. Therefore, studying the Iris dataset is much like watching paint dry for me. You may feel differently.
Learning about topics other than flowers is useful for me. I already mentioned baseball. However, I also wanted to learn more about the Covid-19 crisis the world is going through. This is another use case that I became interested in learning about.
On a sheet of paper list out the topics you are interested in. These can be topics you know a lot about or topics that you want to learn more. Start with the topics you know and explore possible datasets online.
For the initial exploration, don't worry too much about the quality of data. While it is important, you want to focus on what is available first. Then, you can refine your strategy to determine which datasets are authentic.
Be sure to record your findings so that you won't have to go through this exercise every time you want to gain insight into your topics. You'll want a method of retrieving these data sources easily.
Of course, when working on data science projects for others, we won't get the opportunity to work only on projects that interest us. I may even be put on a project where a client wants to learn more about flowers.
Even here, though, flowers are not my use case. The use case in this situation is that of my client. My use case is serving my client. And I would put forth full effort in learning about flowers because that is the requirements of the client.
You'll need to find datasets. You can start with internal data that you have gathered for your
Use cases seem great in theory, but how can they be used to find a job? The use cases themselves probably won't do much for a data science career. However, when you document how you have solved problems with the use cases you've defined for yourself, you will show potential employers that you did more than take a few data science courses.
Data science processes usually follow a step-by-step process flows, which often include exploratory anaysis, data cleansing/munging, model building and tuning (including adjusting hyperparameters), and explanatory analysis.
Many organizations will have a framework that defines the steps that work for their situations. However, most will follow the above steps to some degree.
Many of the processes may be iterative. For instance, during the exploratory analysis stage, more data collection may be needed to effectively answer the questions posed by the problem domain. It's not uncommon to have multiple iterations at several of the stages.
When you learn this process and apply it your your use cases, you'll be several steps ahead of others who go into the field with just an academic background. You'll have taken initiative, and you won't need as much training.
Let Problem solving become second nature. Anyone can be taught to create, run and tweak models. When you know how to interpret them and present them, you will elevate your status among your peers.
Do you need a multiple regression model or classification? Perhaps you should use a random forest. The type of algorithms you choose will depend on the use cases you define. There isn't a one-size-fits-all to data science models.
Many will argue that XGBoost can fit the bill of a one-size-fits-all solution. It can do a lot, but it still pays to know about the models you use. Try to push past black box models and understand all the underlying algorithms, at least at a high level.
The field of data science is vast. Trying to find help when exploring your use cases can be a challenge. I wrote a while back about a trick people can use. The trick remains viable. Essentially, you'll want to view current job listings in data science to see what skills companies are asking for.
Keep up to date with the industry by joining forums and groups. Focus on the area of data science that fits closest to your use cases. For instance, it's nice to learn about artificial intelligence, but nothing in your use cases will use it, consider revisiting the topic later.
When you read books or take courses, make sure the information is current. Learning old methods and material could hinder your progress and make you stumble during your interviews.
Some use cases will present themselves as unintended consequences of others you may have defined for yourself. These consequences may include customers who use your products in different ways than how they were designed. For instance, did you know that Rogaine was originally created to control high blood pressure?
If you still struggle to find use cases, I have good news for you. Search for organizations that are looking for volunteers. Offer to provide data services to them in exchange for a testimonial when the job is done. You'll learn a tremendous amount of knowledge in how to structure a data project. You'll also be giving back to your community. Some of these organization may even hire you to continue work for them.
James is a data science writer who has several years' experience in writing and technology. He helps others who are trying to break into the technology field like data science. If this is something you've been trying to do, you've come to the right place. You'll find resources to help you accomplish this.