Evaluating Machine Learning Models

Machine learning models are not magic elixirs, despite what you might hear to the contrary. They can help extract insight from data, and while that may be beneficial, they require evaluating machine learning models to ensure they are doing the best job possible with the data provided.

Evaluating Machine Learning Models

Disclaimer: site owner may receive compensation for purchases made from the links in this article. This does not change the cost of purchase.

One of the trickiest aspects to machine learning is knowing when a model is doing what it is supposed to do. The aspects of machine learning that become routine are loading and cleaning data, and then training the models. But, those models won't be much use without effectively evaluating them.

According to a report from O'Reilly Books Online, author Alice Zheng writes about the evaluation process, "It's fundamental, and it's also really hard." She goes on to state that you need to know ahead of time what questions you want answered.

Learn more about why Alice believes you need to consider the evaluation before doing anything else.

Note: the following video will suggest you click on the link in the description. There is no need to do that here as you are already where the link would have directed you to. This is only for when you are viewing the video from YouTube. The video is worth watching, though, as it gives a good overview about this post!

Machine Learning Workflow


Like anything else, machine learning is a process and thus, has a certain workflow associated with it. This workflow helps guide data scientist (both new and experienced) to ensure the proper steps are executed and in the right order. The report from Zheng covers this workflow with a nice diagram.

It's All About the Metrics

The metrics are kind of the "meat and potatoes" to the evaluation process. Understanding this concept is crucial to the process. Although Zhang doesn't state this, knowing which metrics will require data scientists to know about the business. That is one of the reasons why this skill is sometimes listed as a necessary requirement for becoming a data scientist.

However, the skill of subject matter expert is often lost in the technical components that comprise the job. This is something that author Robert de Graaf feels needs to be changed. In a Medium article about Subject Matter Expertise, he states that data scientists need to move beyond the data and get to the heart of what the customer needs. Further, he states, "[a customer centered approach] requires a change in mindset from being someone who discovers ‘what the data is saying’ to being someone who improves their customer’s life."

[Source: Graaf, Robert de. “Subject Matter Expertise - Forgotten In Data Science Education.” Medium, The Startup, 8 July 2019, medium.com/swlh/the-user-centred-data-science-revolution-9f41ceccd15e.]

Recently, Medium.com decided to implement a paywall for much of its content. The above resource is under that paywall. However, the company has made available a backdoor via Twitter. If you are being blocked from access and you have a Twitter account, use the following to access this article:


You can read about how to get around the Medium.com paywall from the post that I wrote, How to Blast Through Medium's Paywall.

More Information on Evaluation of Machine Learning Models

The previous report is intented to give you a quick overview about evaluating models in machine learning. It is well-written and covers the basics. You can read more of the report by signing up for the trial. O'Reilly is a leader in publishing technology and finance books. I have been a member for several years and I must say it would be difficult to manage a technology and finance (both my specialties) without this invaluable resource.

For those who don't want to pursue a subscription with O'Reilly books, no hard feelings. It's all good. And, there are plenty of resources to investigate. One such resource, which has become another go-to one for me, is Analytics Vidhya. 

Evaluation seems like a mysterious concept. However, when you break it down, you'll see that it is about trying to reduce the errors that a model produces. In most cases, the error of a model is the values you predict minus the actual values. An article from Analytics Vidhya discusses this error evaluation and it is worthy of your time to read:

Srivastava, Tavish. “11 Important Model Evaluation Error Metrics Everyone Should Know.” Analytics Vidhya, 6 Aug. 2019, www.analyticsvidhya.com/blog/2019/08/11-important-model-evaluation-error-metrics/.

Coming to Grips with the Confusion Matrix

People are confused by the concept of a confusion matrix, mainly because they are confused by why confusion is used as a term for the concept. No matter how the concept got its name (depends on who you ask), a confusion matrix is something that data scientists need to know. And yes, you will use it!

Data School

I found good coverage on the topic from a website called Data School. It explains it well and reading through the comments and questions is helpful, too.

People get confused by the false parts of the confusion matrix (I know I did!) What does false positive mean? What about false negative? Does a false positive mean you predicted false when compared to the actual value? Or, does it mean the actual value was false, but you predicted correctly. What finally got me to come to grips is always treat the scenario from the perspective of the predictions. End of story!

Therefore, a false positive means you predicted a positive result, but the actual result was negative. You'll also hear this refered to a Type I error. Most people like to think in terms of diseases. If a doctor predicts someone has a disease, but the person doesn't really have it, this is a false positive. Again, it's from the perspective of the predicted outcome.

A false negative means you predicted a negative result, but the actual result was positive. This is called a Type II error. In the disease scenario, a false negative would mean the doctor predicted a person does not have a disease, when, in fact, they do have it. In this scenario, this would be worse than the Type I error from the patient's perspective.

People also get confused as to which is the Type I and which is the Type II errors. I found a really good explanation of it here:

The first way is to re-write False Negative and False Positive. False Positive is a Type I error because False Positive = False True and that only has one F. False Negative is a Type II error because False Negative = False False so thus there are two F’s making it a Type II. (Kudos to Riley Dallas for this method!)

Ragan, Allison. “Taking the Confusion Out of Confusion Matrices.” Medium, Towards Data Science, 11 Oct. 2018, towardsdatascience.com/taking-the-confusion-out-of-confusion-matrices-c1ce054b3d3e.

The good news is you won't have to worry about getting confused with the true positives and the true negatives. If you predict something as positive and it turns out to be positive, that is a true positive. Reverse the for true negatives.


Don't overlook learning about evaluating machine learning models. It may seem tricky at first, but in the long run, it will help you produce better models.


Data science is one of the hottest fields and attract 6-figure salaries. It's your turn to start earning what you deserve!

About the Author James

James is a data science writer who has several years' experience in writing and technology. He helps others who are trying to break into the technology field like data science. If this is something you've been trying to do, you've come to the right place. You'll find resources to help you accomplish this.

follow me on:

Leave a Comment: