The Start of Baseball Season 2019

The secret about the start of baseball season 2019 is that it already started. That's right! The season began officially on March 20, 2019. There were only two teams playing on that day (and the next) which were the Seattle Mariners and the Oakland A's. They played two successive games in Japan.

I think the reason for the early start was due to the retiring of Mariner's legend Ichiro Suzuki. March 21st was his last game.

For all the other teams, the season begins on (or after) March 28, 2019.

What Does Baseball Have to Do with Data Science?

Statistics is inherent in baseball and data is a huge factor when analyzing baseball stats. The two have been widely used since the early days of baseball, although we have much more data to work with today.

I picked up a copy of Out of the Park Baseball, which is a baseball simulation program. My goal is to track the teams this season using the simulator to see how well it does. Apparently, the program was used to accurately predict the World Series in the past.

Want Your Own Copy of the Latest Version of Out of the Park Baseball? I am authorized to sell this software and you can obtain your copy here. If you purchase through this website, I will receive a commission. The price is not affected by this arrangement and it helps in a small way to keep this website rolling. Besides, it's a fun game to play!

I wanted to do this last year, but the biggest problem I had was getting accurate data into the simulator. Prior to OOTP 20, real time stats were not updated. That means any simulation would have needed me to update stats for all players and all teams. Let's just say that is a gargantuan task, and that's putting it mildly.

However, OOTP 20 now has the option of updating live stats. This makes it possible to run pregame simulations to see how well the program does. Even if I miss a week here and there, it won't matter as the data will be up-to-date.

As an aside, I did run the simulator for the two games between the Mariner's and A's before they played. The simulator predicted Mariner's to win both. I didn't get to write about it before the games were played, unfortunately. You can interpret that anyway you feel. Remember, this isn't about measuring my skill in predicting winning teams. It's about measuring the accuracy of the AI engine in the software. I have no reason to fudge numbers or lie about results.

March 28 Predictions

For the purposes of this discussion, the team that wins the most games out of 5000 simulations, is the one I am declaring as the predicted winner. This may not be a good way to measure the results as a team could win with 1 run. I also provide a strength indicator which is simply the Runs Scored / Runs Allowed for the 5000 simulations.

Why 5000 Simulations?

No team would play the same team for 5000 games in the same season and without both their stats being updated after each game is played. The scenario is unrealistic. However, you could almost think of the scenario as playing one massive game for 45,000 innings (5000 games x 9 innings each). This too, is equally unlikely. However, the law of large numbers in statistics can be of some help here which basically means that the more simulations you run, the closer the results will approach the mean, whatever that number happens to be. For instance, if you consider 100 coin tosses, you may have a distribution of 60 heads and 40 tails. But, if you ran a million coin tosses, the distribution will be close to 50% heads and 50% tails.

Whatever distribution we are going for with the simulation will be closer with more runs. Since running 5000 games doesn't take much time (15 seconds give or take), this seems like a good number that will either produce a strong winner or show two teams that are close to evenly matched. As you'll see from the predictions, some teams are favored strongly, while others are even.

Predictions that are close to 50/50, won't provide much insight from a predictive perspective. When the actual games are played, if the simulator module predicted incorrectly, it can be attributed to randomness, much like a coin toss. For the teams that should win handily, however, we should reasonably expect those predictions to play out.

We can't judge the process with just one day of games, though. It's highly likely that strong teams will have off days. Have you ever flipped a coin multiple times and have it come up as heads (or tails) twice in a row? My guess this has happened often in your life (it has in mine) and there is about a 25% chance of this happening. We need to measure the accurateness of the simulation module using several days of games, which a season should provide.

One caveat: I am new to the game of OOTP. I bought it last year (OOTP 19) and didn't do much with it. As I feel that the topic of AI, stats, and data is quite relevant to baseball, my goal is to use it extensively this year. What I learn about the game, I will be happy to document in this series.

For the predictions, I used the number of games won as the deciding factor, but used the runs scored divided by the runs allowed as a strength indicator. It is possible that whichever team scored the most runs over the 5000 simulations would be declared the winner (again, it would be as if it was one game for 45,000 innings). However, in most cases, the winner based on games won and the higher scorers will coincide. There are occasions in the simulation where this doesn't happen (rare) but it's likely attributable to low scoring games.

Here are the predictions given by the simulation module:

Baltimore Orioles vs. New York Yankees (Home)

Predicted Winner: New York Yankees
Strength Indicator: 1.42

New York Mets vs. Washington Nationals (Home)

Predicted Winner: Washington Nationals
Strength Indicator: 1.06

St. Louis Cardinals vs. Milwaukee Brewers (Home)

Predicted Winner: St. Louis Cardinals
Strength Indicator: 1.08

Atlanta Braves vs. Philadelphia Phillies (Home)

Predicted Winner: Atlanta Braves
Strength Indicator: 1.06

Detroit Tigers vs. Toronto Blue Jays (Home)

Predicted Winner: Toronto Blue Jays
Strength Indicator: 1.14

Houston Astros vs. Tampa Bay Rays (Home)

Predicted Winner: Houston Astros
Strength Indicator: 1.15

Chicago Cubs vs. Texas Rangers (Home)

Predicted Winner: Chicago Cubs
Strength Indicator: 1.26

Los Angeles Angels vs. Oakland Athletics (Home)

Predicted Winner: Los Angeles Angels
Strength Indicator: 1.04

Pittsburg Pirates vs. Cincinnati Reds (Home)

Predicted Winner: Cincinnati Reds
Strength Indicator: 1.07

Colorado Rockies vs. Miami Marlins (Home)

Predicted Winner: Colorado Rockies
Strength Indicator: 1.46

Cleveland Indians vs. Minnesota Twins (Home)

Predicted Winner: Cleveland Indians
Strength Indicator: 1.15

San Francisco Giants vs. San Diego Padres (Home)

Predicted Winner: San Diego Padres
Strength Indicator: 1.06

Arizona Diamondbacks vs. Los Angeles Dodgers (Home)

Predicted Winner: Los Angeles Dodgers
Strength Indicator: 1.14

Chicago White Sox vs. Kansas City Royals (Home)

Predicted Winner: Kansas City Royals
Strength Indicator: 1.15

Boston Red Sox vs. Seattle Mariners (Home)

Predicted Winner: Boston Red Sox
Strength Indicator: 1.46

Disclaimer: The above predictions are for information purposes only. Please do not misconstrue this as any advice on which teams to choose for any purposes, including monetary gain.

Actual Results

Baltimore Orioles vs. New York Yankees
Predicted Winner: New York Yankees
Actual Winner: New York Yankees-
Predicted Correctly

New York Mets vs. Washington Nationals
Predicted Winner: Washington Nationals
Actual Winner: New York Mets-
Predicted Incorrectly

St. Louis Cardinals vs. Milwaukee Brewers
Predicted Winner: St. Louis Cardinals
Actual Winner: Milwaukee Brewers-
Predicted Incorrectly

Atlanta Braves vs. Philadelphia Phillies
Predicted Winner: Atlanta Braves
Actual Winner: Philadelphia Phillies-
Predicted Incorrectly

Detroit Tigers vs. Toronto Blue Jays
Predicted Winner: Toronto Blue Jays
Actual Winner: Detroit Tigers-
Predicted Incorrectly

Houston Astros vs. Tampa Bay Rays
Predicted Winner: Houston Astros
Actual Winner: Houston Astros-
Predicted Correctly

Chicago Cubs vs. Texas Rangers
Predicted Winner: Chicago Cubs
Actual Winner: Chicago Cubs-
Predicted Correctly

Los Angeles Angels vs. Oakland Athletics
Predicted Winner: Los Angeles Angels
Actual Winner: Oakland Athletics-
Predicted Incorrectly

Pittsburg Pirates vs. Cincinnati Reds
Predicted Winner: Cincinnati Reds
Actual Winner: Cincinnati Reds-
Predicted Correctly

Colorado Rockies vs. Miami Marlins
Predicted Winner: Colorado Rockies
Actual Winner: Colorado Rockies-
Predicted Correctly

Cleveland Indians vs. Minnesota Twins
Predicted Winner: Cleveland Indians
Actual Winner: Minnesota Twins-
Predicted Incorrectly

San Francisco Giants vs. San Diego Padres
Predicted Winner: San Diego Padres
Actual Winner: San Diego Padres-
Predicted Correctly

Arizona Diamondbacks vs. Los Angeles Dodgers
Predicted Winner: Los Angeles Dodgers
Actual Winner: Los Angeles Dodgers-
Predicted Correctly

Chicago White Sox vs. Kansas City Royals
Predicted Winner: Kansas City Royals
Actual Winner: Kansas City Royals-
Predicted Correctly

Boston Red Sox vs. Seattle Mariners
Predicted Winner: Boston Red Sox
Actual Winner: Seattle Mariners-
Predicted Incorrectly *** Game still being played as of this writing!

About the Author James

James is a data science writer who has several years' experience in writing and technology. He helps others who are trying to break into the technology field like data science. If this is something you've been trying to do, you've come to the right place. You'll find resources to help you accomplish this.

follow me on: