The secret about the start of baseball season 2019 is that it already started. That's right! The season began officially on March 20, 2019. There were only two teams playing on that day (and the next) which were the Seattle Mariners and the Oakland A's. They played two successive games in Japan.
I think the reason for the early start was due to the retiring of Mariner's legend Ichiro Suzuki. March 21st was his last game.
For all the other teams, the season begins on (or after) March 28, 2019.
Statistics is inherent in baseball and data is a huge factor when analyzing baseball stats. The two have been widely used since the early days of baseball, although we have much more data to work with today.
I picked up a copy of Out of the Park Baseball, which is a baseball simulation program. My goal is to track the teams this season using the simulator to see how well it does. Apparently, the program was used to accurately predict the World Series in the past.
Want Your Own Copy of the Latest Version of Out of the Park Baseball? I am authorized to sell this software and you can obtain your copy here. If you purchase through this website, I will receive a commission. The price is not affected by this arrangement and it helps in a small way to keep this website rolling. Besides, it's a fun game to play!
I wanted to do this last year, but the biggest problem I had was getting accurate data into the simulator. Prior to OOTP 20, real time stats were not updated. That means any simulation would have needed me to update stats for all players and all teams. Let's just say that is a gargantuan task, and that's putting it mildly.
However, OOTP 20 now has the option of updating live stats. This makes it possible to run pregame simulations to see how well the program does. Even if I miss a week here and there, it won't matter as the data will be up-to-date.
As an aside, I did run the simulator for the two games between the Mariner's and A's before they played. The simulator predicted Mariner's to win both. I didn't get to write about it before the games were played, unfortunately. You can interpret that anyway you feel. Remember, this isn't about measuring my skill in predicting winning teams. It's about measuring the accuracy of the AI engine in the software. I have no reason to fudge numbers or lie about results.
For the purposes of this discussion, the team that wins the most games out of 5000 simulations, is the one I am declaring as the predicted winner. This may not be a good way to measure the results as a team could win with 1 run. I also provide a strength indicator which is simply the Runs Scored / Runs Allowed for the 5000 simulations.
Why 5000 Simulations?
No team would play the same team for 5000 games in the same season and without both their stats being updated after each game is played. The scenario is unrealistic. However, you could almost think of the scenario as playing one massive game for 45,000 innings (5000 games x 9 innings each). This too, is equally unlikely. However, the law of large numbers in statistics can be of some help here which basically means that the more simulations you run, the closer the results will approach the mean, whatever that number happens to be. For instance, if you consider 100 coin tosses, you may have a distribution of 60 heads and 40 tails. But, if you ran a million coin tosses, the distribution will be close to 50% heads and 50% tails.
Whatever distribution we are going for with the simulation will be closer with more runs. Since running 5000 games doesn't take much time (15 seconds give or take), this seems like a good number that will either produce a strong winner or show two teams that are close to evenly matched. As you'll see from the predictions, some teams are favored strongly, while others are even.
Predictions that are close to 50/50, won't provide much insight from a predictive perspective. When the actual games are played, if the simulator module predicted incorrectly, it can be attributed to randomness, much like a coin toss. For the teams that should win handily, however, we should reasonably expect those predictions to play out.
We can't judge the process with just one day of games, though. It's highly likely that strong teams will have off days. Have you ever flipped a coin multiple times and have it come up as heads (or tails) twice in a row? My guess this has happened often in your life (it has in mine) and there is about a 25% chance of this happening. We need to measure the accurateness of the simulation module using several days of games, which a season should provide.
One caveat: I am new to the game of OOTP. I bought it last year (OOTP 19) and didn't do much with it. As I feel that the topic of AI, stats, and data is quite relevant to baseball, my goal is to use it extensively this year. What I learn about the game, I will be happy to document in this series.
For the predictions, I used the number of games won as the deciding factor, but used the runs scored divided by the runs allowed as a strength indicator. It is possible that whichever team scored the most runs over the 5000 simulations would be declared the winner (again, it would be as if it was one game for 45,000 innings). However, in most cases, the winner based on games won and the higher scorers will coincide. There are occasions in the simulation where this doesn't happen (rare) but it's likely attributable to low scoring games.
Here are the predictions given by the simulation module:
Disclaimer: The above predictions are for information purposes only. Please do not misconstrue this as any advice on which teams to choose for any purposes, including monetary gain.
James is a data science writer who has several years' experience in writing and technology. He helps others who are trying to break into the technology field like data science. If this is something you've been trying to do, you've come to the right place. You'll find resources to help you accomplish this.