Math Vacation: August 2024

Wednesday, August 21, 2024

Baserunning Runs WAR Adjustment Proposal

(Image: https://www.iconfinder.com/uberux)

Below is a paper presented at the 2024 Sabermetrics, Scouting, and the Science of Baseball seminar at Illinois Tech, August 24-25.

“Baserunning Runs WAR Adjustment Proposal”

Samuel Rees

Naperville North High School

August 24, 2024

Introduction and Abstract

Before the 2023 MLB season, major changes to the rules and standards of the game created a shift in base running value. With the implementation of a pitch clock, pitchers and hitters havel ess time to recover and reset between pitches and at-bats. Pitchers also have a restriction on how many times they can attempt to pick off the runner at any base. Pitchers can move to pick off a baserunner two times with no penalty, but if they try a third time and do not record an out in the attempt, the runner automatically advances a base. The standards have also changed within baserunning. Larger bases decrease the distance between each base and thus, encourage runners to steal more bases with the shorter distances. All of these changes led to shorter game times and increased run production in 2023 relative to the year prior. However, these changes have also impacted how baserunning runs contribute to a player’s produced value in baserunning. Therefore, there needs to be an adjustment in the calculation of Baserunning Runs Value in terms of the Wins Above Replacement calculation to account for these rule changes.

Starting with the 2018 season, which featured 2,474 stolen bases, and continuing through 2022 (excluding 2020), the number of stolen bases that all teams produced decreased by about 40 stolen bases per year. If this trend were to continue, then in 2023 without the rule changes, there should have been roughly 2,437 stolen bases. This pales in comparison to the actual number of stolen bases in 2023, which was 3,500 (1).

I will now present data, acquired from FanGraphs, that shows changes in stolen base production and other factors to illustrate why the calculation needs to be updated. According to FanGraphs, Weighted Stolen Bases, a component of the WAR calculation, is calculated in part by using a series of constants. These constants change year-by-year to account for different run environments and other factors. For every year since 1871, these values have changed.

However, the constant for stolen bases has remained fixed at the same value constant since the 1871 season. It has not been changed to account for the new rules, even though the rules have affected how players steal bases and how the value of modern baserunning affects the game (2).

For calculating Runs Added from Stolen Bases, I will reference the FanGraphs constant formula with two separate events, direct from their website. The purpose of the formula is to compare each player’s stolen base runs created per opportunity compared to league stolen base runs per opportunity. The constant for the formula is derived from historical league data, and the events from each time a Stolen Base or Caught Stealing event occurred.

(SB x runSB) - (CS x runCS) - (lgwSB x (1B + BB + HBP - IBB)) = Stolen Base Runs

In this formula, SB represents stolen bases, runSB represents the constant for stolen bases, CS represents caught stealing events, and runCS represents the constant for caught stealing events. This formula also contains lgwSB, which represents the league average stolen base runs created per opportunity. The calculation for lgwSB is:

lgwSB = (SB x runSB) + (CS x runSB) / (1B + BB + HBP - IBB)

SB and CS, as well as 1B, BB, HBP, and IBB, are all counting stats and league totals. With this formula, we can compare Stolen Base Runs Value for two players from two differing seasons.

2023 Francisco Lindor vs. 2019 Jarrod Dyson

2023 Francisco Lindor and 2019 Jarrod Dyson are two players with nearly identical stolen base success rates for their respective seasons (88.5% for Lindor and 88.2% for Dyson) and who played in the same league during the last 5 years (3). For these equations, we will calculate the lgwSB value first for each player/season:

2023 lgwSB: (3503 x 0.2) + (866 x -0.422) / (26,031 + 15,819 + 2112 - 474) = 0.00771

2019 lgwSB: (2280 x 0.2) + (832 x -0.435) / (25,947 + 15,895 + 1984 - 753) = 0.00211

I will then use each season value to calculate each player’s stolen base run values:

2023 Francisco Lindor: (0.2 x 31) - (0.425 x 4) - (0.00771 x (87 + 66 + 12 - 1)) = 3.2356

2019 Jarrod Dyson: (0.2 x 30) - (0.426 x 4) - (0.00211 x (72 + 47 + 1 - 0)) = 4.0428

The calculated difference of .8072 between Lindor and Dyson’s Stolen Base Run Values signifies both rule changes and additional adjustment factors like league environmental conditions.

This value demonstrates the need for a separation of players' stolen base run values into two eras: Pre-Pitch Clock and Post-Pitch Clock. As described, 2023 Lindor’s runs value created from stolen bases is less valuable than 2019 Dyson’s runs value created from steals, even though their stolen base and caught stealing metrics are nearly identical.

In my proposal, the stolen base constant for players post-2023 should be changed to a lower value, to reflect the change in run environment and to distinguish the changing difficulty in stealing bases between pre- and post-pitch clock eras.

Data Used

To make a numerical adjustment to baserunning Wins Above Replacement and how it is calculated, I needed to analyze how much of an advantage baserunners have now, given the new rule environment.

Next, we can look at the distance between bases. The size of the bases increased from 15 inches square to 18 inches square. Consequently, the distance from first to second base and second to third base decreased by 4.5 inches. The distance from third base to home plate and home to first base also decreased by 3 inches. With the bases slightly closer, a baserunner has to cover less ground in the same amount of time prior to successfully stealing a base. Additionally, the larger bases can affect and reduce over-sliding and lead to more positive baserunner outcomes.

Finally, the base size increase also makes it easier for runners to avoid a tag, as there is now more area for them to reach safely during a tag play (4).

In 2022, total pitcher pickoff attempts reached an average of 6.07 attempts per game. In 2023, that number decreased to 3.94 attempts per game (through the first 6 weeks of the season). The updated rule changes allowed runners to advance and take a base more easily, with fewer pickoff attempts allowed in addition to “timing” a steal attempt based on the pitch clock.

Finally, we can look at pitch clock analysis. Pitchers cannot look over at runners as often or as long as they were previously used to, creating a shift in the mental tactics used in pickoffs and how easily runners can fool pitchers. In 2022, the average time between pitches with runners on base was 23.1 seconds. In 2023, this number dropped to 19.0 seconds (5). The 4.1 seconds between seasons may seem insignificant, but considering that 1-2 more seconds for a pitcher could mean the difference between successfully attempting to pick off a runner or not, it must be considered. With the introduction of an 18-second pitch clock for the 2024 season, we can expect these times to decrease even further.

Explanation of Data

For these reasons outlined above, the constant for stolen base events should be updated to reflect the new rules. Due to the shorter distance between bases, the decrease in pitcher pickoff attempts, and the implementation of the pitch clock, there needs to be a change in how stolen bases are valued. Furthermore, the increase in the number of successful stolen bases shows the disparity in how stolen bases are calculated versus how they are valued in today’s game.

Explanation of Value of Outs

The value of an out on the basepaths is typically twice as impactful as a stolen base because of the implication of the out(6). Getting caught stealing removes the runner from the basepaths and an out from a team’s 27 outs available. Because of the gap between how valuable an out versus an advanced base is, teams would probably rather keep a runner on their base rather than risk an out on the basepaths. However, with the new rules, there is now less of a risk of sending the runner to steal because of all of the baserunning advantages.

Due to the changes in rules, pitchers now have fewer ways to get runners out on the basepaths and base runners have more advantages compared to before 2023. Because there are fewer ways to get runners out, there is less likelihood that the runner will get out. The value of a stolen base compared to an out after the rule and standard changes is not the same as in the years before 2023.

Special thanks to my analytics advisor Connor Binnig, MSC, University of Chicago.

(1) 2022 & 2023 Team Statistics, Baseball Reference

(2) Seasonal Constants, FanGraphs

(3) Francisco Lindor 2023 Statistics, Baseball Reference. Jarrod Dyson 2019 Statistics, Baseball Reference.

(4) Basepath measurements with new bigger bases, MLB.com

(5) Statcast Pitch Tempo Leaderboard, Baseball Savant

(6) Seasonal Constants, FanGraphs

Saturday, August 17, 2024

A375119 Contribution to the OEIS

This is my 21st published sequence in the OEIS. I have a few other sequences planned based on Kruskal counts (see post: https://jamesmacmath.blogspot.com/2024/08/kruskal-count-with-prime-omega.html).

A375119

Begin A060403 with n instead of 1; a(n) is the position in the new sequence at which it generates the same numbers as A060403 or a(n)=0 if it doesn't.

1, 4, 2, 1, 3, 3, 6, 1, 2, 2, 5, 5, 1, 5, 5, 6, 4, 4, 10, 4, 1, 4, 5, 5, 3, 3, 9, 9, 9, 1, 3, 9, 4, 4, 2, 1, 8, 8, 8, 2, 8, 5, 3, 3, 1, 2, 27, 7, 7, 4, 7, 5, 2, 1, 3, 3, 26, 6, 6, 4, 6, 26, 1, 2, 3, 3, 25, 5, 5, 25, 5, 25, 1, 2, 3, 3, 24, 4, 4, 3, 4, 24, 113 (list; graph; refs; listen; history; edit; text; internal format)

	OFFSET	`1,2`
	COMMENTS	`The indices of the matching entries of A060403 and this sequence do not necessarily have to be the same (see Examples).`
	LINKS	`James C. McMahon, Table of n, a(n) for n = 1..1000` `Wikipedia,Kruskal count`
	EXAMPLE	`Using () to indicate the point at which the new sequence generates the same numbers as A060403:` `A060403: 1, 4, 8, 13, 21, 30, 36, 45... a(1)=1` `Start=2: 2, 6, 9, (13), 21, 30, 36, 45... a(2)=4` `Start=3: 3, (8), 13, 21, 30, 36, 45... a(3)=2` `Start=4: (4), 8, 13, 21, 30, 36, 45... a(4)=1`
	MATHEMATICA	`oneseq=NestList[#+Length[Select[Characters[IntegerName[#, "Words"]], LetterQ ]]&, 1, 200] (* oneseq is A060403 *); seq={}; Do[ i=1; s=n; While[!MemberQ[oneseq, s], s=s+Length[Select[Characters[IntegerName[s, "Words"]], LetterQ ]]; i++]; AppendTo[seq, i], {n, 83}]; seq`
	CROSSREFS	`Cf. A060403.` `Sequence in context: A271310 A071406 A337063 * A010311 A346972 A326485` `Adjacent sequences: A375115 A375116 A375118 * A375122 A375123 A375124`
	KEYWORD	`nonn,base,new`
	AUTHOR	`James C. McMahon, Jul 30 2024`
	STATUS	`approved`

The Look and Say Sequence, A005150

The Look and Say Sequence begins 1, 11, 21, 1211, 111221, 312211...

Each term is a description of the prior term. For example, the second term, 11, is read as one 1 and describes the previous term, 1. The fifth term, 111221, is read as one 1, one 2, two 1s and describes the prior term, 1211.

The sequence is often attributed to the mathematician, John Conway. However, according to the On-Line Encyclopedia of Integer Sequences (OEIS), the sequence's first mention dates back to the 1977 International Mathematical Olympiad in Belgrave, Yugoslavia. In the OEIS, the sequence is A005150.

Recently, Scientific America wrote about the sequence in their puzzle section: https://www.scientificamerican.com/game/math-puzzle-next-sequence/

Friday, August 16, 2024

Prime Number Magnet

(Image: https://opensource.org/license/MIT)

In a prior post the property of all prime numbers greater than 3 can be expressed as 6n +/-1. A question that may arise from this property is whether all multiples of 6 are adjacent to a prime. The short answer is no, but one needs to review all multiples of 6 up to 120 before one finds the first multiple of 6 that is not adjacent to a prime number.

Examples:

1 x 6 = 6 is adjacent to primes 5 and 7

2 x 6 = 12 is adjacent to primes 11 and 13

3 x 6 = 13 is adjacent to primes 17 and 19

4 x 6 = 24 is adjacent to prime 23

At 20 x 6 = 120 is adjacent to 119 (composite 7 x 17) and 121 (composite 11 x 11)

As with many patterns of integers, it is always worth checking the On-Line Encyclopedia of Integer Sequences. We find that sequence {120,144,186,204,216,246,288,300...}

multiples of 6 that are not a prime number +/- 1 is sequence A259826.

Note: the first entry at 120 occurs after the first prime number gap >8 which is between 113 and 127.

Saturday, August 10, 2024

The Mystery Calculator

(Image: https://www.iconfinder.com/ibobicon)

Below is a link to the "mystery calculator." The user is asked to choose between four and seven cards. Each card displays several different numbers. Next the user is asked to pick a secret number that is on any of the cards. Finally, the user is asked to select all the cards that display that number. The "calculator" then determines the number chosen by the user.

https://eddmann.com/mystery-calculator-clojurescript/

How does this work? Consider the option in which five cards are displayed. The numbers 1 through 31 are shown on the five cards. Most of the numbers appear on more than one card. When the user selects the cards showing their number, each card is either yes-it has my number or no-it doesn't have my number. For five cards there are 2^5 = 32 possible arrangements of yes/no combinations. Therefore there is a unique combination for each of the numbers displayed on the cards. For example the number 1 is only shown on the first card, while 31 is shown on all five cards.

Sunday, August 4, 2024

Kruskal Count with Prime Omega

A Kruskal count (or a Dynkin-Kruskal count) is a sequence of entries in which each entry is based on a property of the prior term. For examples of a Kruskal count, see the post about magic tricks that are based on this concept. Also see Wikipedia: Dynkin-Kruskal Count.

For this post, a series of Kruskal counts will be developed using the prime omega function of the prior entry. Prime omega, sometimes referred to as Big Omega, is the number of prime factors a number has, with multiplicity. Prime omega of 2 is 1 since it has one prime factor. Prime omega of 12 is 3 since it has the prime factors 2*2*3.

For this series of sequences, the first term is designated by a(1)=m, and the formula for each subsequent term is a(n)=a(n-1)+primeomega(a(n-1)). The base sequence of the series has m=2 and the sequence is: {2,3,4,6,8,11,12,15,17,18,21,23,24,28,31,32,37,38,40,44,47…}.

For m=3, the sequence becomes {3,4,6,8,11,12,…}. Beginning with first term of the m=3 sequence and the second term of the base, or m=2, sequence, the subsequent terms of the two sequences are the same. Likewise for the m=4 sequence, the terms are the same beginning with its first term.

For m=5, the sequence becomes {5,6,8,11,12…}. In this case the terms are same as the base sequence beginning with its second term (6).

Since for all m (conjectured - at least up to 30,000 have been tested), sequences will match up with the base sequence, to document the entire series of sequences, all terms of the sequences do not have to be listed. One just needs to note at which point an m>2-sequence begins to match up with the base sequence. For example, at m=29,052, the matching of terms doesn’t occur until the 32nd term of the sequence. This happens to the maximum of all the sequences up to m=30,000. To fully document the m=29,052-sequence, one would just need to list the first 31 terms. To know subsequent terms, one could then refer to the base sequence.

These matching points can be listed as a new sequence and its first 86 terms are:

{1,1,1,2,1,2,1,2,2,1,1,5,4,1,3,1,1,3,2,1,2,1,1,6,2,5,1,5,4,1,1,3,3,2,2,1,1,5,1,4,3,2,1,2,2,1,1,3,2,2,5,1,1,4,2,3,1,2,1,3,2,7,1,7,6,6,5,5,1,4,3,1,1,4,1,2,3,1,1,2,9,9,8,1,8,1}. Note that the first three terms of the base sequence is 2, 3, 4, so the sequence above begins 1, 1, 1 because for m=2, m=3, and m=4, their first terms are found in the base sequence. For other starting numbers, one needs to explore higher before matching occurs. For example, with m=13, the point at which the sequence begins to match up with the base sequence is the 5^th term.

The Mathematica program to produce the base sequence is (producing 9,999 terms):

pseq1=NestList[#+PrimeOmega[#]&,2,10000]

This sequence is found in the On-Line Encyclopedia of Integer Sequences (OEIS): A160649.

The Mathematica program to produce the sequence indicating the first term at which the sequences for m=2(the base sequence itself) and higher m’s match up with the base sequence is:

pseq={}; Do[ i=1; s=n; While[!MemberQ[pseq1, s], s=s+PrimeOmega[s]; i++]; AppendTo[pseq, i], {n,2, 30000}];pseq

The histogram of the distribution of the terms of this sequence is:

This program produces the first 29,999 terms (note: it’s first entry starts with n=2). As noted above, the sequence begins with: {1,1,1,2,1,2,1,2,2,1,1,5,4,1,3…} and is not found in the OEIS. However, the author plans to submit it as a proposed sequence.

Update 8/18/2024: A375508 is currently a draft in progress.