The PAT kick has turned into as a ceremonial of a play as the ceremonial first pictch in baseball! It is not a stretch to argue that people new in the sport of American football think that a touchdown is worth 7 points instead of 6. For these newcomers to the sport, it might also come as a surprise when they first see the rare scenario when a team tries for a two-point conversion after a touchdown. Once they realize what is going on, and if they are analytical thinkers, they will be even more confused from the fact that teams do not attempt this play more!!
For an analytical thinker the decision between the PAT kick and the two-point conversion comes down to which provides the highest expected number of points scored. In order to calculate this, one needs two things; the success rate of the PAT kick and the success rate of the two-point conversion play. Then if the two-point conversion is preferable! So everything comes down to calculating the success rates and . Here is where data come in handy! The NFL API provides information for every drive and hence, one can obtain all the touchdown plays and the PAT plays (attempts, success, failures), essentially allowing us to compute and .
I collected information for the past 7 NFL seasons, when 9021 total touchdowns were scored. Out of them, a whooping 8561 were followed by a PAT kick, with 8245 of them being successful (i.e., ). On the contrary, out of the 460 two-point conversion attempts, 235 where successful leading to . Therefore the expected number of points from a kick is 0.981, while the expected number of points from the two-point conversion is larger and equal to 1.02. Is it statistically larger though? Calculating the 95% confidence intervals for the expected points we get for the PAT kick [0.97, 0.983] and for the two-point conversion play the interval [0.95, 1.11]. Hence, statistically we cannot really say with much confidence that the two play-call scenarios provide any difference in the expected points. Nevertheless, we need to also identify that given the small effect size to be detected (if any) the sample size at hand (and in particular the small number of two-point conversion attempts that we can work with) impacts the statistical power of the statistical comparison/test. In brief, the statistical power captures the probability that our test can detect an effect, if the latter exists. Computing the statistical power of our test (using the pwr package in R) we obtain a power less than 0.6 (even at a significance level of 0.1). A typical value for a test to be deemed “powerful” is 0.8 and therefore, the overalpping confidence intervals might be just an artifact of an underpowered test!
However here is where things get interesting. Last year the NFL introduced a new rule for the PAT kick in an effort to make the extra point kick a little more interesting (if this is even possible). In particular, the extra point kick is now taken from the 15 yard line (instead of the 2 yard line). This had a significant and crucial impact on the success rate of the extra point kick. In the following figure we can see the success rate for the PAT kick during the last 7 NFL seasons.
As we can see the rule change led to a lower , which now impacts the expected points from that play. In fact, the 95% confidence interval for the expected points is [0.92,0.954], which still slightly overlaps with the interval for the expected points from a two-point play, while at the 0.1 significance level the two intervals do not overlap.
Overall the conclusion that we can draw from this analysis is that NFL coaches need to attempt more two-point conversions in order to allow us to perform statistical comparisons that are not underpowered and hence, can lead us to more robust conclusions. However, apart from joking, it should be clear that the fact that the dominat decision for PAT is a kick is not justified at all !!! A rational coach would attempt two-point conversions more often (great start Mike Tomlin and Steelers team!!). Of course, this does not mean that the decision coaches take is wrong; it just means that coaches do not act rationally from the angle of point maximization. Coaches though might want to minimize the variance of the point outcome of the play-calling in which case the team should go for the extra point kick (even though the high variance of the two-point conversion might be an artifact of the small sample size). Another possibility is that coaches try to maximize the risk adjusted return, that is, the gain divided by the variance. Obviously there are also special cases, where the extra point kick is clearly the optimal strategy (e.g., a touchdown tied the game with 2 seconds left on the clock and the extra point kick will practically win the game). Overall though I would say that statistically, the 2PT conversion should be the default plan and the extra point kick should be part of situational football!
One parting thought is that an artificial way to increase the sample size for etimating the two-point conversion success rate is to analyze the regular plays from the 2-yard line. However, after thinking about it a little more this approach might not be the best, since regular plays from the 2-yard line might not be equivalent even between them (e.g., do the teams behave the same on a 1st and goal and a 3rd and goal from the 2-yard line? Even more tricky is if the teams just need to reach the 1-yard line to get 3 more attempts). Therefore, I decided not to use these plays in the analysis.
Counterfactual question: What would be the play-calling if the two-point conversion rule was changed to a three-point conversion? To complicate things even more what if you could choose between (a) extra point kick from the 15 yard line, (b) a two-point conversion from the 2 yard line and (c) a three-point conversion from the 5 yard line?