Dear 100 Hour Board,
True or false: the relationship between the number of marathons/marathon like things in Utah and the number of unwed and thus sexually frustrated YSAs in the state is statistically significant.
Please define the relationship to the best of your ability.
Extra thumbs for inclusion of lovely charts and/or diagrams.
-I know you saw what I did there
Dear I don't know what you're talking about,
I wasn't sure from your question whether marathons or singles was supposed to be the independent variable, so I went with the combination that would yield easier-to-read regression numbers. Thus, the hypothesis is that marathons and marathon-like things cause singleness in Utah. Any number of explanations might provide a viable theoretical framework to explain why, including but not limited to:
- Singles in Utah are too busy running marathons, so they don't have time to date.
- People at marathons stink because they just ran 26 miles. No one wants to marry a stinky person, so singles who meet at marathons are less likely to get married.
- Marathons edge out all other forms of recreation. People have to be crazy to want to run 26 miles. Not-crazy people don't want to run in the marathons, but there aren't any other options. So not-crazy people stay at home playing video games, and not-crazy singles never meet each other. All marriages in the state of Utah are between crazy marathon runners.
- Where there are more marathons, people run so much that their libido drops off the charts and they no longer feel a need to get married because all that passion is channeled into their running. (The opposite of sexual frustration, if you will.)
I decided to approach this question using data from each of Utah's 29 counties. Your definitions were a little loose: do widowed and divorced people count as single? What are "marathon-like things," exactly? For the purposes of this study, I defined single as "never married" and used the US Census website to get that information for each county. I defined marathon-like things as "any running event that pops up on RunningintheUSA.com's search feature for Utah during 2014." That includes 5k and 10k runs, triathlons, walks, relay races, and stair climbs. I also gathered data on some control variables, like county population, proportion of population between the ages of 20 and 24, and the number of married, divorced, widowed, and separated adults. That all came from the census as well.
I couldn't find numbers on how many of the single adults in the state are members of the Church, which I assume you wanted because you said "YSAs." Nor were there any data on age distribution among singles. Therefore, these data include people who are not members of the Church and who are over the age of 18, not just members between 18 and 31.
A scatter plot between the number of marathons run and the number of single adults looks like this:
Obviously there is a pretty strong relationship, and it appears to be curvilinear. A simple scatter plot, however, cannot control for other variables. So I ran a couple of regressions, and here are the results:
As you can see, the relationship is only statistically significant when no other variables are controlled for. When I include the other data, however, statistical significance drops down to about 0.49. Given that the scatter plot indicated that the relationship was curved, not linear, I created a logged singles variable and ran a regression with that as the dependent variable. The relationship still doesn't come up as statistically significant. Which is good, because I've forgotten how to interpret logged coefficients.
As a visual representation of just how insignificant the relationship is, here is a graph of the 95% confidence interval:
Basically, the red line is the predicted relationship between marathons and singles. But we would want to be super-confident that the relationship actually exists, right? I think 95% confident sounds like a good threshold. We can be 95% confident that the slope of the relationship falls somewhere between the green line and the yellow line. Meaning that we can't even be sure whether the relationship is positive or negative. Statistically significant? I don't think so.
P.S. I've compiled the data in an Excel Spreadsheet for your delectation and delight, just in case you want to double check my numbers. Or redo the whole regression, since I've forgotten most of what I learned in statistics.