So whenever I see a house poll here I always check it against 538’s house forecasting model, in hopes that Nate is underestimating our chances of keeping the house. Lately, I have been noticing some fairly sharp divergences between the poll numbers and his calls. He’s got KY-6 as a 56% chance of a GOP take-over. The GOP candidate just released an internal showing himself down 7. Nate’s got us with a 47% chance of keeping PA-8. Today’s poll shows us down 14 there. Nate shows a 68% chance of a takeover in MI-7. Rossman group shows the race within the margin of error.
Anyone with two brain cells to rub together can figure out half a dozen reasons this isn’t a knock on Nate Silver, who, frankly speaking, is the single baddest motherf*cker on the planet. These include: the polls I just named were all disclosed after the model was released; structural factors may have more to do with the outcome than the polls; 538’s model is designed to estimate the probability of a certain number of seats, not the likelihood of each individual outcome.
But it does get me thinking, if all we knew were the poll-numbers, how would things look in the house? So humor my ghetto-ass statistical talents a moment and come scratch around on my cocktail napkin…
In order to determine the predictive power of polling about a month outside of a midterm election, I examined 46 races polled between Tuesday October 3, 2006 and Thursday October 5, 2006, as reported here http://www.realclearpolitics.c…
I excluded Foley/Mahoney (because of name-on-ballot-hijinks, and what I can only imagine would be a fairly massive Foley-related Dinkins effect), and all 3 way races. I didn’t exclude internals and partisans (since that is a lot of the contemporaneous house data), and I simply didn’t have enough data from this period of 2006 to limit the review to house races. Some of the included races were Reuters/Zogby, (though I dont think this was part of Zogby’s experimental non-euclidean polling phase). I also heard a rumor that it isn’t October 3, 2010 yet, but this was as far back as this particular web-site reported historical data, so, you know, welcome to my cocktail napkin. The margins of error in the polls were generally in the neighborhood of 5%.
Anyway, I calculated the difference between the polling numbers from this little window in 2006 and the ultimate outcomes, averaging if a race were polled twice. I probably got some math wrong, but I found the following potentially helpful facts:
the average distance (in either direction) between a candidate’s polling lead in this period and the outcome of the election was 6 points (6.3, actually, but Lord Holy-F*cking Jesus God Almighty, you do not want to know the abominations I have perpetrated against the concept of significant figures in this process).
Candidates with leads between 1% and 6% at this point (and by “lead” I mean “lead in the polls conducted on these three days”), won nine out of sixteen races. Candidates with leads in excess of 6 points won 87.5% of their races (though they were only 6 out of 9 if the lead was between 7 and 12 points).
The races were identically likely to move in the direction of either the candidate winning or the one losing.
72% of the races “moved” less than 6% in either direction.
91% of the races “moved” less than 12% in either direction.
Anyway, so I’m all ears with what to do with all this, but it seems to me that it would be worthwhile to take a look at the 68 races 538 ranks as “lean, even, or possible takeover” and see how many have been polled in spitting distance of the right-ish-now-ish era (a concept operationally defined by the cocktail napkin). Then I’d like to know how many are within 6 and 12 points right now, and would probably call anything within 6 a toss-up, and keep anything within 12 on the board.
And that’s probably part 2. I guess I have till October 3. Unless anyone wants to pitch in…
(Please keep comments in the spirit of the cocktail napkin. My cat took over the keyboard, and he’s not that great with margins or error. Meow.)