Multivariate analysis of Wisconsin polling data

(This is cross-posted at Daily Kos)

A couple weeks ago, Kos/PPP polled all the Wisconsin Republicans up for recall and found some very interesting results. However, he did not poll the Dem races up for recall, as well as the statewide upcoming Supreme Court race. In an attempt to rectify this fault, although I’m no Poblano, I decided to try to use multivariate regression to try and model the Wisconsin polling data using information from each district.

Fortunately, the dynamics of the race were much simpler than Clinton vs. Obama 2008, and I found in the end that the polling results could be described by only two variables (which is very nice, as we only had eight data points, so the model shouldn’t be overfitted):

1. Obama: The percentage of the vote Obama received in 2008 (Courtesy of SSP)

2. Incumbency: The number of years the person has been in office (for instance, someone elected in 2008 would have an Incumbency of 2 years.) – Numbers from SSP above.

I also experimented with other variables (which I discarded in the end as being not statistically significant):

3. Barrett: The percentage of the vote Tom Barrett received in 2010. (Thanks to the Journal-Sentinel)

4. Scandal: A 1/0 value describing the unique circumstances of the aptly named Randy Hopper, and perhaps Mr. Prosser as well.

5. Kerry: Percentage of the vote Kerry received in 2004

I also decided on using percentages rather than margins as there was a better correlation between the two.

In the end, my 2-variable model describes very accurately (within +/- 1.5%) the percentage of voters who would commit to voting for a Democrat in a hypothetical election this year; the spreadsheet is included below.



(The main prediction is highlighted in red. There are other columns to the right which include the additional variables that did not turn out to be significant.)

In short, the vulnerability of each Senator is based mostly off of Obama’s performance in the state in 2008, along with a small bonus from incumbency (about 0.3 points per year in office.) Thus, Hopper is quite vulnerable simply from being a freshman (the scandal had not impacted his poll numbers at that point yet), while Alberta Darling has built up goodwill from being in office for 18 years.

Extrapolating this model for the three Democrats who are considered semi-vulnerable, we find they are mostly safe. The only one who’s really vulnerable is Mr. Holperin, who was first elected by 2.5% in 2008 and represents a seat Obama won by single digits. Note that I give the Democrats negative incumbency so it gives a bonus to the D #s (rather than a penalty), and since the model considers undecideds, anything 48% or up is probably leaning D.

Examining Justice Prosser, who gets elected by the State of Wisconsin as a whole, we find that the seat is probably somewhat leaning D at this point, but I would put the margin of error much higher on this estimate – the race is still developing, and a Supreme Court race is very different from a Senate one.

Comparing ways of rating congresspeople

There are a variety of ways to rate congresspeople, and I will cover several, but I’ll spend most of my time on the method I think best.  It’s seriously geeky, but I give a nongeeky summary, and then I give links to the geeky parts.

Many organizations rank congresspeople.  In the Almanac of American Politics, they include ranks from mny.  Each of these organizations looks at votes on their particular issues, and sees how each congress person votes (for their position or against it).  I am not going to talk more about these individual organizations.  

I will discuss three ways of ranking or rating congresspeople, they are used by a) National Journal  b) Progressive Punch  and c) Keith Poole and his associates.  I think the last is the best.

National Journal ratings does the following for the House, and similar for the Senate:

House members are assigned separate scores for their roll-call votes on key economic, social and foreign-policy issues during 2008. The members are rated in each of the three issue categories on both liberal and conservative scales, with the scores on each scale given as percentiles. An economic score of 78 on the liberal scale, for example, means that the member was more liberal than 78 percent of his or her House colleagues on the key votes in that issue area during 2008. A blank in any cell in the table below means that the member missed more than half the rated votes in an issue area. Composite scores are an average of the six issue-based scores. Members with the same composite scores are tied in rank. (C) indicates a conservative score; (L) indicates a liberal score.

If you sort on “composite”, you’ll see one issue: There are a lot of ties.  The top 12 representatives are all tied.  In the senate there are fewer ties.  But how does Bernie Sanders rank as tied for 13th most liberal, and with almost the same rating as Clinton?

The details of how they rated the congresspeople are for subscribers only, but they do have this snippet:

A panel of National Journal editors and reporters initially compiled a list of 167 key congressional roll-call votes for 2008 — 79 votes for the Senate and 88 for the House — and classified them as relating to economic, …

So it seems like they averaged a bunch of votes.

Progressive Punch rates people on the percentage of correct votes, and it offers ranks based on all voeertes, crucial votes, and votes on particular issues.  It is kept up to date, which is a major plus.  This has some advantages and disadvantages.  According to their methods, the three most progressive senators are: Roland Burris, Kirsten Gillibrand, and Edward Kaufman.  Huh?  Well, all 3 have 100% ratings.  Even for Senators that have been in for a while, there are anomalies: Is Sherrod Brown really as liberal as Bernie Sanders?  One problem is revealed when we see that Ted Kennedy has a very low rating for 2009-10: They don’t deal properly with missed votes.  If we look at “Crucial Votes” for “lifetime” Jack Reed is rated as the most progressive senator among those who have been in the Senate for at least one full session.  

The way they came up with scores is summarized here. Briefly, they first identified a few “hardcore progressives” in the Senate and the House.  The ‘overall’ ratings are based on votes in which a majority of those progressives voted against a majority of the Republicans.  The problem here is that all votes are weighted equally, and this isn’t right (see below).  


The crucial votes are a subset of those, specifically:

The votes used to calculate the scores in the “Crucial Votes ’09-’10” column are a subset of the overall votes that qualify according to the Progressive Punch algorithm described above. They show the impact that even a small number of Democrats have when they defect from the progressive position. These are votes where EITHER progressives lost OR where the progressive victory was narrow and could have been changed by a small group of Democrats voting differently.

 This is better, but it’s not as good as more sophisticated methods.

Why not? Well, the good people at Progressive Punch recognize the problem: Not all votes are equal, even among those that are ideological.  Some are easy wins, some are lost by a lot.  But they dichotomize this into “crucial” and “noncrucial” when there is really a continuum.

The site is great for looking into past votes of congresspeople, and it’s great that they keep it up to date, but there is one better method.

That is the method used by the people at voteview.  The software and methods are the best, but it’s not the most user friendly site in the world.  They describe two methods of rating congresspeople: NOMINATE and Optical Classification.  Both are based on using every vote and attempting to place legislators in a way that maximizes the ability to predict how they will vote.  Both work really well: Optimal classification works a bit better, but takes more computer time; NOMINATE (if I understand it correctly) allows placement of issues as well as politicians.  With a single number for each congressperson, you can predict, with 95% accuracy, how they will vote on any bill.

One question is whether a single dimension (liberal to conservative) is enough to accurately classify people.  For most periods in American history, it is.  In the 1960s, a second dimension (racial attitudes) added a lot to the accuracy, but, right now, one dimension does very well.  You can see how OC works in one dimension.  It predicts 95% of the vote correctly.  Note that the things that look like fancy script L (or the old sign for pound) are supposed to be less than or equal to signs.

I am not going to duplicate the example in that link, but I’ll try to explain it a bit more (you might want to open it in another window).  The diamonds are legislators, the spades are ‘cutting points’ for nine votes, each with a different number of “ayes” and “nays”.  The Ace of Spades is a vote with only one “aye”, the two of spades has two “ayes” and so on.  Now, we attempt (first iteration) to place legislators correctly per the votes.  That gives the diagram listed after 2.  Then we re-order the cutpoints, as shown in step 3, and repeat the process.  

(end geekiness)

How do these methods compare?  I am not going to compare all the senators and reps, simply because I can’t figure out an easy way to copy the data into a spreadsheet.  But let’s take 5 well-known Senators from the 110th Senate:  Feingold, Schumer, Bayh, Specter and Coburn.

             OC rank                PP lifetime    NJ 2008 comp.      

Feingold -     most liberal           20th           37th

Schumer -      16th most liberal      16th            7th

Bayh  -        51st most liberal      45th           51st

Specter -      56th most liberal      59th           53rd

Coburn -       101st most liberal     71st           92nd



(there are 102 ranks in OC because of senators getting replaced …e.g. WY has Enzi, Barasso and Thomas).  I couldn’t find Progressive Punch for the 110th, so I gave lifetime ratings.

Which do you think is most accurate?

By what margin will Bob Shamansky win?

View Results

Loading ... Loading ...