(crossposted on dailyKos)
There are lots of ways of trying to figure out which congressmen are vulnerable. Today, I’ll look at a few statistical ones, based on logistic regression.
Don’t run away, just go below the fold
Statistical background: Regression is a set of techniques that can be used you have one dependent variable (DV) and one or more independent variables (IVs). The DV is thought, in some way (we’ll leave that vague) to depend on the others. If the DV is a continuous, the most popular technique is ordinary least squares regression – it’s so popular that if you just say ‘regression’ people will assume that’s what you mean. When the DV is categorical, the OLS regression won’t work (if you want to know why, ask!). The most common technique there is called *logistic regression*. One of the things that any regression produces is a set of predicted values and residuals. In logistic regression, the predicted values are probabilities, and the residuals are differences between the probability and either 1 or 0. (Technical aside – yeah, I know there’s ordinal and multinomial, but let’s keep it simple, OK?)
Let’s put that into context. If you want to model the probability that a district will elect a Democrat, then the predicted value is the probability of them electing a Democrat. If they *do* elect one, then the residual is 1 – the probability. If they elect a Republican, then the residual is the probability. So, one way of looking at vulnerability is to see Republicans who have high residuals – that is, the district seems likely to elect a Democrat.
For our first model, we’ll use Cook PVI as the IV. Cook PVI is basically a measure of how the district voted in 2000 and 2004 presidential elections.
Not surprisingly, there’s a strong relationship between Cook PVI and congresperson’s party: The mean Cook PVI in Republican represented districts was R + 9; in those represented by Democrats, it was D + 11.
There are 25 districts where the model predicts a Democrat, but there really is a Republican:
Now, let’s look at models of demographics:
If we model race (%Black, %Latino and %Other Race…. leaving out %White to avoid collinearity) we get the not surprising result that increases in any of these make the district more likely to elect a Democrat.
Based on this model, there are 70 vulnerable Republicans
TX02 AL01 AL03 AZ01 CA03 CA21 CA24 CA26 CA41 CA42 CA44 CA45 CA48 CA49 CA50
CA52 CT04 DEAL GA01 GA10 IL06 NC08 NJ07 NM01 NM02 NV03 NY13 OH01 OH12 OK05
TX03 TX07 TX10 TX26 TX31 TX32 VA01 VA05 VA10 VA11 WA08 SC04 AL02 FL21 GA07
MS03 NJ02 OK04 TX24 VA02 AKAL CA19 CA22 CA25 CA46 FL18 FL25 GA08 LA04 LA05
LA06 LA07 MS01 OK01 SC01 SC02 TX01 TX06 TX14 VA04
Next, I looked at income and urban-ness, and, again not surprisingly, districts that are higher income are more likely to be Republican, and those that are more Urban are more likely to be Democratic. Based on this model there are 79 vulnerable Republican districts:
AZ02 AZ03 CA03 CA21 CA26 CA41 CA44 CA45 CA49 CA50 CA52 FL01 FL07 FL08
FL09 FL10 FL12 FL13 FL14 FL15 FL24 IL06 KS04 LA01 MI11 NJ03 NJ04 NM01 NM02
NV03 NY13 OH01 OH12 OH15 OK05 PA15 PA18 TX03 TX07 TX13 TX26 TX31 TX32 WA04
WI01 SC04 FL06 FL21 NE02 NJ02 OH03 TX11 TX24 UT03 VA02 AZ06 CA02 CA19 CA22
CA25 CA46 CO05 FL04 FL18 FL25 LA06 LA07 NV02 OH08 OK01 SC01 TN02 TX02 TX06 TX12 TX19 UT01 WA05
Finally, let’s put it all into one model. This model worked somewhat better, and identified 20 hyper-vulnerable Republicans.
AL03 AZ01 CT04 DEAL FL10 KY05 MI07 NM01 NV03 NY13 NY23 PA03 PA15 WA08 NJ02
PA06 IA04 MI04 MI06 OH06
Who are the most vulnerable, according to the combined model:
Rick Renzi (AZ-01) is the most vulnerable, but really illustrates a weakness of the model: I had to lump all ‘Other races’ together. AZ-01 has the highest proportions of Native Americans of any district: 22.1%, and this is a somewhat different minority group
Michael Castle (DE-AL). Delaware gave Kerry 53% and Gore 55%. It has a reasonably large Black population (18.9%), and a moderate median income ($47,000). And Castle has a Kossack opponent! Possum (Jerry Northington) is running. Read more here ] and show him some love and money at the Act Blue site
Another way to look is to look for people on all three partial models:
There are 5 on all three lists:
Heather Wilson (NM01). Ms. Wilson is going for the Senate, and will probably give up her seat (NM will be very busy!) There are a bunch of people running, and I don’t know who to support. Read a little here
Jon Porter (NV03) won in 2006 by 4,000 votes out of 200,000 cast, despite outspending his opponent 2-1. This year, he has at least two opponents, with others considering running.
Vito Fosella (NY13). The only Republican rep in NYC (my hometown!). It would be great to get him gone. This district gave 55% to Bush in 2004, but 52% to Gore in 2000 (plus 3% to Nader). Fosella has won easily, but has had only token opposition (in 2006, his opponent raised just over $100,000; Fossella raised 1.6 Million). But that opponent (Stephen Harrison) is running again. You can see more and give more here .
Steve Chabot (OH01). Chabot won 52-48 in 2006. This district gave Bush narrow victories in both 2000 and 2004, but it has a substantial Black population (27.4%), and quite a few people in poverty (13.9%). His opponent this time is Steven Dreihaus
Pat Tiberi (OH12). Tiberi won fairly easily in 2006, and this district went narrowly for Bush in both 2000 and 2004. But it also has a substantial Black population (21.7%) and is mostly urban (88.1%).
and
Frank LoBiondo (NJ02). LoBiondo has won easily in the past, although his last two opponents raised almost no money. This district went narrowly for Bush in 2004, but gave Gore 54% in 2000 (plus 3% to Nader). It has a fair number of both Blacks (13.8%) and Latinos (10.3%) and is 79% urban.
I’m a statistician – I wasn’t sure how much explanation to put in here.
Ideally, we’d have approval ratings and re-elect/someone-new numbers for all GOP incumbents, because in all likelihood, Mike Castle (DE-AL) probably scores pretty well there. Many of us Delaware progressives complain about Delaware’s culture of incumbency and how Castle is a spineless phony whose “moderate” image is largely a product of spin and the local media love-fest for him. Delaware seems to prefer aristocracy to “partisan politics” (AKA taking a real stand on anything).
I’m behind Northington 100% – but I’m not very optimistic. I think we’d need someone with higher profile to compete here.
What’s the source of your demographic inputs to the models? I’m a resident of OH-12, and it is one of the only districts in Ohio with a population increase since the last redistricting. That increase has come almost entirely in the very white, very suburban (and overwhelmingly GOP) Delaware County portion of the district. If you’re using 2000 Census numbers as estimates, you’ll be over-weighting Dem-friendly demographic characteristics. If you’re using more recent estimates broken down by house district… are they publicly available?
Are a great way to predict retirements or vacancies. The seven GOP House members who have missed the most votes this session will not be returning: Charlie Norwood, JoAnn Davis, Barbara Cubin, Bobby Jindal, Duncan Hunter, Ton Tancredo, and Denny Hastert. Number 8 on the list is Ron Paul who may or may not be back (missed 25.6% of his votes). Number 9 is the retiting Ray LaHood. The next four on this list are Don Young (AK, missed 17.8%), Cathy McMorris Rogers (WA-5, 16.6%, gave birth), Steve Buyer (IN-4, bad knee, 16.0%), and Sam Johnson (15.3%). Bet Johnson and Young ride into the sunset. Buyer, too, if we are lucky.