I recently finished reading The Signal of the Noise by prediction guru and stats wizard Nate Silver (here’s the book on Amazon, and here’s Silver’s FiveThirtyEight blog-cum-website). Silver is well known for his extremely accurate predictions and commentary related to US elections, but his knowledge of and interest in issues related to prediction range far and wide. His book deals with many subtleties associated with prediction. The book manages to go quite deep into the statistical issues without pulling any punches, yet manages to be broadly accessible to readers.
Silver does not discuss anything as radical as open borders, and in general, does not discuss normative questions at all, preferring to stick to his area of expertise: the accuracy and precision of predictions and forecasts and the problems associated with trying to make good predictions and forecasts. Nonetheless, my guess after reading Silver’s book is that he would be extremely skeptical of any claims regarding the effects of open borders, which are way “out of sample.” In particular, I’m guessing Silver would be unimpressed with claims that open borders would double world GDP. At any rate, reading Silver makes me more skeptical of claims made about the effects of open borders with allegedly high confidence. If you believe in Knightian uncertainty as a concept, you may well take the view that the uncertainty associated with open borders is Knightian in nature, and that most attempts at quantifying its impact are flawed. This might also explain why, even though there is a broad economist consensus supporting somewhat more open borders, few economists commit to going all the way to open borders. My co-blogger Nathan noted this explicitly in a comment on another blog post.
Even in areas where we are looking at “out of sample” predictions, however, all is not lost. One idea that Silver repeatedly reiterates throughout his book is that one should keep and use every piece of data. Judging the effects of open borders might be very difficult, and we may end up with a huge range (i.e., low precision). But we can still use some data points. The type of question that somebody like Silver, starting from the outside view, would ask is: “Of the people making predictions regarding the effects of changes in migration policy regimes, who has the better prediction track record?” Or “of the various methods used to predict the effects of changes in migration policy regimes, which methods have the better prediction track record?” Ideally, what we’d need to make this kind of judgment is:
- A large number of data points,
- all of which have outcomes that can be agreed upon clearly,
- with information about what prediction each side made prior to the event, and
- with information about what the outcome was.
Weather prediction is one such example. There are a large number of data points (the daily maximum and minimum temperature and precipitation statistics in many cities over half a century). The final value of each data point is broadly agreed upon, though there are measurement error issues. The values predicted by organizations such as the National Weather Service and Weather Channel are also available. All the conditions for an analysis are therefore available, and Silver in his book mentions one such analysis. The analysis finds that both the National Weather Service and the Weather Channel are fairly accurate, but that the Weather Channel (deliberately, it turns out), inflates the probability of precipitation on days when that probability is extremely low. This phenomenon is now known as wet bias.
Predictions in the political and economic realm don’t fare as well. There are a reasonably large number of data points regarding the outcomes of various electoral races, which satisfy the necessary conditions (lots of data points, clear outcomes, information about each side’s predictions, and information about the outcome) that allow us to get a sense of the quality of political predictions. The data isn’t as extensive as for weather, but it is still quite extensive. Silver finds that while predictions that relied on statistically valid polling techniques tended to do well, predictions made by political pundits on television didn’t. Silver finds a similar disappointing story of prediction when it comes to economic forecasting. He is also critical of people who make predictions and forecasts without specifying the margin of error or the distribution, but simply give a point estimate. In the discussion, Silver alluded to Tetlock’s study of prediction records and his distinction between “foxes” and “hedgehogs” (see here for an article co-authored by Tetlock with a summary of the idea).
When two sides are debating an issue and relying heavily on empirical claims about the future to make their respective cases, you’d naturally be curious about the prediction records of the two sides with respect to past predictions. There are two additional complications over and above the obvious measurement difficulties that apply particularly to political debates such as migration policy debates:
- The specific people engaging in the debate are usually different each time. Most pro-immigration groups and people around today weren’t there when the Immigration and Nationality Act of 1965 was passed. The same is true of the anti-immigration groups and people. Given this complication, each side can happily claim allegiance to the correct claims made historically by their side, and disown the incorrect claims as having been made by others they don’t support. This can be partly overcome by trying to come up with objective metrics of just how similar arguments offered today are to the failed arguments of the past, but there are many then versus now “outs” to deflect claims of objective similarity between the present and the past.
- Relatedly, it can be argued that proponents of an argument weren’t saying it because they actually believed it, but rather, they were just trying to rally public support to our cause, knowing that they would need to lie to (or at any rate, exaggerate their case to) a public that did not share their normative views. (I discussed incentives to lie about immigration enforcement in an earlier post).
Although these two difficulties present a challenge, there is probably much to be gained from a retrospective analysis of past changes in migration regimes and the predictions made by various people during those changes. Significant changes are better because (a) more people are likely to make explicit predictions of the effects of significant changes, and (b) the larger effect size makes it easy to determine what actually happened. Unfortunately, significant changes are also fewer in number, so we do not have the “large number of data points” that would allow for good calibration of the accuracy of predictions. But we’ve just got to deal with that uncertainty. It’s better than completely ignoring the past.
Relatedly, looking at migration regime changes sufficiently far back in the past also gives us some idea of the more long term effects of the changes. BK, one of the skeptics of open borders in our comments, has argued that the benefits of migration are front-loaded, while the costs take decades to unfold (see for instance here and here). Evaluating such concerns would require us to look at the long-term effects of past migration regime changes.
My co-blogger Chris Hendrix plans to begin a series that looks at various instances of open borders becoming more closed, along with the predictions and rationales offered at the time (expect to read Chris’s introductory post soon!). Later, one of us (perhaps Chris again, perhaps I, or perhaps one of our other bloggers) will be looking at instances of immigration liberalization and the predictions and arguments accompanying and opposing them. I’m particularly interested in the Immigration and Nationality Act of 1965 in the United States and the Rivers of Blood Speech by Enoch Powell in 1968 in the UK. The historical analysis will hopefully help us better calibrate the accuracy of predictions and forecasts about changes to migration regimes, hence better enabling us to evaluate the plausibility of claims such as “double world GDP” or end of poverty from the outside view.
I look forward to learning from this series. But take care in how you find the predictions to avoid selection bias. “Dumbest predictions of my political opponents” lists need to be offset against each other, and miss what the mainstream prediction was.
Chris’s current plan is to look at the actual legislative arguments, as well as arguments found in reports cited in the legislative arguments, made in the run-up to the passage of the Chinese Exclusion Act and similar acts.
That is something to always worry about and it’s nice to know there will be skeptical readers like you to help keep us on the straight and narrow BK! 😀
If I in particular seem to just be cherry picking the worst, I do encourage you and any other reader to call me on it. If you do have particular sources to consider that I’ve missed that’s also a nice bonus.
I think that using prediction records is a better judge of the accuracy of an algorithm than it is for a group. The only time that it is useful for a group is if the group can be expected to use a consistent method or algorithm.
Excellent point, Mike. However, if the algorithm is either proprietary or implicit (relying on intuition) rather than being based on a clearly and publicly articulated list of rules, then it is harder to know if different people are using the “same” algorithm in different situations. In that case, evaluating based on the person or group rather than the algorithm may be better.
An exciting project, though probably very time consuming.
I won’t say much for now, except to say that (as you may already know!) there’s lots of good material on the messy details of (1) inferring predictions from stated views, and the messy details of (2) testing those predictions against real-world data, in Expert Political Judgment and in this symposium on the book. And, another update on the subject from Tetlock (more recent than the Cato Unbound discussion you linked to) is at edge.org.
Also, an example of one methodology may be helpful. In How We’re Predicting AI, Armstrong & Sotala explain their “prediction extraction” method in some detail (e.g. in the opening paragraphs of section 4.2).
Best of luck!
Thanks for the thoughts, Luke. I think we are initially planning to do the analysis at a very rough level that should suffice for putting out blog posts and getting feedback from people in the comments (though Chris should know more about the exact gameplan). We might later refine to a more quantitative approach if we think there is sufficient promise in taking such an approach.
Great post and thanks for the pointer to Nate Silver’s book, which, thanks to the miracle of the Kindle, I will read soon. Here are my quick thoughts, which might not be very well thought out, as these were the thoughts coming to my mind as I was reading your post. The one thing I think economists get unfairly criticized for is their prediction record re forecasting and I think the everyday man’s version of this is “how well do economists forecast the direction of exchange rates, interest rates, etc…?”. Unfortunately economists don’t do a good job at this as you point out in the post. But even here, the story is more subtle than most people think. The forecasting performance of economists is quite good when looking at longer term horizons: they don’t do so well over a month to one year horizon but do quite well over the long-run. See for instance: http://www.voxeu.org/article/short-run-ignorance-long-run-prescience-forecasting-exchange-rates.
I also think that economists get “systemic or structural change” predictions right. For instance, I think a lot of the economists who were working in the area of international finance/international economics got their predictions on the likely economic effects of a single currency European Union “right”, (I think Paul Krugman has written voluminously on this). Similarly (don’t have the references now but will look for them) a lot of South African economists did not predict doom and gloom following the end of apartheid. I suspect a similar story-line follows the unification of Germany, no sensible economist predicted doom and gloom, I suspect the same with the collapse of the USSR. I think economists tend to get these systemic or structural change predictions (and also the long-term ones) “right” because they work from certain uncontroversial premises. In the case of the EU, the premise might have been moral hazard: since the motivation for the EU is essentially a political one and not an economic one, highly indebted countries (or countries with questionable fiscal records) are likely to be bailed out (to keep the union intact, whatever the costs of doing so). So creditors, knowing this, will not pay much attention to due diligence when lending to EU countries. The result is that the level of reckless lending in the EU rises resulting in the current state of affairs.
I tend to think that predicting the consequences of open borders falls in the “structural or systemic” camp building on a set of uncontroversial premises: institutions are persistent, etc…Perhaps economists might get the short-run predictions “wrong” but certainly the long-run predictions will not be far off.
“I suspect the same with the collapse of the USSR”
But Russia did do badly after the collapse of the USSR:
http://en.wikipedia.org/wiki/Post-Soviet_states#Economy
And while the USSR existed, textbooks were consistently over-optimistic about its growth:
http://econlog.econlib.org/archives/2009/12/why_were_americ.html
Weren’t postwar economists severely wrong about the effectiveness of development aid and in predicting fast catch-up of poor countries generally?
And have they been able to prospectively predict which poor countries would catch up? Or which rich countries would stagnate or instead keep growing?
Some economists talk about “The Mystery of Economic Growth”, but predictions about future productivity are key for open borders:
http://www.ceriba.org.uk/pub/CERIBA/ChiaraCriscuolo/TheMysteryofEconomicGrowth_web.pdf
Comment from Carl Shulman, conveyed by email and posted with permission:
http://lesswrong.com/lw/gta/selfassessment_in_expert_ai_predictions/
Surveys of experts are better than what gets to the media when strong views make you think it’s more worthwhile to tell the world your prediction. I would look for past versions of things like this:
http://www.igmchicago.org/igm-economic-experts-panel/poll-results?SurveyID=SV_0JtSLKwzqNSfrAF