Open Data: This is a very big deal

It appears that the World Bank, at long last, is going to really make a huge portion of its data publicly available.  The New York Times has a story that outlines some of the trials and tribulations that brought us to this point, some of which will probably seem arcane to the development outsider.  However, as a development researcher/practitioner hybrid, I cannot tell you how exciting or important this is – the Bank is sitting on a giant pile of interesting data.  Not all of it is going to be high quality (a lot of data from the Global South is not – see chapter 9 of Delivering Development or a parallel discussion in Charles Kenny’s Getting Better).  But until very recently the data you could easily access from the Bank was worthy of a lower-division undergraduate project – and getting to the really interesting stuff was brutally difficult.  The new datasets are more detailed and comprehensive, but still not everything the Bank has.  Andy Sumner has been trying to get at the Bank’s core data to refine and test his ideas about the New Bottom Billion (which you should all be reading, by the way), with little success because of security requirements.
I really like a quote, at the end of the NY Times piece, from Bitange Ndemo, Kenya’s permanent secretary for information.  When asked if there would be resistance to public dissemination of government data, he argued that transparency was inevitable because:

Information is valuable, he says, and people will find a way to get it: “This is one of those things, like mobile phones and the Internet, that you cannot control.”

On explanation in development research

I was at a talk today where folks from Michigan State were presenting research and policy recommendations to guide the Feed the Future initiative.  I greatly appreciate this sort of presentation – it is good to get real research in the building, and to see USAID staff that have so little time turn out in large numbers to engage.  Once again, folks, its not that people in the agencies aren’t interested or don’t care, its a question of time and access.
In the course of one of the presentations, however, I saw a moment of “explanation” for observed behavior that nicely captures a larger issue that has been eating at me as the randomized control trials for development (RCT4D) movement gains speed . . . there isn’t a lot of explanation there.  There is really interesting data, rigorously collected, but explanation is another thing entirely.
In the course of the presentation, the presenter put up a slide that showed a wide dispersion of prices around the average price received by farmers for their maize crops around a single market area (near where I happen to do work in Malawi).  Nothing too shocking there, as this happens in Malawi, and indeed in many places.  However, from a policy and programming perspective, it’s important to know that the average price is NOT the same thing as what a given household is taking home.  But then the presenter explained this dispersion by noting (in passing) that some farmers were more price-savvy than others.
1) there is no evidence at all to support this claim, either in his data or in the data I have from an independent research project nearby
2) this offhand explanation has serious policy ramifications.
This explanation is a gross oversimplification of what is actually going on here – in Mulanje (near the Luchenza market area analyzed in the presentation), price information is very well communicated in villages.  Thus, while some farmers might indeed be more savvy than others, the prices they are able to get are communicated throughout the village, thus distributing that information.  So the dispersion of prices is the product of other factors.  Certainly desperation selling is probably part of the issue (another offhand explanation offered later in the presentation).  However, what we really need, if we want a rigorous understanding of the causes of this dispersion and how to address it, is a serious effort to grasp the social component of agriculture in this area – how gender roles, for example, shape household power dynamics, farm roles, and the prices people will sell at (this is a social consideration that exceeds explanation via markets), or how social networks connect particular farmers to particular purchasers in a manner that facilitates or inhibits price maximization at market.  These considerations are both causal of the phenomena that the presenter described, and the points of leverage on which policy might act to actually change outcomes.  If farmers aren’t “price savvy”, this suggests the need for a very different sort of intervention than what would be needed to address gendered patterns of agricultural strategy tied to long-standing gender roles and expectations.
This is a microcosm of what I am seeing in the RCT4D world right now – really rigorous data collection, followed by really thin interpretations of the data.  It is not enough to just point out interesting patterns, and then start throwing explanations out there – we must turn from rigorous quantitative identification of significant patterns of behavior to the qualitative exploration of the causes of those patterns and their endurance over time.  I’ve been wrestling with these issues in Ghana for more than a decade now, an effort that has most recently led me to a complete reconceptualization of livelihoods (shifting from understanding livelihoods as a means of addressing material conditions to a means of governing behaviors through particular ways of addressing material conditions – the article is in review at Development and Change).  However, the empirical tests of this approach (with admittedly tiny-n size samples in Ghana, and very preliminary looks at the Malawi data) suggest that I have a better explanatory resolution for explained behaviors than possible through existing livelihoods approaches (which would end up dismissing a lot of choices as illogical or the products of incomplete information) – and therefore I have a better foundation for policy recommendations than available without this careful consideration of the social.
See, for example, this article I wrote on how we approach gender in development (also a good overview of the current state of gender and development, if I do say so myself).  I empirically demonstrate that a serious consideration of how gender is constructed in particular places has large material outcomes on whose experiences we can understand, and therefore the sorts of interventions we might program to address particular challenges.  We need more rigorous wrestling with “the social” if we are going to learn anything meaningful from our data.  Period.
In summary, explanation is hard.  Harder, in many ways, than rigorous data collection.  Until we start spending at least as much effort on the explanation side as we do on the collection side, we will not really change much of anything in development.

Why do we insist on working at the national level again?

The BBC has posted an interesting map of Nigeria that captures the spatiality of politics, ethnicity, wealth, health, literacy and oil.  There are significant problems with this map.  The underlying data has fairly large error bars that are not acknowledged, and the presentation of the data is somewhat problematic; for example, the ethnic “areas” in the country are represented only by the majority group, hiding the heterogeneity of these areas, and other data is aggregated at the state level, blurring heterogenous voting patterns, incomes, literacy rates and health situations. I really wish that those who create this sort of thing would do a better job addressing some of these issues, and pointing out the issues they cannot address to help the reader better evaluate the data.
But even with all of these caveats, this map is a striking illustration of the problems with using national-level statistics to guide development policy and programs.  Look at the distributions of wealth, health and literacy in the country – error bars or no, this data clearly demonstrates that national measures of wealth cannot guide useful economic policy, national measures of literacy might obscure regional or ethnic patterns of educational neglect, and national vaccination statistics tell us nothing about the regional variations in disease ecology and healthcare delivery that shape health outcomes in this country.
This is not to say that states don’t matter – they matter a lot.  However, when we use national-scale data for just about anything, we are making very bad assumptions about the heterogeneity of the situation in that country . . . and we are probably missing key opportunities and challenges we should be addressing in our work.