Entries tagged with “rct4d”.

Raj Shah has announced his departure from USAID. Honestly, this surprises nobody at the Agency, or anyone in the development world who’s been paying attention. If anything, folks are surprised he is still around – it is well-known (or at least well-gossiped) that he was looking for the door, and at any number of opportunities, at least since the spring of 2012. There are plenty of reviews of Shah’s tenure posted around the web, and I will not rehash them. While I have plenty of opinions of the various initiatives that Shah oversaw/claims credit for (and these are not always the same, by the way), gauging what did and did not work under a particular administrator is usually a question for history, and it will take a bit of space and time before anyone should feel comfortable offering a full review of this administrator’s work.

I will say that I hope much of what Shah pushed for under USAID Forward, especially the rebuilding of the technical capacity of USAID staff, the emphasis on local procurement, and the strengthening of evaluation, becomes entrenched at the agency. Technical capacity is critical – not because USAID is ever going to implement its own work. That would require staffing the Agency at something like three or four times current levels, and nobody is ever going to approve that. Instead, it is critical for better monitoring and evaluating the work of the Agency’s implementing partners. In my time at USAID, I saw implementer work and reports that ran the gamut from “truly outstanding” to “dumpster fire”. The problem is that there are many cases where work that falls on the dumpster fire end of the spectrum is accepted because Agency staff lack the technical expertise to recognize the hot mess they’ve been handed. This is going to be less of a problem going forward, as long as the Agency continues to staff up on the technical side.

Local procurement is huge for both the humanitarian assistance and development missions of USAID. For example, there is plenty of evidence supporting the cost/time effectiveness of procuring emergency food aid in or near regions of food crisis. Further, mandates that push more USAID funding to local organizations and implementers will create incentives to truly build local capacity to manage these funds and design/implement projects, as it will be difficult for prime contractors to meet target indicators and other goals without high-capacity local partners.

A strong evaluation policy will be huge for the Agency…if it ever really comes to pass. While I have seen real signs of Agency staff struggling with how to meaningfully evaluate the impact of their programs, the overall state of evaluation at the Agency remains in flux. The Evaluation Policy was never really implementable, for example because it seems nobody actually considered who would do the evaluations. USAID staff generally lack the time and/or expertise to conduct these evaluations, and the usual implementing partners suffer from a material conflict of interest – very often, they would have to evaluate programs and projects implemented by their competitors…even projects where they had lost the bid to a competitor. Further, the organizations I have seen/interacted with that focus on evaluation remain preoccupied with quantitative approaches to evaluation that, while perhaps drawing on Shah’s interest in the now-fading RCT craze in development, really cannot identify or measure the sorts of causal processes that connect development interventions and outcomes. Finally, despite the nice words to the contrary, the culture at USAID remains intolerant of project failure, and the leadership of the Agency never mounted the strong defense of this culture change to the White House or Congress needed to create the space for a new understanding of evaluation, nor did it ever really convey a message of culture change that the staff of USAID found convincing across the board. There are some groups/offices at USAID (for example, in the ever-growing Global Development Lab) where this culture is fully in bloom, but these are small offices with small budgets. Most everyone else remains mired in very old thinking on evaluation.

At least from an incrementalist perspective, entrenching and building on these aspects of USAID Forward would be a major accomplishment for Shah’s successor. Whoever comes next will not simply run out the clock of the Obama Administration – there are two years left. I therefore expect the administration to appoint an administrator (rather than promote a career USAID staff caretaker with no political mandate) to the position. In a perfect world, this would be a person who understands development as a discipline, but also has the government and implementing experience to understand how development thought intersects with development practice in the real world. Someone with a real understanding of development and humanitarian assistance as a body of thought and practice with a long history that can be learned from and built upon would be able to parse the critical parts of USAID Forward from the fluff, could prevent the design and implementation of projects that merely repeat the efforts (and often failures) of decades ago, and could perhaps reverse the disturbing trend at USAID to view development challenges as technical challenges akin to those informed by X-Prizes – a trend that has shoved the social aspects of development to the back seat at the Agency. At the same time, someone with implementing and government experience would understand what is possible within the current structure, thus understanding where incremental victories might push the Agency in important and productive directions that move toward the achievement of more ideal, long-term goals

There are very, very few people out there who meet these criteria. Steve Radelet does, and he served as the Chief Economist at USAID while I was there, but I have no idea if he is interested or, more importantly, if anyone is interested in him. Much the pity if not. More likely, the administration is going to go with the relatively new Deputy Administrator Alfonso Lenhardt. Looking at his background, he’s already been vetted by the Senate for his current position, has foreign service experience, time in various implementer-oriented positions, and he is well-positioned to avoid a long confirmation process as a former lobbyist and from his time as House Sergeant-at-Arms, which likely give him deep networks on both sides of the aisle. In his background, I see no evidence of a long engagement with development as a discipline, and I wonder how reform-minded a former Senior Vice President for Government Relations at an implementer can be. I do not know Deputy Administrator Lenhardt at all, and so I cannot speak to where he might fall on any or all of the issues above. According to Devex, he says his goal is to “improve management processes and institutionalize the reforms and initiatives that Shah’s administration has put in place.” I have no objection to either of these goals – they are both important. But what this means in practice, should Lenhardt be promoted, is an open question that will have great impact on the future direction of the Agency.

Five and half years ago, at the end of the spring semester of 2009, I sat down and over the course of 30 days drafted my book Delivering Development. The book was, for me, many things: an effort to impose a sort of narrative on the work I’d been doing for 12 years in Ghana and other parts of Africa; an effort to escape the increasingly claustrophobic confines of academic writing and debates; and an effort to exorcise the growing frustration and isolation I felt as an academic working on international development in a changing climate, but without a meaningful network into any development donors. Most importantly, however, it was a 90,000 word scream at the field that could be summarized in three sentences:

  1. Most of the time, we have no idea what the global poor are doing or why they are doing it.
  2. Because of this, most of our projects are designed for what we think is going on, which rarely aligns with reality
  3. This is why so many development projects fail, and if we keep doing this, the consequences will get dire

The book had a generous reception, received very fair (if sometimes a bit harsh) reviews, and actually sold a decent number of copies (at least by the standards of the modern publishing industry, which was in full collapse by the time the book appeared in January 2011). Maybe most gratifying, I heard from a lot of people who read the book and who heard the message, or for whom the book articulated concerns they had felt in their jobs.

This is not to say the book is without flaws. For example, the second half of the book, the part addressing the implications of being wrong about the global poor, was weaker than the first – and this is very clear to me now, as the former employee of a development donor. Were I writing the book now, I would do practically nothing to the first half, but I would revise several parts of the second half (and the very dated scenarios chapter really needs revision at this point, anyway). But, five and a half years after I drafted it, I can still say one thing clearly.


Well, I was right about point #1 above, anyway. The newest World Development Report from the World Bank has empirically demonstrated what was so clear to me and many others, and what I think I did a very nice job of illustrating in Delivering Development: most people engaged in the modern development industry have very little understanding of the lives and thought processes of the global poor, the very people that industry is meant to serve. Chapter 10 is perfectly titled: “The biases of development professionals.” All credit to the authors of the report for finally turning the analytic lens on development itself, as it would have been all too easy to simply talk about the global poor through the lens of perception and bias. And when the report turns to development professionals’ perceptions…for the love of God. Just look at the findings on page 188. No, wait, let me show you some here:

Screen Shot 2014-12-21 at 10.05.06 PM


For those who are chart-challenged, let me walk you through this. In three settings, the survey asked development professionals what percentage of their beneficiaries thought “what happens in the future depends on me.” For the bottom third, the professionals assumed very few people would say this. Except that a huge number of very poor people said this, in all settings. In short, the development professionals were totally wrong about what these people thought, which means they don’t understand their mindsets, motivations, etc. Holy crap, folks. This isn’t a near miss. This is I-have-no-idea-what-I-am-talking-about stuff here. These are the error bars on the initial ideas that lead to projects and programs at development donors.

WDR’s frames these findings in pretty stark terms (page 180):

Perhaps the most pressing concern is whether development professionals understand the circumstances in which the beneficiaries of their policies actually live and the beliefs and attitudes that shape their lives.

And their proposed solution is equally pointed (page 190):

For project and program design, development professionals should “eat their own dog food”: that is, they should try to experience firsthand the programs and projects they design.

Yes. Or failing that, they should really start either reading the work of people who can provide that experience for them, or start funding the people who can generate the data that allows for this experience (metaphorically).

On one hand, I am thrilled to see this point in mainstream development conversation. On the other…I said this five years ago, and not that many people cared. Now the World Bank says it…or maybe more to the point, the World Bank says it in terms of behavioral economics, and everyone gets excited. Well, my feelings on this are pretty clear:

  1. Just putting this in terms of behavioral economics is actually putting the argument out there in the least threatening manner possible, as it is still an argument from economics that preserves that disciplinary perspective’s position of superiority in development
  2. The things that behavioral economics have been “discovering” about the global poor that anthropology, geography, sociology, and social history have been saying for decades. Further, their analyses generally lack explanatory rigor or anything resembling external validity – see my posts here, here, and here.

Also, the WDR never makes a case for why we should care that we are probably misunderstanding/ misrepresenting the global poor. As a result, this just reads as an extended “oopsie!” piece that needs not be seriously addressed as long as we look a little sheepish – then we can get back to work. But getting this stuff wrong is really, really important – this was the central point of the second half of Delivering Development (a point that Duncan Green unfortunately missed in his review). We can design projects that not only fail to make things better, we can actually make things much worse: we can kill people by accident. We can gum up the global environment, which is not going to only hurt some distant, abstract global poor person – it will hit those in the richest countries, too. We can screw up the global economy, another entity that knows few borders and over which nobody has complete control. This is not “oopsie!” This is a disaster that requires serious attention and redress.

So, good first step World Bank, but not far enough. Delivering Development still goes a lot further than you are willing to now. Delivering Development goes much further than behavioral development economics has gone, or really can go. Time to catch up to the real nature of this problem, and the real challenges it presents. Time to catch up to things I was writing five years ago, before it’s too late.

I have a confession. For a long time now I have found myself befuddled by those who claim to have identified the causes behind observed outcomes in social research via the quantitative analysis of (relatively) large datasets (see posts here, here, and here).  For a while, I thought I was seeing the all-to-common confusion of correlation and causation…except that a lot of smart, talented people seemed to be confusing correlation with causation.  This struck me as unlikely.

Then, the other day in seminar (I was covering for a colleague in our department’s “Contemporary Approaches to Geography” graduate seminar, discussing the long history of environmental determinism within and beyond the discipline), I found myself in a similar discussion related to explanation…and I think I figured out what has been going on.  The remote sensing and GIS students in the course, all of whom are extraordinarily well-trained in quantitative methods, got to thinking about how to determine if, in fact, the environment was “causing” a particular behavior*. In the course of this discussion, I realized that what they meant by “cause” was simple (I will now oversimplify): when you can rule out/control for the influence of all other possible factors, you can say that factor X caused event Y to happen.  Indeed, this does establish a causal link.  So, I finally get what everyone was saying when they said that, via well-constructed regressions, etc., one can establish causality.

So it turns out I was wrong…sort of. You see, I wasn’t really worried about causality…I was worried about explanation. My point was that the information you would get from a quantitative exercise designed to establish causal relationships isn’t enough to support rigorous project and program design. Just because you know that the construction of a borehole in a village caused girl-child school attendance to increase in that village doesn’t mean you know HOW the borehole caused this change in school attendance to happen.  If you cannot rigorously explain this relationship, you don’t understand the mechanism by which the borehole caused the change in attendance, and therefore you don’t really understand the relationship. In the “more pure” biophysical sciences**, this isn’t that much of a problem because there are known rules that particles, molecules, compounds, and energy obey, and therefore under controlled conditions one can often infer from the set of possible actors and actions defined by these rules what the causal mechanism is.

But when we study people it is never that simple.  The very act of observing people’s behaviors causes shifts in that behavior, making observation at best a partial account of events. Interview data are limited by the willingness of the interviewee to talk, and the appropriateness of the questions being asked – many times I’ve had to return to an interviewee to ask a question that became evident later, and said “why didn’t you tell me this before?”  (to which they answer, quite rightly, with something to the effect of “you didn’t ask”).  The causes of observed human behavior are staggeringly complex when we get down to the real scales at which decisions are made – the community, household/family, and individual. Decisions may vary by time of the year, or time of day, and by the combination of gender, age, ethnicity, religion, and any other social markers that the group/individual chooses to mobilize at that time.  In short, just because we see borehole construction cause increases in girl-child school attendance over and over in several places, or even the same place, doesn’t mean that the explanatory mechanism between the borehole and attendance is the same at all times.

Understanding that X caused Y is lovely, but in development it is only a small fraction of the battle.  Without understanding how access to a new borehole resulted in increased girl-child school attendance, we cannot scale up borehole construction in the context of education programming and expect to see the same results.  Further, if we do such a scale-up, and don’t get the same results, we won’t have any idea why.  So there is causality (X caused Y to happen) and there are causal mechanisms (X caused Y to happen via Z – where Z is likely a complex, locally/temporally specific alignment of factors).

Unfortunately, when I look at much quantitative development research, especially in development economics, I see a lot of causality, but very little work on causal mechanisms that get us to explanation.  There is a lot of story time, “that pivot from the quantitative finding to the speculative explanation.”  In short, we might be programming development and aid dollars based upon evidence, but much of the time that evidence only gets us part of the way to what we really need to know to really inform program and project design.

This problem is avoidable –it does not represent the limits of our ability to understand the world. There is one obvious way to get at those mechanisms – serious, qualitative fieldwork.  We need to be building research and policy teams where ethnographers and other qualitative social scientists learn to respect the methods and findings of their quantitative brethren such that they can target qualitative methods at illuminating the mechanisms driving robust causal relationships. At the same time, the quantitative researchers on these teams will have to accept that they have only partially explained what we need to know when they have established causality through their methods, and that qualitative research can carry their findings into the realm of implementation.

The bad news for everyone…for this to happen, you are going to have to pick your heads up out of your (sub)disciplinary foxholes and start reading across disciplines in your area of interest.  Everyone talks a good game about this, but when you read what keeps getting published, it is clear that cross-reading is not happening.  Seriously, the number of times I have seen people in one field touting their “new discoveries” about human behavior that are already common conversation in other disciplines is embarrassing…or at least it should be to the authors. But right now there is no shame in this sort of thing, because most folks (including peer reviewers) don’t read outside their disciplines, and therefore have no idea how absurd these claims of discovery really are. As a result, development studies gives away its natural interdisciplinary advantage and returns to the problematic structure of academic knowledge and incentives, which not only enable, but indeed promote narrowly disciplinary reading and writing.

Development donors, I need a favor. I need you to put a little research money on the table to learn about whatever it is you want to learn about. But when you do, I want you to demand it be published in a multidisciplinary development-focused journal.  In fact, please start doing this for all of your research-related money. People will still pursue your money, as the shrinking pool of research dollars is driving academia into your arms. Administrators like grant and contract money, and so many academics are now being rewarded for bringing in grants and contracts from non-traditional sources (this is your carrot). Because you hold the carrot, you can draw people in and then use “the stick” inherent in the terms of the grant/contract to demand cross-disciplinary publishing that might start to leverage change in academia. You all hold the purse, so you can call the tune…




*Spoiler alert: you can’t.  Well, you probably can if 1) you pin the behavior you want to explain down to something extraordinarily narrow, 2) can limit the environmental effect in question to a single independent biophysical process (good luck with that), and 3) limit your effort to a few people in a single place. But at that point, the whole reason for understanding the environmental determinant of that behavior starts to go out the window, as it would clearly not be generalizable beyond the study. Trust me, geography has been beating its head against this particular wall for a century or more, and we’ve buried the idea.  Learn from our mistakes.


**by “more pure” I am thinking about those branches of physics, chemistry, and biology in which lab conditions can control for many factors. As soon as you get into field sciences, or starting asking bigger questions, complexity sets in and things like causality get muddied in the manner I discuss below…just ask an ecologist.

Alright, last post I laid out an institutional problem with M&E in development – the conflict of interest between achieving results to protect one’s budget and staff, and the need to learn why things do/do not work to improve our effectiveness.  This post takes on a problem in the second part of that equation – assuming we all agree that we need to know why things do/do not work, how do we go about doing it?

As long-time readers of this blog (a small, but dedicated, fanbase) know, I have some issues with over-focusing on quantitative data and approaches for M&E.  I’ve made this clear in various reactions to the RCT craze (see herehere, here and here). Because I framed my reactions in terms of RCTs, I think some folks think I have an “RCT issue.”  In fact, I have a wider concern – the emerging aggressive push for quantifiable data above all else as new, more rigorous implementation policies come into effect.  The RCT is a manifestation of this push, but really is a reflection of a current fad in the wider field.  My concern is that the quantification of results, while valuable in certain ways, cannot get us to causation – it gets us to really, really rigorously established correlations between intervention and effect in a particular place and time (thoughtful users of RCTs know this).  This alone is not generalizable – we need to know how and why that result occurred in that place, to understand the underlying processes that might make that result replicable (or not) in the future, or under different conditions.

As of right now, the M&E world is not doing a very good job of identifying how and why things happen.  What tends to happen after rigorous correlation is established is what a number of economists call “story time”, where explanation (as opposed to analysis) suddenly goes completely non-rigorous, with researchers “supposing” that the measured result was caused by social/political/cultural factor X or Y, without any follow on research to figure out if in fact X or Y even makes sense in that context, let alone whether or not X or Y actually was causal.  This is where I fear various institutional pushes for rigorous evaluation might fall down.  Simply put, you can measure impact quantitatively – no doubt about it.  But you will not be able to rigorously say why that impact occurred unless someone gets in there and gets seriously qualitative and experiential, working with the community/household/what have you to understand the processes by which the measured outcome occurred.  Without understanding these processes, we won’t have learned what makes these projects and programs scalable (or what prevents them from being scaled) – all we will know is that it worked/did not work in a particular place at a particular time.

So, we don’t need to get rid of quantitative evaluation.  We just need to build a strong complementary set of qualitative tools to help interpret that quantitative data.  So the next question to you, my readers: how are we going to build in the space, time, and funding for this sort of complementary work? I find most development institutions to be very skeptical as soon as you say the words qualitative…mostly because it sounds “too much like research” and not enough like implementation. Any ideas on how to overcome this perception gap?

(One interesting opportunity exists in climate change – a lot of pilot projects are currently piloting new M&E approaches, as evaluating impacts of climate change programming requires very long-term horizons.  In at least one M&E effort I know of, there is talk of running both quantitative and qualitative project evaluations to see what each method can and cannot answer, and how they might fit together.  Such a demonstration might catalyze further efforts…but this outcome is years away)

So, how do we fix the way we think about development to address the challenges of global environmental change?  Well, there are myriad answers, but in this post I propose two – we have to find ways of evaluating the impact of our current projects such that those lessons are applicable to other projects that are implemented in different places and at various points in the future . . . and we have to better evaluate just where things will be in the future as we think about the desired outcomes of development interventions.

To achieve the first of these two is relatively easy, at least conceptually: we need to fully link up the RCT4D crowd with the qualitative research/social theory crowd.  We need teams of people that can bring the randomista obsession with sampling frames and serious statistical tools – in other words, a deep appreciation for rigor in data collection – and connect it to the qualitative social theoretical emphasis on understanding causality by interrogating underlying social process – in other words, a deep appreciation for rigor in data interpretation.  Such teams work to cover the weaknesses of their members, and could bring us new and very exciting insights into development interventions and social process.

Of course, everyone says we need mixed methodologies in development (and a lot of other fields of inquiry), but we rarely see projects that take this on in a serious way.  In part, this is because very few people are trained in mixed methods – they are either very good at qualitative methods and interpretation, or very good at sampling and quantitative data analysis.  Typically, when a team gets together with these different skills, one set of skills or the other predominates (in policy circles, quant wins every time).  To see truly mixed methodologies, this cannot happen – as soon as one trumps the other, the value of the mixing declines precipitously.

For example, you need qualitative researchers to frame the initial RCT – an RCT framed around implicit, unacknowledged assumptions about society is unlikely to “work” – or to capture the various ways in which an intervention works.  At the same time, the randomista skill of setting up a sampling frame and obtaining meaningful large-scale data sets requires attention to how one frames the question, and where the RCT is to be run . . . which impose important constraints on the otherwise unfettered framings of social process coming from the qualitative side, framings that might not really be testable in a manner that can be widely understood by the policy community.  Then you need to loop back to the qualitative folks to interpret the results of the initial RCT – to move past whether or not something worked to the consideration of the various ways in which it did and did not work, and a careful consideration of WHY it worked.  Finally, these interpretations can be framed and tested by the qualitative members of the team, starting an iterative interpretive process that blends qualitative and quantitative analysis and interpretation to rigorously deepen our understanding of how development works (or does not work).

The process I have just described will require teams of grownups with enough self-confidence to accept criticism and to revise their ideas and interpretations in the face of evidence of varying sorts.  As soon as one side of this mixed method team starts denigrating the other, or the concerns of one side start trumping those of the other, the value of this mixing drops off – qualitative team members become fig leaves for “story time” analyses, or quantitative researchers become fig leaves for weak sampling strategies or overreaching interpretations of the data.  This can be done, but it will require team leaders with special skill sets – with experience in both worlds, and respect for both types of research.  There are not many of these around, but they are around.

Where are these people now?  Well, interestingly the answer to this question leads me to the second answer for how development might better answer the challenges of global environmental change: development needs to better link itself with the global environmental change community.  Despite titles that might suggest otherwise (UNEP’s Fourth Global Environment Outlook was titled Environment for Development), there is relatively little interplay between these communities right now.  Sure, development folks say the right things about sustainability and climate change these days, but they are rarely engaging the community that has been addressing these and many other challenges for decades.  At the same time, the global environmental change community has a weak connection to development, making their claims about the future human impacts of things like climate change often wildly inaccurate, as they assume current conditions will persist into the future (or they assume equally unrealistic improvements in future human conditions).

Development needs to hang out with the scenario builders of the global environmental change community to better understand the world we are trying to influence twenty years hence – the spot to which we are delivering the pass, to take up a metaphor from an earlier post on this topic.  We need to get with the biophysical scientists who can tell us about the challenges and opportunities the expect to see two or more decades hence.  And we need to find the various teams that are already integrating biophysical scientists and social scientists to address these challenges – the leaders already have to speak quant and qual, science and humanities, to succeed at their current jobs.  The members of these teams have already started to learn to respect their colleagues’ skills, and to better explain what they know to colleagues who may not come at the world with the same framings, data or interpretations.  They are not perfect, by any stretch (I voice some of my concerns in Delivering Development), but they are great models to go on.

Meanwhile, several of my colleagues and I are working on training a new generation of interdisciplinary scholars with this skill set.  All of my current Ph.D. students have taken courses in qualitative methods, and have conducted qualitative fieldwork . . . but they also have taken courses on statistics and biogeographic modeling.  They will not be statisticians or modelers, but now they know what those tools can and cannot do – and therefore how they can engage with them.  The first of this crew are finishing their degrees soon . . . the future is now.  And that gives me reason to be realistically optimistic about things . . .

OK, ok, you say: I get it, global environmental change matters to development/aid/relief.  But aside from thinking about project-specific intersections between the environment and development/aid/relief, what sort of overarching challenges does global environmental change pose to the development community?  Simply put, I think that the inevitability of various forms of environmental change (a level of climate change cannot be stopped now, certain fisheries are probably beyond recovery, etc.) over the next 50 or so years forces the field of development to start thinking very differently about the design and evaluation of policies, programs, and projects . . . and this, in turn, calls into question the value of things like randomized control trials for development.

In aid/development we tend to be oriented to relatively short funding windows in which we are supposed to accomplish particular tasks (which we measure through output indicators, like the number of judges trained) that, ideally, change the world in some constructive manner (outcome indicators, like a better-functioning judicial system).  Outputs are easier to deliver and measure than outcomes, and they tend to operate on much shorter timescales – which makes them perfect for end-of-project reporting even though they often bear little on the achievement of the desired outcomes that motivated the project in the first place (does training X judges actually result in a better functioning judicial system?  What if the judges were not the problem?).  While there is a serious push in the development community to move past outputs to outcomes (which I generally see as a very positive trend), I do not see a serious conversation about the different timescales on which these two sorts of indicators operate.  Outputs are very short-term.  Outcomes can take generations.  Obviously this presents significant practical challenges to those who do development work, and must justify their expenditures on an annual basis.

This has tremendous implications, I think, for development practice in the here and now – especially in development research.  For example, I think this pressure to move to outcomes but deliver them on the same timescale as outputs has contributed to the popularity of the randomized control trials for development (RCT4D) movement.  RCT4D work gathers data in a very rigorous manner, and subjects it to interesting forms of quantitative analysis to determine the impact of a particular intervention on a particular population.  As my colleague Marc Bellemare says, RCTs establish “whether something works, not how it works.”

The vast majority of RCT4D studies are conducted across a few months to years, directly after the project is implemented.  Thus, the results seem to move past outputs to impacts without forcing everyone to wait a very long time to see how things played out.  This, to me, is both a strength and a weakness of the approach . . . though I never hear anyone talking about it as a weakness.  The RCT4D approach seems to suggest that the evaluation of project outcomes can be effectively done almost immediately, without need for long-term follow-up.  This sense implicitly rests on the forms of interpretation and explanation that undergird the RCT4D approach – basically, what I see as an appallingly thin approach to the interpretation of otherwise interesting and rigorously gathered data. My sense of this interpretation is best captured by Andrew Gelman’s (quoting Fung) use of the term “story time”, which he defines as a “pivot from the quantitative finding to the speculative explanation.” It seems that many practitioners of RCT4D seem to think that story time is unavoidable . . . which to me reflects a deep ignorance of the concerns for rigor and validity that have existed in the qualitative research community for decades.  Feel free to check the methods section of any of my empirically-based articles (i.e. here and here): they address who I interviewed, why I interviewed them, how I developed interview questions, and how I knew that my sample size had grown large enough to feel confident that it was representative of the various phenomena I was trying to understand.  Toward the end of my most recent work in Ghana, I even ran focus groups where I offered my interpretations of what was going on back to various sets of community members, and worked with them to strengthen what I had right and correct what I had wrong.  As a result, I have what I believe is a rigorous, highly nuanced understanding of the social causes of the livelihoods decisions and outcomes that I can measure in various ways, qualitative and quantitative, but I do not have a “story time” moment in there.

The point here is that “story time”, as a form of explanation, rests on uncritical assumptions about the motivations for human behavior that can make particular decisions or behaviors appear intelligible but leave the door open for significant misinterpretations of events on the ground.  Further, the very framing of what “works” in the RCT4D approach is externally defined by the person doing the evaluation/designing the project, and is rarely revised in the face of field realities . . . principally because when a particular intervention does not achieve some externally-defined outcome, it is deemed “not to have worked.”  That really tends to shut down continued exploration of alternative outcomes that “worked” in perhaps unpredictable ways for unexpected beneficiaries.  In short, the RCT4D approach tends to reinforce the idea that development is really about delivering apolitical, technical interventions to people to address particular material needs.

The challenge global environmental change poses to the RCT4D randomista crowd is that of the “through ball” metaphor I raised in my previous post.  Simply put, identifying “what works” without rigorously establishing why it worked is broadly useful if you make two pretty gigantic assumptions: First, you have to assume that the causal factors that led to something “working” are aspects of universal biophysical and social processes that are translatable across contexts.  If this is not true, an RCT only gives you what works for a particular group of people in a particular place . . . which is not really that much more useful than just going and reading good qualitative ethnographies.  If RCTs are nothing more than highly quantified case studies, they suffer from the same problem as ethnography – they are hard to aggregate into anything meaningful at a broader scale.  And yes, there are really rigorous qualitative ethnographies out there . . .

Second, you have to assume that the current context of the trial is going to hold pretty much constant going forward.  Except, of course, global environmental change more or less chucks that idea for the entire planet.  In part, this is because global environmental change portends large, inevitable biophysical changes in the world.  Just because something works for improving rain-fed agricultural outputs today does not mean that the same intervention will work when the enabling environmental conditions, such as rainfall and temperature, change over the next few decades.  More importantly, though, these biophysical changes will play out in particular social contexts to create particular impacts on populations, who will in turn develop efforts to address those impacts. Simply put, when we introduce a new crop today and it is taken up and boosts yields, we know that it “worked” by the usual standards of agricultural development and extension.  But the take-up of new crops is not a function of agricultural ecology – there are many things that will grow in many places, but various social factors ranging from the historical (what crops were introduced via colonialism) to gender (who grows what crops and why) are what lead to particular farm compositions.  For example, while tree crops (oil palm, coconut, various citrus, acacia for charcoal) are common on farms around the villages in which I have worked in Ghana, almost none of these trees are found on women’s farms.  The reasons for this are complex, and link land tenure, gender roles, and household power relations into livelihoods strategies that balance material needs with social imperatives (for extended discussions, see here and here, or read my book).

Unless we know why that crop was taken up, we cannot understand if the conditions of success now will exist in the future . . . we cannot tell if what we are doing will have a durable impact.  Thus, under the most reliable current scenario for climate change in my Ghanaian research context, we might expect the gradual decline in annual precipitation, and the loss of the minor rainy season, to make tree crops (which tend to be quite resilient in the face of fluctuating precipitation) more and more attractive.  However, tree crops challenge the local communal land tenure system by taking land out of clan-level recirculation, and allowing women to plant them would further challenge land tenure by granting them direct control over access to land (which they currently lack).  Altering the land tenure system would, without question, set off a cascade of unpredictable social changes that would be seen in everything from gender roles to the composition of farms.  There is no way to be sure that any development intervention that is appropriate to the current context will be even functional in that future context.  Yet any intervention we put into place today should be helping to catalyze long-term changes . . .

Simply put: Global environmental change makes clear the limitations of our current thinking on aid/development (of which RCT4D is merely symptomatic).   Just like RCTs, our general framing of development does not move us any closer to understanding the long-term impact of our interventions.  Further, the results of RCTs are not generalizable past the local context (which most good randomistas already know), limiting their ability to help us transform how we do development.  In a world of global environmental change, our current approaches to development just replicate our existing challenges: they don’t really tell us if what we are doing will be of any lasting benefit, or even teach us general lessons about how to deliver short-term benefits in a rigorous manner.


Next up: The Final Chapter – Fixing It

Marc Bellemare’s blog pointed me to an interesting paper by Pascaline Dupas and Jonathan Robinson titled “Why Don’t the Poor Save More? Evidence from Health Savings Experiments.”  It is an interesting paper, taking a page from the RCT4D literature to test some different tools for savings in four Kenyan villages.  I’m not going to wade into the details of the paper or its findings here (they find some tools to be more effective than others at promoting savings for health expenditures), because they are not what really caught me about this paper.  Instead, what struck me was the absence of a serious consideration of “the social” in the framing of the questions asked and the results.  Dupas and Robinson expected three features to impact health savings: adequate storage facilities/technology, the ability to earmark funds, and the level of social commitment of the participant.  The social context of savings (or, more accurately, barriers to savings) are treated in what I must say is a terribly dismissive way [emphases are mine]:

a secure storage technology can enable individuals to avoid carrying loose cash on their person and thus allow people to keep some physical distance between themselves and their money. This may make it easier to resist temptations, to borrow the terminology in Banerjee and Mullainathan (2010), or unplanned expenditures, as many of our respondents call them. While these unplanned expenditures include luxury items such as treats, another important category among such unplanned expenditures are transfers to others.

A storage technology can increase the mental costs associated with unplanned expenditures, thereby reducing such expenditures. Indeed, if people use the storage technology to save towards a specic goal, such as a health goal in our study, people may consider the money saved as unavailable for purposes other than the specic goal – this is what Thaler (1990) coined mental accounting. By enabling such mental accounting, a designated storage place may give people the strength to resist frivolous expenditures as well as pressure to share with others, including their spouse.

I have seen many cases of unplanned expenditures to others in my fieldwork.  Indeed, my village-based field crews in Ghana used to ask for payment on as infrequent a basis as possible to avoid exactly these sorts of expenditures.  They would plan for large needed purchases, work until they had earned enough for that purchase, then take payment and immediately make the purchase, making their income illiquid before family members could call upon them and ask for loans or handouts.

However, the phrasing of Dupas and Robinson strikes the anthropologist/ geographer in me as dismissive.  These expenses are seen as “frivolous”, things that should be “resisted”.  The authors never consider the social context of these expenditures – why people agree to make them in the first place.  There seems to be an implicit assumption here that people don’t know how to manage their money without the introduction of new tools, and that is not at all what I have seen (albeit in contexts other than Kenya).  Instead, I saw these expenditures as part of a much larger web of social relations that implicates everything from social status to gender roles – in this context, the choice to give out money instead of saving it made much more sense.

In short, it seems to me that Dupas and Robinson are treating these savings technologies as apolitical, purely technical interventions.  However, introducing new forms of savings also intervenes in social relations at scales ranging from the household to the extended family to the community.  Thus, the uptake of these forms of savings will be greatly effected by contextual factors that seem to have been ignored here.  Further, the durability of the behavioral changes documented in this study might be much better predicted and understood – from my perspective, the declining use of these technologies over the 33 month scope of the project was completely predictable (the decline, that is, not the size of the decline).  Just because a new technology enables savings that might result in a greater standard of living for the individual or household does not mean that the technology will be seen as desirable – instead, that standard of living must also work within existing social roles and relations if these new behaviors are to endure.  Therefore, we cannot really explain the declining use of these technologies over time . . . yet development is, to me, about catalyzing enduring change.  While this study shows that the introduction of these technologies has at least a short term transformative effect on savings behavior, I’m not convinced this study does much to advance our understanding of how to catalyze changes that will endure.

Charles Kenny’s* book Getting Better has received quite a bit of attention in recent months, at least in part because Bill Gates decided to review it in the Wall Street Journal (up until that point, I thought I had a chance of outranking Charles on Amazon, but Gates’ positive review buried that hope).  The reviews that I have seen (for example here, here and here) cast the book as a counterweight to the literature of failure that surrounds development, and indeed Getting Better is just that.  It’s hard to write an optimistic book about a project as difficult as development without coming off as glib, especially when it is all too easy to write another treatise that critiques development in a less than constructive way.  It’s a challenge akin to that facing the popular musician – it’s really, really hard to convey joy in a way that moves the listener (I’m convinced this ability is the basis of Bjork’s career), but fairly easy to go hide in the basement for a few weeks, pick up a nice pallor, tune everything a step down, put on a t-shirt one size too small and whine about the girlfriend/boyfriend that left you.

Much of the critical literature on development raises important challenges to development practice and thought, but does so in a manner that makes addressing those challenges very difficult (if not intentionally impossible).  For example, deep (and important) criticisms of development anchored in poststructural understandings of discourse, meaning and power (for example, Escobar’s Encountering Development and Ferguson’s The Anti-Politics Machine) emerged in the early and mid-1990s, but their critical power was not tied in any way to a next step . . . which eventually undermined the critical project.  It also served to isolate academic development studies from the world of development practice in many ways, as even those working in development who were open to these criticisms could find no way forward from them.  Tearing something down is a lot easier than building something new from the rubble.

While Getting Better does not reconstruct development, its realistically grounded optimism provides what I see as a potential foundation for a productive rethinking of efforts to help the global poor.  Kenny chooses to begin from a realistic grounding, where Chapters 2 and 3 of the book present us with the bad news (global incomes are diverging) and the worse news (nobody is really sure how to raise growth rates).  But, Kenny answers these challenges in three chapters that illustrate ways in which things have been improving over the past several decades, from sticking a fork in the often-overused idea of poverty traps to the recognition that quality of life measures appear to be converging globally.  This is more than a counterweight to the literature of failure – this book is a counterweight to the literature of development that all-too-blindly worships growth as its engine.  In this book, Kenny clearly argues that growth-centric approaches to development don’t seem to be having the intended results, and growth itself is extraordinarily difficult to stimulate . . . and despite these facts, things are improving in many, many places around the world.   This opens the door to question the directionality of causality in the development and growth relationship: is growth the cause of development, or its effect?

Here, I am pushing Kenny’s argument beyond its overtly stated purpose in the book. Kenny doesn’t overtly take on a core issue at the heart of development-as-growth: can we really guarantee 3% growth per year for everyone forever?  But at the same time, he illustrates that development is occurring in contexts where there is little or no growth, suggesting that we can delink the goal of development from the impossibility of endless growth.  If ever there were a reason to be an optimist about the potential for development, this delinking is it.

I feel a great kinship with this book, in its realistic optimism.  I also like the lurking sense of development as a catalyst for change, as opposed to a tool or process by which we obtain predictable results from known interventions.  I did find Getting Better’s explanations for social change to rest a bit too heavily on a simplistic diffusion of ideas, a rather exogenous explanation of change that was largely abandoned by anthropology and geography back in the structure-functionalism of the 1940s and 50s.  The book does not really dig into “the social” in general.  For example, Kenny’s discussion of randomized control trials for development (RCT4D), like the RCT4D literature itself, is preoccupied with “what works” without really diving into an exploration of why the things that worked played out so well.  To be fair to Kenny, his discussion was not focused on explanation, but on illustrating that some things that we do in development do indeed make things better in some measurable way.  I also know that he understands that “what works” is context specific . . . as indeed is the very definition of “works.”  However, why these things work and how people define success is critical to understanding if they are just anecdotes of success in a sea of failure, or replicable findings that can help us to better address the needs of the global poor.  In short, without an exploration of social process, it is not clear from these examples and this discussion that things are really getting better.

An analogy to illustrate my point – while we have very good data on rainfall over the past several decades in many parts of West Africa that illustrate a clear downward trend in overall precipitation, and some worrying shifts in the rainy seasons (at least in Ghana), we do not yet have a strong handle on the particular climate dynamics that are producing these trends.  As a result, we cannot say for certain that the trend of the past few decades will continue into the future – because we do not understand the underlying mechanics, all we can do is say that it seems likely, given the past few decades, that this trend will continue into the future.  This problem suggests a need to dig into such areas as atmospheric physics, ocean circulation, and land cover change to try to identify the underlying drivers of these observed changes to better understand the future pathways of this trend.  In Getting Better (and indeed in the larger RCT4D literature), we have a lot of trends (things that work), but little by way of underlying causes that might help us to understand why these things worked, whether they will work elsewhere, or if they will work in the same places in the future.

In the end, I think Getting Better is an important counterweight to both the literature of failure and a narrowly framed idea of development-as-growth.  My minor grumbles amount to a wish that this counterweight was heavier.  It is most certainly worth reading, and it is my hope that its readers will take the book as a hopeful launching point for further explorations of how we might actually achieve an end to global poverty.


*Full disclosure: I know Charles, and have had coffee with him in his office discussing his book and mine.  If you think that somehow that has swayed my reading of Getting Better, well, factor that into your interpretation of my review.

Well, the response to part one was great – really good comments, and a few great response posts.  I appreciate the efforts of some of my economist colleagues/friends to clarify the terminology and purpose behind RCTs.  All of this has been very productive for me – and hopefully for others engaged in this conversation.

First, a caveat: On the blog I tend to write quickly and with minimal editing – so I get a bit fast and loose at times – well, faster and looser than I intend.  So, to this end, I did not mean to suggest that nobody was doing rigorous work in development research – in fact, the rest of my post clearly set out to refute that idea, at least in the qualitative sphere.  But I see how Marc Bellemare might have read me that way.  What I should have said was that there has always been work, both in research and implementation, where rigorous data collection and analysis were lacking.  In fact, there is quite a lot of this work.  I think we can all agree this is true . . . and I should have been clearer.

I have also learned that what qualitative social scientists/social theorists mean by theory, and what economists mean by theory, seems to be two different things.  Lee defined theory as “formal mathematical modeling” in a comment on part 1 of this series of posts, which is emphatically not what a social theorist might mean.  When I say theory, I am talking about a conjectural framing of a social totality such that complex causality can at least be contained, if not fully explained.  This framing should have reference to some sort of empirical evidence, and therefore should be testable and refinable over time – perhaps through various sorts of ethnographic work, perhaps through formal mathematical modeling of the propositions at hand (I do a bit of both, actually).  In other words, what I mean by theory (and what I focus on in my work) is the establishment of a causal architecture for observed social outcomes.  I am all about the “why it worked” part of research, and far less about the “if it worked” questions – perhaps mostly because I have researched unintended “development interventions” (i.e. unplanned road construction, the establishment of a forest reserve that alters livelihoods resource access, etc.) that did not have a clear goal, a clear “it worked!” moment to identify.  All I have been looking at are outcomes of particular events, and trying to establish the causes of those outcomes.  Obviously, this can be translated to an RCT environment because we could control for the intervention and expected outcome, and then use my approaches to get at the “why did it work/not work” issues.

It has been very interesting to see the economists weigh in on what RCTs really do – they establish, as Marc puts it, “whether something works, not in how it works.”  (See also Grant’s great comment on the first post).  I don’t think that I would get a lot of argument from people if I noted that without causal mechanisms, we can’t be sure why “what worked” actually worked, and whether the causes of “what worked” are in any way generalizable or transportable.  We might have some idea, but I would have low confidence in any research that ended at this point.  This, of course, is why Marc, Lee, Ruth, Grant and any number of other folks see a need for collaboration between quant and qual – so that we can get the right people, with the right tools, looking at different aspects of a development intervention to rigorously establish the existence of an impact, and the establish an equally rigorous understanding of the causal processes by which that impact came to pass.  Nothing terribly new here, I think.  Except, of course, for my continued claim that the qualitative work I do see associated with RCT work is mostly awful, tending toward bad journalism (see my discussion of bad journalism and bad qualitative work in the first post).

But this discussion misses a much larger point about epistemology – what I intended to write in this second part of the series all along.  I do not see the dichotomy between measuring “if something works” and establishing “why something worked” as analytically valid.  Simply put, without some (at least hypothetical) framing of causality, we cannot rigorously frame research questions around either question.  How can you know if something worked, if you are not sure how it was supposed to work in the first place?  Qualitative research provides the interpretive framework for the data collected via RCT4D efforts – a necessary framework if we want RCT4D work to be rigorous.  By separating qualitative work from the quant oriented RCT work, we are assuming that somehow we can pull data collection apart from the framing of the research question.  We cannot – nobody is completely inductive, which means we all work from some sort of framing of causality.  The danger is when we don’t acknowledge this simple point – under most RCT4D work, those framings are implicit and completely uninterrogated by the practitioners.  Even where they come to the fore (Duflo’s 3 I s), they are not interrogated – they are assumed as framings for the rest of the analysis.

If we don’t have causal mechanisms, we cannot rigorously frame research questions to see if something is working – we are, as Marc says, “like the drunk looking for his car keys under the street lamp when he knows he lost them elsewhere, because the only place he can actually see is under the street lamp.”  Only I would argue we are the drunk looking for his keys under a streetlamp, but he has no idea if they are there or not.

In short, I’m not beating up on RCT4D, nor am I advocating for more conversation – no, I am arguing that we need integration, teams with quant and qual skills that frame the research questions together, that develop tests together, that interpret the data together.  This is the only way we will come to really understand the impact of our interventions, and how to more productively frame future efforts.  Of course, I can say this because I already work in a mixed-methods world where my projects integrate the skills of GIScientists, land use modelers, climate modelers, biogeographers and qualitative social scientists – in short, I have a degree of comfort with this sort of collaboration.  So, who wants to start putting together some seriously collaborative, integrated evaluations?

Those following this blog (or my twitter feed) know that I have some issues with RCT4D work.  I’m actually working on a serious treatment of the issues I see in this work (i.e. journal article), but I am not above crowdsourcing some of my ideas to see how people respond.  Also, as many of my readers know, I have a propensity for really long posts.  I’m going to try to avoid that here by breaking this topic into two parts.  So, this is part 1 of 2.

To me, RCT4D work is interesting because of its emphasis on rigorous data collection – certainly, this has long been a problem in development research, and I have no doubt that the data they are gathering is valid.  However, part of the reason I feel confident in this data is because, as I raised in an earlier post,  it is replicating findings from the qualitative literature . . . findings that are, in many cases, long-established with rigorously-gathered, verifiable data.  More on that in part 2 of this series.

One of the things that worries me about the RCT4D movement is the (at least implicit, often overt) suggestion that other forms of development data collection lack rigor and validity.  However, in the qualitative realm we spend a lot of time thinking about rigor and validity, and how we might achieve both – and there are tools we use to this end, ranging from discursive analysis to cross-checking interviews with focus groups and other forms of data.  Certainly, these are different means of establishing rigor and validity, but they are still there.

Without rigor and validity, qualitative research falls into bad journalism.  As I see it, good journalism captures a story or an important issue, and illustrates that issue through examples.  These examples are not meant to rigorously explain the issue at hand, but to clarify it or ground it for the reader.  When journalists attempt to move to explanation via these same few examples (as far too often columnists like Kristof and Friedman do), they start making unsubstantiated claims that generally fall apart under scrutiny.  People mistake this sort of work for qualitative social science all the time, but it is not.  Certainly there is some really bad social science out there that slips from illustration to explanation in just the manner I have described, but this is hardly the majority of the work found in the literature.  Instead, rigorous qualitative social science recognizes the need to gather valid data, and therefore requires conducting dozens, if not hundreds, of interviews to establish understandings of the events and processes at hand.

This understanding of qualitative research stands in stark contrast to what is in evidence in the RCT4D movement.  For all of the effort devoted to data collection under these efforts, there is stunningly little time and energy devoted to explanation of the patterns seen in the data.  In short, RCT4D often reverts to bad journalism when it comes time for explanation.  Patterns gleaned from meticulously gathered data are explained in an offhand manner.  For example, in her (otherwise quite well-done) presentation to USAID yesterday, Esther Duflo suggested that some problematic development outcomes could be explained by a combination of “the three I s”: ideology, ignorance and inertia.  This is a boggling oversimplification of why people do what they do – ideology is basically nondiagnostic (you need to define and interrogate it before you can do anything about it), and ignorance and inertia are (probably unintentionally) deeply patronizing assumptions about people living in the Global South that have been disproven time and again (my own work in Ghana has demonstrated that people operate with really fine-grained information about incomes and gender roles, and know exactly what they are doing when they act in a manner that limits their household incomes – see here, here and here).  Development has claimed to be overcoming ignorance and inertia since . . . well, since we called it colonialism.  Sorry, but that’s the truth.

Worse, this offhand approach to explanation is often “validated” through reference to a single qualitative case that may or may not be representative of the situation at hand – this is horribly ironic for an approach that is trying to move development research past the anecdotal.  This is not merely external observation – I have heard from people working inside J-PAL projects that the overall program puts little effort into serious qualitative work, and has little understanding of what rigor and validity might mean in the context of qualitative methods or explanation.  In short, the bulk of explanation for these interesting patterns of behavior that emerges from these studies resorts to uninterrogated assumptions about human behavior that do not hold up to empirical reality.  What RCT4D has identified are patterns, not explanations – explanation requires a contextual understanding of the social.

Coming soon: Part 2 – Qualitative research and the interpretation of empirical data