Entries tagged with “rct4d”.


I have a confession. For a long time now I have found myself befuddled by those who claim to have identified the causes behind observed outcomes in social research via the quantitative analysis of (relatively) large datasets (see posts here, here, and here).  For a while, I thought I was seeing the all-to-common confusion of correlation and causation…except that a lot of smart, talented people seemed to be confusing correlation with causation.  This struck me as unlikely.

Then, the other day in seminar (I was covering for a colleague in our department’s “Contemporary Approaches to Geography” graduate seminar, discussing the long history of environmental determinism within and beyond the discipline), I found myself in a similar discussion related to explanation…and I think I figured out what has been going on.  The remote sensing and GIS students in the course, all of whom are extraordinarily well-trained in quantitative methods, got to thinking about how to determine if, in fact, the environment was “causing” a particular behavior*. In the course of this discussion, I realized that what they meant by “cause” was simple (I will now oversimplify): when you can rule out/control for the influence of all other possible factors, you can say that factor X caused event Y to happen.  Indeed, this does establish a causal link.  So, I finally get what everyone was saying when they said that, via well-constructed regressions, etc., one can establish causality.

So it turns out I was wrong…sort of. You see, I wasn’t really worried about causality…I was worried about explanation. My point was that the information you would get from a quantitative exercise designed to establish causal relationships isn’t enough to support rigorous project and program design. Just because you know that the construction of a borehole in a village caused girl-child school attendance to increase in that village doesn’t mean you know HOW the borehole caused this change in school attendance to happen.  If you cannot rigorously explain this relationship, you don’t understand the mechanism by which the borehole caused the change in attendance, and therefore you don’t really understand the relationship. In the “more pure” biophysical sciences**, this isn’t that much of a problem because there are known rules that particles, molecules, compounds, and energy obey, and therefore under controlled conditions one can often infer from the set of possible actors and actions defined by these rules what the causal mechanism is.

But when we study people it is never that simple.  The very act of observing people’s behaviors causes shifts in that behavior, making observation at best a partial account of events. Interview data are limited by the willingness of the interviewee to talk, and the appropriateness of the questions being asked – many times I’ve had to return to an interviewee to ask a question that became evident later, and said “why didn’t you tell me this before?”  (to which they answer, quite rightly, with something to the effect of “you didn’t ask”).  The causes of observed human behavior are staggeringly complex when we get down to the real scales at which decisions are made – the community, household/family, and individual. Decisions may vary by time of the year, or time of day, and by the combination of gender, age, ethnicity, religion, and any other social markers that the group/individual chooses to mobilize at that time.  In short, just because we see borehole construction cause increases in girl-child school attendance over and over in several places, or even the same place, doesn’t mean that the explanatory mechanism between the borehole and attendance is the same at all times.

Understanding that X caused Y is lovely, but in development it is only a small fraction of the battle.  Without understanding how access to a new borehole resulted in increased girl-child school attendance, we cannot scale up borehole construction in the context of education programming and expect to see the same results.  Further, if we do such a scale-up, and don’t get the same results, we won’t have any idea why.  So there is causality (X caused Y to happen) and there are causal mechanisms (X caused Y to happen via Z – where Z is likely a complex, locally/temporally specific alignment of factors).

Unfortunately, when I look at much quantitative development research, especially in development economics, I see a lot of causality, but very little work on causal mechanisms that get us to explanation.  There is a lot of story time, “that pivot from the quantitative finding to the speculative explanation.”  In short, we might be programming development and aid dollars based upon evidence, but much of the time that evidence only gets us part of the way to what we really need to know to really inform program and project design.

This problem is avoidable –it does not represent the limits of our ability to understand the world. There is one obvious way to get at those mechanisms – serious, qualitative fieldwork.  We need to be building research and policy teams where ethnographers and other qualitative social scientists learn to respect the methods and findings of their quantitative brethren such that they can target qualitative methods at illuminating the mechanisms driving robust causal relationships. At the same time, the quantitative researchers on these teams will have to accept that they have only partially explained what we need to know when they have established causality through their methods, and that qualitative research can carry their findings into the realm of implementation.

The bad news for everyone…for this to happen, you are going to have to pick your heads up out of your (sub)disciplinary foxholes and start reading across disciplines in your area of interest.  Everyone talks a good game about this, but when you read what keeps getting published, it is clear that cross-reading is not happening.  Seriously, the number of times I have seen people in one field touting their “new discoveries” about human behavior that are already common conversation in other disciplines is embarrassing…or at least it should be to the authors. But right now there is no shame in this sort of thing, because most folks (including peer reviewers) don’t read outside their disciplines, and therefore have no idea how absurd these claims of discovery really are. As a result, development studies gives away its natural interdisciplinary advantage and returns to the problematic structure of academic knowledge and incentives, which not only enable, but indeed promote narrowly disciplinary reading and writing.

Development donors, I need a favor. I need you to put a little research money on the table to learn about whatever it is you want to learn about. But when you do, I want you to demand it be published in a multidisciplinary development-focused journal.  In fact, please start doing this for all of your research-related money. People will still pursue your money, as the shrinking pool of research dollars is driving academia into your arms. Administrators like grant and contract money, and so many academics are now being rewarded for bringing in grants and contracts from non-traditional sources (this is your carrot). Because you hold the carrot, you can draw people in and then use “the stick” inherent in the terms of the grant/contract to demand cross-disciplinary publishing that might start to leverage change in academia. You all hold the purse, so you can call the tune…

 

 

 

*Spoiler alert: you can’t.  Well, you probably can if 1) you pin the behavior you want to explain down to something extraordinarily narrow, 2) can limit the environmental effect in question to a single independent biophysical process (good luck with that), and 3) limit your effort to a few people in a single place. But at that point, the whole reason for understanding the environmental determinant of that behavior starts to go out the window, as it would clearly not be generalizable beyond the study. Trust me, geography has been beating its head against this particular wall for a century or more, and we’ve buried the idea.  Learn from our mistakes.

 

**by “more pure” I am thinking about those branches of physics, chemistry, and biology in which lab conditions can control for many factors. As soon as you get into field sciences, or starting asking bigger questions, complexity sets in and things like causality get muddied in the manner I discuss below…just ask an ecologist.

Alright, last post I laid out an institutional problem with M&E in development – the conflict of interest between achieving results to protect one’s budget and staff, and the need to learn why things do/do not work to improve our effectiveness.  This post takes on a problem in the second part of that equation – assuming we all agree that we need to know why things do/do not work, how do we go about doing it?

As long-time readers of this blog (a small, but dedicated, fanbase) know, I have some issues with over-focusing on quantitative data and approaches for M&E.  I’ve made this clear in various reactions to the RCT craze (see herehere, here and here). Because I framed my reactions in terms of RCTs, I think some folks think I have an “RCT issue.”  In fact, I have a wider concern – the emerging aggressive push for quantifiable data above all else as new, more rigorous implementation policies come into effect.  The RCT is a manifestation of this push, but really is a reflection of a current fad in the wider field.  My concern is that the quantification of results, while valuable in certain ways, cannot get us to causation – it gets us to really, really rigorously established correlations between intervention and effect in a particular place and time (thoughtful users of RCTs know this).  This alone is not generalizable – we need to know how and why that result occurred in that place, to understand the underlying processes that might make that result replicable (or not) in the future, or under different conditions.

As of right now, the M&E world is not doing a very good job of identifying how and why things happen.  What tends to happen after rigorous correlation is established is what a number of economists call “story time”, where explanation (as opposed to analysis) suddenly goes completely non-rigorous, with researchers “supposing” that the measured result was caused by social/political/cultural factor X or Y, without any follow on research to figure out if in fact X or Y even makes sense in that context, let alone whether or not X or Y actually was causal.  This is where I fear various institutional pushes for rigorous evaluation might fall down.  Simply put, you can measure impact quantitatively – no doubt about it.  But you will not be able to rigorously say why that impact occurred unless someone gets in there and gets seriously qualitative and experiential, working with the community/household/what have you to understand the processes by which the measured outcome occurred.  Without understanding these processes, we won’t have learned what makes these projects and programs scalable (or what prevents them from being scaled) – all we will know is that it worked/did not work in a particular place at a particular time.

So, we don’t need to get rid of quantitative evaluation.  We just need to build a strong complementary set of qualitative tools to help interpret that quantitative data.  So the next question to you, my readers: how are we going to build in the space, time, and funding for this sort of complementary work? I find most development institutions to be very skeptical as soon as you say the words qualitative…mostly because it sounds “too much like research” and not enough like implementation. Any ideas on how to overcome this perception gap?

(One interesting opportunity exists in climate change – a lot of pilot projects are currently piloting new M&E approaches, as evaluating impacts of climate change programming requires very long-term horizons.  In at least one M&E effort I know of, there is talk of running both quantitative and qualitative project evaluations to see what each method can and cannot answer, and how they might fit together.  Such a demonstration might catalyze further efforts…but this outcome is years away)

So, how do we fix the way we think about development to address the challenges of global environmental change?  Well, there are myriad answers, but in this post I propose two – we have to find ways of evaluating the impact of our current projects such that those lessons are applicable to other projects that are implemented in different places and at various points in the future . . . and we have to better evaluate just where things will be in the future as we think about the desired outcomes of development interventions.

To achieve the first of these two is relatively easy, at least conceptually: we need to fully link up the RCT4D crowd with the qualitative research/social theory crowd.  We need teams of people that can bring the randomista obsession with sampling frames and serious statistical tools – in other words, a deep appreciation for rigor in data collection – and connect it to the qualitative social theoretical emphasis on understanding causality by interrogating underlying social process – in other words, a deep appreciation for rigor in data interpretation.  Such teams work to cover the weaknesses of their members, and could bring us new and very exciting insights into development interventions and social process.

Of course, everyone says we need mixed methodologies in development (and a lot of other fields of inquiry), but we rarely see projects that take this on in a serious way.  In part, this is because very few people are trained in mixed methods – they are either very good at qualitative methods and interpretation, or very good at sampling and quantitative data analysis.  Typically, when a team gets together with these different skills, one set of skills or the other predominates (in policy circles, quant wins every time).  To see truly mixed methodologies, this cannot happen – as soon as one trumps the other, the value of the mixing declines precipitously.

For example, you need qualitative researchers to frame the initial RCT – an RCT framed around implicit, unacknowledged assumptions about society is unlikely to “work” – or to capture the various ways in which an intervention works.  At the same time, the randomista skill of setting up a sampling frame and obtaining meaningful large-scale data sets requires attention to how one frames the question, and where the RCT is to be run . . . which impose important constraints on the otherwise unfettered framings of social process coming from the qualitative side, framings that might not really be testable in a manner that can be widely understood by the policy community.  Then you need to loop back to the qualitative folks to interpret the results of the initial RCT – to move past whether or not something worked to the consideration of the various ways in which it did and did not work, and a careful consideration of WHY it worked.  Finally, these interpretations can be framed and tested by the qualitative members of the team, starting an iterative interpretive process that blends qualitative and quantitative analysis and interpretation to rigorously deepen our understanding of how development works (or does not work).

The process I have just described will require teams of grownups with enough self-confidence to accept criticism and to revise their ideas and interpretations in the face of evidence of varying sorts.  As soon as one side of this mixed method team starts denigrating the other, or the concerns of one side start trumping those of the other, the value of this mixing drops off – qualitative team members become fig leaves for “story time” analyses, or quantitative researchers become fig leaves for weak sampling strategies or overreaching interpretations of the data.  This can be done, but it will require team leaders with special skill sets – with experience in both worlds, and respect for both types of research.  There are not many of these around, but they are around.

Where are these people now?  Well, interestingly the answer to this question leads me to the second answer for how development might better answer the challenges of global environmental change: development needs to better link itself with the global environmental change community.  Despite titles that might suggest otherwise (UNEP’s Fourth Global Environment Outlook was titled Environment for Development), there is relatively little interplay between these communities right now.  Sure, development folks say the right things about sustainability and climate change these days, but they are rarely engaging the community that has been addressing these and many other challenges for decades.  At the same time, the global environmental change community has a weak connection to development, making their claims about the future human impacts of things like climate change often wildly inaccurate, as they assume current conditions will persist into the future (or they assume equally unrealistic improvements in future human conditions).

Development needs to hang out with the scenario builders of the global environmental change community to better understand the world we are trying to influence twenty years hence – the spot to which we are delivering the pass, to take up a metaphor from an earlier post on this topic.  We need to get with the biophysical scientists who can tell us about the challenges and opportunities the expect to see two or more decades hence.  And we need to find the various teams that are already integrating biophysical scientists and social scientists to address these challenges – the leaders already have to speak quant and qual, science and humanities, to succeed at their current jobs.  The members of these teams have already started to learn to respect their colleagues’ skills, and to better explain what they know to colleagues who may not come at the world with the same framings, data or interpretations.  They are not perfect, by any stretch (I voice some of my concerns in Delivering Development), but they are great models to go on.

Meanwhile, several of my colleagues and I are working on training a new generation of interdisciplinary scholars with this skill set.  All of my current Ph.D. students have taken courses in qualitative methods, and have conducted qualitative fieldwork . . . but they also have taken courses on statistics and biogeographic modeling.  They will not be statisticians or modelers, but now they know what those tools can and cannot do – and therefore how they can engage with them.  The first of this crew are finishing their degrees soon . . . the future is now.  And that gives me reason to be realistically optimistic about things . . .



OK, ok, you say: I get it, global environmental change matters to development/aid/relief.  But aside from thinking about project-specific intersections between the environment and development/aid/relief, what sort of overarching challenges does global environmental change pose to the development community?  Simply put, I think that the inevitability of various forms of environmental change (a level of climate change cannot be stopped now, certain fisheries are probably beyond recovery, etc.) over the next 50 or so years forces the field of development to start thinking very differently about the design and evaluation of policies, programs, and projects . . . and this, in turn, calls into question the value of things like randomized control trials for development.

In aid/development we tend to be oriented to relatively short funding windows in which we are supposed to accomplish particular tasks (which we measure through output indicators, like the number of judges trained) that, ideally, change the world in some constructive manner (outcome indicators, like a better-functioning judicial system).  Outputs are easier to deliver and measure than outcomes, and they tend to operate on much shorter timescales – which makes them perfect for end-of-project reporting even though they often bear little on the achievement of the desired outcomes that motivated the project in the first place (does training X judges actually result in a better functioning judicial system?  What if the judges were not the problem?).  While there is a serious push in the development community to move past outputs to outcomes (which I generally see as a very positive trend), I do not see a serious conversation about the different timescales on which these two sorts of indicators operate.  Outputs are very short-term.  Outcomes can take generations.  Obviously this presents significant practical challenges to those who do development work, and must justify their expenditures on an annual basis.

This has tremendous implications, I think, for development practice in the here and now – especially in development research.  For example, I think this pressure to move to outcomes but deliver them on the same timescale as outputs has contributed to the popularity of the randomized control trials for development (RCT4D) movement.  RCT4D work gathers data in a very rigorous manner, and subjects it to interesting forms of quantitative analysis to determine the impact of a particular intervention on a particular population.  As my colleague Marc Bellemare says, RCTs establish “whether something works, not how it works.”

The vast majority of RCT4D studies are conducted across a few months to years, directly after the project is implemented.  Thus, the results seem to move past outputs to impacts without forcing everyone to wait a very long time to see how things played out.  This, to me, is both a strength and a weakness of the approach . . . though I never hear anyone talking about it as a weakness.  The RCT4D approach seems to suggest that the evaluation of project outcomes can be effectively done almost immediately, without need for long-term follow-up.  This sense implicitly rests on the forms of interpretation and explanation that undergird the RCT4D approach – basically, what I see as an appallingly thin approach to the interpretation of otherwise interesting and rigorously gathered data. My sense of this interpretation is best captured by Andrew Gelman’s (quoting Fung) use of the term “story time”, which he defines as a “pivot from the quantitative finding to the speculative explanation.” It seems that many practitioners of RCT4D seem to think that story time is unavoidable . . . which to me reflects a deep ignorance of the concerns for rigor and validity that have existed in the qualitative research community for decades.  Feel free to check the methods section of any of my empirically-based articles (i.e. here and here): they address who I interviewed, why I interviewed them, how I developed interview questions, and how I knew that my sample size had grown large enough to feel confident that it was representative of the various phenomena I was trying to understand.  Toward the end of my most recent work in Ghana, I even ran focus groups where I offered my interpretations of what was going on back to various sets of community members, and worked with them to strengthen what I had right and correct what I had wrong.  As a result, I have what I believe is a rigorous, highly nuanced understanding of the social causes of the livelihoods decisions and outcomes that I can measure in various ways, qualitative and quantitative, but I do not have a “story time” moment in there.

The point here is that “story time”, as a form of explanation, rests on uncritical assumptions about the motivations for human behavior that can make particular decisions or behaviors appear intelligible but leave the door open for significant misinterpretations of events on the ground.  Further, the very framing of what “works” in the RCT4D approach is externally defined by the person doing the evaluation/designing the project, and is rarely revised in the face of field realities . . . principally because when a particular intervention does not achieve some externally-defined outcome, it is deemed “not to have worked.”  That really tends to shut down continued exploration of alternative outcomes that “worked” in perhaps unpredictable ways for unexpected beneficiaries.  In short, the RCT4D approach tends to reinforce the idea that development is really about delivering apolitical, technical interventions to people to address particular material needs.

The challenge global environmental change poses to the RCT4D randomista crowd is that of the “through ball” metaphor I raised in my previous post.  Simply put, identifying “what works” without rigorously establishing why it worked is broadly useful if you make two pretty gigantic assumptions: First, you have to assume that the causal factors that led to something “working” are aspects of universal biophysical and social processes that are translatable across contexts.  If this is not true, an RCT only gives you what works for a particular group of people in a particular place . . . which is not really that much more useful than just going and reading good qualitative ethnographies.  If RCTs are nothing more than highly quantified case studies, they suffer from the same problem as ethnography – they are hard to aggregate into anything meaningful at a broader scale.  And yes, there are really rigorous qualitative ethnographies out there . . .

Second, you have to assume that the current context of the trial is going to hold pretty much constant going forward.  Except, of course, global environmental change more or less chucks that idea for the entire planet.  In part, this is because global environmental change portends large, inevitable biophysical changes in the world.  Just because something works for improving rain-fed agricultural outputs today does not mean that the same intervention will work when the enabling environmental conditions, such as rainfall and temperature, change over the next few decades.  More importantly, though, these biophysical changes will play out in particular social contexts to create particular impacts on populations, who will in turn develop efforts to address those impacts. Simply put, when we introduce a new crop today and it is taken up and boosts yields, we know that it “worked” by the usual standards of agricultural development and extension.  But the take-up of new crops is not a function of agricultural ecology – there are many things that will grow in many places, but various social factors ranging from the historical (what crops were introduced via colonialism) to gender (who grows what crops and why) are what lead to particular farm compositions.  For example, while tree crops (oil palm, coconut, various citrus, acacia for charcoal) are common on farms around the villages in which I have worked in Ghana, almost none of these trees are found on women’s farms.  The reasons for this are complex, and link land tenure, gender roles, and household power relations into livelihoods strategies that balance material needs with social imperatives (for extended discussions, see here and here, or read my book).

Unless we know why that crop was taken up, we cannot understand if the conditions of success now will exist in the future . . . we cannot tell if what we are doing will have a durable impact.  Thus, under the most reliable current scenario for climate change in my Ghanaian research context, we might expect the gradual decline in annual precipitation, and the loss of the minor rainy season, to make tree crops (which tend to be quite resilient in the face of fluctuating precipitation) more and more attractive.  However, tree crops challenge the local communal land tenure system by taking land out of clan-level recirculation, and allowing women to plant them would further challenge land tenure by granting them direct control over access to land (which they currently lack).  Altering the land tenure system would, without question, set off a cascade of unpredictable social changes that would be seen in everything from gender roles to the composition of farms.  There is no way to be sure that any development intervention that is appropriate to the current context will be even functional in that future context.  Yet any intervention we put into place today should be helping to catalyze long-term changes . . .

Simply put: Global environmental change makes clear the limitations of our current thinking on aid/development (of which RCT4D is merely symptomatic).   Just like RCTs, our general framing of development does not move us any closer to understanding the long-term impact of our interventions.  Further, the results of RCTs are not generalizable past the local context (which most good randomistas already know), limiting their ability to help us transform how we do development.  In a world of global environmental change, our current approaches to development just replicate our existing challenges: they don’t really tell us if what we are doing will be of any lasting benefit, or even teach us general lessons about how to deliver short-term benefits in a rigorous manner.

 

Next up: The Final Chapter – Fixing It



Marc Bellemare’s blog pointed me to an interesting paper by Pascaline Dupas and Jonathan Robinson titled “Why Don’t the Poor Save More? Evidence from Health Savings Experiments.”  It is an interesting paper, taking a page from the RCT4D literature to test some different tools for savings in four Kenyan villages.  I’m not going to wade into the details of the paper or its findings here (they find some tools to be more effective than others at promoting savings for health expenditures), because they are not what really caught me about this paper.  Instead, what struck me was the absence of a serious consideration of “the social” in the framing of the questions asked and the results.  Dupas and Robinson expected three features to impact health savings: adequate storage facilities/technology, the ability to earmark funds, and the level of social commitment of the participant.  The social context of savings (or, more accurately, barriers to savings) are treated in what I must say is a terribly dismissive way [emphases are mine]:

a secure storage technology can enable individuals to avoid carrying loose cash on their person and thus allow people to keep some physical distance between themselves and their money. This may make it easier to resist temptations, to borrow the terminology in Banerjee and Mullainathan (2010), or unplanned expenditures, as many of our respondents call them. While these unplanned expenditures include luxury items such as treats, another important category among such unplanned expenditures are transfers to others.

A storage technology can increase the mental costs associated with unplanned expenditures, thereby reducing such expenditures. Indeed, if people use the storage technology to save towards a specic goal, such as a health goal in our study, people may consider the money saved as unavailable for purposes other than the specic goal – this is what Thaler (1990) coined mental accounting. By enabling such mental accounting, a designated storage place may give people the strength to resist frivolous expenditures as well as pressure to share with others, including their spouse.

I have seen many cases of unplanned expenditures to others in my fieldwork.  Indeed, my village-based field crews in Ghana used to ask for payment on as infrequent a basis as possible to avoid exactly these sorts of expenditures.  They would plan for large needed purchases, work until they had earned enough for that purchase, then take payment and immediately make the purchase, making their income illiquid before family members could call upon them and ask for loans or handouts.

However, the phrasing of Dupas and Robinson strikes the anthropologist/ geographer in me as dismissive.  These expenses are seen as “frivolous”, things that should be “resisted”.  The authors never consider the social context of these expenditures – why people agree to make them in the first place.  There seems to be an implicit assumption here that people don’t know how to manage their money without the introduction of new tools, and that is not at all what I have seen (albeit in contexts other than Kenya).  Instead, I saw these expenditures as part of a much larger web of social relations that implicates everything from social status to gender roles – in this context, the choice to give out money instead of saving it made much more sense.

In short, it seems to me that Dupas and Robinson are treating these savings technologies as apolitical, purely technical interventions.  However, introducing new forms of savings also intervenes in social relations at scales ranging from the household to the extended family to the community.  Thus, the uptake of these forms of savings will be greatly effected by contextual factors that seem to have been ignored here.  Further, the durability of the behavioral changes documented in this study might be much better predicted and understood – from my perspective, the declining use of these technologies over the 33 month scope of the project was completely predictable (the decline, that is, not the size of the decline).  Just because a new technology enables savings that might result in a greater standard of living for the individual or household does not mean that the technology will be seen as desirable – instead, that standard of living must also work within existing social roles and relations if these new behaviors are to endure.  Therefore, we cannot really explain the declining use of these technologies over time . . . yet development is, to me, about catalyzing enduring change.  While this study shows that the introduction of these technologies has at least a short term transformative effect on savings behavior, I’m not convinced this study does much to advance our understanding of how to catalyze changes that will endure.



Charles Kenny’s* book Getting Better has received quite a bit of attention in recent months, at least in part because Bill Gates decided to review it in the Wall Street Journal (up until that point, I thought I had a chance of outranking Charles on Amazon, but Gates’ positive review buried that hope).  The reviews that I have seen (for example here, here and here) cast the book as a counterweight to the literature of failure that surrounds development, and indeed Getting Better is just that.  It’s hard to write an optimistic book about a project as difficult as development without coming off as glib, especially when it is all too easy to write another treatise that critiques development in a less than constructive way.  It’s a challenge akin to that facing the popular musician – it’s really, really hard to convey joy in a way that moves the listener (I’m convinced this ability is the basis of Bjork’s career), but fairly easy to go hide in the basement for a few weeks, pick up a nice pallor, tune everything a step down, put on a t-shirt one size too small and whine about the girlfriend/boyfriend that left you.

Much of the critical literature on development raises important challenges to development practice and thought, but does so in a manner that makes addressing those challenges very difficult (if not intentionally impossible).  For example, deep (and important) criticisms of development anchored in poststructural understandings of discourse, meaning and power (for example, Escobar’s Encountering Development and Ferguson’s The Anti-Politics Machine) emerged in the early and mid-1990s, but their critical power was not tied in any way to a next step . . . which eventually undermined the critical project.  It also served to isolate academic development studies from the world of development practice in many ways, as even those working in development who were open to these criticisms could find no way forward from them.  Tearing something down is a lot easier than building something new from the rubble.

While Getting Better does not reconstruct development, its realistically grounded optimism provides what I see as a potential foundation for a productive rethinking of efforts to help the global poor.  Kenny chooses to begin from a realistic grounding, where Chapters 2 and 3 of the book present us with the bad news (global incomes are diverging) and the worse news (nobody is really sure how to raise growth rates).  But, Kenny answers these challenges in three chapters that illustrate ways in which things have been improving over the past several decades, from sticking a fork in the often-overused idea of poverty traps to the recognition that quality of life measures appear to be converging globally.  This is more than a counterweight to the literature of failure – this book is a counterweight to the literature of development that all-too-blindly worships growth as its engine.  In this book, Kenny clearly argues that growth-centric approaches to development don’t seem to be having the intended results, and growth itself is extraordinarily difficult to stimulate . . . and despite these facts, things are improving in many, many places around the world.   This opens the door to question the directionality of causality in the development and growth relationship: is growth the cause of development, or its effect?

Here, I am pushing Kenny’s argument beyond its overtly stated purpose in the book. Kenny doesn’t overtly take on a core issue at the heart of development-as-growth: can we really guarantee 3% growth per year for everyone forever?  But at the same time, he illustrates that development is occurring in contexts where there is little or no growth, suggesting that we can delink the goal of development from the impossibility of endless growth.  If ever there were a reason to be an optimist about the potential for development, this delinking is it.

I feel a great kinship with this book, in its realistic optimism.  I also like the lurking sense of development as a catalyst for change, as opposed to a tool or process by which we obtain predictable results from known interventions.  I did find Getting Better’s explanations for social change to rest a bit too heavily on a simplistic diffusion of ideas, a rather exogenous explanation of change that was largely abandoned by anthropology and geography back in the structure-functionalism of the 1940s and 50s.  The book does not really dig into “the social” in general.  For example, Kenny’s discussion of randomized control trials for development (RCT4D), like the RCT4D literature itself, is preoccupied with “what works” without really diving into an exploration of why the things that worked played out so well.  To be fair to Kenny, his discussion was not focused on explanation, but on illustrating that some things that we do in development do indeed make things better in some measurable way.  I also know that he understands that “what works” is context specific . . . as indeed is the very definition of “works.”  However, why these things work and how people define success is critical to understanding if they are just anecdotes of success in a sea of failure, or replicable findings that can help us to better address the needs of the global poor.  In short, without an exploration of social process, it is not clear from these examples and this discussion that things are really getting better.

An analogy to illustrate my point – while we have very good data on rainfall over the past several decades in many parts of West Africa that illustrate a clear downward trend in overall precipitation, and some worrying shifts in the rainy seasons (at least in Ghana), we do not yet have a strong handle on the particular climate dynamics that are producing these trends.  As a result, we cannot say for certain that the trend of the past few decades will continue into the future – because we do not understand the underlying mechanics, all we can do is say that it seems likely, given the past few decades, that this trend will continue into the future.  This problem suggests a need to dig into such areas as atmospheric physics, ocean circulation, and land cover change to try to identify the underlying drivers of these observed changes to better understand the future pathways of this trend.  In Getting Better (and indeed in the larger RCT4D literature), we have a lot of trends (things that work), but little by way of underlying causes that might help us to understand why these things worked, whether they will work elsewhere, or if they will work in the same places in the future.

In the end, I think Getting Better is an important counterweight to both the literature of failure and a narrowly framed idea of development-as-growth.  My minor grumbles amount to a wish that this counterweight was heavier.  It is most certainly worth reading, and it is my hope that its readers will take the book as a hopeful launching point for further explorations of how we might actually achieve an end to global poverty.

 

*Full disclosure: I know Charles, and have had coffee with him in his office discussing his book and mine.  If you think that somehow that has swayed my reading of Getting Better, well, factor that into your interpretation of my review.


Well, the response to part one was great – really good comments, and a few great response posts.  I appreciate the efforts of some of my economist colleagues/friends to clarify the terminology and purpose behind RCTs.  All of this has been very productive for me – and hopefully for others engaged in this conversation.

First, a caveat: On the blog I tend to write quickly and with minimal editing – so I get a bit fast and loose at times – well, faster and looser than I intend.  So, to this end, I did not mean to suggest that nobody was doing rigorous work in development research – in fact, the rest of my post clearly set out to refute that idea, at least in the qualitative sphere.  But I see how Marc Bellemare might have read me that way.  What I should have said was that there has always been work, both in research and implementation, where rigorous data collection and analysis were lacking.  In fact, there is quite a lot of this work.  I think we can all agree this is true . . . and I should have been clearer.

I have also learned that what qualitative social scientists/social theorists mean by theory, and what economists mean by theory, seems to be two different things.  Lee defined theory as “formal mathematical modeling” in a comment on part 1 of this series of posts, which is emphatically not what a social theorist might mean.  When I say theory, I am talking about a conjectural framing of a social totality such that complex causality can at least be contained, if not fully explained.  This framing should have reference to some sort of empirical evidence, and therefore should be testable and refinable over time – perhaps through various sorts of ethnographic work, perhaps through formal mathematical modeling of the propositions at hand (I do a bit of both, actually).  In other words, what I mean by theory (and what I focus on in my work) is the establishment of a causal architecture for observed social outcomes.  I am all about the “why it worked” part of research, and far less about the “if it worked” questions – perhaps mostly because I have researched unintended “development interventions” (i.e. unplanned road construction, the establishment of a forest reserve that alters livelihoods resource access, etc.) that did not have a clear goal, a clear “it worked!” moment to identify.  All I have been looking at are outcomes of particular events, and trying to establish the causes of those outcomes.  Obviously, this can be translated to an RCT environment because we could control for the intervention and expected outcome, and then use my approaches to get at the “why did it work/not work” issues.

It has been very interesting to see the economists weigh in on what RCTs really do – they establish, as Marc puts it, “whether something works, not in how it works.”  (See also Grant’s great comment on the first post).  I don’t think that I would get a lot of argument from people if I noted that without causal mechanisms, we can’t be sure why “what worked” actually worked, and whether the causes of “what worked” are in any way generalizable or transportable.  We might have some idea, but I would have low confidence in any research that ended at this point.  This, of course, is why Marc, Lee, Ruth, Grant and any number of other folks see a need for collaboration between quant and qual – so that we can get the right people, with the right tools, looking at different aspects of a development intervention to rigorously establish the existence of an impact, and the establish an equally rigorous understanding of the causal processes by which that impact came to pass.  Nothing terribly new here, I think.  Except, of course, for my continued claim that the qualitative work I do see associated with RCT work is mostly awful, tending toward bad journalism (see my discussion of bad journalism and bad qualitative work in the first post).

But this discussion misses a much larger point about epistemology – what I intended to write in this second part of the series all along.  I do not see the dichotomy between measuring “if something works” and establishing “why something worked” as analytically valid.  Simply put, without some (at least hypothetical) framing of causality, we cannot rigorously frame research questions around either question.  How can you know if something worked, if you are not sure how it was supposed to work in the first place?  Qualitative research provides the interpretive framework for the data collected via RCT4D efforts – a necessary framework if we want RCT4D work to be rigorous.  By separating qualitative work from the quant oriented RCT work, we are assuming that somehow we can pull data collection apart from the framing of the research question.  We cannot – nobody is completely inductive, which means we all work from some sort of framing of causality.  The danger is when we don’t acknowledge this simple point – under most RCT4D work, those framings are implicit and completely uninterrogated by the practitioners.  Even where they come to the fore (Duflo’s 3 I s), they are not interrogated – they are assumed as framings for the rest of the analysis.

If we don’t have causal mechanisms, we cannot rigorously frame research questions to see if something is working – we are, as Marc says, “like the drunk looking for his car keys under the street lamp when he knows he lost them elsewhere, because the only place he can actually see is under the street lamp.”  Only I would argue we are the drunk looking for his keys under a streetlamp, but he has no idea if they are there or not.

In short, I’m not beating up on RCT4D, nor am I advocating for more conversation – no, I am arguing that we need integration, teams with quant and qual skills that frame the research questions together, that develop tests together, that interpret the data together.  This is the only way we will come to really understand the impact of our interventions, and how to more productively frame future efforts.  Of course, I can say this because I already work in a mixed-methods world where my projects integrate the skills of GIScientists, land use modelers, climate modelers, biogeographers and qualitative social scientists – in short, I have a degree of comfort with this sort of collaboration.  So, who wants to start putting together some seriously collaborative, integrated evaluations?

Those following this blog (or my twitter feed) know that I have some issues with RCT4D work.  I’m actually working on a serious treatment of the issues I see in this work (i.e. journal article), but I am not above crowdsourcing some of my ideas to see how people respond.  Also, as many of my readers know, I have a propensity for really long posts.  I’m going to try to avoid that here by breaking this topic into two parts.  So, this is part 1 of 2.

To me, RCT4D work is interesting because of its emphasis on rigorous data collection – certainly, this has long been a problem in development research, and I have no doubt that the data they are gathering is valid.  However, part of the reason I feel confident in this data is because, as I raised in an earlier post,  it is replicating findings from the qualitative literature . . . findings that are, in many cases, long-established with rigorously-gathered, verifiable data.  More on that in part 2 of this series.

One of the things that worries me about the RCT4D movement is the (at least implicit, often overt) suggestion that other forms of development data collection lack rigor and validity.  However, in the qualitative realm we spend a lot of time thinking about rigor and validity, and how we might achieve both – and there are tools we use to this end, ranging from discursive analysis to cross-checking interviews with focus groups and other forms of data.  Certainly, these are different means of establishing rigor and validity, but they are still there.

Without rigor and validity, qualitative research falls into bad journalism.  As I see it, good journalism captures a story or an important issue, and illustrates that issue through examples.  These examples are not meant to rigorously explain the issue at hand, but to clarify it or ground it for the reader.  When journalists attempt to move to explanation via these same few examples (as far too often columnists like Kristof and Friedman do), they start making unsubstantiated claims that generally fall apart under scrutiny.  People mistake this sort of work for qualitative social science all the time, but it is not.  Certainly there is some really bad social science out there that slips from illustration to explanation in just the manner I have described, but this is hardly the majority of the work found in the literature.  Instead, rigorous qualitative social science recognizes the need to gather valid data, and therefore requires conducting dozens, if not hundreds, of interviews to establish understandings of the events and processes at hand.

This understanding of qualitative research stands in stark contrast to what is in evidence in the RCT4D movement.  For all of the effort devoted to data collection under these efforts, there is stunningly little time and energy devoted to explanation of the patterns seen in the data.  In short, RCT4D often reverts to bad journalism when it comes time for explanation.  Patterns gleaned from meticulously gathered data are explained in an offhand manner.  For example, in her (otherwise quite well-done) presentation to USAID yesterday, Esther Duflo suggested that some problematic development outcomes could be explained by a combination of “the three I s”: ideology, ignorance and inertia.  This is a boggling oversimplification of why people do what they do – ideology is basically nondiagnostic (you need to define and interrogate it before you can do anything about it), and ignorance and inertia are (probably unintentionally) deeply patronizing assumptions about people living in the Global South that have been disproven time and again (my own work in Ghana has demonstrated that people operate with really fine-grained information about incomes and gender roles, and know exactly what they are doing when they act in a manner that limits their household incomes – see here, here and here).  Development has claimed to be overcoming ignorance and inertia since . . . well, since we called it colonialism.  Sorry, but that’s the truth.

Worse, this offhand approach to explanation is often “validated” through reference to a single qualitative case that may or may not be representative of the situation at hand – this is horribly ironic for an approach that is trying to move development research past the anecdotal.  This is not merely external observation – I have heard from people working inside J-PAL projects that the overall program puts little effort into serious qualitative work, and has little understanding of what rigor and validity might mean in the context of qualitative methods or explanation.  In short, the bulk of explanation for these interesting patterns of behavior that emerges from these studies resorts to uninterrogated assumptions about human behavior that do not hold up to empirical reality.  What RCT4D has identified are patterns, not explanations – explanation requires a contextual understanding of the social.

Coming soon: Part 2 – Qualitative research and the interpretation of empirical data

I was at a talk today where folks from Michigan State were presenting research and policy recommendations to guide the Feed the Future initiative.  I greatly appreciate this sort of presentation – it is good to get real research in the building, and to see USAID staff that have so little time turn out in large numbers to engage.  Once again, folks, its not that people in the agencies aren’t interested or don’t care, its a question of time and access.

In the course of one of the presentations, however, I saw a moment of “explanation” for observed behavior that nicely captures a larger issue that has been eating at me as the randomized control trials for development (RCT4D) movement gains speed . . . there isn’t a lot of explanation there.  There is really interesting data, rigorously collected, but explanation is another thing entirely.

In the course of the presentation, the presenter put up a slide that showed a wide dispersion of prices around the average price received by farmers for their maize crops around a single market area (near where I happen to do work in Malawi).  Nothing too shocking there, as this happens in Malawi, and indeed in many places.  However, from a policy and programming perspective, it’s important to know that the average price is NOT the same thing as what a given household is taking home.  But then the presenter explained this dispersion by noting (in passing) that some farmers were more price-savvy than others.

1) there is no evidence at all to support this claim, either in his data or in the data I have from an independent research project nearby

2) this offhand explanation has serious policy ramifications.

This explanation is a gross oversimplification of what is actually going on here – in Mulanje (near the Luchenza market area analyzed in the presentation), price information is very well communicated in villages.  Thus, while some farmers might indeed be more savvy than others, the prices they are able to get are communicated throughout the village, thus distributing that information.  So the dispersion of prices is the product of other factors.  Certainly desperation selling is probably part of the issue (another offhand explanation offered later in the presentation).  However, what we really need, if we want a rigorous understanding of the causes of this dispersion and how to address it, is a serious effort to grasp the social component of agriculture in this area – how gender roles, for example, shape household power dynamics, farm roles, and the prices people will sell at (this is a social consideration that exceeds explanation via markets), or how social networks connect particular farmers to particular purchasers in a manner that facilitates or inhibits price maximization at market.  These considerations are both causal of the phenomena that the presenter described, and the points of leverage on which policy might act to actually change outcomes.  If farmers aren’t “price savvy”, this suggests the need for a very different sort of intervention than what would be needed to address gendered patterns of agricultural strategy tied to long-standing gender roles and expectations.

This is a microcosm of what I am seeing in the RCT4D world right now – really rigorous data collection, followed by really thin interpretations of the data.  It is not enough to just point out interesting patterns, and then start throwing explanations out there – we must turn from rigorous quantitative identification of significant patterns of behavior to the qualitative exploration of the causes of those patterns and their endurance over time.  I’ve been wrestling with these issues in Ghana for more than a decade now, an effort that has most recently led me to a complete reconceptualization of livelihoods (shifting from understanding livelihoods as a means of addressing material conditions to a means of governing behaviors through particular ways of addressing material conditions – the article is in review at Development and Change).  However, the empirical tests of this approach (with admittedly tiny-n size samples in Ghana, and very preliminary looks at the Malawi data) suggest that I have a better explanatory resolution for explained behaviors than possible through existing livelihoods approaches (which would end up dismissing a lot of choices as illogical or the products of incomplete information) – and therefore I have a better foundation for policy recommendations than available without this careful consideration of the social.

See, for example, this article I wrote on how we approach gender in development (also a good overview of the current state of gender and development, if I do say so myself).  I empirically demonstrate that a serious consideration of how gender is constructed in particular places has large material outcomes on whose experiences we can understand, and therefore the sorts of interventions we might program to address particular challenges.  We need more rigorous wrestling with “the social” if we are going to learn anything meaningful from our data.  Period.

In summary, explanation is hard.  Harder, in many ways, than rigorous data collection.  Until we start spending at least as much effort on the explanation side as we do on the collection side, we will not really change much of anything in development.