Entries tagged with “evaluation”.


I’m a big fan of accountability when it comes to aid and development. We should be asking if our interventions have impact, and identifying interventions that are effective means of addressing particular development challenges. Of course, this is a bit like arguing for clean air and clean water. Seriously, who’s going to argue for dirtier water or air. Who really argues for ineffective aid and development spending?

Nobody.

More often than not, discussions of accountability and impact serve only to inflate narrow differences in approach, emphasis, or opinion into full on “good guys”/ “bad guys” arguments, where the “bad guys” are somehow against evaluation, hostile to the effective use of aid dollars, and indeed actively out to hurt the global poor. This serves nothing but particular cults of personality and, in my opinion, serves to squash out really important problems with the accountability/impact agenda in development. And there are major problems with this agenda as it is currently framed – around the belief that we have proven means of measuring what works and how, if only we would just apply those tools.

When we start from this as a foundation, the accountability discussion is narrowed to a rather tepid debate about the application of the right tools to select the right programs. If all we are really talking about are tools, any skepticism toward efforts to account for the impact of aid projects and dollars is easily labeled an exercise in obfuscation, a refusal to “learn what works,” or an example of organizations and individuals captured by their own intellectual inertia. In narrowing the debate to an argument about the willingness of individuals and organizations to apply these tools to their projects, we are closing off discussion of a critical problem in development: we don’t actually know exactly what we are trying to measure.

Look, you can (fairly easily) measure the intended impact of a given project or program if you set things up for monitoring and evaluation at the outset.  Hell, with enough time and money, we can often piece enough data together to do a decent post-hoc evaluation. But both cases assume two things:

1)   The project correctly identified the challenge at hand, and the intervention was actually foundational/central to the needs of the people at hand.

This is a pretty weak assumption. I filled up a book arguing that a lot of the things that we assume about life for the global poor are incorrect, and therefore that many of our fundamental assumptions about how to address the needs of the global poor are incorrect. And when much of what we do in development is based on assumptions about people we’ve never met and places we’ve never visited, it is likely that many projects which achieve their intended outcomes are actually doing relatively little for their target populations.

Bad news: this is pretty consistent with the findings of a really large academic literature on development. This is why HURDL focuses so heavily on the implementation of a research approach that defines the challenges of the population as part of its initial fieldwork, and continually revisits and revises those challenges as it sorts out the distinct and differentiated vulnerabilities (for explanation of those terms, see page one of here or here) experienced by various segments of the population.

Simply evaluating a portfolio of projects in terms of their stated goals serves to close off the project cycle into an ever more hermetically-sealed, self-referential world in which the needs of the target population recede ever further from design, monitoring, and evaluation. Sure, by introducing that drought-tolerant strain of millet to the region, you helped create a stable source of household food that guards against the impact of climate variability. This project could record high levels of variety uptake, large numbers of farmers trained on the growth of that variety, and even improved annual yields during slight downturns in rain. By all normal project metrics, it would be a success. But if the biggest problem in the area was finding adequate water for household livestock, that millet crop isn’t much good, and may well fail in the first truly dry season because men cannot tend their fields when they have to migrate with their animals in search of water.  Thus, the project achieved its goal of making agriculture more “climate smart,” but failed to actually address the main problem in the area. Project indicators will likely capture the first half of the previous scenario, and totally miss the second half (especially if that really dry year comes after the project cycle is over).

2)   The intended impact was the only impact of the intervention.

If all that we are evaluating is the achievement of the expected goals of a project, we fail to capture the wider set of impacts that any intervention into a complex system will produce. So, for example, an organization might install a borehole in a village in an effort to introduce safe drinking water and therefore lower rates of morbidity associated with water-borne illness. Because this is the goal of the project, monitoring and evaluation will center on identifying who uses the borehole, and their water-borne illness outcomes. And if this intervention fails to lower rates of water-borne illness among borehole users, perhaps because post-pump sanitation issues remain unresolved by this intervention, monitoring and evaluation efforts will likely grade the intervention a failure.

Sure, that new borehole might not have resulted in lowered morbidity from water-borne illness. But what if it radically reduced the amount of time women spent gathering water, time they now spend on their own economic activities and education…efforts that, in the long term, produced improved household sanitation practices that ended up achieving the original goal of the borehole in an indirect manner? In this case, is the borehole a failure? Well, in one sense, yes – it did not produce the intended outcome in the intended timeframe. But in another sense, it had a constructive impact on the community that, in the much longer term, produced the desired outcome in a manner that is no longer dependent on infrastructure. Calling that a failure is nonsensical.

Nearly every conversation I see about aid accountability and impact suffers from one or both of these problems. These are easy mistakes to make if we assume that we have 1) correctly identified the challenges that we should address and 2) we know how best to address those challenges. When these assumptions don’t hold up under scrutiny (which is often), we need to rethink what it means to be accountable with aid dollars, and how we identify the impact we do (or do not) have.

What am I getting at? I think we are at a point where we must reframe development interventions away from known technical or social “fixes” for known problems to catalysts for change that populations can build upon in locally appropriate, but often unpredictable, ways. The former framing of development is the technocrats’ dream, beautifully embodied in the (failing) Millennium Village Project, just the latest incarnation of Mitchell’s Rule of Experts or Easterly’s White Man’s Burden. The latter requires a radical embrace of complexity and uncertainty that I suspect Ben Ramalingan might support (I’m not sure how Owen Barder would feel about this). I think the real conversation in aid/development accountability and impact is about how to think about these concepts in the context of chaotic, complex systems.

Bill Gates, in his annual letter, makes a compelling argument for the need to better measure the effectiveness of aid.  There is a nice, 1 minute summary video here.  This is becoming a louder and louder message in development and aid, having been pushed now by folks ranging from Raj Shah, the Administrator of USAID, to most everyone at the Center for Global Development.  There are interesting debates going on about how to shift from a focus on outputs (we bought this much stuff for this many dollars) to a focus on impacts (the stuff we bought did the following good things in the world).  Most of these discussions are technical, focused on indicators and methods.  What is not discussed is the massively failure-averse institutional culture of development donors, and how this culture is driving most of these debates.  As a result, I think that Gates squanders his bully pulpit by arguing that we should be working harder on evaluation. We all know that better evaluation would improve aid and development. Suggesting that this is even a serious debate in development requires a nearly-nonexistent straw man that somehow thinks learning from our programs and projects is bad.

Like most everyone else in the field, I agree with the premise that better measurement (thought very broadly, to include methods and data across the quantitative to qualitative spectrum) can create a learning environment from which we might make better decisions about aid and development. But none of this matters if all of the institutional pressures run against hearing bad news. Right now, donors simply cannot tolerate bad news, even in the name of learning. Certainly, there are lots of people within the donor agencies that are working hard on finding ways to better evaluate and learn from existing and past programs, but these folks are going to be limited in their impact as long as agencies such as USAID answer to legislators that seem ready to declare any misstep a waste of taxpayer money, and therefore a reason to cut the aid budget…so how can they talk about failure?

So, a modest proposal for Bill Gates. Bill (may I call you Bill?), please round up a bunch of venture capitalists. Not the nice socially-responsible ones (who could be dismissed as bleeding-heart lefties or something of the sort), the real red-in-tooth-and-claw types.  Bring them over to DC, and parade out these enormously wealthy, successful (by economic standards, at least) people, and have them explain to Congress how they make their money. Have them explain how they got rich failing on eight investments out of ten, because the last two investments more than paid for the cost of the eight failures. Have them explain how failure is a key part of learning, of success, and how sometimes failure isn’t the fault of the investor or donor – sometimes it is just bad luck. Finally, see if anyone is interested in taking a back-of-the-envelope shot at calculating how much impact is lost due to risk-averse programming at USAID (or any other donor, really).  You can shame Congress, who might feel comfortable beating up on bureaucrats, but not so much on economically successful businesspeople.  You could start to bring about the culture change needed to make serious evaluation a reality. The problem is not that people don’t understand the need for serious evaluation – I honestly don’t know anyone making that argument.  The problem is creating a space in which that can happen. This is what you should be doing with your annual letter, and with the clout that your foundation carries.

Failing that (or perhaps alongside that), lead by demonstration – create an environment in your foundation in which failure becomes a tag attached to anything from which we do not learn, instead of a tag attached to a project that does not meet preconceived targets or outcomes.  Forget charter cities (no, really, forget them), become the “charter donor” that shows what can be done when this culture is instituted.

The evaluation agenda is getting stale, running aground on the rocky shores of institutional incentives. We need someone to pull it off the rocks.  Now.

So, the Center for Global Development, a non-partisan think tank focused on reducing poverty and making globalization work for the poor (a paraphrase of their mission statement, which can be found here), has issued a report that more or less says that USAID’s quality and effectiveness of aid is very low when compared to other agencies.

Well, I’m not all that freaked out by this assessment, principally because it fails to ask important questions relevant to understanding development needs and development outcomes.  In fact, the entire report is rigged – not intentionally, mind you, but I suspect out of a basic ignorance of the difference between the agencies being evaluated, and an odd (mis)understanding of what development is.

For me, the most telling point in the report came right away, on pages 3 and 4:

Given these difficulties in relating aid to development impact on the ground, the scholarly literature on aid effectiveness has failed to convince or impress those who might otherwise spend more because aid works (as in Sachs 2005) or less because aid doesn’t work often enough (Easterly 2003).

Why did this set me off?  Well, in my book I argue that the “poles” of Sachs and Easterly in the development literature are not poles at all – they operate from the same assumptions about how development and globalization work, and I just spent 90,000 words worth of a book laying out those assumptions and why they are often wrong.  In short, this whole report is operating from within the development echo chamber from which this blog takes its name.  But then they really set me off:

In donor countries especially, faced with daunting fiscal and debt problems, there is new and healthy emphasis on value for money and on maximizing the impact of their aid spending.

Folks, yesterday I posted about how the desire to get “value for our money” in development was putting all the wrong pressures on agencies . . . not because value is bad, but because it puts huge pressures on the development agencies to avoid risk (and associated costs), which in turn chokes off innovation in their programs and policies.  And here we have a report, evaluating the quality of aid (their words) in terms of its cost-effectiveness.  One of their four pillar analyses is the ability of agencies to maximize aid efficiency.  This is nuts.

Again, its not that there should be no oversight of the funds or their uses, or that there should be no accountability for those uses.  But to demand efficiency is to largely rule out high risk efforts which could have huge returns but carry a significant risk of failure.  Put another way, if this metric was applied to the Chilean mine rescue, then it would score low for efficiency because they tried three methods at once and two failed.  Of course, that overlooks the fact that they GOT THE MINERS OUT ALIVE.  Same thing for development – give me an “inefficient” agency that can make transformative leaps forward in our understandings of how development works and how to improve the situation of the global poor over the “efficient” agency that never programs anything of risk, and never makes those big leaps.

Now, let’s look at the indicators – because they tell the same story.  One of the indicators under efficiency is “Share of allocation to well-governed countries.”  Think about the pressure that places on an agency that has to think about where to set up its programming.  What about all of the poor, suffering people in poorly-governed countries?  Is USAID not supposed to send massive relief to Haiti after an earthquake because its government is not all we might hope?  This indicator either misses the whole point of development as a holistic, collaborative process of social transformation, or it is a thinly-veiled excuse to start triaging countries now.

They should know better – Andrew Natsios is one of their fellows, and he has explained how these sorts of evaluation pressures choke an agency to death.  Amusingly, they cite this work in here . . . almost completely at random on page 31, for a point that has no real bearing on that section of the text.  I wonder what he thinks of this report . . .

In the end, USAID comes out 126th of 130 agencies evaluated for “maximizing efficiency.”  Thank heavens.  It probably means that we still have some space to experiment and fail left.  Note that of the top 20% of donors, the highest scores went to the World Bank and UN Agencies, arguably the groups that do the least direct programming on the ground – in other words, the “inefficiencies” of their work are captured elsewhere, when the policies and programs they set up for others to run begin to come apart.  The same could be said of the Millennium Challenge Corporation here in the US, which also scored high.  In other words, they are rewarding the agencies that don’t actually do all that much on the ground for their efficiency, while the agencies that actually have to deal with the uncertainties of real life get dinged for it.

And the Germans ended up ranking high, but hey, nothing goes together like Germans and efficiency.  That one’s for you, Daniel Esser.

What a mess of a report . . . and what a mess this will cause in the press, in Congress, etc.  For no good reason.