This post on language in international development evaluation is one way I’m trying to stay sane and productive since the end of USAID. (Still hurts to type that. The end of USAID. Painful and surreal.) Anyway, it’s not as long as my last post (sorry, not sorry) and, as you can see, it’s about the big L.


Language is the water we swim in

Without good language and translation skills there is no field work. There is only colonialist claptrap that serves the researchers’ needs and wastes the respondents’ time. In case it needs to be said, Don’t Do That!

So you’ve got a survey, or some interviews or focus groups. Let’s say you want to ask about democratic ideals. Or hand-washing. Or attitudes toward LGBTQ individuals. It’s in English, because it comes from Upscale University’s Highdolla Campus in Americatown. Where do you want to run this survey? If you want to run it in Americatown, okay, you might be nearly ready to run your survey.

But if, say, you’re shooting for Putumayo in Colombia, or Ulan-Ude in Siberia, you’ve only just begun.

Translating to another language is anything but straightforward. Across geographic and linguistic zones, you may need months (plural) to translate and test, back-translate, refine, test and refine again. You might still get answers you can’t explain.

Languages are not 1:1. It’s more like EE:salamander:√79:🤯

You also have to get it right for the range of people you want to query. Not all people in Mongolia speak the same words and have the same meanings, just like not all those who speak French sound like someone from Paris.

Issues around language are one great place to flex a vital evaluation muscle: humility.

How not to get lost in translation

Translation is an art and a science.

It’s also an audience, a moment, and a sense-making. Professional translators work their whole careers to refine and improve their work, in a milieu of nuance, variability, and subjectivity. They know the rules and the exceptions of the target language, as well as how to break the rules productively. Even if, say, you are interviewing teens in U.S. suburban school districts in 2024, knowing the slang terms they’re using can make your questions or your interpretations of their responses make more sense.


Do you know what you’d need to know, to do a survey with suburban teens in Mexico City, or rural teens in Chiapas, or business executives in Monterrey? People may be able to answer your generic Spanish-language questions in these cases, but will they bother? How much attention they pay to your survey depends on how much attention you pay to them. This starts with good, patient, thoughtful translation, by professional translators who understand these differences.

[Here’s one of my recent face-palms. My team wanted to survey trainees in a worldwide program. They all spoke English well enough to participate in training in English, and to communicate with each other in chats. The survey wasn’t too long, though time and cost to translate to all the participants’ languages would have been substantial. Still, I knew it would have made them feel more welcome. But we did not take that extra step – we assumed their participation based on how engaged and active they were (in English) with the program. Result? Low survey response, lower than their habitual engagement would suggest. If I ever get another job, I won’t make that mistake again.]

Even big themes don’t necessarily translate.

This is true even when the theme is, strictly speaking, translatable. In East Africa, our interviewing team identified some respondents as living with disabilities – wheelchair, crutches, other physical clues that in Western cultures mean “living with a disability.” But when we asked them if they had a disability, these folks uniformly said no. They defined disability differently from how our American and middle class East African definitions. That doesn’t make either definition right or wrong, but it does make the Western one wrong for that context. We need to understand what living with disabilities means for that culture before our survey questions could make sense.

Answer categories often don’t translate either.

Have you ever done a survey where none of the options looks good, but you have to answer something, so you just pick one? If not, it’s probably because you’re answering surveys made by your peers, more or less. Imagine if a nomadic tribe from Sudan wrote a survey you had to answer. Imagine they ask you, Which animal is most important for your family’s economic and social status? Imagine, then, that the answer categories are things like oxen, camel, bulls… but your family’s economic status doesn’t rest on livestock. Instead, in your culture, showing off status means buying a tiny dog or a hairless cat. Writing answer categories has to be a repeated process with local users to get categories that fit.

Of course you can include an “Other” option, and ask them to tell you what that other is for them… but that’s more work that could be done more efficiently by briefing the survey writers on animals and status in the culture where the survey is being sent. It’s also more work for the respondent: looking over answers that don’t fit, not seeing how to answer, realizing they’ll have to type something in for you… all these are moments that a survey respondent might drop out. The less aligned your language and cultural content to the people who are supposed to respond, the more likely they will instead just… bounce.


Scales – Likert and otherwise – are annoying in any language.

Asking if people “Strongly Agree, Agree, Neither Agree nor Disagree, Disagree, or Strongly Disagree” is not only annoying (especially when repeated in full for two dozen questions). It can also give you answers that are of questionable value. The precise gradient those questions aim to elicit can be easily lost in translation, in unfamiliarity with this (over-)used Western technique, or in simple fatigue. Can’t you just ask people if they like something or not?

Similarly we often see Western survey scales like None, A little, Some, Significantly, and Extremely – or similar variations. But what’s the difference between “Some” and “Significantly”? Or “Significantly” and “Extremely”? Now, do it in Haitian Kreyòl – or whatever language that is not your native tongue. Then do it for forty-five minutes with some unsuspecting nice person who didn’t want to tell you No when you asked if they had time for a survey.


It’s a draining exercise for everyone. And it can yield unhelpful results.

At that point many respondents just want you off their doorstep, out of their market stall, or out of their school. Even your interviewers dread these questions. In many environments, researchers take advantage of the politeness of different cultures with long, unwieldy, untested surveys. They’re also looking to economize with under-experienced survey teams. How can this confluence yield anything but weak results? They look like nice numbers lined up in an Excel spreadsheet. But the quality of attention by interviewer and respondent is questionable. There are probably biases in these numbers that the team can’t see.

And the best/worst part of these scales? We often collapse them for purposes of reporting. That precision the survey writers wanted so much? Ignored. Better to use the simplest array of answers possible, if that’s how you’re going to report anyway. What’s a better answer scheme for the following question? (See one suggested answer at the end of the post, but you may have even better ones! (Share ’em in the Comments – ha ha ha ha, that’s like asking you to fill in an “Other” survey answer!)

SAMPLE QUESTION AND ANSWER: “To what extent did you use the project materials in your job after your training?

  1. To no extent at all
  2. To a little extent
  3. To a medium-level extent
  4. To a larger than medium-level extent
  5. To a very great extent”

In this formation, your field interviewers are often asked to clarify. Depending on the type of survey and the level of precision demanded by the design, interviewers may or may not be able to clarify or define, or if they do, they may do so differently across your team. You’re very likely to get imprecise data, and your respondents are likely to feel kind of dissatisfied about hearing and having to answer this question.

Some survey specialists say you should always have an odd number of answer so there’s a midpoint, so people can answer “medium-level” or the equivalent of any scale. I tend to prefer an even number, so respondents have to come down on one side or the other. Say, for example, you took out #3 or #4 from the list above: either one could be cut, and the question would get better answers. Plus, you could more easily divide the answers into two categories when you’re analyzing – the negatives at #1 and #2 and the remaining two that are more positive.

But if you’re going to do that anyway, why not shrink your list to the essentials from the start?

The more rigid the English, the more concerning the translation

Some of the best, most rigorous research – on which donors rely to make hefty decisions – is based on the longest, the most in-depth, the most demanding surveys, like household surveys done to test cash transfer programs. They can easily take an hour or two to complete with one respondent. These often share questions across projects and researchers, to make the research as comparable as possible. That means they’re also more constrained during translation. One English question we often ask is: “How many hours last week did you work for pay?” The translations of this question across contexts must remain true to the original meaning, which is based on how we in the West understand the theme. We train field teams to stick to the exact question order and wording, and to not define or reword the question, even if the respondent asks.

But there are as many ways to interpret this question as there are discriminatory labor markets. (And by that I mean, “A lot.”) People work for housing, for example, or they farm, for which they are paid only at harvest. They may be on-call as caretakers or housekeepers, working intermittently when summoned. How do we count those hours? A person may have worked none last week but only because it was a holiday – so, report nothing? Report the previous week? Challenges to that particular question are infinite, because of the creativity and variation in the ways people are exploited for their labor. We shoehorn answers into the translated-from-English categories and collapse the real-life conditions behind them. When we or others rely on these data to make decisions, like about mobile money program payments or free time for community projects, these blind spots are like invisible time bombs in the analyses.

What can evaluators do?

Humility and patience are good starting points. So is a diverse team whose local staff have enough influence to make decisions. Speaking frankly with evaluation commissioners on these topics is also essential, and should be done early. You need to establish timelines and budget availability for quality translation, back-translation, and piloting the instruments with people who are very like the target audiences.

Another “must” is brevity. Ask what you need to ask, but resist the urge to find out things you won’t use. Different methods have different time limits – more than twenty minutes on the phone is pretty challenging, for example, or more than an hour in an interview. But strive for even less – people’s time is precious. If people will answer survey questions on their own – on the web or SMS, or on paper – correct, simple language is crucial so they do not abandon the survey.

Recently my team sent out an online survey link to graduates of a program who took the program in English. Later, in phone and in-person interviews, we learned how many of these graduates actually didn’t speak English all that well. Our response rates from those sites reflected it. In one language group, zero people responded to the English-language survey, though most native English speakers responded. I won’t make that mistake again.

I think it goes without saying that using something like Google Translate or some AI tool is not good enough. Even a sole native speaker might not be good enough. Don’t assume that a local on your team can translate well enough. It really takes questioning the translation thoughtfully. For that you need someone trained to do so and/or a team of people willing to take the time for it.

The attention to language doesn’t end when you send researchers to the field. Those researchers are great resources – it’s important to have regular check-ins. The team needs to be able to talk about what’s working and what’s not, including around language. Unexpected answers may show that respondents are hearing a different question than the one you thought you were asking, for example. Or there are more “Other” responses than those ticking an answer category box. Or more people are declining to participate than you imagined. Ask yourself, and ask your team: What are the anomalies telling you about the survey and its language?

It’s often said that language is culture. So beliefs and religion, sport and leisure, history, trends, jokes and shared experiences are what turn “language” into “communication” – or even communion – with one another. There’s variety within countries and cultures, including or particularly those that are multilingual to begin with. All of this is a huge responsibility to face, as an outsider or team of outsiders, when addressing a community with your survey or interview questions.

Just one example

I speak a second language very well. I took part in an animated discussion a few months ago among four team members in their mother tongue, my second language. The debate was about the language of a straightforward, simple, twenty-minute phone survey. Nothing too personal or nuanced, and the audience was mostly educated business owners in the capital city. As much as I have loved and worked in this language for over twenty years, I knew my place was on the sidelines: only native speakers have the necessary “ear” for local variations and vocabulary.

The team worked and reworked the language of the survey, trying versions of questions on others in the office. They consulted dictionaries and sector specialists and, in at least one instance, a brother-in-law who teaches the language in a high school. After a few days, they had a compromise version. Each one of them would have changed some of their peers’ language, but they had found common ground through patience and thoughtfulness.

Still, after just one day of piloting the survey on the phone, they went back to the debate table. Based on what they heard from their pilot testers, they changed the wording of more than half the questions! This is normal and good – unless you haven’t built in time and budget for it.

And now just for fun

If you’ve read this far and you are interested in languages and translation, check out this novel from R. F. Kuang: Babel. It’s fantastic. It has a bit of the fantasy/sci-fi about it, but to my eyes it’s more about the magic of language in the face of empire.


Here’s a lovely quote from the book: “That’s just what translation is, I think. That’s all speaking is. Listening to the other and trying to see past your own biases to glimpse what they’re trying to say. Showing yourself to the world, and hoping someone else understands.” And, I would add, listening well enough to understand someone else.


Here’s one way you might rephrase and improve that question and answer scheme from above:

POSSIBLE REWORKING OF THE QUESTION AND ANSWER: “Did you use the materials from the project after the training?”

  1. Yes, very much
  2. Yes, but only a little
  3. No, not at all

Much easier to analyze, and you can still separate out 1 and 2 meaningfully.