How to Write a Data Collection Protocol (And Why Most Research Projects Should Have One)

A protocol is not a bureaucratic requirement. It is the document that saves a project when the unexpected happens in the field.

The supervisor gets a call on day three of fieldwork. One of the enumerators wants to know whether to interview a household head who was listed in the sampling frame but has been away from the home for two weeks and might not return before data collection ends. Do they wait? Interview a family member? Replace the household? Move on?

If the project has a data collection protocol, the answer is in it. If it does not, the supervisor makes a judgment call. That call might be right. Or it might introduce a systematic deviation from the sampling design that only becomes visible three months later when the data is being analyzed.

A data collection protocol exists to prevent that scenario. Here is what it should contain.

What a Data Collection Protocol Is

A data collection protocol is a written document that describes every operational aspect of how data will be collected on a project: who collects it, from whom, using what methods and tools, under what conditions data is acceptable, how quality is monitored, and what procedures apply when standard scenarios do not fit.

It is different from the survey instrument (which is the questionnaire or interview guide). It is different from the sampling framework (which describes how respondents are selected). It sits alongside both documents as the operational rulebook for fieldwork.

Blog cover showing two contrasting outcomes in research fieldwork: a judgment call made without a data collection protocol versus a resolved answer when one exists, published by Projectbist — A protocol is the document your field team refers to when the research plan meets the reality of the field. Make it clear enough to be useful under pressure

The Core Sections Every Protocol Needs

1. Study Overview and Purpose

A brief paragraph describing what the study is measuring, why it matters, and who commissioned it. This is not for the supervisor. It is for the junior enumerator who needs to understand why they are doing this work with enough context to handle unexpected situations with judgment.

2. Eligibility and Exclusion Criteria

Who qualifies for this study and who does not. Written with enough specificity to handle edge cases. For a household welfare survey targeting women aged 18 to 49: what happens if the eligible woman is away during the interview window? What if she is present but ill? What if the household has multiple eligible women? The protocol should answer all of these.

3. Respondent Identification and Contact Procedures

How enumerators find and approach respondents. In rural household surveys: which household in a cluster, using which approach (random route, listing-based, GPS polygon). At what time of day should contact attempts be made? How many attempts before a household is replaced? These are not trivial decisions. Inconsistent contact procedures are one of the most common sources of undetected sampling bias.

4. Informed Consent Procedure

The exact process for obtaining consent before an interview begins. What the enumerator says, what the respondent is shown or given, how consent is recorded (signature, verbal agreement captured on audio, digital checkbox), and what happens if consent is refused. This section is particularly important for health and social research but applies to all studies involving human participants.

5. Data Collection Method and Tool

What form or instrument is used, what platform it runs on (paper, KoboToolbox, SurveyCTO, ODK), and what the enumerator does if the device fails during an interview. Where data is submitted and how often. What happens to partially completed interviews.

6. Quality Control Procedures

How data quality is monitored. Who reviews incoming data and at what intervals. What quality indicators trigger a flag (interview duration below minimum, GPS location outside the study area, logical inconsistency between responses). How errors are corrected. What the back-check procedure is and how many interviews per enumerator will be checked.

A protocol that does not cover what to do when things go wrong is a protocol for the ideal scenario, which almost never exists in fieldwork.

7. Substitution and Replacement Rules

Under what circumstances can a respondent be replaced, and with what. This section prevents enumerators from making ad hoc substitution decisions that introduce systematic bias. Rules should be specific: if the primary respondent is not available after three contact attempts, go to the designated replacement household in the sampling frame. Do not substitute with a nearby convenience household.

8. Supervisor Escalation Procedures

What situations the enumerator cannot resolve independently and must escalate to their supervisor. And what the supervisor cannot resolve independently and must escalate to the research team. Having this written down prevents important decisions being made at the wrong level of authority in the field.

Piloting the Protocol, Not Just the Instrument

Most pilot tests focus on the questionnaire. Fewer pilot the protocol. During a pilot, run through several of the substitution and edge case scenarios described in the protocol with your field team and observe whether the procedures are clear enough to apply consistently. The ambiguities that surface in a protocol pilot are far cheaper to fix than the biases they would otherwise introduce into the main dataset.