The Problem;

Assuming you have a data frame that has data collected as baseline, or have already started data collection and you change your data collection too - by adding new variables, for instance var100,var202 & var347, the following will happen;

The already collected data, lets call it dfA will have blanks on the newly added variables once merged with the newly collected data. When running you analysis, the outputs for var100,var202 & var347, variables will be NULL, thus give out the wrong numbers :(.

The Solution;

Based on the method you are using to collect your data, you can either;

  • Create a spreadsheet file with the variables that are missing data from dfA, or
  • Create a short survey that the team will use to go back to the field and fill in.

Whichever option you use, make sure that you extract all relevant metadata from dfA that will be used as the key identifier and put it in the above created survey, lets call it dfB.

Once dfB has been filled in, you can now use the power of R to fill in the blanks in dfA using the data collected in dfB.

#The required libraries
require(data.table)

#Load the libraries
library(data.table)  

We wiil now use join from data.table. Convert the “data.frame” to “data.table” (setDT(dfA), join on with dfA using “key” and assign (:=), the values in var100,var202 and var347 with i.var100, i.var202 and i.var347.

Just run the following code

setDT(dfA)[dfB, c('var100', 'var202','var347') := .(i.var100, i.var202,i.var347), on = "key"]

If you now View(dfA), the data will be updated.