Overview of IPUMS-DHS Contextual Variables

Contextual variables describe features of the physical and social environment of a small geographic area (5-10 kilometer radius) surrounding the location where a DHS respondent was interviewed. Contextual variables in IPUMS-DHS eliminate the need to create linking keys and merge data files. The contextual variables encompass:

The original DHS files include some variables that are implicitly contextual, such as the classification of the respondents' residence as urban versus rural. The IPUMS-DHS contextual variables allow researchers to study how a wider range of surrounding characteristics may influence health and well-being. For example, researchers have found that exposure to unusually hot days is correlated with higher rates of heart attack and low birthweights, while unusually high or low rainfall influences outmigration. Certain types of physical environments, livelihoods, and staple crops are more vulnerable to weather extremes and global warming, with implications for health and well-being.

Click HERE for a list of all IPUMS-DHS contextual variables.

Two Ways to Access Contextual Variables

Researchers may include contextual variables as part of their customized data file (extract), treating these variables as characteristics of the respondents' environment (just like urban or rural residence).

For samples not yet included in IPUMS-DHS, researchers may download a flat file containing environmental and contextual variables and link to respondents in an original DHS data file, on the basis of sample and cluster number. A list of samples with data on contextual variables is available HERE.

Additional contextual variables that link GPS cluster data to ancillary data are available from The DHS Program's Geospatial Covariates page.

Contextual variable computation in IPUMS-DHS

The statistics for all contextual variables are computed within a 5- to 10-kilometer buffer around DHS clusters. The main reason for using a buffer is to minimize the effects of DHS cluster displacement. It is also best to calculate environmental statistics by considering the surrounding area for individuals or communities, instead of using the value at the single point location. The size of the buffer varies across variables. A 5-kilometer buffer was used for ecoregion, livelihood zone, population density, and soil data; all other variables used a 10-kilometer buffer. All buffer sizes for a variable were consistent across all clusters - regardless of whether urban or rural - to make the data consistent and comparable across individuals.

The creation of most IPUMS-DHS contextual variables is based on a common general methodology using Esri's ArcGIS software suite. Source data were acquired as raster or vector files. Source raster files with a resolution greater than 500 meters and vector files were converted to raster files with a resolution of 500 meters. The Focal Statistics tool was then used to update each pixel value within a 5- or 10-kilometer circular buffer around the sample cluster GPS location. For qualitative variables such as soil type, livelihood zone, or ecoregion, the predominant value within a 5-kilometer buffer was used to update each pixel value. For quantitative variables such as temperature, precipitation, population density, or malaria incidence, the mean, maximum, or sum was computed for all pixels within a 5- or 10-kilometer buffer. The staff then used the Extract Values to Points tool to assign the value of the intersecting pixel to each cluster location.

A different methodology was used for the conflict variables (battles, riots, and violence against civilians). The conflict data report the annual counts (i.e., number of days) of incidents for a given latitude/longitude coordinate. IPUMS-DHS staff converted the coordinates to a point layer, created a 10-kilometer buffer around each DHS sample cluster location, and then counted the number of conflict events falling within the buffer. If the buffer crossed an international boundary, only events occurring within the same country as the DHS sample were included in the count.

Flat files in .csv format are available for all contextual variables for all DHS samples with GPS data available before July 2018, including samples not yet available in IPUMS-DHS.

A note about the GPS cluster datasets

The Demographic and Health Surveys (DHS) Program provides GPS coordinates for clusters. Clusters are groupings of households that participated in the survey. The GPS readings are highly accurate, but are displaced to ensure respondent confidentiality. Displacement ranges from 0 to 2 kilometers in urban areas to 0 to 5 kilometers for rural areas, with a further 1% of rural clusters displaced up to 10 kilometers. Clusters are not displaced across survey regions or national boundaries. The contextual variables calculated by IPUMS-DHS average the values within the radius of displacement. For details, please check the documentation on cluster displacement. Users interested in performing spatial analysis with GPS cluster datasets may obtain DHS cluster shapefiles from The DHS Program website.

Source data citations:


Olson, D. M., Dinerstein, E., Wikramanayake, E. D., Burgess, N. D., Powell, G. V. N., Underwood, E. C., D'Amico, J. A., Itoua, I., Strand, H. E., Morrison, J. C., Loucks, C. J., Allnutt, T. F., Ricketts, T. H., Kura, Y., Lamoreux, J. F., Wettengel, W. W., Hedao, P., Kassem, K. R. 2001. Terrestrial ecoregions of the world: a new map of life on Earth. Bioscience 51(11):933-938.

Soil Type

Hengl, T., Mendes de Jesus, J., Heuvelink, G. B.M., Ruiperez Gonzalez, M., Kilibarda, M. et al. (2017) SoilGrids250m: global gridded soil information based on Machine Learning. PLoS ONE 12(2): e0169748.doi:10.1371/journal.pone.0169748.

Normalized Vegetation Index (NDVI)

K. Didan. (2015). MOD13Q1 MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V006. NASA EOSDIS Land Processes DAAC. https://doi.org/10.5067/modis/mod13q1.006


Funk, Chris, Pete Peterson, Martin Landsfeld, Diego Pedreros, James Verdin, Shraddhanand Shukla, Gregory Husak, James Rowland, Laura Harrison, Andrew Hoell & Joel Michaelsen. "The climate hazards infrared precipitation with station - a new environmental record for monitoring extremes". Scientific Data 2, 150066. doi:10.1038/sdata.2015.66 2015.


Sheffield, J., G. Goteti, and E. F. Wood, 2006: Development of a 50-yr high-resolution global dataset of meteorological forcings for land surface modeling, J. Climate, 19 (13), 3088-3111. http://hydrology.princeton.edu/home.php http://hydrology.princeton.edu/home.php

Livelihood Zone


Population Density

Center for International Earth Science Information Network - CIESIN - Columbia University. 2016. Documentation for the Gridded Population of the World, Version 4(GPWv4). Palisades NY: NASA Socioeconomic Data and Applications Center (SEDAC). http://dx.doi.org/10.7927/H4D50JX4


Weiss DJ, Lucas TCD, Nguyen M, et al. Mapping the global prevalence, incidence, and mortality of Plasmodium falciparum, 2000 - 2019: a spatial and temporal modelling study. Lancet 2019; published online June 19. DOI: 10.1016/S0140-6736(19)31097-9

Conflicts (Battles, Riots, Violence)

Raleigh, Clionadh, Andrew Linke, Havard Hegre, and Joakim Karlsen. (2010). "Introducing ACLED-Armed Conflict Location and Event Data." Journal of Peace Research 47(5) 651- 660

Cropland & Pastureland

Ramankutty, N., A.T. Evan, C. Monfreda, and J.A. Foley (2008), Farming the planet: 1. Geographic distribution of global agricultural lands in the year 2000. Global Biogeochemical Cycles 22, GB1003, doi:10.1029/2007GB002952.

Crops Harvested Area & Yield

Monfreda, C., N. Ramankutty, and J.A. Foley (2008). Farming the planet. Part 2: Geographic distribution of crop areas, yields, physiological types, and net primary production in the year 2000. Global Biogeochemical Cycles 22, GB1022, doi:10.1029/2007GB002947.

Back to Top