Frequently Asked Questions (FAQ)
Is IPUMS-DHS the best tool for my needs?
How do I start?
How much does it cost to use IPUMS-DHS?
When and how should I cite IPUMS-DHS?
I want the latest sample for my country. When will IPUMS-DHS make that available?
I used IPUMS-DHS to conduct research. Will you help me share my research?
Is IPUMS-DHS useful to me if I am not a registered DHS user?
Is there a preferred statistical package for using IPUMS-DHS?
Understanding the IPUMS-DHS website
What is meant by "Universe"?
What is an "extract"?
What are "integrated" variables?
Is there an IPUMS-DHS User Guide?
How long does it take to make a data extract?
IPUMS-DHS compared to other DHS data sources
How are IPUMS-DHS data different from DHS data?
How are weights in IPUMS-DHS different from the original DHS weights?
The DHS is already harmonized. What value does IPUMS-DHS add?
What's the difference between STATCompiler and IPUMS-DHS?
How do I combine my IPUMS-DHS file with original DHS data?
Where can I get the original DHS data?
I'm having trouble logging into IPUMS-DHS. What should I do?
I'm logged in, but IPUMS-DHS won't let me select certain samples. What's going on?
I can't find the variable I want.
Why can't I open my downloaded data file?
I'm having trouble using the IPUMS-DHS website. How can I get help?
I found a mistake in IPUMS-DHS or want to suggest an improvement. What should I do?
Using the variables page
Variables page menu
Variables page details
How does "sample selection" work on the IPUMS-DHS web site?
What does "Add to cart" mean?
Using the data extract system
Your data cart
Why are some variables in my data cart preselected?
What is "Type"?
Extract request page
Extract option: Describe your extract
Is IPUMS-DHS the best tool for my needs? [top]
IPUMS-DHS is the ideal tool for people who want to work with DHS microdata (that is, data on individuals) using statistical software, such as SPSS, SAS, Stata, or R, and are interested in the DHS surveys that are currently included in IPUMS-DHS.
If you are just interested in seeing DHS descriptive statistics, such as the percentage of respondents using modern family planning methods in a certain year, ICF International provides extensive summaries of the DHS data in both country Final Reports and in their STATCompiler tool. These will be the best and easiest sources for you to use for this purpose.
Use IPUMS-DHS if you want to go further, for example, to test whether differences in the use of modern family planning over time or across countries are statistically significant, assess relationships across variables through correlations or multivariable analyses, or design your own figures or charts.
Many people find it useful to combine the different available tools. See "What's the difference between STATCompiler and IPUMS-DHS?" for more information.
How do I start? [top]
Start by registering to use DHS data at The DHS Program website and reviewing the IPUMS-DHS User Guide.
Registration is simple, but it can take a couple of days for your approval to come through. If you are already registered as a DHS user, you can use your DHS Program login information (email and password) to log in to IPUMS-DHS.
You can browse the IPUMS-DHS website and see all the variable documentation without registering, but you will not be able to create your own dataset until your registration is approved.
For new users, the IPUMS-DHS User Guide provides step-by-step instructions on using the website and constructing a customized data file.
How much does it cost to use IPUMS-DHS? [top]
Nothing. The IPUMS-DHS data and system are completely free. Everything about the system is free. We are grateful to the Eunice Shriver Kennedy National Institute of Child Health and Development, USAID, ICF International, and the countries participating in the DHS surveys for providing the support and material to make this possible.
When and how should I cite IPUMS-DHS? [top]
Elizabeth Heger Boyle, Miriam King, and Matthew Sobek. IPUMS-Demographic and Health Surveys: Version 4.0 [dataset]. Minnesota Population Center and ICF International, 2017. http://doi.org/10.18128/D080.V4.1. NICHD Grant Number R01HD069471.
Please always include a citation to IPUMS-DHS in your research document; do not just say "DHS data" without acknowledging IPUMS-DHS as a source. Continued funding for IPUMS-DHS depends on our showing that the data are widely used, through researchers' citations.
If you use a single survey or just a few surveys, you may also wish to add citation information for the specific survey(s) used.
I want the latest sample for my country. When will IPUMS-DHS make that available? [top]
We get access to DHS samples at the same time as the general public. It takes some time to integrate new samples into IPUMS-DHS. Currently, we release samples within about three months after The DHS Program makes them available. We continue to introduce efficiencies, so the lag time is getting shorter all the time. Under our current grant proposal, we have promised to release all public DHS data for Africa, the Middle East, and South Asia over the next 5 years, and we are adding the latest samples for countries already in IPUMS-DHS as soon as we possibly can.
I used IPUMS-DHS to conduct research. Will you help me share my research? [top]
Absolutely! If you use IPUMS-DHS data or documentation for an article, book, thesis, class or conference paper or poster, teaching materials, or other product, please enter the information into the IPUMS Bibliography where others can find it.
Is IPUMS-DHS useful to me if I am not a registered DHS user? [top]
Yes! Some of the most important aspects of IPUMS-DHS are variable discovery and variable-specific documentation (that is, question wording, who was asked the question, etc.) All of this is available to anyone with Internet access. You can also, without logging in, limit the display to include only the samples that interest you.
IPUMS-DHS will require you to log in as an approved DHS user before you can download data.
Is there a preferred statistical package for using IPUMS-DHS? [top]
No. IPUMS-DHS supports SPSS, SAS and Stata. Users will always have access to an ASCII file for use with R (R-consistent codebooks will be available in 2018). Users may also request a comma-delimited (CSV) file to read the data into Excel.
Understanding the IPUMS-DHS website
What is meant by "Universe"? [top]
"Universe" appears in IPUMS-DHS in two ways. Most IPUMS-DHS variables have a "Not in Universe" or NIU code, which indicates which respondents were not asked that particular question. There is also a UNIVERSE tab on every variable's documentation page.
The UNIVERSE tab describes the individuals who were asked a question.
A variable's universe is affected, first, by the sample. In most DHS, respondents are all women aged 15 to 49, and they are typically asked about their children under five years old, but this is not always the case. (For example, sometimes only ever-married women are included, and sometimes women are asked about children born in the last three or four years.) Furthermore, some question modules, such as the interpersonal violence questions, are only asked of a random subsample of respondents.
The universe is affected, second, by survey skip patterns, and IPUMS-DHS will tell you this, e.g., the universe is "Women age 15-49 who have ever heard of HIV/AIDS."
The universe is sometimes affected, third, by post-survey processing of the data collected. Data processing can produce universes different from what is suggested by the survey skip patterns. For example, sometimes but not always respondents are assigned a "no" response to one question based on their response to an earlier question in the survey. (For example, only women who gave birth in a medical facility are asked whether their last delivery was Caesarian, but women who gave birth elsewhere are assigned a "no" response to the Caesarian birth variable.)
IPUMS-DHS staff empirically check the universes for all variables and all samples, so the reported universes match what appears in the data.
Researchers will make false inferences if they fail to ensure that universes are the same when comparing variable responses across DHS surveys. This is a very common mistake for new DHS users! Fortunately, IPUMS-DHS makes it easy for researchers to identify universe differences and make adjustments accordingly.
What is an "extract"? [top]
"Extract" is the IPUMS term for a tailored dataset, with samples and variables selected by the user. When accessing data at The DHS Program website, users must download files that include variables for every possible survey question (which can include upwards of 10,000 variables) and must then keep or drop variables and merge files to create the dataset they want to analyze. IPUMS-DHS eliminates the need to keep, drop, or merge. Many users find the smaller, tailored IPUMS-DHS data extracts easier to use than the larger files, particularly if they are using less expensive versions of statistical software packages.
What are "integrated" variables? [top]
Integrated variables have the same variable names and codes used in every sample. This consistency may also be true for some standard variables in the original DHS files, but it is not true for a) standard variables with country-specific responses and b) country-specific, non-standard variables.
Is there an IPUMS-DHS User Guide? [top]
Yes. You can access it here.
How long does it take to make a data extract? [top]
The time needed to make an extract differs depending on the number and size of samples requested and the load on the server. Extracts can take from a few minutes to an hour or more. The system sends an email when the extract is completed, so there is no need to stay active on the IPUMS-DHS site while the extract is being made.
IPUMS-DHS compared to other DHS data sources
How are IPUMS-DHS data different from DHS data? [top]
The source material for IPUMS-DHS is the original DHS files, but in some ways the integrated variables in IPUMS-DHS look quite a bit different.
- We've added many useful variables, such as harmonized geography for each country (with identified regions making the same geographic footprint in each sample) and variables that bridge different variable names and ways of asking a question (for example, about literacy) across all years of the DHS. We've also added variables about the environmental and social context for samples with GPS cluster information, using data sources from outside the DHS.
- Weights downloaded from IPUMS-DHS do not need to be transformed (divided by one million); they can be applied directly to the data.
- In the variable harmonization process, we often create IPUMS-specific response codes for variables. The DHS survey responses are never changedonly the numbers assigned to them. We use new numbers for codes to impose consistency across all samples, for the variables with country-specific responses (such as the variable on the respondent's religion). Sometimes we change the codes to accommodate additional detail that is only included in some samples. Because of these coding differences, we recommend researchers compare across IPUMS-DHS samples rather than across IPUMS-DHS and original DHS files.
- We have assigned IPUMS-DHS variable names, and these names are displayed by default. For example, the IPUMS-DHS variable for whether a woman is working is called "currwork" in IPUMS-DHS; it is called V705 in the original DHS data files. Users have the option of using original DHS variable names by clicking a button at the top of the variable search page or any variable list page. Users can search on either IPUMS-DHS names or the original DHS variable names. For variables with consistent/standardized names in the DHS files, the DHS variable name is also shown in parentheses (e.g., CURRWORK (V705)) in the variable documentation.
- Most, but not all, of the DHS data are currently in IPUMS-DHS. It's our goal to include all DHS data within IPUMS, but we haven't met that goal yet. If there is a variable you desperately need that's not in IPUMS-DHS yet, let us know, and we will try to move it up on our priority list.
How are weights in IPUMS-DHS different from the original DHS weights? [top]
In the original DHS data files, weights had to be divided by one million before they could be applied. This is not necessary with IPUMS-DHS. We have already done the transformation for you, so weights can be applied directly to the data without transformation.
The DHS is already harmonized. What value does IPUMS-DHS add? [top]
IPUMS-DHS makes the use of DHS data more efficient and simple.
With IPUMS-DHS you can:
- Easily determine what questions were asked in each DHS survey.
- See the specific wording of the survey question, translated into English, associated with every variable in every survey.
- Learn the universe (who was asked the question) and see a discussion of possible comparability problems for each variable, with just a click.
- Immediately access information on how complex variables (like "unmet need") were constructed, and follow hyperlinks to the variables used to construct them.
- Download a single, fully integrated and harmonized dataset drawn from multiple surveys with a few simple clicks.
- Add a new sample or variable to your dataset just as easily.
- Beyond integrating across samples, IPUMS-DHS also makes it unnecessary to merge separate files related to any single survey. IPUMS-DHS staff have merged the DHS data across DHS files, by, for example, including household records with women's records and mothers' and households' record with the records of children and births. Based on questions in the DHS User Forum, merging files has been one of the biggest challenges for users of DHS data, so IPUMS-DHS removed the need for such work by researchers.
- Compare information on fully-harmonized subnational regions (with each region keeping the same geographic footprint across samples) over time.
- Link DHS data to census data in IPUMS-International through a geographic linking key.
These are just a few of the many advantages of using IPUMS-DHS.
What's the difference between STATCompiler and IPUMS-DHS? [top]
Both tools allow researchers to make comparisons across samples, but otherwise they serve very different functions. STATCompiler provides an overview of DHS data through aggregate, or summary, statistics. IPUMS-DHS provides users with the micro-data, that is, information from each individual survey respondent. The point of IPUMS-DHS is to facilitate multivariable analyses, particularly across time and countries, with DHS data.
IPUMS-DHS also provides detailed variable documentation, which can be a great complement to STATCompiler. By consulting the IPUMS-DHS documentation about variables used to construct measures included in STATCompiler, users can see how question wording and universes changed over time.
For example, if you were interested in seeing how attitudes toward HIV had changed in Mali over time, STATCompiler could tell you that the percentage of respondents who thought a female teacher who is HIV positive should be allowed to continue teaching rose from 44.7% in 2001 to 46.5% in 2006 and 64.7% in 2012-13. Perhaps you want to know if the wording of the question or the respondents answering the question changed over time. Without downloading any data, the IPUMS-DHS website can show you that in 2001, the STATCompiler percentage is based on the question:
But in 2006 and 2012, the percentages are based on the question:
After consulting IPUMS-DHS, you know to be careful about comparing the 2001 statistic to the other two years, because gender changed in the wording of the question.
Once you've downloaded IPUMS-DHS data, replicating the aggregate numbers in STATCompiler is a great way to make sure you are using the data correctly.
How do I combine my IPUMS-DHS file with original DHS data? [top]
See our User Note, "Linking IPUMS-DHS Data to DHS Files."
Where can I get the original DHS data? [top]
The source materials for IPUMS-DHS are the DHS household (HR) files, women's individual recode (IR) files, child recode (KR) files, and birth recode (BR) files distributed through The DHS Program. Researchers go through the same process to apply for access to the original files or the IPUMS-DHS version of the data. As noted above, the DHS data files include some variables not included in IPUMS-DHS. DHS variables may differ in their coding schemes from the IPUMS-DHS version of the same variable. To apply for access to the original DHS files, go here.
I'm having trouble logging into IPUMS-DHS. What should I do? [top]
The DHS Program controls access to IPUMS-DHS data files. You must register with The DHS Program, indicating the country samples you wish to use. Once your registration is approved, you can use your DHS Program login information to log in to IPUMS-DHS.
If registering for DHS access does not solve the problem for you, please contact us so that we can help you gain access to the data. For individualized help, send an email message describing your problem in detail to email@example.com, and someone will get back to you soon.
By the way, your IPUMS Username and Password will not work for IPUMS-DHS; IPUMS-DHS is on a different registration system than the other IPUMS projects (such as IPUMS-International).
I'm logged in, but IPUMS-DHS won't let me select certain samples. What's going on? [top]
Once you are logged in, you can only add samples to your Data Cart if you have registered to use them through The DHS Program website. The DHS Program provides access on a country-by-country basis. If you want access to additional samples, you will need to change your registration with The DHS Program, specifying which additional countries you need to analyze. If you just want to browse documentation but not actually download data, simply log out.
I can't find the variable I want. [top]
IPUMS-DHS gives you the ability to browse variables by topic or to find a variable by searching for a keyword (e.g., unmet). Both options are part of the SELECT VARIABLES box on the top left of the variable selection page.
If you cannot find the variable you're looking for using either of those options, it may not yet be in IPUMS-DHS. IPUMS-DHS includes over 6000 DHS variables, but there are still a few variables from the original files that have not yet been included. Post a note to the IPUMS User Forum, and we can let you know if that is the case.
Why can't I open my downloaded data file? [top]
The data produced by the extract system are gzipped (the file has a .gz extension). You must use a data compression utility to decompress the file before you can analyze it.
Detailed instructions for the downloading and reading the data are available here.
I'm having trouble using the IPUMS-DHS website. How can I get help? [top]
You might want to start with the User Guide. The HELP link at the top of the IPUMS-DHS home page will also bring you to a number of resources. Online video tutorials explain what IPUMS-DHS is, how the online documentation works, and how to create customized data files for analysis. User notes provide further details on how to download and unzip your customized data file and cover other topics, such as recommended child vaccination schedules. Training exercises give you practice in using the website and analyzing data. And the IPUMS User Forum allows you to post a question that will be answered by IPUMS staff or experienced users. If you need individualized help beyond these resources, send an email to firstname.lastname@example.org, with a full description of your problem.
I found a mistake in IPUMS-DHS or want to suggest an improvement. What should I do? [top]
Please contact us right away! We make every effort to provide accurate information, but like everyone else, we make mistakes. And we love to get new ideas about how to make our website and data more useful. Please share your ideas for improvements or information about mistakes by emailing email@example.com, and that information will be passed along to the IPUMS-DHS team.
Using the variables page
Variables page menu [top]
Use the left side of the menu to browse variables:
Topics: person variables by group
A-Z: integrated variables by first letter of the IPUMS-DHS variable name
Search: display only variables that contain specified text in particular fields
Use the buttons and links on the right side of the menu to:
Select Samples: limit the display of variable information to selected samples
View . . . Variables: toggle between displaying IPUMS-DHS alphabetic variable names or the original DHS variable names (when the latter are standard across surveys)
Options: alter how the variable list is displayed or get help for this page
Variables page details [top]
The variables page allows you to browse integrated variables while limiting and controlling how the information is displayed.
The left side of the menu is for browsing the variables. The radio button on the right switches the variable menu between showing IPUMS-DHS variable names and the original DHS variable names (when standard).
When you "Select Samples," you limit the variable list to display only variables that are available in at least one of those samples. The effect of selecting samples also extends to all the variable descriptions and codes pages you can access through the variable system. Only information relevant to your selected samples will be displayed in any context while you browse the variables. You can change your sample selections at any point.
Selecting samples is a good practice when exploring IPUMS-DHS, because the amount of information can be unwieldy. Selecting samples also makes sense if you know you are only interested in a specific country or countries. On the other hand, sometimes you need to see everything to determine what kinds of research are possible using the database.
"Search" lets you specify search terms for specific fields of variable metadata. The system will return a list of variables that include any of the search terms you indicate.
The final choices are "Options" and "Help." The "Options" item brings up a screen that offers a number of choices regarding the display of the variable list. Each selection has a default choice.
Use short country codes / Use long country codes
Switch between the 2-letter country abbreviations and longer abbreviations. The long codes are the default. The IPUMS-DHS list of country codes is not the same as the DHS list of countries sampled.
View one group / View all groups together
Switch between viewing one variable group at a time and viewing all variable groups on one screen. Unless you have a limited number of samples selected, your browser may be slow to display all groups. The default view is one group at a time.
Show availability detail / Show availability summary
Switch between displaying the full sample-specific availability matrix, and a view that only displays the total number of samples that contain each variable. Both views only display or sum the samples that the user has selected in "Select samples." The default view is the detailed availability information.
Samples are displayed oldest to newest / Samples . . . newest to oldest
Display the samples columns indicating variable availability in chronological order or reverse chronological order. The default is oldest to newest.
The Variable List
As you browse the variables, they are displayed in a list containing a number of columns. The variable name links to the variable description, which will include detailed comparability discussions, universes, and survey text. The variable codes -- and their associated labels -- can be accessed directly using the "codes" links.
By default, all samples are selected for display. The country abbreviation and the sample year identify each sample at the top of every column. Hover over the country code with the mouse to see the full country name. If a variable is available in a given sample, an "X" is printed in that column.
In the column labeled "Add to cart," each variable has a purple circle with a "+" on the far left. Click these circles to add them to your data cart. Once you have clicked them, these icons change to a checked box, indicating that the variable is in your data cart. To remove the variable from your data cart, simply click the checkbox. Note: You will only be allowed to add variables to your data cart for countries whose data you have been approved to download.
How does "sample selection" work on the IPUMS-DHS web site? [top]
When a user first enters the variable documentation system, all samples are selected by default. Every variable in the system will display on all relevant screens.
Users can filter the information displayed by selecting only the samples of interest to them. Only the variables available in one of the selected samples will appear in the variable lists. The variable descriptions and codes pages will also be filtered to display only the text and columns corresponding to the selected samples. Sample selections can be altered at any time in your session. Selections do not persist beyond the current session.
If you have been previously approved to download DHS data from a country included in IPUMS-DHS, and you log into the website using your usual DHS e-mail and password, samples for which you are pre-approved to download data show up in green text. Other samples appear in grey text after login, and to download data from those "greyed out" samples, you must apply for and receive access through The DHS Project.
When a user enters the extract system after selecting samples, those selections are carried into the data extract system.
What does "Add to cart" mean? [top]
While browsing variables in the documentation system, you can select them to include in a data extract, sending them to your data cart (assuming you have been approved to download data from the sample(s) in question). You can deselect a variable by unchecking its box in the data cart. After you proceed to "create data extract," you can return to the variable list to make more selections.
Using the data extract system
Your data cart [top]
You cannot create data from the extract system unless you are a registered user approved to download a given sample (or samples). If you are not registered, you must apply for access.
At the top right corner of the variables page is a summary of your data cart. This box displays the number of variables and samples you have selected. Clicking the purple circle next to a variable places it in your data cart. You can view your data cart at any time by clicking "View Cart." The "View Cart" link only becomes operative when you have selected a variable or sample.
The data cart lists the variables preselected by the extract system as well as any variables you selected while browsing the documentation. As with the variable selection page, you can remove variables from your extract in this step by clicking the checkbox next to the variable in the "Add to cart" column. If you chose a variable but subsequently altered your sample selections in such a way that the variable is no longer available, it is indicated by an "i" icon.
The data cart also includes links to codes pages and sample availability for the variables in your cart.
Buttons are provided to return to the variable list to make more selections or to alter your sample choices. If you return to the variable list, click on "View Cart" again to return to the data cart.
When you are satisfied with your data selections, click "Create data extract" to finalize your extract request.
Why are some variables in my data cart preselected? [top]
Certain variables appear in your data cart even if you did not select them, and they are not included in the constantly updated count of variables in your data cart.
Unless you are absolutely certain you will not need one of these variables, we recommend that you not remove them from your data cart.
What is "Type"? [top]
The "Type" column on the variables selection pages and in your data cart indicates the record type of the variable. All records in IPUMS-DHS are person records, so a "P" will always be shown.
Extract request page [top]
When you click "Create data extract" in the Data Cart, you come to the Extract Request page. If you wish, you can simply hit the "Submit" button and create your data extract.
The page summarizes your data extract and provides options for modifying it. A link at the top expands to show the samples you selected. Click the appropriate links to go back to the variable browsing and sample selection pages to alter your choices. You return to the extract request page via the data cart, where you can review the availability matrix for selections and easily drop variables by unchecking them.
When you submit an extract, there will be a delay ranging from minutes to hours, depending on the size of the job. You do not need to wait on our site for the job to be completed. The system will send you an email when your extract is ready.
The definitions of every extract will remain on the server indefinitely, but the data files are subject to deletion after three days. However, the screen where you download extracts has a feature that lets you revise old extracts. When you click on "revise," all your selections for that extract will be loaded into the system, after which you can edit or regenerate it. Note, however, that each successive data release can create difficulties for recreating old extracts, because codes might change.
Extract option: Describe your extract [top]
You can describe your extract for future reference. The system will display the description on the page where you download your data extract.