Following other ANH researchers to find datasets of interest

Follow others’ research to generate questions and ideas

We usually think of starting a research project analysing secondary data with a dataset. However, it can also be useful to start at the other end of the cycle – communicating results. Following blogs related to agriculture, food systems, nutrition, and health is a great way to stay up to date on research and perspectives in the field. Look in the SCANR Research Guidance Communicating results section for a list of relevant blogs. One of my favorites is the IFPRI Blog, which includes blogs focused on research and issues in research and development from researchers and partners at the International Food Policy Research Institute.

The impacts of the COVID-19 pandemic on food systems have been a hot topic for researchers in 2020 and 2021. On the IFPRI Blog, I found a piece by Seneshaw Tamru, Kalle Hirvonen, and Bart Minten entitled “Impacts of the COVID-19 crisis on vegetable value chains in Ethiopia” – part of an IFPRI blog series on the impacts of the COVID-19 pandemic on food and nutrition security, poverty, and development. The authors describe results from a recent survey assessing the effects of the pandemic on vegetable value chains in Ethiopia.

Find the associated papers and datasets

The blog mentions that their findings are based on phone surveys conducted in March and April 2020 with key stakeholders along the vegetable value chain in Ethiopia. However, the authors “intend to substantiate these findings using more representative surveys in the near future.” So, has this research team published papers and datasets related to their research? First, I look for other blogs about COVID-19 and Ethiopia on the IFPRI Blog. This research team has been busy – there are blogs about trends in food consumption, dairy value chains, and food and nutrition insecurity in Ethiopia during COVID-19.

Next, I look for publicly available datasets from this research project. On the SCANR Research Guidance Data repositories section, you will find a list of sources of publicly available data related to agriculture, food systems, nutrition, and health. IFPRI researchers often share their data through the IFPRI Dataverse. A quick scan of recently posted datasets turns up the “Vegetable Value Chain Survey in Ethiopia: Producer/Household Survey.”

SCANR use case 1 metadata — Source: https://doi.org/10.7910/DVN/Q55PL6

A dataset in a publicly available data repository is usually accompanied by metadata, or information that describes the data and helps future users understand how it was collected, who collected it, what it contains, and how it can be referenced. Look in the SCANR Research Guidance Cleaning data section for a discussion of metadata. In this case, the metadata includes a link to a related publication, a working paper entitled “Emerging Medium-Scale Tenant Farming, Gig Economies, and the COVID-19 Disruption.”

Download and explore the data

In the IFPRI Dataverse entry for this dataset, there is citation information for the datatset and other metadata, as well as links to download the microdata files associated with the dataset. The first two files associated with this dataset are the questionnaire and codebook. Download these first so that you can get to know the dataset. By looking through the questionnaire, I see that this is a household survey conducted in January and February 2020. It contains modules about the household roster, land ownership and cultivation, land tenure, crop production, weeding, harvesting, labor, sales, and more. The codebook translates this questionnaire into variables that you will find in the dataset. Use the codebook to find the variable names, labels, and types. Look in the SCANR Research Guidance Cleaning data section for a discussion of codebooks.

SCANR use case 1 microdata — Source: https://doi.org/10.7910/DVN/Q55PL6

Using the codebook and questionnaire, you can find the survey items and variables used to generate findings reported in the discussion paper. For example, Figure 4.2 compares tomato and onion yields for medium-scale and smallholder farmers. This likely comes from Part K, question 17 of the questionnaire: “What are the average yields of these different vegetables by you?” In the codebook, we can see that these survey items correspond to the variables ska1_6a through ska1_6e.

Figure 4.2 — Source: https://doi.org/10.2499/p15738coll2.133909

Download the corresponding microdata files to explore the data further or answer research questions of your own. On the SCANR Research Guidance Using data software page, you will find resources for learning to use common statistical software programs. The data files for this dataset are shared in DTA format, the data file format used by Stata. However, this file format can also be imported into other programs.

Keep following the conversation

This publicly available dataset is the first round of data collection for the associated discussion paper, collected in February 2020. Since we can see that this research team has generated multiple blogs and research papers in since early 2020 related to COVID-19, value chains, and food consumption in Ethiopia, we can expect that they will publish more research and datasets on these topics in the coming months. Keep an eye out for new blogs, papers, and datasets that will help you understand what has been happening in Ethiopia and help spark new research ideas!

This use case was prepared by Elena M. Martinez, PhD Candidate at the Friedman School of Nutrition Science and Policy at Tufts University. January 25, 2021.