Research Guidance

Many organizations share publicly-available data related to agriculture, food systems, nutrition, and health that researchers and policymakers can use to learn more about the world. How can we harness these datasets to answer research and policy questions? In this section, you will find resources and guidance for every stage in the research process.

Documents
What to look for

 

210122 SCANR landing page graphic More than ever before, there is a vast amount of data available for public use online. With the right tools and training, we can use these data to answer research questions within and across disciplines. For researchers interested in agriculture, food systems, nutrition, and health, there are a wide number of data repositories, government and organizational websites, and more that offer free access to datasets to help us investigate questions in our field.

 

In the SCANR Research Guidance sections below, you will find resources and guidance for every stage in the process of finding and utilizing secondary data for your research, from finding the right dataset to sharing your results.

 

 

SCANR tips about publicly-available data

 

  • What is FAIR data?
    • FAIR data is findable, accessible, interoperable, and reusable. FAIR principles are about having future users in mind when you create and share data.
    • GARDIAN metrics for FAIR data: This is a set of metrics for determining whether data is FAIR developed by the GARDIAN initiative.
    • “What scientists need to know about FAIR data”: This article from Nature Index discusses and explains FAIR data principles.
  • What is interoperability?
    • Interoperability is the ability to merge datasets for new applications without losing their meaning. Interoperable data can be re-used and re-analyzed for multiple applications. Though not all datasets can be combined, interoperability can us to use existing and new data for more research and other applications.
    • Data interoperability: A practitioner’s guide to joining up data in the development sector: A guide to data interoperability created by the UN Statistics Division, the Global Partnership for Sustainable Development Data, and the Collaborative on SDG Data Interoperability.

 

Types of data

 

In the Data repositories section, you will find links to a wide variety of publicly available data sources related to agriculture, food systems, nutrition, and health. When choosing the right dataset for your interests and research questions, you first need to understand the attributes of the data that are available.

 

  • Unit of measurement: Some datasets provide information about respondents at the individual or household level. For example, the Living Standards Measurement Study (LSMS) conducts household surveys in multiple countries, so each observation from these datasets represents a household or individual within the household. Other datasets provide national averages. For example, the World Bank World Development Indicators are a collection of country-level indicators related to development. Each observation is a national average of a development indicator, such as average cereal yield in kilograms per hectare in a given year.
  • Data collection method: Many of the datasets referenced in the Data repositories section provide data observed through surveys, environmental or biometric samples, or geospatial data collection. These data are from either observational studies – in which researchers observe subjects without assigning any treatment – or experimental studies – in which researchers assign subjects to experimental conditions and then observe outcomes. Some datasets provide data generated through modeling studies – in which researchers combine observed data with a set of assumptions to generate modeled estimates. For example, the data provided in the Global Dietary Database (GDD) are modeled estimates of global dietary intake.
  • Quantitative versus qualitative data: Quantitative data is data that is collected by measuring things can be expressed using numbers. Qualitative data is collected by observing and interviewing subjects, and often answers of how and why that are difficult to answer using quantitative data. Most of the datasets referenced in the Data repositories section provide quantitative data, though some repositories such as the Harvard Dataverse do contain some qualitative datasets. Qualitative data is less commonly found in publicly accessible repositories because it is more difficult to anonymize data to protect the privacy of respondents.
Data repositories

Many organizations share publicly available data. Below you will find a list of data repositories that contain downloadable datasets related to agriculture, food systems, nutrition, and health, as well as tips for downloading data and assessing data quality. 

 

Data repositories

 

Importing data
 

Importing data into a statistical software program can be challenging. Below are resources to help you learn to import data into common statistical softwares. For general guidance on using data software, look in the Using data software section.

 

Identifying methods, metrics, and tools

 

The relationships between agriculture, food systems, nutrition, and health are complex, dynamic, and difficult to measure. Researchers in this field have developed a wide variety of innovative methods, metrics, and tools that researchers can use. The following resources provide syntheses and examples of available methods, metrics, and tools.

 

  • Data4Diets: A platform that helps researchers and others to identify and understand diet-related food security indicators. 
  • How to find measures and tests: A guide for finding existing measures in social science from existing reference databases.
  • Food Systems Dashboard: An online tools that displays over 150 indicators related to food systems from multiple data sources. Researchers and policymakers identify and visualise relevant indicators and download the related data. 
  • IMMANA Evidence and Gap Map (EGM): An online portal that summarises research innovations over the past decade in methods, metrics, and tools related to food systems and agriculture-nutrition linkages. The interactive map and data portal allow users to explore the areas where methods, metrics, and tools do and do not exist. The creation of and findings from the EGM are summarized in a paper by Thalia Sparling et al. in Advances in Nutrition.
Incorporating cross-cutting themes

 

Cross-cutting themes are topics that play a role in research on a variety of topics and disciplines. Cross-cutting themes that often play a role in research on agriculture, food systems, nutrition, and health are gender, equity, and climate change, among others.

 

 

Gender

 

 

Climate change

 

Using data software
 

There are many programs that can be used to analyse secondary data. These programs allow users to view and prepare data, perform analyses, create visualisations, and more. This page provides resources for learning to use several commonly used statistical software programs, as well as resources for spatial data analysis and database management.

 

Most researchers learn to use one or more data software programs and mastering one of these programs requires considerable time and effort. When choosing which program to learn, you will want to consider your financial resources and institutional affiliations; the programs commonly used in your field; the programs used by your research collaborators; and your own research goals. Some of these programs are proprietary so you will need to pay for access or gain access through your institution. Excel, SPSS, SAS, Stata, and MPlus are all proprietary programs (though Google Sheets is a free program that has some of the functionality of Excel). R and Python are free, open-source programs. If you are interested in analysis of geospatial data, you may want to learn to use ArcGIS or an open-source counterpart like QGIS. If you are managing a large amount of data, you may want to learn to use SQL and a relational database management system.

 

Statistical software

 

  • Excel
    • Coursera courses on Excel: Coursera offers a variety of courses in Excel on topics such as basics, data visualization, and data analysis.
    • edX courses on Excel: edX offers a variety of courses in Excel, including a course sequence on Excel and business analytics.
  • R
    • The Comprehensive R Archive Network (CRAN): R is an open-source statistical software that can be downloaded for free here.
    • Analyze Survey Data: A step-by-step guide for analyzing publicly available survey data using R.
    • Coursera courses on R: Coursera offers a variety of courses in R on topics such as data science, statistics, and programming.
    • CRAN Task Views: Guidance on which R packages can be used for tasks related to certain topics, such as clinical trials, econometric, experimental design, graphics, optimization, and more.
    • DataLab at Tufts Workshops: The Tufts DataLab website features recordings of workshops on R, including A Gentle Intro to R and Intermediate Statistics In R.
    • edX courses on R: edX offers a variety of courses in R, including course sequences on data science and statistics.
    • Quick-R: A website designed to teach experienced users of other statistical programming packages to use R.
    • R-bloggers: A blog aggregator that combines blog feeds from a set of participating R blogs. Users can view blogs on the website, receive daily digests, connect with other R users on their Facebook page, and contribute blogs about R. 
    • R Bootcamp: A set of R tutorials that each come with slides, handouts, and R codes.
    • R for Data Science: A free online book that teaches best practices for using R, including data visualization, workflow, data transformation, exploratory data analysis, and more.
    • RStudio Cheatsheets: A set of references for learning and using R packages. Each cheat sheet addresses a specific topic related to R, such as data import, data transformation, and data visualization. Look at the bottom of the page for translations to Chinese, Dutch, French, German, Greek, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Turkish, Ukrainian, Uzbek, and Vietnamese.
    • UCLA Institute for Digital Research & Education Statistical Consulting: A website that helps you to learn and use R through tutorials, class notes, data analysis examples, and annotated outputs.
  • MPlus
  • Python
    • Anaconda: The Python Anaconda Distribution, recommended for those using Python for data science, can be downloaded here. Anaconda is a data science toolkit that includes Python, Project Jupyter, a graphical user interface, and a package manager. The Anaconda Individual Edition for solo practitioners, students, and researchers is free. The website includes a user guide that helps users get started and use each part of the toolkit.
    • A Byte of Python: A free online book that provides a tutorial for Python beginners.
    • A Whirlwind Tour of Python: A free online book that introduces users to the Python language that is written for users with some programming experience.
    • DataLab at Tufts Workshops: The Tufts DataLab website features recordings of workshops on Python, including Programming in Python.
    • edX courses on Python: edX offers a variety of courses in Python, including an course sequence on Python for data science.
    • Google’s Python Class: A set of resources, videos, and coding exercises designed to introduce Python to users with some coding experience.
    • Kaggle Courses: A set of self-guided courses on using Python for data science.
    • Programming with Python: An introduction to Python that focuses on data analysis.
    • The Python Tutorial: An official tutorial from Python the reviews the built-in capabilities of Python. 
  • SAS
  • SPSS
  • Stata

 

Software for spatial data analysis

 

Spatial data can be analysed using programs such as ArcGIS, a proprietary geographic information system for working with maps and other spatial data, and QGIS, a free, open-source platform for analysing geospatial data.

 

  • DataLab at Tufts Workshops: The Tufts DataLab website features recordings of workshops on GIS, including Intro to GIS using ArcMap, Intro to GIS using QGIS, and Intro to Web Mapping using ArcGIS Online.
  • QGIS Tutorials and Tips: A set of tutorials for basic, intermediate, and advanced operations in QGIS and integrating Python and QGIS.
  • QGIS User Guide: Comprehensive documentation, user guides, and training manuals for using QGIS, a free, open-source platform for analyzing geospatial data.

 

Software for storing and managing data

 

A database is a dataset or multiple datasets stored on a computer. A relational database is a database that stores data in tables. Relational database management systems can be used to manage large, complex data tables. If you are managing a large amount of data, you may want to use a relational database management system to manage and query your data. Structured query language (SQL) is a programming language used in many relational database management systems.

 

Relational database management systems
  • MySQL: An open-source SQL database owned by Oracle.
  • PostgreSQL: An open-source SQL database developed by the University of California at Berkeley.
  • Oracle DB: A proprietary SQL database owned by Oracle.
  • SQL Server: A proprietary SQL database owned by Microsoft. There is a free entry-level version available.
  • SQLite: An open-source SQL database that can store data locally.

 

Resources for using SQL and relational database management systems
  • Intro to Relational Databases: A free online course that provides an introduction to relational databases using Python and SQL.
  • Kaggle Courses: This set of self-guided courses on using Python includes a mini-courses on basic and advanced SQL.
  • Learn SQL on Code Academy: A free online course on manipulating, querying, and aggregating data in SQL.
  • SQLBolt: A set of interactive lessons to help users learn to query data using SQL.
  • SQL Zoo: A set of exercises to help users learn to use SQL.

 

Cleaning data

 

Once you have found and imported your dataset, you will need to prepare it for analysis. This page contains resources and tips for using statistical software to clean secondary data.

 

Resources for data cleaning and management

 

  • Excel
  • Python
  • SPSS
    • SPSS Online Training Workshop: This set of tutorials, videos, and datasets includes tutorials on defining and modifying variables, merging data, transforming variables, and restructuring variables in SPSS.
  • Stata
    • Stata YouTube channel: The StataYouTube channel includes hundreds of videos on data cleaning and management in Stata. Each short video addresses a specific topic, such as merging files, appending files, reshaping data, identifying and removing duplicates, and many more. 
  • R
    • R Bootcamp: This set of R tutorials includes tutorials on sorting, reshaping, and cleaning data.

 

General resources on data cleaning and management:

 

  • Best Practices for Data and Code Management: This guide created by Innovations for Poverty Action provides best practices for organizing folder and files; writing and organizing code; dealing with missing values; documenting data and code; and keeping data secure.
  • J-PAL Research Resources: This collection of resources from the Abdul Latif Jameel Poverty Action Lab (J-PAL) includes guidance on data cleaning and analysis.
 
SCANR tips on data cleaning

 

  • What is metadata?
    • Metadata is information that describes the data and helps future users understand how it was collected, who collected or created it, what is contains, and how it can be referenced. Metadata may include elements such as a dataset persistent ID (DOI), publication date, title, author(s), contact information, description, subject(s), keyword(s), topic classification(s), language, producing and distributing organization(s), production date and place, names of contributors, funding information, time period, data type, software, geographic location, and unit of analysis.
  • What is a codebook?  
    • The codebook tells you what questions were asked/answered, what variables were measured, and/or how the answers/results were recorded. Use the codebook to find your variables of interest and explore other aspects of the dataset. When cleaning data, refer to the codebook for the variable names, variable values, and value labels. For some datasets, the “codebook” might be a survey questionnaire, survey guide or protocol, or other data collection tool.
  • What if some data are missing?
    • First, figure out how missing data is coded in your dataset. Missing values may be represented by a blank, a period (i.e., a dot or full stop), a large number (e.g., 9999 or -9999), or some other symbol.
    • Figure out which values are plausible. Compare the values in the dataset to the values listed in the codebook. Values not listed in the codebook may reflect data entry errors. Then, check for biologically implausible values. These may reflect data entry errors (e.g., age = 200 years). If possible, contact the people or organization who collected the data to see if changes to the data collection tool or data entry protocols could help you to understand implausible values.
    • Check for skip patterns denoted in the questionnaire or other data collection tool or guide. For example, in a survey, some questions may be automatically skipped if a respondent gives a certain answer to an earlier question. These questions are not applicable to the respondent but should not be considered missing.

 

Analysing data

 

This page contains resources and tips for using statistical software to analyse secondary data and export results.

 

Resources on using statistical software for analysing data and exporting results

 

  • Excel
    • Descriptive Statistics Using Excel: A learning module that focuses on entering data, descriptive statistics, pivot tables, one-way ANOVA, and graphic in Excel.
    • RegressIt: A guide to an Excel add-in called RegressIt that performs linear regression and multivariate data analysis.
    • Top Excel Tips for Data Analysis: A set of tips for data cleaning, analysis, and visualization in Excel.
  • NVivo
    • DataLab at Tufts Workshops: The Tufts DataLab website features recordings of workshops on NVivo, including Unlocking Qualitative Data with NVivo.
  • Python
  • R
    • Analyze Survey Data: A step-by-step guide for analyzing publicly available survey data using R.
    • DataLab at Tufts Workshops: The Tufts DataLab website features recordings of workshops on R, including A Gentle Intro to R and Intermediate Statistics in R.
    • Microeconomics in R: A set of lecture notes on using Stata and R for microeconomics. Topics include OLS regression, functional form, heteroskedasticity, clustering, instrumental variables, panel data, binary response models, and more.
    • R Bootcamp: This set of R tutorials includes tutorials on regression and basic analytics, including OLS regression and regression diagnostics, as well as exporting your work in the form of image files and HTML notebooks and reports. 
  • SPSS
    • SPSS Online Training Workshop: This set of tutorials, videos, and datasets includes a tutorial on descriptive statistics, t-tests, ANOVA, linear models, mixed models, cluster analysis, factor analysis, time series, and more in SPSS.
    • SPSS Tutorials: This set of tutorials includes data analysis, chi-square tests, ANOVA, and t-tests in SPSS.
  • Stata
    • DataLab at Tufts Workshops: The Tufts DataLab website features recordings of workshops on Stata, including Stata Basic
    • Guide to gologit2: A guide to using Stata’s gologit2 command for generalized ordered logit models for ordinal dependent variables.
    • Guide to oglm: A guide to using Stata’s oglm command for ordinal generalized linear models.
    • Stata YouTube channel: The StataYouTube channel includes hundreds of videos on data analysis in Stata. Each short video addresses a specific topic, such as t-tests, regression models, and many more.
    • Microeconomics in Stata: A set of lecture notes on using Stata and R for microeconomics. Topics include OLS regression, functional form, heteroskedasticity, clustering, instrumental variables, panel data, binary response models, and more.
    • South Africa Distance Learning Project: A hand-on distance learning tools that teaches students how to analyze South African household survey data in Stata, with a focus on investigating policy issues.
    • Stata Web Books Regression with Stata: A set of materials to help users learn to perform regression analysis in Stata, including chapters on simple and multiple regression, regression diagnostics, regression with categorical predictors, and regression beyond OLS.
    • Survival Analysis with Stata: A set of lessons, materials, do-files, and datasets to help users learn survival analysis in Stata.
    • The 10 Commandments for Regression Tables: A guide to creating high quality regression tables written by Dr. Keith Head. This guide can be used with any statistical software but includes some Stata commands.

 

General resources on data analysis and study design

 

 

Visualising data

 

Data visualisations are an essential way to communicate your results, whether in a journal article, blog, Twitter post, or other outlet. There are many data visualisation tools available online. The resources in this section focus on data visualisation in statistical software and other programs commonly used by researchers.

 

Resources for effective data visualization

 

  • Color Brewer: An online tool for identifying color schemes for maps and visualisations.
  • From data to viz: An online tool to help users choose an appropriate visualisation method for their data, including examples created using R.
  • Introduction to Data Visualization: A comprehensive guide to data visualisation created by Duke University, including definitions, types, tools, and tutorials.
  • The 10 Commandments for Figures: A guide to creating high quality regression tables written by Dr. Keith Head.
  • Visualising data: A collection of resources for data visualisation, including programming visualisations, web-based visualisation tools, qualitative tools, mapping tools, and colour schemes.

 

Resources for using data visualization programmes

 

  • Excel
  • Python
  • R
    • Beautiful plotting in R: A guide to using the ggplot2 package in R for creating elegant graphics.
    • Comprehensive Guide to Data Visualization in R: A guide to basic and advanced data visualisation techniques in R.
    • RStudio Cheatsheets related to data visualisation: This website contains a set of quick references on different topics in RStudio. The Data Visualization Cheatsheet focuses on how to use the ggplot2 package in R and is translated into Chinese, Dutch, French, German, Japanese, Portuguese, Russian, Spanish, Turkish, and Vietnamese. Also look for the cheat sheet on the vtree package that explains how to visualise hierarchical subsets of data with variable trees.
    • R Bootcamp: This set of R tutorials includes a tutorial on data visualisation.
    • The R Graph Gallery: A collection of charts, maps, and diagrams with reproducible R code, mostly made using the tidyverse and ggplot2. This website also includes tools to help users learn to use base R, ggplot2, interactive charts, and R markdown.
  • Stata
    • Stata YouTube channel: The StataYouTube channel includes hundreds of videos on data visualisation in Stata. Each short video addresses a specific topic, such as scatterplots, bar graphs, and many more. 
  • Tableau

 

Online data visualization tools

 

Many organizations and research groups have built online tools to create data visualisations from existing datasets. Some online data visualisation tools that may be of interest to ANH researchers include:

 

  • DHS Program STATcompiler: A online tool that allows users to create visualisations using data from the Demographic and Health Surveys (DHS). 
  • Food Systems Dashboard: An online tool that displays over 150 indicators related to food systems from multiple data sources. Researchers and policymakers identify and visualise relevant indicators and download the related data. 
  • IHME Viz Hub: A set of data visualisations created by the Institute for Health Metrics and Evaluation (IHME) using data from their projects and publications.

 

Examples of effective data visualization

 

Looking at how other people – including researchers, policymakers, journalists, and more – have visualised data to communicate their results and messages can help you to think of how to communicate your own work. Here are some examples of data visualisations from a variety of sources:

 

 

Communicating results

 

When you finish your analysis, the challenge has just begun! This page contains resources for communicating the results of your research through scientific articles, blogs, presentations, and more.

 

Peer-reviewed journals and discussion paper series

 

  • How do I write a scientific paper: A compilation of tips on how to present the results of a study in a scientific paper, prepared by the Applied Ecology Research Group at the University of Canberra.
  • How to Write Applied Papers in Economics: This paper provides detailed guidance about how to prepare the title, abstract, introduction, structure, theoretical framework, data and statistics, empirical framework, results, discussion, and conclusions of an effective paper in applied economics written by Dr. Marc Bellemare.
  • How to Write the Introduction of Your Development Economics Paper:  A detailed guide for writing the introduction of a research paper written by Dr. David Evans.
  • Public Health Writing Guide: This guide from the Boston University School of Public Health provides detailed guidance for effective scientific writing and publishing.
  • The Introduction Formula: A brief guide written by Dr. Keith Head about the key components of the introduction of an academic paper.
  • The Conclusion Formula: A guide to writing the conclusion of a research paper written by Dr. Marc Bellemare.
  • Writing a Better Research Article: This article by Dr. Thomas Lang provides guidance on writing a clear, concise research article; what to include in each section of the article; and how to navigate the publication process.
 
Research blogs

 

  • How to write a blogpost from your journal article: A reflection on how and why to effectively communicate research findings through a blog.
  • Blogs of interest to researchers in agriculture, food systems, nutrition, and health
    • ANH Academy Blog: A blog that highlights research, policy, and events related to agriculture, food systems, nutrition, and health
    • Development Impact: A World Bank blog focused on news, methods, and insights about impact evaluation
    • Economics for Food and Nutrition Policy: A teaching blog written by Dr. William Masters
    • Food Systems Idea Exchange: A blog that showcases food systems research curated by the CGIAR Research Program on Agriculture for Nutrition and Health
    • Gender-Nutrition Idea Exchange: A blog that showcases research on connections between agriculture, nutrition, health, and gender curated by the CGIAR Research Program on Agriculture for Nutrition and Health
    • IFPRI Blogs: A set of blogs from the International Food Policy Research Institute focused on research and issues in international development
    • Food Politics: A blog written by Dr. Marion Nestle
    • Marc. F Bellemare: A blog focused on agricultural and applied economics written by Dr. Marc Bellemare
    • Planetary Health Alliance Blog: A blog curated by the Planetary Health Alliance
    • State of the Planet: A blog curated by the Columbia University Earth Institute
    • The Food Archive: A blog about food and nutrition science, politics, and culture
    • U.S. Food Policy: A public policy focused blog written by Dr. Parke Wilde

 

Presentations

 

 
Communicating to non-researchers