Using data software

There are many programs that can be used to analyse secondary data. These programs allow users to view and prepare data, perform analyses, create visualisations, and more. This page provides resources for learning to use several commonly used statistical software programs, as well as resources for spatial data analysis and database management.

 

Most researchers learn to use one or more data software programs and mastering one of these programs requires considerable time and effort. When choosing which program to learn, you will want to consider your financial resources and institutional affiliations; the programs commonly used in your field; the programs used by your research collaborators; and your own research goals. Some of these programs are proprietary so you will need to pay for access or gain access through your institution. Excel, SPSS, SAS, Stata, and MPlus are all proprietary programs (though Google Sheets is a free program that has some of the functionality of Excel). R and Python are free, open-source programs. If you are interested in analysis of geospatial data, you may want to learn to use ArcGIS or an open-source counterpart like QGIS. If you are managing a large amount of data, you may want to learn to use SQL and a relational database management system.

 

Statistical software

 

  • Excel
    • Coursera courses on Excel: Coursera offers a variety of courses in Excel on topics such as basics, data visualization, and data analysis.
    • edX courses on Excel: edX offers a variety of courses in Excel, including a course sequence on Excel and business analytics.
  • R
    • The Comprehensive R Archive Network (CRAN): R is an open-source statistical software that can be downloaded for free here.
    • Analyze Survey Data: A step-by-step guide for analyzing publicly available survey data using R.
    • Coursera courses on R: Coursera offers a variety of courses in R on topics such as data science, statistics, and programming.
    • CRAN Task Views: Guidance on which R packages can be used for tasks related to certain topics, such as clinical trials, econometric, experimental design, graphics, optimization, and more.
    • DataLab at Tufts Workshops: The Tufts DataLab website features recordings of workshops on R, including A Gentle Intro to R and Intermediate Statistics In R.
    • edX courses on R: edX offers a variety of courses in R, including course sequences on data science and statistics.
    • Quick-R: A website designed to teach experienced users of other statistical programming packages to use R.
    • R-bloggers: A blog aggregator that combines blog feeds from a set of participating R blogs. Users can view blogs on the website, receive daily digests, connect with other R users on their Facebook page, and contribute blogs about R. 
    • R Bootcamp: A set of R tutorials that each come with slides, handouts, and R codes.
    • R for Data Science: A free online book that teaches best practices for using R, including data visualization, workflow, data transformation, exploratory data analysis, and more.
    • RStudio Cheatsheets: A set of references for learning and using R packages. Each cheat sheet addresses a specific topic related to R, such as data import, data transformation, and data visualization. Look at the bottom of the page for translations to Chinese, Dutch, French, German, Greek, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Turkish, Ukrainian, Uzbek, and Vietnamese.
    • UCLA Institute for Digital Research & Education Statistical Consulting: A website that helps you to learn and use R through tutorials, class notes, data analysis examples, and annotated outputs.
  • MPlus
  • Python
    • Anaconda: The Python Anaconda Distribution, recommended for those using Python for data science, can be downloaded here. Anaconda is a data science toolkit that includes Python, Project Jupyter, a graphical user interface, and a package manager. The Anaconda Individual Edition for solo practitioners, students, and researchers is free. The website includes a user guide that helps users get started and use each part of the toolkit.
    • A Byte of Python: A free online book that provides a tutorial for Python beginners.
    • A Whirlwind Tour of Python: A free online book that introduces users to the Python language that is written for users with some programming experience.
    • DataLab at Tufts Workshops: The Tufts DataLab website features recordings of workshops on Python, including Programming in Python.
    • edX courses on Python: edX offers a variety of courses in Python, including an course sequence on Python for data science.
    • Google’s Python Class: A set of resources, videos, and coding exercises designed to introduce Python to users with some coding experience.
    • Kaggle Courses: A set of self-guided courses on using Python for data science.
    • Programming with Python: An introduction to Python that focuses on data analysis.
    • The Python Tutorial: An official tutorial from Python the reviews the built-in capabilities of Python. 
  • SAS
  • SPSS
  • Stata

 

Software for spatial data analysis

 

Spatial data can be analysed using programs such as ArcGIS, a proprietary geographic information system for working with maps and other spatial data, and QGIS, a free, open-source platform for analysing geospatial data.

 

  • DataLab at Tufts Workshops: The Tufts DataLab website features recordings of workshops on GIS, including Intro to GIS using ArcMap, Intro to GIS using QGIS, and Intro to Web Mapping using ArcGIS Online.
  • QGIS Tutorials and Tips: A set of tutorials for basic, intermediate, and advanced operations in QGIS and integrating Python and QGIS.
  • QGIS User Guide: Comprehensive documentation, user guides, and training manuals for using QGIS, a free, open-source platform for analyzing geospatial data.

 

Software for storing and managing data

 

A database is a dataset or multiple datasets stored on a computer. A relational database is a database that stores data in tables. Relational database management systems can be used to manage large, complex data tables. If you are managing a large amount of data, you may want to use a relational database management system to manage and query your data. Structured query language (SQL) is a programming language used in many relational database management systems.

 

Relational database management systems
  • MySQL: An open-source SQL database owned by Oracle.
  • PostgreSQL: An open-source SQL database developed by the University of California at Berkeley.
  • Oracle DB: A proprietary SQL database owned by Oracle.
  • SQL Server: A proprietary SQL database owned by Microsoft. There is a free entry-level version available.
  • SQLite: An open-source SQL database that can store data locally.
  • CKAN: An open-source data management system designed for use by governments and businesses.

 

Resources for using SQL and relational database management systems
  • Intro to Relational Databases: A free online course that provides an introduction to relational databases using Python and SQL.
  • Kaggle Courses: This set of self-guided courses on using Python includes a mini-courses on basic and advanced SQL.
  • Learn SQL on Code Academy: A free online course on manipulating, querying, and aggregating data in SQL.
  • SQLBolt: A set of interactive lessons to help users learn to query data using SQL.
  • SQL Zoo: A set of exercises to help users learn to use SQL.

 

Last update: 19 August 2022