1 Introduction

1.1 What is GGIR?

GGIR is an R-package to process multi-day raw accelerometer data for physical activity and sleep research. The term raw refers to data being expressed in m/s2 or gravitational acceleration as opposed to the previous generation accelerometers which stored data in accelerometer brand specific units. The signal processing includes automatic calibration, detection of sustained abnormally high values, detection of non-wear and calculation of average magnitude of dynamic acceleration based on a variety of metrics. Next, GGIR uses this information to describe the data per recording, per day of measurement, and (optionally) per segment of a day of measurement, including estimates of physical activity, inactivity and sleep. We published an overview paper of GGIR in 2019 link.

This vignette provides a general introduction on how to use GGIR and interpret the output, additionally you can find a introduction video and a mini-tutorial on YouTube. If you want to use your own algorithms for raw data then GGIR facilitates this with it’s external function embedding feature, documented in a separate vignette: Embedding external functions in GGIR. GGIR is increasingly being used by research groups across the world. A non-exhaustive overview of academic publications related to GGIR can be found here. R package GGIR would not have been possible without the support of the contributors listed in the author list at GGIR, with specific code contributions over time since April 2016 (when GGIR development moved to GitHub) shown here.

Cite GGIR:

When you use GGIR in publications do not forget to cite it properly as that makes your research more reproducible and it gives credit to it’s developers. See paragraph on Citing GGIR for details.

1.2 Contributing, Support, and Keeping up to date

How to contribute to the code?

The development version of GGIR can be found on github, which is also where you will find guidance on how to contribute.

How can I get service and support?

GGIR is open source software and does not come with service or support guarantees. However, as user-community you can help each other via the GGIR google group or the GitHub issue tracker. Please use these public platform rather than private e-mails such that other users can learn from the conversations.

If you need dedicated support with the use of GGIR or need someone to adapt GGIR to your needs then Vincent van Hees is available as independent consultant.

GGIR training We offer frequent online GGIR training courses, both standard training on GGIR (3 x 3 hours) and private training to focus on your outstanding issues (3 x 2 hours). Check our dedicated training website with more details and the option to book your training . Do you have questions about the training or the booking process? Do not hesitate to contact us via: .

Change log

Our log of main changes to GGIR over time can be found here.

2 Setting up your work environment

2.1 Install R and RStudio

Download and install R

Download and install RStudio (optional, but recommended)

Install GGIR with its dependencies from CRAN. You can do this with one command from the console command line:

install.packages("GGIR", dependencies = TRUE)

Alternatively, to install the latest development version with the latest bug fixes use instead:

install.packages("remotes")
remotes::install_github("wadpac/GGIR")

2.2 Prepare folder structure

  1. GGIR works with the following accelerometer brands and formats:
    • GENEActiv .bin
    • Axivity AX3 and AX6 .wav, .csv and .cwa
    • ActiGraph .csv and .gt3x (.gt3x only the newer format generated with firmware versions above 2.5.0. Serial numbers that start with “NEO” or “MRA” and have firmware version of 2.5.0 or earlier use an older format of the .gt3x file). Note for Actigraph users: If you want to work with .csv exports via the commercial ActiLife software then note that you have the option to export data with timestamps. Please do not do this as this causes memory issues for GGIR. To cope with the absence of timestamps GGIR will calculate timestamps from the sample frequency, the start time and start date as presented in the file header.
    • Movisens with data stored in folders.
    • Genea (an accelerometer that is not commercially available anymore, but which was used for some studies between 2007 and
      1. .bin and .csv
    • Any other accelerometer brand that generates csv output, see documentation for functions read.myacc.csv and argument rmc.noise in the GGIR function documentation (pdf).
  2. All accelerometer data that needs to be analysed should be stored in one folder, or subfolders of that folder.
  3. Give the folder an appropriate name, preferable with a reference to the study or project it is related to rather than just ‘data’, because the name of this folder will be used later on as an identifier of the dataset.

2.3 GGIR shell function

GGIR comes with a large number of functions and optional settings (arguments) per functions.

To ease interacting with GGIR there is one central function, named GGIR, to talk to all the other functions. In the past this function was called g.shell.GGIR, but we decided to shorten it to GGIR for convenience. You can still use g.shell.GGIR because g.shell.GGIR has become a wrapper function around GGIR passing on all arguments to GGIR and by that providing identical functionality.

In this paragraph we will guide you through the main arguments to GGIR relevant for 99% of research. First of all, it is important to understand that the GGIR package is structured in two ways.

Firstly, it has a computational structure of five parts which are applied sequentially to the data, and that GGIR controls each of these parts:

  • Part 1: Loads the data and stores derived features (aggregations) needed for the other parts. This is the time-consuming part. Once this is done, parts 2-5 can be run (or re-run with different parameters in parts 2-5) relatively quickly.
  • Part 2: Data quality analyses and low-level description of signal features per day and per file. At this point a day is defined from midnight to midnight
  • Part 3: Estimation of sustained inactivity and sleep periods, needed for input to Part 4 for sleep detection
  • Part 4: Labels the sustained inactive periods detected in Part 3 as sleep, or daytime sustained inactivity, per night and per file
  • Part 5: Derives sleep and physical activity characteristics by re-using information derived in part 2, 3 and 4. Total time in intensity categories, the number of bouts, time spent in bouts and average acceleration (overall activity) is calculated.

The reason why it split up in parts is that it avoids having the re-do all analysis if you only want to make a small change in the more downstream parts. The specific order and content of the parts has grown for historical and computational reasons.

Secondly, the function arguments which we will refer to as input parameters are structured thematically independently of the five parts they are used in:

  • params_rawdata: parameters related to handling the raw data such as resampling or calibrating
  • params_metrics: parameters related to aggregating the raw data to epoch level summary metrics
  • params_sleep: parameters related to sleep detection
  • params_physact: parameters related to physical activity
  • params_247: parameters related to 24/7 behaviours that do not fall into the typical sleep or physical activity research category.
  • params_output: parameters relating to how and whether output is stored.
  • params_general: general parameters not covered by any of the above categories

This structure was introduced in GGIR version 2.5-6 to make the GGIR code and documentation easier to navigate.

To see the parameters in each parameter category and their default values do:

library(GGIR)
print(load_params())

If you are only interested in one specific category like sleep:

library(GGIR)
print(load_params()$params_sleep)

If you are only interested in parameter “HASIB.algo” from the sleep_params object:

library(GGIR)
print(load_params()$params_sleep[["HASPT.algo"]])

Documentation for all arguments in the parameter objects can be found the vignette: GGIR configuration parameters.

All of these arguments are accepted as argument to function GGIR, because GGIR is a shell around all GGIR functionality. However, the params_ objects themselves can not be provided as input to GGIR.

2.3.1 Key general arguments

You will probably never need to think about most of the arguments listed above, because a lot of arguments are only included to facilitate methodological studies where researchers want to have control over every little detail. See previous paragraph for links to the documentation and how to find the default value of each parameter.

The bare minimum input needed for GGIR is:

library(GGIR)
GGIR(datadir="C:/mystudy/mydata",
             outputdir="D:/myresults")

Argument datadir allows you to specify where you have stored your accelerometer data and outputdir allows you to specify where you would like the output of the analyses to be stored. This cannot be equal to datadir. If you copy paste the above code to a new R script (file ending with .R) and Source it in R(Studio) then the dataset will be processed and the output will be stored in the specified output directory.

Below we have highlighted the key arguments you may want to be aware of. We are not giving a detailed explanation, please see the package manual for that.

  • mode - which part of GGIR to run, GGIR is constructed in five parts.
  • overwrite - whether to overwrite previously produced milestone output. Between each GGIR part, GGIR stores milestone output to ease re-running parts of the pipeline.
  • idloc - tells GGIR where to find the participant ID (default: inside file header)
  • strategy - informs GGIR how to consider the design of the experiment.
    • If strategy is set to value 1, then check out arguments hrs.del.start and hrs.del.end.
    • If strategy is set to value 3, then check out arguments ndayswindow.
  • maxdur - maximum number of days you expect in a data file based on the study protocol.
  • desiredtz - time zone of the experiment.
  • chunksize - a way to tell GGIR to use less memory, which can be useful on machines with limited memory.
  • includedaycrit - tell GGIR how many hours of valid data per day (midnight-midnight) is acceptable.
  • includenightcrit - tell GGIR how many hours of a valid night (noon-noon) is acceptable.
  • qwindow - argument to tell GGIR whether and how to segment the day for day-segment specific analysis.
  • mvpathreshold and boutcriter - acceleration threshold and bout criteria used for calculating time spent in MVPA (only used in GGIR part2).
  • epochvalues2csv - to export epoch level magnitude of acceleration to a csv files (in addition to already being stored as RData file)
  • dayborder - to decide whether the edge of a day should be other than midnight.
  • iglevels - argument related to intensity gradient method proposed by A. Rowlands.
  • do.report - specify reports that need to be generated.
  • viewingwindow and visualreport - to create a visual report, this only works when all five parts of GGIR have successfully run.

2.3.4 Published cut-points and how to use them

This section has been rewritten and moved. Please, visit the vignette Published cut-points and how to use them in GGIR for more details on the cut-points available, how to use them, and some additional reflections on the use of cut-points in GGIR.

2.3.5 Example call

If you consider all the arguments above you me may end up with a call to GGIR that could look as follows.

library(GGIR)
GGIR(
             mode=c(1,2,3,4,5),
             datadir="C:/mystudy/mydata",
             outputdir="D:/myresults",
             do.report=c(2,4,5),
             #=====================
             # Part 2
             #=====================
             strategy = 1,
             hrs.del.start = 0,          hrs.del.end = 0,
             maxdur = 9,                 includedaycrit = 16,
             qwindow=c(0,24),
             mvpathreshold =c(100),
             bout.metric = 6,
             excludefirstlast = FALSE,
             includenightcrit = 16,
             #=====================
             # Part 3 + 4
             #=====================
             def.noc.sleep = 1,
             outliers.only = TRUE,
             criterror = 4,
             do.visual = TRUE,
             #=====================
             # Part 5
             #=====================
             threshold.lig = c(30), threshold.mod = c(100),  threshold.vig = c(400),
             boutcriter = 0.8,      boutcriter.in = 0.9,     boutcriter.lig = 0.8,
             boutcriter.mvpa = 0.8, boutdur.in = c(1,10,30), boutdur.lig = c(1,10),
             boutdur.mvpa = c(1),
             includedaycrit.part5 = 2/3,
             #=====================
             # Visual report
             #=====================
             timewindow = c("WW"),
             visualreport=TRUE)

Once you have used GGIR and the output directory (outputdir) will be filled with milestone data and results.

2.3.6 Configuration file

Function GGIR stores all the explicitly entered argument values and default values for the argument that are not explicitly provided in a csv-file named config.csv stored in the root of the output folder. The config.csv file is accepted as input to GGIR with argument configfile to replace the specification of all the arguments, except datadir and outputdir, see example below.

library(GGIR)
GGIR(datadir="C:/mystudy/mydata",
             outputdir="D:/myresults", configfile = "D:/myconfigfiles/config.csv")

The practical value of this is that it eases the replication of analysis, because instead of having to share you R script, sharing your config.csv file will be sufficient. Further, the config.csv file contribute to the reproducibility of your data analysis.

Note 1: When combining a configuration file with explicitly provided argument values, the explicitly provided argument values will overrule the argument values in the configuration file. Note 2: The config.csv file in the root of the output folder will be overwritten every time you use GGIR. So, if you would like to add annotations in the file, e.g. in the fourth column, then you will need to store it somewhere outside the output folder and explicitly point to it with configfile argument.

3 Time for action: How to run your analysis?

3.1 From the R console on your own desktop/laptop

Create an R-script and put the GGIR call in it. Next, you can source the R-script with the source function in R:

source("pathtoscript/myshellscript.R")

or use the Source button in RStudio if you use RStudio.

3.2 In a cluster

GGIR by default support multi-thread processing, which can be turned off by seting argument do.parallel = FALSE. If this is still not fast enough then we advise using a GGIR on a computing cluster. The way we did it on a Sun Grid Engine cluster is shown below, please note that some of these commands are specific to the computing cluster you are working on. Also, you may actually want to use an R package like clustermq or snowfall, which avoids having to write bash script. Please consult your local cluster specialist to tailor this to your situation. In our case, we had three files for the SGE setting:

submit.sh

for i in {1..707}; do
    n=1
    s=$(($(($n * $[$i-1]))+1))
    e=$(($i * $n))
    qsub /home/nvhv/WORKING_DATA/bashscripts/run-mainscript.sh $s $e
done

run-mainscript.sh

#! /bin/bash
#$ -cwd -V
#$ -l h_vmem=12G
/usr/bin/R --vanilla --args f0=$1 f1=$2 < /home/nvhv/WORKING_DATA/test/myshellscript.R

myshellscript.R

options(echo=TRUE)
args = commandArgs(TRUE)
if(length(args) > 0) {
  for (i in 1:length(args)) {
    eval(parse(text = args[[i]]))
  }
}
GGIR(f0=f0,f1=f1,...)

You will need to update the ... in the last line with the arguments you used for GGIR. Note that f0=f0,f1=f1 is essential for this to work. The values of f0 and f1 are passed on from the bash script.

Once this is all setup you will need to call bash submit.sh from the command line.

With the help of computing clusters GGIR has successfully been run on some of the worlds largest accelerometer data sets such as UK Biobank and German NAKO study.

3.3 Processing time

The time to process a typical seven day recording should be anywhere in between 3 and 10 minutes depending on the sample frequency of the recording, the sensor brand, data format, the exact configuration of GGIR, and the specifications of your computer. If you are observing processing times of 20 minutes or longer for a 7 day recording then probably you are slowed down by other factors.

Some tips on how you may be able to address this:

  • Make sure the data you process is on the same machine as where GGIR is run. Processing data located somewhere else on a computer network can substantially slow software down.
  • Make sure your machine has 8GB or more RAM memory, using GGIR on old machines with only 4GB is known to be slow. However, total memory is not the only bottle neck, also consider the number of processes (threads) your CPU can run relative to the amount of memory. Ending up with 2GB per process seems a good target.
  • Avoid doing other computational activities with your machine while running GGIR. For example, if you use DropBox or OneDrive make sure they do not sync while you are running GGIR. When using GGIR to process large datasets it is probably best to not use the machine, but make sure the machine is configured not to fall asleep as that would terminate the analyses.

4 Inspecting the results

GGIR generates the following types of output. - csv-spreadsheets with all the variables you need for physical activity, sleep and circadian rhythm research - Pdfs with on each page a low resolution plot of the data per file and quality indicators - R objects with milestone data - Pdfs with a visual summary of the physical activity and sleep patterns as identified (see example below)