First of all, if you know what R is then please skip down the page to the Packages section of this post which gives R packages that can be used, with some relevant code to get you started.
What is R?
The R website has an about page which can be found here https://www.r-project.org/about.html, but the crux of what is said on this page is that R is a large calculator with the most extraordinary number of buttons for as many functions as you can imagine. Functions are essentially the button that squares a number or calculates some trigonometry equation.
The beauty of R is that it is open source which means free. Some of the functions that have been created would take such a large number of hours to create yourself but, fortunately, someone has already, very kindly done this for you. I think the best way to thank these kind people is to use this technology wherever possible and once you’re familiar enough begin to contribute.
A collection of these functions in R is called a package. There are over 3000 R packages with varying uses.
Some of the packages can be used to speed up processes but please be aware that you can enter a time black hole through tinkering when there may be a simple solution. Your best friend will become slack overflow and there will be strangely named users who answer your queries at a moments notice.
In summary though, what R can help you do is the following:
- Load in data from multiple sources for inspection
- Manipulate the data through various operations to understand general trends.
- Visualising data in a variety of graphical outputs.
- Model relationships, detect correlations and generate advanced statistical insights.
- Create prediction methods to assist in decision making
- Automate data processing, report generation and interactive dashboards.
R is a significant upgrade from using excel that can help a business become data-driven. Don’t stop using Excel, however, as it has it’s own benefits including quickly analysing datasets using filtering tools but if you’re looking to advance some of your computations and models that drive decision making, learning more about R could get your business to that next level.
How to install R
To begin with install R from https://www.r-project.org.
There is an introduction to the R environment here https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf that would familiarize a user. I would also recommend downloading the IDE (integrated development environment) RStudio from https://rstudio.com/products/rstudio/ after you have installed R. RStudio provides the user with more than the console to work with and gives the option of having a script, console, view of the R environment and file locations in one window. There is a list of tips here on using RStudio here https://rstudio.com/resources/cheatsheets/.
Note: (You will also need to install x11 forward https://www.xquartz.org/ to run on OS.)
Writing your first pieces of code.
Firstly, opening RStudio you should find the following screen:
Next, select a new script:
Once opened your screen should look like this:
Now by entering on to this a simple command 2 + 2 and selecting Crtl+R or Crtl+Ent on Windows and Cmd+Ent on OS you should see the following:
(You can also copy and paste the line of code into the console in the window below and select enter.)
This is your very first R Script. To try some of the other operations in R out I’ve ran the following.
This is the basic introduction to how to use R as a calculator. The next stage is to look at how we can save variables to be used later.
Useful Packages
For those more advanced users of R who are looking for package specific help or interesting use, I intend to create examples for the following package and following the list provided by RStudio here (https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages).
To load data
DBI - The standard for communication between R and relational database management systems. Packages that connect R to databases depend on the DBI package.
odbc - Use any ODBC driver with the odbc package to connect R to your database. Note: RStudio professional products come with professional drivers for some of the most popular databases.
RMySQL, RPostgresSQL, RSQLite - If you’d like to read in data from a database, these packages are a good place to start. Choose the package that fits your type of database.
XLConnect, xlsx - These packages help you read and write Microsoft Excel files from R. You can also just export your spreadsheets from Excel as .csv’s.
foreign - Want to read a SAS data set into R? Or an SPSS data set? Foreign provides functions that help you load data files from other programs into R.
haven - Enables R to read and write data from SAS, SPSS, and Stata.
R can handle plain text files – no package required. Just use the functions read.csv, read.table, and read.fwf. If you have even more exotic data, consult the CRAN guide by running a ? in front of the function, you wish to use. For example ?read.csv will give you access to the manual in the bottom right-hand corner of RStudio. This can help you decide whether you can use your data as it is or if manipulation is required.
For more information about using R with databases see db.rstudio.com.
To manipulate data
tidyverse - An opinionated collection of R packages designed for data science that share an underlying design philosophy, grammar, and data structures. This collection includes all the packages in this section, plus many more for data import, tidying, and visualization listed here.
dplyr - Essential shortcuts for subsetting, summarizing, rearranging, and joining together data sets. dplyr is our go to package for fast data manipulation.
tidyr - Tools for changing the layout of your data sets. Use the gather and spread functions to convert your data into the tidy format, the layout R likes best.
stringr - Easy to learn tools for regular expressions and character strings.
lubridate - Tools that make working with dates and times easier.
To visualize data
ggplot2 - R’s famous package for making beautiful graphics. ggplot2 lets you use the grammar of graphics to build layered, customizable plots.
ggvis - Interactive, web based graphics built with the grammar of graphics.
rgl - Interactive 3D visualizations with R
htmlwidgets - A fast way to build interactive (javascript based) visualizations with R.
Packages that implement htmlwidgets include: - leaflet (maps) - dygraphs (time series) - DT (tables) - diagrammeR (diagrams) - network3D (network graphs) - threeJS (3D scatterplots and globes).
googleVis - Let’s you use Google Chart tools to visualize data in R. Google Chart tools used to be called Gapminder, the graphing software Hans Rosling made famous in hie TED talk.
To model data
tidymodels - A collection of packages for modeling and machine learning using tidyverse principles. This collection includes rsample, parsnip, recipes, broom, and many other general and specialized packages listed here.
car - car’s Anova function is popular for making type II and type III Anova tables.
mgcv - Generalized Additive Models
lme4/nlme - Linear and Non-linear mixed effects models
randomForest - Random forest methods from machine learning
multcomp - Tools for multiple comparison testing
vcd - Visualization tools and tests for categorical data
glmnet - Lasso and elastic-net regression methods with cross validation
survival - Tools for survival analysis
caret - Tools for training regression and classification models
To report results
shiny - Easily make interactive, web apps with R. A perfect way to explore data and share findings with non-programmers.
R Markdown - The perfect workflow for reproducible reporting. Write R code in your markdown reports. When you run render, R Markdown will replace the code with its results and then export your report as an HTML, pdf, or MS Word document, or a HTML or pdf slideshow. The result? Automated reporting. R Markdown is integrated straight into RStudio.
xtable - The xtable function takes an R object (like a data frame) and returns the latex or HTML code you need to paste a pretty version of the object into your documents. Copy and paste, or pair up with R Markdown.
For Spatial data
sp, maptools - Tools for loading and using spatial data including shapefiles.
maps - Easy to use map polygons for plots.
ggmap - Download street maps straight from Google maps and use them as a background in your ggplots.
For Time Series and Financial data zoo - Provides the most popular format for saving time series objects in R.
xts - Very flexible tools for manipulating time series data sets.
quantmod - Tools for downloading financial data, plotting common charts, and doing technical analysis.
To write high performance R code
Rcpp - Write R functions that call C++ code for lightning fast speed.
data.table - An alternative way to organize data sets for very, very fast operations. Useful for big data.
parallel - Use parallel processing in R to speed up your code or to crunch large data sets.
To work with the web
XML - Read and create XML documents with R
jsonlite - Read and create JSON data tables with R
httr - A set of useful tools for working with http connections
To write your own R packages
devtools - An essential suite of tools for turning your code into an R package.
testthat - testthat provides an easy way to write unit tests for your code projects.
roxygen2 - A quick way to document your R packages. roxygen2 turns inline code comments into documentation pages and builds a package namespace.
You can also read about the entire package development process online in Hadley Wickham’s R Packages book
Some of our work for Activinsights using R.
Activinsights manufacture a device called the GENEActiv which is the original wrist-worn, raw data accelerometer for objective behavioural measurement. The accelerometer watches lead the way for the next generation of affordable waveform output accelerometers. The watches are the perfect tool for analysing human behaviour, from studying the impact of physical activity on health and lifestyle to sports science and vehicle safety. The device is an ergonomic body-worn instrument:
waterproof,
robust to moderate impacts,
contains a precision real-time clock,
runs from a long-lasting, rechargeable battery,
storage for 500 MB of binary data.
The package GENEAread provides data import functionality, giving researchers access to cutting edge analytical tools from the R environment. Imported data can be summarized by a segmentation process which cuts the dataset into time periods of characteristically similar behaviour. The activities in each segment can be guessed by an rpart GENEA classification tree. A sample rpart GENEA classification tree, trainingFit, is provided with GENEAclassify. This package provides classification tools, allowing researchers to segment training data and create custom classification trees.
Our package GENEAsphere is for visualising the data from GENEAread and GENEAclassify and include 3D sphere visualisation, Posture positionals plots and Hypnograms.
For more information on the Activinsights device GENEActiv please follow this link:
The package links can be found here:
GENEAclassify - https://cran.r-project.org/web/packages/GENEAclassify/index.html
For a closer, interactive play with some of the reports then please see our Shiny Application here:
I'll be breaking down these packages in the near future and discussing their applications for business use.
References:
Opmerkingen