Logo

Data Learning Center Statistical Guides

common-R-packages.knit

Common R Packages

Data Wrangling and Manipulation

Package Main use Sample functions
data.table high-speed data wrangling fread() – read data from a flat file such as .csv or .tsv.
dcast() and melt() – reshape between long and wide formats.
join() - combine data tables.
dplyr data manipulation mutate() – adds new variables using functions on existing variables.
select() – picks variables (columns) based on their name.
filter() picks rows based on their values.
summarize() – reduced multiple values down to a single summary.
arrange() – changes the ordering of rows.
forcats handling categorical variables fct_reorder() – reorders a factor by another variable.
fct_relevel() – changes the order of factors manually.
lubridate tools to work with date-time data now() – current time in time zone.
magrittr useful operators for easier more readable code %>% – pipe the left-hand side values forward into expressions on the right hand side of the operator.
%<>% - pipe and assign a data frame in place.
purrr tools for working with vectors and functions map() – allows you to replace many ‘for’ loops with code that is both more succinct and easier to read.
readr read data files (csv, tsv, etc.) in tidy format read_csv() – reads .csv files and loads them as tibbles.
readxl read Excel files in tidy format read_excel() – read a .xls or .xlsx file.
excel_sheets() - return a vector of sheet names.
stringr working with strings str_replace() – replaces matching text in a string with new text.
str_extract() – extracts matching text from a string.
str_split() – splits strings into multiple strings.
tibble data classification and handling tibble() – constructs a data frame with special behaviors, such as enhanced printing.
tidyr data cleaning (creating tidy data) pivot_longer() and pivot_wider() – convert between long and wide formats.
drop_na() – removes rows with missing values.



Data Visualization

Package Main use Sample functions
ggplot2 drawing figures ggplot() – system for declaratively creating graphics, based on “The Grammar of Graphics.”
gridExtra working with graphical objects (grobs) on a grid arrangeGrob() - arrange multiple grobs on a page.
kableExtra builds on the knitr package to construct complex and customizable tables kable() - create tables in LaTeX, HTML, Markdown, and reStructuredText.
xtable formatting tables into LaTeX and HTML xtable() - convert an R object into an xtable object that can be printed as LaTeX or HTML.



Statistical Analysis

Package Main use Sample functions
car expands statistical toolset for regression and analysis of variance models Anova() – calculates type 2 or type 3 sum of square tables.
vif() – calculates variance inflation factors to assess multicollinearity.
caret tools for predictive modeling train() – fits predictive models over different tuning parameters
emmeans
multcomp
tools for multiple comparison testing emmeans() – calculates estimated marginal means for specified factors or factor combinations in a linear model; and optionally, comparisons or contrasts among them.
glht() – general linear hypotheses and multiple comparisons for parametric models, including generalized linear models, linear mixed effects models, and survival models.
Hmisc useful functions for data analysis and statistics Cs() – creates character strings from unquoted names.
describe() - concise statistical description of a vector, matrix, data frame, or formula.
lme4
nlme
linear and non-linear mixed effect modeling lmer() – fits linear mixed models.
glmer() – fits generalized linear mixed models.
nlme() – fits non-linear mixed models.
rstatix pipe-friendly framework for performing basic statistical tests adjust_pvalue() – adds and adjusted p-value into a data frame.
add_significance() – adds p-value significance symbols to a data frame.
DLC_statistical_guides