Title: | Variable Table for Variable Documentation |
---|---|
Description: | Automatically generates HTML variable documentation including variable names, labels, classes, value labels (if applicable), value ranges, and summary statistics. See the vignette "vtable" for a package overview. |
Authors: | Nick Huntington-Klein [aut, cre] |
Maintainer: | Nick Huntington-Klein <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.4.7 |
Built: | 2024-11-16 06:21:57 UTC |
Source: | https://github.com/nickch-k/vtable |
This function calculates the number of values in a vector that are NA.
countNA(x)
countNA(x)
x |
A vector. |
This function just shorthand for sum(is.na(x))
, with a shorter name for reference in the vtable
or sumtable
summ
option.
x <- c(1, 1, NA, 2, 3, NA) countNA(x)
x <- c(1, 1, NA, 2, 3, NA) countNA(x)
This function takes a data frame or matrix with column names and outputs an HTML table version of that data frame.
dftoHTML( data, out = NA, file = NA, note = NA, note.align = "l", anchor = NA, col.width = NA, col.align = NA, row.names = FALSE, no.escape = NA )
dftoHTML( data, out = NA, file = NA, note = NA, note.align = "l", anchor = NA, col.width = NA, col.align = NA, row.names = FALSE, no.escape = NA )
data |
Data set; accepts any format with column names. |
out |
Determines where the completed table is sent. Set to |
file |
Saves the completed variable table file to HTML with this filepath. May be combined with any value of |
note |
Table note to go after the last row of the table. |
note.align |
Alignment of table note, l, r, or c. |
anchor |
Character variable to be used to set an |
col.width |
Vector of page-width percentages, on 0-100 scale, overriding default column widths in HTML table. Must have a number of elements equal to the number of columns in the resulting table. |
col.align |
Vector of 'left', 'right', 'center', etc. to be used with the HTML table text-align attribute in each column. If you want to get tricky, you can add a |
row.names |
Flag determining whether or not the row names should be included in the table. Defaults to |
no.escape |
Vector of column indices for which special characters should not be escaped (perhaps they include markup text of their own). |
This function is designed to feed HTML versions of variable tables to vtable()
, sumtable()
, and labeltable()
.
Multi-column cells are supported. Set the cell's contents to "content_MULTICOL_c_5"
where "content" is the content of the cell, "c" is the cell's alignment (l, c, r), and 5 is the number of columns to span. Then fill in the cells that need to be deleted to make room with "DELETECELL".
If the first column and row begins with the text "HEADERROW", then the first row will be put above the column names.
if(interactive()) { df <- data.frame(var1 = 1:4,var2=5:8,var3=c('A','B','C','D'), var4=as.factor(c('A','B','C','C')),var5=c(TRUE,TRUE,FALSE,FALSE)) dftoHTML(df,out="browser") }
if(interactive()) { df <- data.frame(var1 = 1:4,var2=5:8,var3=c('A','B','C','D'), var4=as.factor(c('A','B','C','C')),var5=c(TRUE,TRUE,FALSE,FALSE)) dftoHTML(df,out="browser") }
This function takes a data frame or matrix with column names and outputs a lightly-formatted LaTeX table version of that data frame.
dftoLaTeX( data, file = NA, fit.page = NA, frag = TRUE, title = NA, note = NA, note.align = "l", anchor = NA, align = NA, row.names = FALSE, no.escape = NA )
dftoLaTeX( data, file = NA, fit.page = NA, frag = TRUE, title = NA, note = NA, note.align = "l", anchor = NA, align = NA, row.names = FALSE, no.escape = NA )
data |
Data set; accepts any format with column names. |
file |
Saves the completed table to LaTeX with this filepath. |
fit.page |
uses a LaTeX resizebox to force the table to a certain width. Often |
frag |
Set to TRUE to produce only the LaTeX table itself, or FALSE to produce a fully buildable LaTeX. Defaults to TRUE. |
title |
Character variable with the title of the table. |
note |
Table note to go after the last row of the table. |
note.align |
Set the alignment for the multi-column table note. Usually "l", but if you have a long note you might want to set it with "p" |
anchor |
Character variable to be used to set a label tag for the table. |
align |
Character variable with standard LaTeX formatting for alignment, for example |
row.names |
Flag determining whether or not the row names should be included in the table. Defaults to |
no.escape |
Vector of column indices for which special characters should not be escaped (perhaps they include markup text of their own). |
This function is designed to feed LaTeX versions of variable tables to vtable()
, sumtable()
, and labeltable()
.
Multi-column cells are supported. Wrap the cell's contents in a multicolumn
tag as normal, and then fill in any cells that need to be deleted to make room for the multi-column cell with "DELETECELL". Or use the MULTICOL syntax of dftoHTML
, that works too.
If the first column and row begins with the text "HEADERROW", then the first row will be put above the column names.
df <- data.frame(var1 = 1:4,var2=5:8,var3=c('A','B','C','D'), var4=as.factor(c('A','B','C','C')),var5=c(TRUE,TRUE,FALSE,FALSE)) dftoLaTeX(df, align = 'ccccc')
df <- data.frame(var1 = 1:4,var2=5:8,var3=c('A','B','C','D'), var4=as.factor(c('A','B','C','C')),var5=c(TRUE,TRUE,FALSE,FALSE)) dftoLaTeX(df, align = 'ccccc')
This function takes a set of options for the format()
function and returns a function that itself calls format()
with those settings.
formatfunc( percent = FALSE, prefix = "", suffix = "", scale = 1, digits = NULL, nsmall = 0L, big.mark = "", trim = TRUE, scientific = FALSE, ... )
formatfunc( percent = FALSE, prefix = "", suffix = "", scale = 1, digits = NULL, nsmall = 0L, big.mark = "", trim = TRUE, scientific = FALSE, ... )
percent |
Whether to apply percentage formatting. Set to |
prefix |
A prefix to apply to the formatted number. For example, |
suffix |
A suffix to apply to the formatted number. If specified alongside |
scale |
A scalar value to be multiplied by all numbers prior to formatting. |
digits |
Number of significant digits. |
nsmall |
The minimum number of digits to the right of the decimal point. |
big.mark |
A character to mark thousands places, for example producing "1,000" instead of "1000". |
trim |
Whether numbers should be trimmed to their own size, rather than being right-justified to a common width. Unlike the actual |
scientific |
Whether numbers should be encoded in scientific format. Unlike the actual |
... |
Arguments to be passed to |
The only differences are:
1. scientific
is set to FALSE
by default, and trim
is set to TRUE
2. Passing a NA
value produces ''
instead of 'NA'
.
3. In addition to standard format()
options, it also accepts a percent
option to apply percentage formatting, and prefix
and suffix
options to apply prefixes or suffixes to formatted numbers.
4. Has an attribute 'big.mark'
storing the 'big.mark'
option chosen.
This is in the spirit of the label_
functions in the scales package, except that it uses format()
's focus on significant digits instead of fixed decimal places, which is good for numbers that range across multiple orders of magnitude, common in sumtable()
and vtable()
.
x <- c(1, 1000, .000235, 1298.255, NA) my.formatting.func = formatfunc(digits = 3, prefix = '$') my.formatting.func(x)
x <- c(1, 1000, .000235, 1298.255, NA) my.formatting.func = formatfunc(digits = 3, prefix = '$') my.formatting.func(x)
This function takes in two variables of equal length, the first of which is a categorical variable, and performs a test of independence between them. It returns a character string with the results of that test for putting in a table.
independence.test( x, y, w = NA, factor.test = NA, numeric.test = NA, star.cutoffs = c(0.01, 0.05, 0.1), star.markers = c("***", "**", "*"), digits = 3, fixed.digits = FALSE, format = "{name}={stat}{stars}", opts = list() )
independence.test( x, y, w = NA, factor.test = NA, numeric.test = NA, star.cutoffs = c(0.01, 0.05, 0.1), star.markers = c("***", "**", "*"), digits = 3, fixed.digits = FALSE, format = "{name}={stat}{stars}", opts = list() )
x |
A categorical variable. |
y |
A variable to test for independence with |
w |
A vector of weights to pass to the appropriate test. |
factor.test |
Used when |
numeric.test |
Used when |
star.cutoffs |
A numeric vector indicating the p-value cutoffs to use for reporting significance stars. Defaults to |
star.markers |
A character vector indicating the symbols to use to indicate significance cutoffs associated with |
digits |
Number of digits after the decimal to round the test statistic and p-value to. |
fixed.digits |
|
format |
The way in which the four elements returned by (or calculated after) the test - |
opts |
The options listed above, entered in named-list format. |
In an attempt (and perhaps an encouragement) to use this function in weird ways, and because it's not really expected to be used directly, input is not sanitized. Have fun!
data(mtcars) independence.test(mtcars$cyl,mtcars$mpg)
data(mtcars) independence.test(mtcars$cyl,mtcars$mpg)
This function takes a vector and checks if any information is lost by rounding to a certain number of digits.
is.round(x, digits = 0)
is.round(x, digits = 0)
x |
A vector. |
digits |
How many digits to round to. |
Returns TRUE
if rounding to digits
digits after the decimal can be done without losing information.
is.round(1:5) x <- c(1, 1.2, 1.23) is.round(x) is.round(x,digits=2)
is.round(1:5) x <- c(1, 1.2, 1.23) is.round(x) is.round(x,digits=2)
This function output a descriptive table listing, for each value of a given variable, either the label of that value, or all values of another variable associated with that value. The table is output either to the console or as an HTML file that can be viewed continuously while working with data.
labeltable( var, ..., out = NA, count = FALSE, percent = FALSE, file = NA, desc = NA, note = NA, note.align = NA, anchor = NA )
labeltable( var, ..., out = NA, count = FALSE, percent = FALSE, file = NA, desc = NA, note = NA, note.align = NA, anchor = NA )
var |
A vector. Label table will show, for each of the values of this variable, its label (if labels can be found with |
... |
As described above. If specified, will show the values of these variables, instead of the labels of var, even if labels can be found. |
out |
Determines where the completed table is sent. Set to |
count |
Set to |
percent |
Set to |
file |
Saves the completed variable table file to HTML with this filepath. May be combined with any value of |
desc |
Description of variable (or labeling system) to be included with the table. |
note |
Table note to go after the last row of the table. |
note.align |
Set the alignment for the multi-column table note. Usually "l", but if you have a long note in LaTeX you might want to set it with "p" |
anchor |
Character variable to be used to set an anchor link in HTML tables, or a label tag in LaTeX. |
Outputting the label table as a help file will make it easy to search through value labels, or to see the correspondence between the values of one variable and the values of another.
Labels that are not in the data will also be reported in the table.
if(interactive()){ #Input a single labelled variable to see a table relating values to labels. #Values not present in the data will be included in the table but moved to the end. library(sjlabelled) data(efc) labeltable(efc$e15relat) #Include multiple variables to see, for each value of the first variable, #each value of the others present in the data. data(efc) labeltable(efc$e15relat,efc$e16sex,efc$e42dep) #Commonly, the multi-variable version might be used to recover the original #values of encoded variables data(USJudgeRatings) USJudgeRatings$Judge <- row.names(USJudgeRatings) USJudgeRatings$JudgeID <- as.numeric(as.factor(USJudgeRatings$Judge)) labeltable(USJudgeRatings$JudgeID,USJudgeRatings$Judge) }
if(interactive()){ #Input a single labelled variable to see a table relating values to labels. #Values not present in the data will be included in the table but moved to the end. library(sjlabelled) data(efc) labeltable(efc$e15relat) #Include multiple variables to see, for each value of the first variable, #each value of the others present in the data. data(efc) labeltable(efc$e15relat,efc$e16sex,efc$e42dep) #Commonly, the multi-variable version might be used to recover the original #values of encoded variables data(USJudgeRatings) USJudgeRatings$Judge <- row.names(USJudgeRatings) USJudgeRatings$JudgeID <- as.numeric(as.factor(USJudgeRatings$Judge)) labeltable(USJudgeRatings$JudgeID,USJudgeRatings$Judge) }
This function calculates the number of values in a vector that are not NA.
notNA(x, big.mark = NULL, scientific = FALSE, ...)
notNA(x, big.mark = NULL, scientific = FALSE, ...)
x |
A vector. |
big.mark |
Argument to pass to |
scientific |
Argument to pass to |
... |
Other arguments to pass to |
This function just shorthand for sum(!is.na(x))
, with a shorter name for reference in the vtable
or sumtable
summ
option.
If big.mark
is specified, will return a formatted string instead of a number, where the formatting is based on format(x, big.mark = big.mark, scientific = FALSE, ...)
.
x <- c(1, 1, NA, 2, 3, NA) notNA(x) notNA(1:10000, big.mark = ',')
x <- c(1, 1, NA, 2, 3, NA) notNA(x) notNA(1:10000, big.mark = ',')
This function takes a vector and returns the number of unique values in that vector.
nuniq(x)
nuniq(x)
x |
A vector. |
This function is just shorthand for length(unique(x))
, with a shorter name for reference in the vtable
or sumtable
summ
option.
x <- c(1, 1, 2, 3, 4, 4, 4) nuniq(x)
x <- c(1, 1, 2, 3, 4, 4, 4) nuniq(x)
This function calculates 100 percentiles of a vector and returns all of them.
pctile(x)
pctile(x)
x |
A vector. |
This function just shorthand for quantile(x,1:100/100)
, with a shorter name for reference in the vtable
or sumtable
summ
option, and which works with sumtable
summ.names
styling.
x <- 1:500 pctile(x)[50] quantile(x,.5) median(x)
x <- 1:500 pctile(x)[50] quantile(x,.5) median(x)
This function calculates the proportion of values in a vector that are NA.
propNA(x)
propNA(x)
x |
A vector. |
This function just shorthand for mean(is.na(x))
, with a shorter name for reference in the vtable
or sumtable
summ
option.
x <- c(1, 1, NA, 2, 3, NA) propNA(x)
x <- c(1, 1, NA, 2, 3, NA) propNA(x)
This function will output a summary statistics variable table either to the console or as an HTML file that can be viewed continuously while working with data, or sent to file for use elsewhere. st()
is the same thing but requires fewer key presses to type.
sumtable( data, vars = NA, out = NA, file = NA, summ = NA, summ.names = NA, add.median = FALSE, group = NA, group.long = FALSE, group.test = FALSE, group.weights = NA, group.weights.sd.type = "frequency", col.breaks = NA, digits = 2, fixed.digits = FALSE, numformat = formatfunc(digits = digits, big.mark = ""), skip.format = c("notNA(x)", "propNA(x)", "countNA(x)", obs.function), factor.percent = TRUE, factor.counts = TRUE, factor.numeric = FALSE, logical.numeric = FALSE, logical.labels = c("No", "Yes"), labels = NA, title = "Summary Statistics", note = NA, anchor = NA, col.width = NA, col.align = NA, align = NA, note.align = "l", fit.page = "\\textwidth", simple.kable = FALSE, obs.function = NA, opts = list() ) st( data, vars = NA, out = NA, file = NA, summ = NA, summ.names = NA, add.median = FALSE, group = NA, group.long = FALSE, group.test = FALSE, group.weights = NA, group.weights.sd.type = "frequency", col.breaks = NA, digits = 2, fixed.digits = FALSE, numformat = formatfunc(digits = digits, big.mark = ""), skip.format = c("notNA(x)", "propNA(x)", "countNA(x)", obs.function), factor.percent = TRUE, factor.counts = TRUE, factor.numeric = FALSE, logical.numeric = FALSE, logical.labels = c("No", "Yes"), labels = NA, title = "Summary Statistics", note = NA, anchor = NA, col.width = NA, col.align = NA, align = NA, note.align = "l", fit.page = "\\textwidth", simple.kable = FALSE, obs.function = NA, opts = list() )
sumtable( data, vars = NA, out = NA, file = NA, summ = NA, summ.names = NA, add.median = FALSE, group = NA, group.long = FALSE, group.test = FALSE, group.weights = NA, group.weights.sd.type = "frequency", col.breaks = NA, digits = 2, fixed.digits = FALSE, numformat = formatfunc(digits = digits, big.mark = ""), skip.format = c("notNA(x)", "propNA(x)", "countNA(x)", obs.function), factor.percent = TRUE, factor.counts = TRUE, factor.numeric = FALSE, logical.numeric = FALSE, logical.labels = c("No", "Yes"), labels = NA, title = "Summary Statistics", note = NA, anchor = NA, col.width = NA, col.align = NA, align = NA, note.align = "l", fit.page = "\\textwidth", simple.kable = FALSE, obs.function = NA, opts = list() ) st( data, vars = NA, out = NA, file = NA, summ = NA, summ.names = NA, add.median = FALSE, group = NA, group.long = FALSE, group.test = FALSE, group.weights = NA, group.weights.sd.type = "frequency", col.breaks = NA, digits = 2, fixed.digits = FALSE, numformat = formatfunc(digits = digits, big.mark = ""), skip.format = c("notNA(x)", "propNA(x)", "countNA(x)", obs.function), factor.percent = TRUE, factor.counts = TRUE, factor.numeric = FALSE, logical.numeric = FALSE, logical.labels = c("No", "Yes"), labels = NA, title = "Summary Statistics", note = NA, anchor = NA, col.width = NA, col.align = NA, align = NA, note.align = "l", fit.page = "\\textwidth", simple.kable = FALSE, obs.function = NA, opts = list() )
data |
Data set; accepts any format with column names. |
vars |
Character vector of column names to include, in the order you'd like them included. Defaults to all numeric, factor, and logical variables, plus any character variables with six or fewer unique values. You can include strings that aren't columns in the data (including blanks) - these will create rows that are blank except for the string (left-aligned), for spacers or subtitles. |
out |
Determines where the completed table is sent. Set to |
file |
Saves the completed summary table file to file with this filepath. May be combined with any value of |
summ |
Character vector of summary statistics to include for numeric and logical variables, in the form |
summ.names |
Character vector of names for the summary statistics included. If |
add.median |
Adds |
group |
Character variable with the name of a column in the data set that statistics are to be calculated over. Value labels will be used if found for numeric variables. Changes the default |
group.long |
By default, if |
group.test |
Set to |
group.weights |
THIS OPTION DOES NOT AUTOMATICALLY WEIGHT ALL CALCULATIONS. This is mostly to be used with |
group.weights.sd.type |
If |
col.breaks |
Numeric vector indicating the variables (or number of elements of |
digits |
Number of digits after the decimal place to report. Set to a single number for consistent digits, or a vector the same length as |
fixed.digits |
Deprecated; currently only works if |
numformat |
A function that takes a numeric input and produces labeled output, which you might construct using the |
skip.format |
Set of functions in |
factor.percent |
Set to |
factor.counts |
Set to |
factor.numeric |
By default, factor variable dummies basically ignore the |
logical.numeric |
By default, logical variables are treated as factors with |
logical.labels |
When turning logicals into factors, use these labels for |
labels |
Variable labels. labels will accept four formats: (1) A vector of the same length as the number of variables in the data that will be included in the table (tricky to use if many are being dropped, also won't work for your |
title |
Character variable with the title of the table. |
note |
Table note to go after the last row of the table. Will follow significance star note if |
anchor |
Character variable to be used to set an anchor link in HTML tables, or a label tag in LaTeX. |
col.width |
Vector of page-width percentages, on 0-100 scale, overriding default column widths in an HTML table. Must have a number of elements equal to the number of columns in the resulting table. |
col.align |
For HTML output, a character vector indicating the HTML |
align |
For LaTeX output, string indicating the alignment of each column. Use standard LaTeX syntax (i.e. |
note.align |
For LaTeX output, set the alignment for the multi-column table note. Usually "l", but if you have a long note in LaTeX you might want to set it with "p" |
fit.page |
For LaTeX output, uses a resizebox to force the table to a certain width. Set to |
simple.kable |
For |
obs.function |
The function to use (and, potentially, format) to count the number of observations for the N column. This should take a vector and return a single number or string. Uses the same string formatting as |
opts |
The same |
There are many, many functions in R that will produce a summary statisics table for you. So why use sumtable()
? sumtable()
serves two main purposes:
(1) In the same spirit as vtable()
, it makes it easy to view the summary statistics as you work, either in the Viewer pane or in a browser window.
(2) sumtable()
is designed to have nice defaults and is not really intended for deep customization. It's got lots of options, sure, but they're only intended to go so far. So you can have a summary statistics table without much work.
Keeping with point (2), sumtable()
is designed for use by people who want the kind of table that sumtable()
produces, which is itself heavily influenced by the kinds of summary statistics tables you often see in economics papers. In that regard it is most similar to stargazer::stargazer()
except that it can handle tibbles, factor variables, grouping, and produce multicolumn tables, or summarytools::dfSummary()
or skimr::skim()
except that it is easier to export with nice formatting. If you want a lot of control over your summary statistics table, check out the packages gtsummary, arsenal, qwraps2, or Amisc, and about a million more.
If you would like to include a sumtable
in an RMarkdown document, it should just work! If you leave out
blank, it will default to a nicely-formatted knitr::kable()
, although this will drop some formatting elements like multi-column cells (or do out="kable"
to get an unformatted kable
that you can format yourself). If you prefer the vtable
package formatting, then use out="latex"
if outputting to LaTeX or out="htmlreturn"
for HTML, both with results="asis"
in the code chunk. Alternately, in HTML, you can use the file
option to write to file and use a <iframe>
to include it.
# Examples are only run interactively because they open HTML pages in Viewer or a browser. if (interactive()) { data(iris) # Sumtable handles both numeric and factor variables st(iris) # Output to LaTeX as well for easy integration # with RMarkdown, or \input{} into your LaTeX docs # (specify file too to save the result) st(iris, out = 'latex') # Summary statistics by group iris$SL.above.median <- iris$Sepal.Length > median(iris$Sepal.Length) st(iris, group = 'SL.above.median') # Add a group test, or report by-group in "long" format st(iris, group = 'SL.above.median', group.test = TRUE) st(iris, group = 'SL.above.median', group.long = TRUE) # Going all out! Adding variable labels with labels, # spacers and variable "category" titles with vars, # Changing the presentation of the factor variable, # and putting the factor in its own column with col.breaks var.labs <- data.frame(var = c('SL.above.median','Sepal.Length', 'Sepal.Width','Petal.Length', 'Petal.Width'), labels = c('Above-median Sepal Length','Sepal Length', 'Sepal Width','Petal Length', 'Petal Width')) st(iris, labels = var.labs, vars = c('Sepal Variables','SL.above.median','Sepal.Length','Sepal.Width', 'Petal Variables','Petal.Length','Petal.Width', 'Species'), factor.percent = FALSE, col.breaks = 7) # Format the results # use rep so there are enough observations to see the comma separators irisrep = do.call('rbind', replicate(100, iris, simplify = FALSE)) # Comma separator for thousands, including for N. st(irisrep, numformat = 'comma') # Dollar formatting for sepal.width, decimal (1.000,00) formatting for the rest st(iris, numformat = c('decimal','Sepal.Width' = '$')) # Custom formatting throughout, note the big.mark = ',' will also be picked up by N st(irisrep, numformat = formatfunc(digits = 2, nsmall = 2, big.mark = ',')) }
# Examples are only run interactively because they open HTML pages in Viewer or a browser. if (interactive()) { data(iris) # Sumtable handles both numeric and factor variables st(iris) # Output to LaTeX as well for easy integration # with RMarkdown, or \input{} into your LaTeX docs # (specify file too to save the result) st(iris, out = 'latex') # Summary statistics by group iris$SL.above.median <- iris$Sepal.Length > median(iris$Sepal.Length) st(iris, group = 'SL.above.median') # Add a group test, or report by-group in "long" format st(iris, group = 'SL.above.median', group.test = TRUE) st(iris, group = 'SL.above.median', group.long = TRUE) # Going all out! Adding variable labels with labels, # spacers and variable "category" titles with vars, # Changing the presentation of the factor variable, # and putting the factor in its own column with col.breaks var.labs <- data.frame(var = c('SL.above.median','Sepal.Length', 'Sepal.Width','Petal.Length', 'Petal.Width'), labels = c('Above-median Sepal Length','Sepal Length', 'Sepal Width','Petal Length', 'Petal Width')) st(iris, labels = var.labs, vars = c('Sepal Variables','SL.above.median','Sepal.Length','Sepal.Width', 'Petal Variables','Petal.Length','Petal.Width', 'Species'), factor.percent = FALSE, col.breaks = 7) # Format the results # use rep so there are enough observations to see the comma separators irisrep = do.call('rbind', replicate(100, iris, simplify = FALSE)) # Comma separator for thousands, including for N. st(irisrep, numformat = 'comma') # Dollar formatting for sepal.width, decimal (1.000,00) formatting for the rest st(iris, numformat = c('decimal','Sepal.Width' = '$')) # Custom formatting throughout, note the big.mark = ',' will also be picked up by N st(irisrep, numformat = formatfunc(digits = 2, nsmall = 2, big.mark = ',')) }
This function will output a descriptive variable table either to the console or as an HTML file that can be viewed continuously while working with data. vt()
is the same thing but requires fewer key presses to type.
vtable( data, out = NA, file = NA, labels = NA, class = TRUE, values = TRUE, missing = FALSE, index = FALSE, factor.limit = 5, char.values = FALSE, data.title = NA, desc = NA, note = NA, note.align = "l", anchor = NA, col.width = NA, col.align = NA, align = NA, fit.page = NA, summ = NA, lush = FALSE, opts = list() ) vt( data, out = NA, file = NA, labels = NA, class = TRUE, values = TRUE, missing = FALSE, index = FALSE, factor.limit = 5, char.values = FALSE, data.title = NA, desc = NA, note = NA, note.align = "l", anchor = NA, col.width = NA, col.align = NA, align = NA, fit.page = NA, summ = NA, lush = FALSE, opts = list() )
vtable( data, out = NA, file = NA, labels = NA, class = TRUE, values = TRUE, missing = FALSE, index = FALSE, factor.limit = 5, char.values = FALSE, data.title = NA, desc = NA, note = NA, note.align = "l", anchor = NA, col.width = NA, col.align = NA, align = NA, fit.page = NA, summ = NA, lush = FALSE, opts = list() ) vt( data, out = NA, file = NA, labels = NA, class = TRUE, values = TRUE, missing = FALSE, index = FALSE, factor.limit = 5, char.values = FALSE, data.title = NA, desc = NA, note = NA, note.align = "l", anchor = NA, col.width = NA, col.align = NA, align = NA, fit.page = NA, summ = NA, lush = FALSE, opts = list() )
data |
Data set; accepts any format with column names. If variable labels are set with the haven package, |
out |
Determines where the completed table is sent. Set to |
file |
Saves the completed variable table file to HTML or .tex with this filepath. May be combined with any value of |
labels |
Variable labels. labels will accept three formats: (1) A vector of the same length as the number of variables in the data, in the same order as the variables in the data set, (2) A matrix or data frame with two columns and more than one row, where the first column contains variable names (in any order) and the second contains labels, or (3) A matrix or data frame where the column names (in any order) contain variable names and the first row contains labels. Setting the labels parameter will override any variable labels already in the data. Set to |
class |
Set to |
values |
Set to |
missing |
Set to |
index |
Set to |
factor.limit |
Sets maximum number of factors that will be included if |
char.values |
Set to |
data.title |
Character variable with the title of the dataset. |
desc |
Character variable offering a brief description of the dataset itself. This will by default include information on the number of observations and the number of columns. To remove this, set |
note |
Table note to go after the last row of the table. |
note.align |
Set the alignment for the multi-column table note. Usually "l", but if you have a long note in LaTeX you might want to set it with "p" |
anchor |
Character variable to be used to set an anchor link in HTML tables, or a label tag in LaTeX. |
col.width |
Vector of page-width percentages, on 0-100 scale, overriding default column widths in HTML table. Must have a number of elements equal to the number of columns in the resulting table. |
col.align |
For HTML output, a character vector indicating the HTML |
align |
For LaTeX output, string indicating the alignment of each column. Use standard LaTeX syntax (i.e. |
fit.page |
For LaTeX output, uses a resizebox to force the table to a certain width. Set to |
summ |
Character vector of summary statistics to include for numeric and logical variables, in the form |
lush |
Set to |
opts |
The same |
Outputting the variable table as a help file will make it easy to search through variable names or labels, or to refer to information about the variables easily.
This function is in a similar spirit to promptData()
, but focuses on variable documentation rather than dataset documentation.
If you would like to include a vtable
in an RMarkdown document, it should just work! If you leave out
blank, it will default to a nicely-formatted knitr::kable()
, although this will drop some formatting elements like multi-column cells (or do out="kable"
to get an unformatted kable
that you can format yourself). If you prefer the vtable
package formatting, then use out="latex"
if outputting to LaTeX or out="htmlreturn"
for HTML, both with results="asis"
in the code chunk. Alternately, in HTML, you can use the file
option to write to file and use a <iframe>
to include it.
if(interactive()){ df <- data.frame(var1 = 1:4,var2=5:8,var3=c('A','B','C','D'), var4=as.factor(c('A','B','C','C')),var5=c(TRUE,TRUE,FALSE,FALSE)) #Demonstrating different options: vtable(df,labels=c('Number 1','Number 2','Some Letters', 'Some Labels','You Good?')) vtable(subset(df,select=c(1,2,5)), labels=c('Number 1','Number 2','You Good?'),class=FALSE,values=FALSE) vtable(subset(df,select=c('var1','var4')), labels=c('Number 1','Some Labels'), factor.limit=1,col.width=c(10,10,40,35)) #Different methods of applying variable labels: labelsmethod2 <- data.frame(var1='Number 1',var2='Number 2', var3='Some Letters',var4='Some Labels',var5='You Good?') vtable(df,labels=labelsmethod2) labelsmethod3 <- data.frame(a =c("var1","var2","var3","var4","var5"), b=c('Number 1','Number 2','Some Letters','Some Labels','You Good?')) vtable(df,labels=labelsmethod3) #Using value labels and pre-labeled data: library(sjlabelled) df <- set_label(df,c('Number 1','Number 2','Some Letters', 'Some Labels','You Good?')) df$var1 <- set_labels(df$var1,labels=c('A little','Some more', 'Even more','A lot')) vtable(df) #efc is data with embedded variable and value labels from the sjlabelled package library(sjlabelled) data(efc) vtable(efc) #Displaying the values of a character vector data(USJudgeRatings) USJudgeRatings$Judge <- row.names(USJudgeRatings) vtable(USJudgeRatings,char.values=c('Judge')) #Adding summary statistics for variable mean and proportion of data that is missing. vtable(efc,summ=c('mean(x)','propNA(x)')) }
if(interactive()){ df <- data.frame(var1 = 1:4,var2=5:8,var3=c('A','B','C','D'), var4=as.factor(c('A','B','C','C')),var5=c(TRUE,TRUE,FALSE,FALSE)) #Demonstrating different options: vtable(df,labels=c('Number 1','Number 2','Some Letters', 'Some Labels','You Good?')) vtable(subset(df,select=c(1,2,5)), labels=c('Number 1','Number 2','You Good?'),class=FALSE,values=FALSE) vtable(subset(df,select=c('var1','var4')), labels=c('Number 1','Some Labels'), factor.limit=1,col.width=c(10,10,40,35)) #Different methods of applying variable labels: labelsmethod2 <- data.frame(var1='Number 1',var2='Number 2', var3='Some Letters',var4='Some Labels',var5='You Good?') vtable(df,labels=labelsmethod2) labelsmethod3 <- data.frame(a =c("var1","var2","var3","var4","var5"), b=c('Number 1','Number 2','Some Letters','Some Labels','You Good?')) vtable(df,labels=labelsmethod3) #Using value labels and pre-labeled data: library(sjlabelled) df <- set_label(df,c('Number 1','Number 2','Some Letters', 'Some Labels','You Good?')) df$var1 <- set_labels(df$var1,labels=c('A little','Some more', 'Even more','A lot')) vtable(df) #efc is data with embedded variable and value labels from the sjlabelled package library(sjlabelled) data(efc) vtable(efc) #Displaying the values of a character vector data(USJudgeRatings) USJudgeRatings$Judge <- row.names(USJudgeRatings) vtable(USJudgeRatings,char.values=c('Judge')) #Adding summary statistics for variable mean and proportion of data that is missing. vtable(efc,summ=c('mean(x)','propNA(x)')) }
This is a basic weighted standard deviation function, mainly for internal use with sumtable
. For a more fully-fledged weighted SD function, see Hmisc::wtd.var
, although it uses a slightly differend degree-of-freedom correction.
weighted.sd(x, w, na.rm = TRUE, type = "frequency")
weighted.sd(x, w, na.rm = TRUE, type = "frequency")
x |
A numeric vector. |
w |
A vector of weights. Negative weights are not allowed. |
na.rm |
Set to |
type |
The type of weights to use. The default is |
x <- c(1, 1, 2, 3, 4, 4, 4) w <- c(4, 1, 3, 7, 0, 2, 5) weighted.sd(x, w)
x <- c(1, 1, 2, 3, 4, 4, 4) w <- c(4, 1, 3, 7, 0, 2, 5) weighted.sd(x, w)