| aggregate {stats} | R Documentation |
Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.
aggregate(x, ...)
## Default S3 method:
aggregate(x, ...)
## S3 method for class 'data.frame':
aggregate(x, by, FUN, ...)
## S3 method for class 'ts':
aggregate(x, nfrequency = 1, FUN = sum, ndeltat = 1,
ts.eps = getOption("ts.eps"), ...)
x |
an R object. |
by |
a list of grouping elements, each as long as the variables
in x. Names for the grouping variables are provided if
they are not given. The elements of the list will be coerced to
factors (if they are not already factors). |
FUN |
a scalar function to compute the summary statistics which can be applied to all data subsets. |
nfrequency |
new number of observations per unit of time; must
be a divisor of the frequency of x. |
ndeltat |
new fraction of the sampling period between
successive observations; must be a divisor of the sampling
interval of x. |
ts.eps |
tolerance used to decide if nfrequency is a
sub-multiple of the original frequency. |
... |
further arguments passed to or used by methods. |
aggregate is a generic function with methods for data frames
and time series.
The default method aggregate.default uses the time series
method if x is a time series, and otherwise coerces x
to a data frame and calls the data frame method.
aggregate.data.frame is the data frame method. If x
is not a data frame, it is coerced to one. Then, each of the
variables (columns) in x is split into subsets of cases
(rows) of identical combinations of the components of by, and
FUN is applied to each such subset with further arguments in
... passed to it.
(I.e., tapply(VAR, by, FUN, ..., simplify = FALSE) is done
for each variable VAR in x, conveniently wrapped into
one call to lapply().)
Empty subsets are removed, and the result is reformatted into a data
frame containing the variables in by and x. The ones
arising from by contain the unique combinations of grouping
values used for determining the subsets, and the ones arising from
x the corresponding summary statistics for the subset of the
respective variables in x.
aggregate.ts is the time series method. If x is not a
time series, it is coerced to one. Then, the variables in x
are split into appropriate blocks of length
frequency(x) / nfrequency, and FUN is applied to each
such block, with further (named) arguments in ... passed to
it. The result returned is a time series with frequency
nfrequency holding the aggregated values.
Kurt Hornik
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
## Compute the averages for the variables in 'state.x77', grouped
## according to the region (Northeast, South, North Central, West) that
## each state belongs to.
aggregate(state.x77, list(Region = state.region), mean)
## Compute the averages according to region and the occurrence of more
## than 130 days of frost.
aggregate(state.x77,
list(Region = state.region,
Cold = state.x77[,"Frost"] > 130),
mean)
## (Note that no state in 'South' is THAT cold.)
## Compute the average annual approval ratings for American presidents.
aggregate(presidents, nf = 1, FUN = mean)
## Give the summer less weight.
aggregate(presidents, nf = 1, FUN = weighted.mean, w = c(1, 1, 0.5, 1))