Type: | Package |
Title: | High Dimensional Time Series Analysis Tools |
Version: | 1.0.5-1 |
Date: | 2025-01-25 |
Author: | Jinyuan Chang [aut], Jing He [aut], Chen Lin [aut, cre], Qiwei Yao [aut] |
Maintainer: | Chen Lin <linchen@smail.swufe.edu.cn> |
Description: | An implementation for high-dimensional time series analysis methods, including factor model for vector time series proposed by Lam and Yao (2012) <doi:10.1214/12-AOS970> and Chang, Guo and Yao (2015) <doi:10.1016/j.jeconom.2015.03.024>, martingale difference test proposed by Chang, Jiang and Shao (2023) <doi:10.1016/j.jeconom.2022.09.001>, principal component analysis for vector time series proposed by Chang, Guo and Yao (2018) <doi:10.1214/17-AOS1613>, cointegration analysis proposed by Zhang, Robinson and Yao (2019) <doi:10.1080/01621459.2018.1458620>, unit root test proposed by Chang, Cheng and Yao (2022) <doi:10.1093/biomet/asab034>, white noise test proposed by Chang, Yao and Zhou (2017) <doi:10.1093/biomet/asw066>, CP-decomposition for matrix time series proposed by Chang et al. (2023) <doi:10.1093/jrsssb/qkac011> and Chang et al. (2024) <doi:10.48550/arXiv.2410.05634>, and statistical inference for spectral density matrix proposed by Chang et al. (2022) <doi:10.48550/arXiv.2212.13686>. |
License: | GPL-3 |
Depends: | R (≥ 3.5.0) |
Imports: | stats, Rcpp, clime, sandwich, methods, MASS, geigen, jointDiag, vars, forecast |
LinkingTo: | RcppEigen, Rcpp |
Suggests: | knitr |
NeedsCompilation: | yes |
RoxygenNote: | 7.3.2 |
Encoding: | UTF-8 |
URL: | https://github.com/Linc2021/HDTSA |
BugReports: | https://github.com/Linc2021/HDTSA/issues |
Repository: | CRAN |
Packaged: | 2025-01-27 08:05:22 UTC; linchen |
Date/Publication: | 2025-01-28 04:00:06 UTC |
HDTSA: High Dimensional Time Series Analysis Tools
Description
The purpose of HDTSA
is to address a range of high-dimensional time
series problems, which includes solutions to a series of statistical issues,
primarily comprising: Procedures for high-dimensional time series analysis
including factor analysis proposed by Lam and Yao (2012)
<doi:10.1214/12-AOS970> and Chang, Guo and Yao (2015)
<doi:10.1016/j.jeconom.2015.03.024>,martingale difference test proposed by
Chang, Jiang and Shao (2022) <doi:10.1016/j.jeconom.2022.09.001> in press,
principal component analysis proposed by Chang, Guo and Yao (2018)
<doi:10.1214/17-AOS1613>, identifying cointegration proposed by Zhang,
Robinson and Yao (2019) <doi:10.1080/01621459.2018.1458620>,
unit root test proposed by Chang, Cheng and Yao (2021)
<doi:10.1093/biomet/asab034>, white noise test proposed by Chang, Yao and
Zhou (2017) <doi:10.1093/biomet/asw066>, CP-decomposition for high-dimensional
matrix time series proposed by Chang, He, Yang and Yao (2023)
<doi:10.1093/jrsssb/qkac011> and Chang, Du, Huang and Yao (2024+), and
Statistical inference for high-dimensional spectral density matrix porposed
by Chang, Jiang, McElroy and Shao (2023) <doi:10.48550/arXiv.2212.13686>.
Author(s)
Chen lin, Jinyuan Chang, Qiwei Yao Maintainer: Chen lin<linchen@smail.swufe.edu.cn>
See Also
Useful links:
Estimating the matrix time series CP-factor model
Description
CP_MTS()
deals with the estimation of the CP-factor model for matrix time series:
{\bf{Y}}_t = {\bf A \bf X}_t{\bf B}' +
{\boldsymbol{\epsilon}}_t,
where {\bf X}_t = {\rm diag}(x_{t,1},\ldots,x_{t,d})
is a d \times d
unobservable diagonal matrix, {\boldsymbol{\epsilon}}_t
is a p \times q
matrix white noise, {\bf A}
and {\bf B}
are, respectively, p
\times d
and q \times d
unknown constant matrices with their columns being
unit vectors, and 1\leq d < \min(p,q)
is an unknown integer.
Let {\rm rank}(\mathbf{A}) = d_1
and {\rm rank}(\mathbf{B}) = d_2
with some unknown d_1,d_2\leq d
.
This function aims to estimate d, d_1, d_2
and the loading
matrices {\bf A}
and {\bf B}
using the methods proposed in Chang
et al. (2023) and Chang et al. (2024).
Usage
CP_MTS(
Y,
xi = NULL,
Rank = NULL,
lag.k = 20,
lag.ktilde = 10,
method = c("CP.Direct", "CP.Refined", "CP.Unified"),
thresh1 = FALSE,
thresh2 = FALSE,
thresh3 = FALSE,
delta1 = 2 * sqrt(log(dim(Y)[2] * dim(Y)[3])/dim(Y)[1]),
delta2 = delta1,
delta3 = delta1
)
Arguments
Y |
An |
xi |
An |
Rank |
A list containing the following components: |
lag.k |
The time lag
where |
lag.ktilde |
The time lag |
method |
A string indicating which CP-decomposition method is used. Available options include:
|
thresh1 |
Logical. If |
thresh2 |
Logical. If |
thresh3 |
Logical. If |
delta1 |
The value of the threshold level |
delta2 |
The value of the threshold level |
delta3 |
The value of the threshold level |
Details
All three CP-decomposition methods involve the estimation of the autocovariance of
{\bf Y}_t
and \xi_t
at lag k
, which is defined as follows:
\hat{\bf \Sigma}_{k} = T_{\delta_1}\{\hat{\boldsymbol{\Sigma}}_{\mathbf{Y},
\xi}(k)\}\ \ {\rm with}\ \ \hat{\boldsymbol{\Sigma}}_{\mathbf{Y}, \xi}(k) = \frac{1}{n-k}
\sum_{t=k+1}^n(\mathbf{Y}_t-\bar{\mathbf{Y}})(\xi_{t-k}-\bar{\xi})\,,
where \bar{\bf Y} = n^{-1}\sum_{t=1}^n {\bf Y}_t
, \bar{\xi}=n^{-1}\sum_{t=1}^n \xi_t
and T_{\delta_1}(\cdot)
is a threshold operator defined as
T_{\delta_1}({\bf W}) = \{w_{i,j}1(|w_{i,j}|\geq \delta_1)\}
for any matrix
{\bf W}=(w_{i,j})
, with the threshold level \delta_1 \geq 0
and 1(\cdot)
representing the indicator function. Chang et al. (2023) and Chang et al. (2024) suggest to choose
\delta_1 = 0
when p, q
are fixed and \delta_1>0
when pq \gg n
.
The refined estimation method involves
\check{\bf \Sigma}_{k} =
T_{\delta_2}\{\hat{\mathbf{\Sigma}}_{\check{\mathbf{Y}}}(k)\}\ \ {\rm with}
\ \ \hat{\mathbf{\Sigma}}_{\check{\mathbf{Y}}}(k)=\frac{1}{n-k}
\sum_{t=k+1}^n(\mathbf{Y}_t-\bar{\mathbf{Y}}) \otimes {\rm vec}
(\mathbf{Y}_{t-k}-\bar{\mathbf{Y}})\,,
where T_{\delta_2}(\cdot)
is a threshold operator with the threshold level
\delta_2 \geq 0
, and {\rm vec}(\cdot)
is a vecterization operator
with {\rm vec}({\bf H})
being the (m_1m_2)\times 1
vector obtained by stacking
the columns of the m_1 \times m_2
matrix {\bf H}
. See Section 3.2.2 of Chang
et al. (2023) for details.
The unified estimation method involves
\vec{\bf \Sigma}_{k}=
T_{\delta_3}\{\hat{\boldsymbol{\Sigma}}_{\vec{\mathbf{Y}}}(k)\}
\ \ {\rm with}\ \ \hat{\boldsymbol{\Sigma}}_{\vec{\mathbf{Y}}}(k)=\frac{1}{n-k}
\sum_{t=k+1}^n{\rm vec}({\mathbf{Y}}_t-\bar{\mathbf{Y}})\{{\rm vec}
(\mathbf{Y}_{t-k}-\bar{\mathbf{Y}})\}'\,,
where T_{\delta_3}(\cdot)
is a threshold operator with the threshold level
\delta_3 \geq 0
. See Section 4.2 of Chang et al. (2024) for details.
Value
An object of class "mtscp"
, which contains the following
components:
A |
The estimated |
B |
The estimated |
f |
The estimated latent process |
Rank |
The estimated |
method |
A string indicating which CP-decomposition method is used. |
References
Chang, J., Du, Y., Huang, G., & Yao, Q. (2024). Identification and estimation for matrix time series CP-factor models. arXiv preprint. doi:10.48550/arXiv.2410.05634.
Chang, J., He, J., Yang, L., & Yao, Q. (2023). Modelling matrix time series via a tensor CP-decomposition. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85, 127–148. doi:10.1093/jrsssb/qkac011.
Examples
# Example 1.
p <- 10
q <- 10
n <- 400
d = d1 = d2 <- 3
## DGP.CP() generates simulated data for the example in Chang et al. (2024).
data <- DGP.CP(n, p, q, d, d1, d2)
Y <- data$Y
## d is unknown
res1 <- CP_MTS(Y, method = "CP.Direct")
res2 <- CP_MTS(Y, method = "CP.Refined")
res3 <- CP_MTS(Y, method = "CP.Unified")
## d is known
res4 <- CP_MTS(Y, Rank = list(d = 3), method = "CP.Direct")
res5 <- CP_MTS(Y, Rank = list(d = 3), method = "CP.Refined")
# Example 2.
p <- 10
q <- 10
n <- 400
d1 = d2 <- 2
d <-3
data <- DGP.CP(n, p, q, d, d1, d2)
Y1 <- data$Y
## d, d1 and d2 are unknown
res6 <- CP_MTS(Y1, method = "CP.Unified")
## d, d1 and d2 are known
res7 <- CP_MTS(Y1, Rank = list(d = 3, d1 = 2, d2 = 2), method = "CP.Unified")
Identifying the cointegration rank of nonstationary vector time series
Description
Coint()
deals with cointegration analysis for high-dimensional
vector time series proposed in Zhang, Robinson and Yao (2019). Consider the model:
{\bf y}_t = {\bf Ax}_t\,,
where {\bf A}
is a p \times p
unknown and invertible constant matrix,
{\bf x}_t = ({\bf x}'_{t,1}, {\bf x}'_{t,2})'
is a latent
p \times 1
process, {\bf x}_{t,2}
is an r \times 1
I(0)
process,
{\bf x}_{t,1}
is a process with nonstationary components, and no linear
combination of {\bf x}_{t,1}
is I(0)
. This function aims to estimate the
cointegration rank r
and the invertible constant matrix {\bf A}
.
Usage
Coint(
Y,
lag.k = 5,
type = c("acf", "urtest", "both"),
c0 = 0.3,
m = 20,
alpha = 0.01
)
Arguments
Y |
An |
lag.k |
The time lag
where |
type |
The method used to identify the cointegration rank. Available
options include: |
c0 |
The prescribed constant |
m |
The prescribed constant |
alpha |
The significance level |
Details
Write \hat{\bf x}_t=\hat{\bf A}'{\bf y}_t\equiv (\hat{x}_t^1,\ldots,\hat{x}_t^p)'
.
When type = "acf"
, Coint()
estimates r
by
\hat{r}=\sum_{i=1}^{p}1\bigg\{\frac{S_i(m)}{m}<c_0 \bigg\}
for some
constant c_0\in (0,1)
and some large constant m
, where
S_i(m)
is the sum of the sample autocorrelations of
\hat{x}^{i}_{t}
over lags 1 to m
,
which is specified in Section 2.3 of Zhang, Robinson and Yao (2019).
When type = "urtest"
, Coint()
estimates r
by unit root
tests. For i= 1,\ldots, p
, consider the null hypothesis
H_{0,i}:\hat{x}_t^{p-i+1} \sim I(0)\,.
The estimation procedure for
r
can be implemented as follows:
Step 1. Start with i=1
. Perform the unit root test proposed
in Chang, Cheng and Yao (2021) for H_{0,i}
.
Step 2. If the null hypothesis is not rejected at the significance
level \alpha
, increment i
by 1 and repeat Step 1. Otherwise, stop
the procedure and denote the value of i
at termination as i_0
.
The cointegration rank is then estimated as \hat{r}=i_0-1
.
Value
An object of class "coint"
, which contains the following
components:
A |
The estimated |
coint_rank |
The estimated cointegration rank |
lag.k |
The time lag used in function. |
method |
A string indicating which method is used to identify the cointegration rank. |
References
Chang, J., Cheng, G., & Yao, Q. (2022). Testing for unit roots based on sample autocovariances. Biometrika, 109, 543–550. doi:10.1093/biomet/asab034.
Zhang, R., Robinson, P., & Yao, Q. (2019). Identifying cointegration by eigenanalysis. Journal of the American Statistical Association, 114, 916–927. doi:10.1080/01621459.2018.1458620.
Examples
# Example 1 (Example 1 in Zhang, Robinson and Yao (2019))
## Generate yt
p <- 10
n <- 1000
r <- 3
d <- 1
X <- mat.or.vec(p, n)
X[1,] <- arima.sim(n-d, model = list(order=c(0, d, 0)))
for(i in 2:3)X[i,] <- rnorm(n)
for(i in 4:(r+1)) X[i, ] <- arima.sim(model = list(ar = 0.5), n)
for(i in (r+2):p) X[i, ] <- arima.sim(n = (n-d), model = list(order=c(1, d, 1), ar=0.6, ma=0.8))
M1 <- matrix(c(1, 1, 0, 1/2, 0, 1, 0, 1, 0), ncol = 3, byrow = TRUE)
A <- matrix(runif(p*p, -3, 3), ncol = p)
A[1:3,1:3] <- M1
Y <- t(A%*%X)
Coint(Y, type = "both")
Generating simulated data for the example in Chang et al. (2024)
Description
DGP.CP()
function generates simulated data following the
data generating process described in Section 7.1 of Chang et al. (2024).
Usage
DGP.CP(n, p, q, d, d1, d2)
Arguments
n |
Integer. The number of observations of the |
p |
Integer. The number of rows of |
q |
Integer. The number of columns of |
d |
Integer. The number of columns of the factor loading matrices |
d1 |
Integer. The rank of the |
d2 |
Integer. The rank of the |
Details
We generate
{\bf{Y}}_t = {\bf A \bf X}_t{\bf B}' + {\boldsymbol{\epsilon}}_t
for any t=1, \ldots, n
, where {\bf X}_t = {\rm diag}({\bf x}_t)
with {\bf x}_t = (x_{t,1},\ldots,x_{t,d})'
being a d \times 1
time series,
{\boldsymbol{\epsilon}}_t
is a p \times q
matrix white noise,
and {\bf A}
and {\bf B}
are, respectively, p\times d
and
q \times d
factor loading matrices. \bf A
, {\bf X}_t
, and \bf B
are generated based on the data generating process described in Section 7.1 of
Chang et al. (2024) and satisfy {\rm rank}({\bf A})=d_1
and
{\rm rank}({\bf B})=d_2
, 1 \le d_1, d_2 \le d
.
Value
A list containing the following components:
Y |
An |
A |
The |
B |
The |
X |
An |
References
Chang, J., Du, Y., Huang, G., & Yao, Q. (2024). Identification and estimation for matrix time series CP-factor models. arXiv preprint. doi:10.48550/arXiv.2410.05634.
See Also
Examples
p <- 10
q <- 10
n <- 400
d = d1 = d2 <- 3
data <- DGP.CP(n,p,q,d1,d2,d)
Y <- data$Y
## The first observation: Y_1
Y[1, , ]
Factor analysis for vector time series
Description
Factors()
deals with factor modeling for high-dimensional
time series proposed in Lam and Yao (2012):
{\bf y}_t = {\bf Ax}_t +
{\boldsymbol{\epsilon}}_t,
where {\bf x}_t
is an r \times 1
latent process with (unknown) r \leq p
, {\bf A}
is a p
\times r
unknown constant matrix, and {\boldsymbol{\epsilon}}_t
is a
vector white noise process. The number of factors r
and the factor
loadings {\bf A}
can be estimated in terms of an eigenanalysis for a
nonnegative definite matrix, and is therefore applicable when the dimension
of {\bf y}_t
is on the order of a few thousands. This function aims to
estimate the number of factors r
and the factor loading matrix
{\bf A}
.
Usage
Factors(
Y,
lag.k = 5,
thresh = FALSE,
delta = 2 * sqrt(log(ncol(Y))/nrow(Y)),
twostep = FALSE
)
Arguments
Y |
An |
lag.k |
The time lag
where |
thresh |
Logical. If |
delta |
The value of the threshold level |
twostep |
Logical. If |
Details
The threshold operator T_\delta(\cdot)
is defined as
T_\delta({\bf W}) = \{w_{i,j}1(|w_{i,j}|\geq \delta)\}
for any matrix
{\bf W}=(w_{i,j})
, with the threshold level \delta \geq 0
and 1(\cdot)
representing the indicator function. We recommend to choose
\delta=0
when p
is fixed and \delta>0
when p \gg n
.
Value
An object of class "factors"
, which contains the following
components:
factor_num |
The estimated number of factors
|
loading.mat |
The estimated |
X |
The |
lag.k |
The time lag used in function. |
References
Lam, C., & Yao, Q. (2012). Factor modelling for high-dimensional time series: Inference for the number of factors. The Annals of Statistics, 40, 694–726. doi:10.1214/12-AOS970.
Examples
# Example 1 (Example in Section 3.3 of lam and Yao 2012)
## Generate y_t
p <- 200
n <- 400
r <- 3
X <- mat.or.vec(n, r)
A <- matrix(runif(p*r, -1, 1), ncol=r)
x1 <- arima.sim(model=list(ar=c(0.6)), n=n)
x2 <- arima.sim(model=list(ar=c(-0.5)), n=n)
x3 <- arima.sim(model=list(ar=c(0.3)), n=n)
eps <- matrix(rnorm(n*p), p, n)
X <- t(cbind(x1, x2, x3))
Y <- A %*% X + eps
Y <- t(Y)
fac <- Factors(Y,lag.k=2)
r_hat <- fac$factor_num
loading_Mat <- fac$loading.mat
Fama-French 10*10 return series
Description
The portfolios are constructed by the intersections of 10 levels
of size, denoted by {\rm S}_{1}, \ldots, {\rm S}_{10}
, and 10 levels of the book
equity to market equity ratio (BE), denoted by {\rm BE}_1, \ldots,{\rm BE}_{10}
.
The dataset consists of monthly returns from January 1964 to
December 2021, which contains 69600 observations for 696 total months.
Usage
data(FamaFrench)
Format
A data frame with 696 rows and 102 columns. The first column
represents the month, and the second column named
MKT.RF
represents the monthly market returns. The rest of the columns
represent the return series for different sizes and BE-ratios.
Source
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
Factor analysis with observed regressors for vector time series
Description
HDSReg()
considers a multivariate time series model which
represents a high-dimensional vector process as a sum of three terms: a
linear regression of some observed regressors, a linear combination of some
latent and serially correlated factors, and a vector white noise:
{\bf
y}_t = {\bf Dz}_t + {\bf Ax}_t + {\boldsymbol {\epsilon}}_t,
where {\bf
y}_t
and {\bf z}_t
are, respectively, observable p\times 1
and
m \times 1
time series, {\bf x}_t
is an r \times 1
latent
factor process, {\boldsymbol{\epsilon}}_t
is a vector white noise process,
{\bf D}
is an unknown regression coefficient matrix, and
{\bf A}
is an unknown factor loading matrix. This procedure proposed in
Chang, Guo and Yao (2015) aims to estimate the regression coefficient
matrix {\bf D}
, the number of factors r
and the factor loading
matrix {\bf A}
.
Usage
HDSReg(
Y,
Z,
D = NULL,
lag.k = 5,
thresh = FALSE,
delta = 2 * sqrt(log(ncol(Y))/nrow(Y)),
twostep = FALSE
)
Arguments
Y |
An |
Z |
An |
D |
A |
lag.k |
The time lag
where |
thresh |
Logical. If |
delta |
The value of the threshold level |
twostep |
Logical. The same as the argument |
Details
The threshold operator T_\delta(\cdot)
is defined as
T_\delta({\bf W}) = \{w_{i,j}1(|w_{i,j}|\geq \delta)\}
for any matrix
{\bf W}=(w_{i,j})
, with the threshold level \delta \geq 0
and 1(\cdot)
representing the indicator function. We recommend to choose
\delta=0
when p
is fixed and \delta>0
when p \gg n
.
Value
An object of class "factors"
, which contains the following
components:
factor_num |
The estimated number of factors |
reg.coff.mat |
The estimated |
loading.mat |
The estimated |
X |
The |
lag.k |
The time lag used in function. |
References
Chang, J., Guo, B., & Yao, Q. (2015). High dimensional stochastic regression with latent factors, endogeneity and nonlinearity. Journal of Econometrics, 189, 297–312. doi:10.1016/j.jeconom.2015.03.024.
See Also
Examples
# Example 1 (Example 1 in Chang, Guo and Yao (2015)).
## Generate xt
n <- 400
p <- 200
m <- 2
r <- 3
X <- mat.or.vec(n,r)
x1 <- arima.sim(model = list(ar = c(0.6)), n = n)
x2 <- arima.sim(model = list(ar = c(-0.5)), n = n)
x3 <- arima.sim(model = list(ar = c(0.3)), n = n)
X <- cbind(x1, x2, x3)
X <- t(X)
## Generate yt
Z <- mat.or.vec(m,n)
S1 <- matrix(c(5/8, 1/8, 1/8, 5/8), 2, 2)
Z[,1] <- c(rnorm(m))
for(i in c(2:n)){
Z[,i] <- S1%*%Z[, i-1] + c(rnorm(m))
}
D <- matrix(runif(p*m, -2, 2), ncol = m)
A <- matrix(runif(p*r, -2, 2), ncol = r)
eps <- mat.or.vec(n, p)
eps <- matrix(rnorm(n*p), p, n)
Y <- D %*% Z + A %*% X + eps
Y <- t(Y)
Z <- t(Z)
## D is known
res1 <- HDSReg(Y, Z, D, lag.k = 2)
## D is unknown
res2 <- HDSReg(Y, Z, lag.k = 2)
U.S. Industrial Production indices
Description
The dataset consists of 7 monthly U.S. Industrial Production indices, namely the total index, nonindustrial supplies, final products, manufacturing, materials, mining, and utilities, from January 1947 to December 2023 published by the U.S. Federal Reserve.
Usage
data(IPindices)
Format
A data frame with 924 rows and 8 variables:
- DATE
The observation date
- INDPRO
The total index
- IPB54000S
Nonindustrial supplies
- IPFINAL
Final products
- IPMANSICS
Manufacturing
- IPMAT
Materials
- IPMINE
Mining
- IPUTIL
Utilities
Source
https://fred.stlouisfed.org/release/tables?rid=13&eid=49670
Testing for martingale difference hypothesis in high dimension
Description
MartG_test()
implements a new test proposed in
Chang, Jiang and Shao (2023) for the following hypothesis testing problem:
H_0:\{{\bf y}_t\}_{t=1}^n\mathrm{\ is\ a\ MDS\ \ versus\ \ }H_1:
\{{\bf y}_t\}_{t=1}^n\mathrm{\ is\ not\ a\ MDS}\,,
where MDS is the abbreviation of "martingale difference sequence".
Usage
MartG_test(
Y,
lag.k = 2,
B = 1000,
type = c("Linear", "Quad"),
alpha = 0.05,
kernel.type = c("QS", "Par", "Bart")
)
Arguments
Y |
An |
lag.k |
The time lag |
B |
The number of bootstrap replications for generating multivariate normally distributed random vectors when calculating the critical value. The default is 1000. |
type |
The map used for constructing the test statistic.
Available options include: |
alpha |
The significance level of the test. The default is 0.05. |
kernel.type |
The option for choosing the symmetric kernel
used in the estimation of long-run covariance matrix.
Available options include: |
Details
Write {\bf x}= (x_1,\ldots,x_p)'
.
When type = "Linear"
, the linear identity map is defined
as \boldsymbol \phi({\bf x})={\bf x}
.
When type = "Quad"
,
\boldsymbol \phi({\bf x})=\{{\bf x}',({\bf x}^2)'\}'
includes both linear and quadratic terms, where
{\bf x}^2 = (x_1^2,\ldots,x_p^2)'
.
We can also choose \boldsymbol \phi({\bf x}) = \cos({\bf x})
to capture
certain type of nonlinear dependence, where
\cos({\bf x}) = (\cos x_1,\ldots,\cos x_p)'
.
See 'Examples'.
Value
An object of class "hdtstest"
, which contains the following
components:
statistic |
The test statistic of the test. |
p.value |
The p-value of the test. |
lag.k |
The time lag used in function. |
type |
The map used in function. |
kernel.type |
The kernel used in function. |
References
Chang, J., Jiang, Q., & Shao, X. (2023). Testing the martingale difference hypothesis in high dimension. Journal of Econometrics, 235, 972–1000. doi:10.1016/j.jeconom.2022.09.001.
Examples
# Example 1
n <- 200
p <- 10
X <- matrix(rnorm(n*p),n,p)
res <- MartG_test(X, type="Linear")
res <- MartG_test(X, type=cbind(X, X^2)) #the same as type = "Quad"
## map can also be defined as an expression in R.
res <- MartG_test(X, type=quote(cbind(X, X^2))) # expr using quote()
res <- MartG_test(X, type=substitute(cbind(X, X^2))) # expr using substitute()
res <- MartG_test(X, type=expression(cbind(X, X^2))) # expr using expression()
res <- MartG_test(X, type=parse(text="cbind(X, X^2)")) # expr using parse()
## map can also be defined as a function in R.
map_fun <- function(X) {X <- cbind(X, X^2); X}
res <- MartG_test(X, type=map_fun)
Pvalue <- res$p.value
rej <- res$reject
Principal component analysis for vector time series
Description
PCA_TS()
seeks for a contemporaneous linear
transformation for a multivariate time series such that the transformed
series is segmented into several lower-dimensional subseries:
{\bf
y}_t={\bf Ax}_t,
where {\bf x}_t
is an unobservable p \times 1
weakly stationary time series consisting of q\ (\geq 1)
both
contemporaneously and serially uncorrelated subseries. See Chang, Guo and
Yao (2018).
Usage
PCA_TS(
Y,
lag.k = 5,
opt = 1,
permutation = c("max", "fdr"),
thresh = FALSE,
delta = 2 * sqrt(log(ncol(Y))/nrow(Y)),
prewhiten = TRUE,
m = NULL,
beta,
control = list()
)
Arguments
Y |
An |
lag.k |
The time lag
where |
opt |
An option used to choose which method will be implemented to get a
consistent estimate |
permutation |
The method of permutation procedure to assign the
components of |
thresh |
Logical. If |
delta |
The value of the threshold level |
prewhiten |
Logical. If |
m |
A positive integer used in the permutation procedure [See (2.10) in Chang, Guo and Yao (2018)]. The default is 10. |
beta |
The error rate used in the permutation procedure[See (2.16) in
Chang, Guo and Yao (2018)] when |
control |
A list of control arguments. See ‘Details’. |
Details
The threshold operator T_\delta(\cdot)
is defined as
T_\delta({\bf W}) = \{w_{i,j}1(|w_{i,j}|\geq \delta)\}
for any matrix
{\bf W}=(w_{i,j})
, with the threshold level \delta \geq 0
and 1(\cdot)
representing the indicator function. We recommend to choose
\delta=0
when p
is fixed and \delta>0
when p \gg n
.
For large p
, since the sample covariance matrix may not be consistent,
we recommend to use the method proposed
in Cai, Liu and Luo (2011) to estimate the precision matrix
\hat{{\bf V}}^{-1}
(opt = 2
).
control
is a list of arguments passed to the function clime()
,
which contains the following components:
-
nlambda
: Number of values for program generated lambda. The default is 100. -
lambda.max
: Maximum value of program generated lambda. The default is 0.8. -
lambda.min
: Minimum value of program generated lambda. The default is10^{-4}
(n>p
) or10^{-2}
(n<p
). -
standardize
: Logical. Ifstandardize = TRUE
, the variables will be standardized to have mean zero and unit standard deviation. The default isFALSE
. -
linsolver
: An option used to choose which method should be employed. Available options include"primaldual"
(the default) and"simplex"
. Rule of thumb:"primaldual"
for largep
,"simplex"
for smallp
.
Value
An object of class "tspca"
, which contains the following
components:
B |
The |
X |
The |
NoGroups |
The number of groups. |
No_of_Members |
The number of members in each group. |
Groups |
The indices of the components of |
method |
A string indicating which permutation procedure is performed. |
References
Cai, T., Liu, W., & Luo, X. (2011). A constrained L1 minimization approach for sparse precision matrix estimation. Journal of the American Statistical Association, 106, 594–607. doi:10.1198/jasa.2011.tm10155.
Chang, J., Guo, B., & Yao, Q. (2018). Principal component analysis for second-order stationary vector time series. The Annals of Statistics, 46, 2094–2124. doi:10.1214/17-AOS1613.
Examples
# Example 1 (Example 1 in the supplementary material of Chang, Guo and Yao (2018)).
# p=6, x_t consists of 3 independent subseries with 3, 2 and 1 components.
## Generate x_t
p <- 6;n <- 1500
X <- mat.or.vec(p,n)
x <- arima.sim(model = list(ar = c(0.5, 0.3), ma = c(-0.9, 0.3, 1.2,1.3)),
n = n+2, sd = 1)
for(i in 1:3) X[i,] <- x[i:(n+i-1)]
x <- arima.sim(model = list(ar = c(0.8,-0.5),ma = c(1,0.8,1.8) ), n = n+1, sd = 1)
for(i in 4:5) X[i,] <- x[(i-3):(n+i-4)]
x <- arima.sim(model = list(ar = c(-0.7, -0.5), ma = c(-1, -0.8)), n = n, sd = 1)
X[6,] <- x
## Generate y_t
A <- matrix(runif(p*p, -3, 3), ncol = p)
Y <- A%*%X
Y <- t(Y)
## permutation = "max" or permutation = "fdr"
res <- PCA_TS(Y, lag.k = 5,permutation = "max")
res1 <- PCA_TS(Y, lag.k = 5,permutation = "fdr", beta = 10^(-10))
Z <- res$X
# Example 2 (Example 2 in the supplementary material of Chang, Guo and Yao (2018)).
# p=20, x_t consists of 5 independent subseries with 6, 5, 4, 3 and 2 components.
## Generate x_t
p <- 20;n <- 3000
X <- mat.or.vec(p,n)
x <- arima.sim(model = list(ar = c(0.5, 0.3), ma = c(-0.9, 0.3, 1.2, 1.3)),
n.start = 500, n = n+5, sd = 1)
for(i in 1:6) X[i,] <- x[i:(n+i-1)]
x <- arima.sim(model = list(ar = c(-0.4, 0.5), ma = c(1, 0.8, 1.5, 1.8)),
n.start = 500, n = n+4, sd = 1)
for(i in 7:11) X[i,] <- x[(i-6):(n+i-7)]
x <- arima.sim(model = list(ar = c(0.85,-0.3), ma=c(1, 0.5, 1.2)),
n.start = 500, n = n+3,sd = 1)
for(i in 12:15) X[i,] <- x[(i-11):(n+i-12)]
x <- arima.sim(model = list(ar = c(0.8, -0.5),ma = c(1, 0.8, 1.8)),
n.start = 500, n = n+2,sd = 1)
for(i in 16:18) X[i,] <- x[(i-15):(n+i-16)]
x <- arima.sim(model = list(ar = c(-0.7, -0.5), ma = c(-1, -0.8)),
n.start = 500,n = n+1,sd = 1)
for(i in 19:20) X[i,] <- x[(i-18):(n+i-19)]
## Generate y_t
A <- matrix(runif(p*p, -3, 3), ncol =p)
Y <- A%*%X
Y <- t(Y)
## permutation = "max" or permutation = "fdr"
res <- PCA_TS(Y, lag.k = 5,permutation = "max")
res1 <- PCA_TS(Y, lag.k = 5,permutation = "fdr",beta = 10^(-200))
Z <- res$X
The national QWI hires data
Description
The data on new hires at a national level are obtained from the Quarterly Workforce Indicators (QWI) of the Longitudinal Employer-Household Dynamics program at the U.S. Census Bureau (Abowd et al., 2009). The national QWI hires data covers a variable number of years, with some states providing time series going back to 1990 (e.g., Washington), and others (e.g., Massachusetts) only commencing at 2010. For each of 51 states (excluding D.C. but including Puerto Rico) there is a new hires time series for each county. Additional description of the data, along with its relevancy to labor economics, can be found in Hyatt and McElroy (2019).
Usage
data(QWIdata)
Format
A list with 51 elements. Every element contains a multivariate time series.
Source
https://qwiexplorer.ces.census.gov/
https://ledextract.ces.census.gov/qwi/all
References
Abowd, J. M., Stephens, B. E., Vilhuber, L., Andersson, F., McKinney, K. L., Roemer, M., and Woodcock, S. (2009). The LEHD infrastructure files and the creation of the quarterly workforce indicators. In Producer dynamics: New evidence from micro data, pages 149–230. University of Chicago Press. doi:10.7208/chicago/9780226172576.003.0006.
Hyatt, H. R. and McElroy, T. S. (2019). Labor reallocation, employment, and earnings: Vector autoregression evidence. Labour, 33, 463–487. doi:10.1111/labr.12153
Multiple testing with FDR control for spectral density matrix
Description
SpecMulTest()
implements a new multiple testing procedure proposed in
Chang et al. (2022) for the following Q
hypothesis testing problems:
H_{0,q}:f_{i,j}(\omega)=0\mathrm{\ for\ any\ }(i,j)\in\mathcal{I}^{(q)}\mathrm{\ and\ }
\omega\in\mathcal{J}^{(q)}\mathrm{\ \ versus\ \ }
H_{1,q}:H_{0,q}\mathrm{\ is\ not\ true}
for q=1,\dots,Q
.
Here, f_{i,j}(\omega)
represents the cross-spectral density between
x_{t,i}
and x_{t,j}
at frequency \omega
with x_{t,i}
being
the i
-th component of the p \times 1
times series {\bf x}_t
,
and \mathcal{I}^{(q)}
and \mathcal{J}^{(q)}
refer to
the set of index pairs and the set of frequencies associated with the
q
-th test, respectively.
Usage
SpecMulTest(Q, PVal, alpha = 0.05, seq_len = 0.01)
Arguments
Q |
The number of the hypothesis tests. |
PVal |
A vector of length |
alpha |
The prescribed level for the FDR control. The default is 0.05. |
seq_len |
The step size for generating a sequence from 0 to
|
Value
An object of class "hdtstest"
, which contains the following
component:
MultiTest |
A logical vector of length |
References
Chang, J., Jiang, Q., McElroy, T. S., & Shao, X. (2022). Statistical inference for high-dimensional spectral density matrix. arXiv preprint. doi:10.48550/arXiv.2212.13686.
See Also
Examples
# Example 1
## Generate xt
n <- 200
p <- 10
flag_c <- 0.8
B <- 1000
burn <- 1000
z.sim <- matrix(rnorm((n+burn)*p),p,n+burn)
phi.mat <- 0.4*diag(p)
x.sim <- phi.mat %*% z.sim[,(burn+1):(burn+n)]
x <- x.sim - rowMeans(x.sim)
Q <- 4
## Generate the sets Iq and Jq
ISET <- list()
ISET[[1]] <- matrix(c(1,2),ncol=2)
ISET[[2]] <- matrix(c(1,3),ncol=2)
ISET[[3]] <- matrix(c(1,4),ncol=2)
ISET[[4]] <- matrix(c(1,5),ncol=2)
JSET <- as.list(2*pi*seq(0,3)/4 - pi)
## Calculate Q p-values
PVal <- rep(NA,Q)
for (q in 1:Q) {
cross.indices <- ISET[[q]]
J.set <- JSET[[q]]
temp.q <- SpecTest(t(x), J.set, cross.indices, B, flag_c)
PVal[q] <- temp.q$p.value
}
res <- SpecMulTest(Q, PVal)
res
Global testing for spectral density matrix
Description
SpecTest()
implements a new global test proposed in
Chang et al. (2022) for the following hypothesis testing problem:
H_0:f_{i,j}(\omega)=0 \mathrm{\ for\ any\ }(i,j)\in \mathcal{I}\mathrm{\ and\ }
\omega \in \mathcal{J}\mathrm{\ \ versus\ \ }H_1:H_0\mathrm{\ is\ not\ true }\,,
where f_{i,j}(\omega)
represents the cross-spectral density between
x_{t,i}
and x_{t,j}
at frequency \omega
with x_{t,i}
being
the i
-th component of the p \times 1
times series {\bf x}_t
.
Here, \mathcal{I}
is the set of index pairs of interest, and
\mathcal{J}
is the set of frequencies.
Usage
SpecTest(X, J.set, cross.indices, B = 1000, flag_c = 0.8)
Arguments
X |
An |
J.set |
A vector representing the set |
cross.indices |
An |
B |
The number of bootstrap replications for generating multivariate normally distributed random vectors when calculating the critical value. The default is 2000. |
flag_c |
The bandwidth |
Value
An object of class "hdtstest"
, which contains the following
components:
Stat |
The test statistic of the test. |
pval |
The p-value of the test. |
cri95 |
The critical value of the test at the significance level 0.05. |
References
Chang, J., Jiang, Q., McElroy, T. S., & Shao, X. (2022). Statistical inference for high-dimensional spectral density matrix. arXiv preprint. doi:10.48550/arXiv.2212.13686.
See Also
Examples
# Example 1
## Generate xt
n <- 200
p <- 10
flag_c <- 0.8
B <- 1000
burn <- 1000
z.sim <- matrix(rnorm((n+burn)*p),p,n+burn)
phi.mat <- 0.4*diag(p)
x.sim <- phi.mat %*% z.sim[,(burn+1):(burn+n)]
x <- x.sim - rowMeans(x.sim)
## Generate the sets I and J
cross.indices <- matrix(c(1,2), ncol=2)
J.set <- 2*pi*seq(0,3)/4 - pi
res <- SpecTest(t(x), J.set, cross.indices, B, flag_c)
Stat <- res$statistic
Pvalue <- res$p.value
CriVal <- res$cri95
Testing for unit roots based on sample autocovariances
Description
This function implements the test proposed in Chang, Cheng and Yao (2022) for the following hypothesis testing problem:
H_0:Y_t \sim I(0)\ \ \mathrm{versus}\ \ H_1:Y_t \sim
I(d)\ \mathrm{for\ some\ integer\ }d \geq 1\,,
where Y_t
is
a univariate time series.
Usage
UR_test(Y, lagk.vec = NULL, con_vec = NULL, alpha = 0.05)
Arguments
Y |
A vector |
lagk.vec |
The time lag |
con_vec |
The constant |
alpha |
The significance level of the test. The default is 0.05. |
Value
An object of class "urtest"
, which contains the following
components:
statistic |
A |
reject |
An |
lag.vec |
The time lags used in function. |
References
Chang, J., Cheng, G., & Yao, Q. (2022). Testing for unit roots based on sample autocovariances. Biometrika, 109, 543–550. doi:10.1093/biomet/asab034.
Examples
# Example 1
## Generate yt
N <- 100
Y <-arima.sim(list(ar = c(0.9)), n = 2*N, sd = sqrt(1))
con_vec <- c(0.45, 0.55, 0.65)
lagk.vec <- c(0, 1, 2)
UR_test(Y, lagk.vec = lagk.vec, con_vec = con_vec, alpha = 0.05)
UR_test(Y, alpha = 0.05)
Testing for white noise hypothesis in high dimension
Description
WN_test()
implements the test proposed in Chang, Yao and Zhou
(2017) for the following hypothesis testing problem:
H_0:\{{\bf y}_t
\}_{t=1}^n\mathrm{\ is\ white\ noise\ \ versus\ \ }H_1:\{{\bf y}_t
\}_{t=1}^n\mathrm{\ is\ not\ white\ noise.}
Usage
WN_test(
Y,
lag.k = 2,
B = 1000,
kernel.type = c("QS", "Par", "Bart"),
pre = FALSE,
alpha = 0.05,
control.PCA = list()
)
Arguments
Y |
An |
lag.k |
The time lag |
B |
The number of bootstrap replications for generating multivariate normally distributed random vectors when calculating the critical value. The default is 1000. |
kernel.type |
The option for choosing the symmetric kernel used
in the estimation of long-run covariance matrix. Available options include:
|
pre |
Logical. If |
alpha |
The significance level of the test. The default is 0.05. |
control.PCA |
A list of control arguments passed to the function
|
Value
An object of class "hdtstest"
, which contains the following
components:
statistic |
The test statistic of the test. |
p.value |
The p-value of the test. |
lag.k |
The time lag used in function. |
kernel.type |
The kernel used in function. |
References
Chang, J., Guo, B., & Yao, Q. (2018). Principal component analysis for second-order stationary vector time series. The Annals of Statistics, 46, 2094–2124. doi:10.1214/17-AOS1613.
Chang, J., Yao, Q., & Zhou, W. (2017). Testing for high-dimensional white noise using maximum cross-correlations. Biometrika, 104, 111–127. doi:10.1093/biomet/asw066.
See Also
Examples
#Example 1
## Generate xt
n <- 200
p <- 10
Y <- matrix(rnorm(n * p), n, p)
res <- WN_test(Y)
Pvalue <- res$p.value
rej <- res$reject
Make predictions from a "factors"
object
Description
This function makes predictions from a "factors"
object.
Usage
## S3 method for class 'factors'
predict(
object,
newdata = NULL,
n.ahead = 10,
control_ARIMA = list(),
control_VAR = list(),
...
)
Arguments
object |
An object of class |
newdata |
Optional. A new data matrix to predict from. |
n.ahead |
An integer specifying the number of steps ahead for prediction. |
control_ARIMA |
A list of arguments passed to the function
|
control_VAR |
A list of arguments passed to the function
|
... |
Currently not used. |
Details
Forecasting for {\bf y}_t
can be implemented in two steps:
Step 1. Get the h
-step ahead forecast of the \hat{r} \times 1
time series \hat{\bf x}_t
[See Factors
], denoted by
\hat{\bf x}_{n+h}
, using a VAR model
(if \hat{r} > 1
) or an ARIMA model (if \hat{r} = 1
). The orders
of VAR and ARIMA models are determined by AIC by default. Otherwise, they
can also be specified by users through the arguments control_VAR
and control_ARIMA
, respectively.
Step 2. The forecasted value for {\bf y}_t
is obtained by
\hat{\bf y}_{n+h}= \hat{\bf A}\hat{\bf x}_{n+h}
.
Value
ts_pred |
A matrix of predicted values. |
See Also
Examples
library(HDTSA)
data(FamaFrench, package = "HDTSA")
## Remove the market effects
reg <- lm(as.matrix(FamaFrench[, -c(1:2)]) ~ as.matrix(FamaFrench$MKT.RF))
Y_2d = reg$residuals
res_factors <- Factors(Y_2d, lag.k = 5)
pred_fac_Y <- predict(res_factors, n.ahead = 1)
Make predictions from a "mtscp"
object
Description
This function makes predictions from a "mtscp"
object.
Usage
## S3 method for class 'mtscp'
predict(
object,
newdata = NULL,
n.ahead = 10,
control_ARIMA = list(),
control_VAR = list(),
...
)
Arguments
object |
An object of class |
newdata |
Optional. A new data matrix to predict from. |
n.ahead |
An integer specifying the number of steps ahead for prediction. |
control_ARIMA |
A list of arguments passed to the function
|
control_VAR |
A list of arguments passed to the function
|
... |
Currently not used. |
Details
Forecasting for {\bf y}_t
can be implemented in two steps:
Step 1. Get the h
-step ahead forecast of the \hat{d} \times 1
time series \hat{\bf x}_t=(\hat{x}_{t,1},\ldots,\hat{x}_{t,\hat{d}})'
[See CP_MTS
], denoted by \hat{\bf x}_{n+h}
, using a VAR model
(if \hat{d} > 1
) or an ARIMA model
(if \hat{d} = 1
). The orders of VAR and ARIMA models are determined by
AIC by default. Otherwise, they can also be specified by users through the
arguments control_VAR
and control_ARIMA
, respectively.
Step 2. The forecasted value for {\bf Y}_t
is obtained by
\hat{\bf Y}_{n+h}= \hat{\bf A} \hat{\bf X}_{n+h} \hat{\bf B}'
with
\hat{\bf X}_{n+h} = {\rm diag}(\hat{\bf x}_{n+h})
.
Value
Y_pred |
A list of length |
See Also
Examples
library(HDTSA)
data(FamaFrench, package = "HDTSA")
## Remove the market effects
reg <- lm(as.matrix(FamaFrench[, -c(1:2)]) ~ as.matrix(FamaFrench$MKT.RF))
Y_2d = reg$residuals
## Rearrange Y_2d into a 3-dimensional array Y
Y = array(NA, dim = c(NROW(Y_2d), 10, 10))
for (tt in 1:NROW(Y_2d)) {
for (ii in 1:10) {
Y[tt, ii, ] <- Y_2d[tt, (1 + 10*(ii - 1)):(10 * ii)]
}
}
res_cp <- CP_MTS(Y, lag.k = 20, method = "CP.Refined")
pred_cp_Y <- predict(res_cp, n.ahead = 1)[[1]]
Make predictions from a "tspca"
object
Description
This function makes predictions from a "tspca"
object.
Usage
## S3 method for class 'tspca'
predict(
object,
newdata = NULL,
n.ahead = 10,
control_ARIMA = list(),
control_VAR = list(),
...
)
Arguments
object |
An object of class |
newdata |
Optional. A new data matrix to predict from. |
n.ahead |
An integer specifying the number of steps ahead for prediction. |
control_ARIMA |
A list of arguments passed to the function
|
control_VAR |
A list of arguments passed to the function
|
... |
Currently not used. |
Details
Having obtained \hat{\bf x}_t
using the PCA_TS
function, which is
segmented into q
uncorrelated subseries
\hat{\bf x}_t^{(1)},\ldots,\hat{\bf x}_t^{(q)}
, the forecasting for {\bf y}_t
can be performed in two steps:
Step 1. Get the h
-step ahead forecast \hat{\bf x}_{n+h}^{(j)}
(j=1,\ldots,q)
by using a VAR model (if the dimension of \hat{\bf x}_t^{(j)}
is larger than 1)
or an ARIMA model (if the dimension of \hat{\bf x}_t^{(j)}
is 1). The orders
of VAR and ARIMA models are determined by AIC by default. Otherwise, they
can also be specified by users through the arguments control_VAR
and control_ARIMA
, respectively.
Step 2. Let \hat{\bf x}_{n+h} = (\{\hat{\bf x}_{n+h}^{(1)}\}',\ldots,\{\hat{\bf x}_{n+h}^{(q)}\}')'
.
The forecasted value for {\bf y}_t
is obtained by
\hat{\bf y}_{n+h}= \hat{\bf B}^{-1}\hat{\bf x}_{n+h}
.
Value
Y_pred |
A matrix of predicted values. |
See Also
Examples
library(HDTSA)
data(FamaFrench, package = "HDTSA")
## Remove the market effects
reg <- lm(as.matrix(FamaFrench[, -c(1:2)]) ~ as.matrix(FamaFrench$MKT.RF))
Y_2d = reg$residuals
res_pca <- PCA_TS(Y_2d, lag.k = 5, thresh = TRUE)
pred_pca_Y <- predict(res_pca, n.ahead = 1)