Seminar on Probability and Statistics

Seminar information archive ~05/22Next seminarFuture seminars 05/23~

Organizer(s) Nakahiro Yoshida, Teppei Ogihara, Yuta Koike


11:00-12:10   Room #052 (Graduate School of Math. Sci. Bldg.)
Xiaohui Chen (University of Illinois at Urbana–Champaign)
Gaussian and bootstrap approximations of high-dimensional U-statistics with applications and extensions ※変更の可能性あり

[ Abstract ]
We shall first discuss the Gaussian approximation of high-dimensional and non-degenerate U-statistics of order two under the supremum norm. A two-step Gaussian approximation procedure that does not impose structural assumptions on the data distribution is proposed. Subject to mild moment conditions on the kernel, we establish the explicit rate of convergence that decays polynomially in sample size for a high-dimensional scaling limit, where the dimension can be much larger than the sample size. We also provide computable approximation methods for the quantiles of the maxima of centered U-statistics. Specifically, we provide a unified perspective for the empirical, the randomly reweighted, and the multiplier bootstraps as randomly reweighted quadratic forms, all asymptotically valid and inferentially first-order equivalent in high-dimensions.

The bootstrap methods are applied on statistical applications for high-dimensional non-Gaussian data including: (i) principled and data-dependent tuning parameter selection for regularized estimation of the covariance matrix and its related functionals; (ii) simultaneous inference for the covariance and rank correlation matrices. In particular, for the thresholded covariance matrix estimator with the bootstrap selected tuning parameter, we show that the Gaussian-like convergence rates can be achieved for heavy-tailed data, which are less conservative than those obtained by the Bonferroni technique that
ignores the dependency in the underlying data distribution. In addition, we also show that even for subgaussian distributions, error bounds of the bootstrapped thresholded covariance matrix estimator can be much tighter than those of the minimax estimator with a universal threshold.

Time permitting, we will discuss some extensions to the infinite-dimensional version (i.e., U-processes of increasing complexity) and to the randomized inference via the incomplete U-statistics whose computational cost can be made independent of the order.