solve linear LHSC and Kernel LHSC
lhsc.Rd
Fit the LHSC in input space and reproducing kernel Hilbert space. The solution path is computed at a grid of values of tuning parameter lambda
.
Arguments
- x
A numerical matrix with \(N\) rows and \(p\) columns for predictors.
- y
A vector of length \(N\) for binary responses. The element of
y
is either -1 or 1.- kern
A kernel function; see
dots
.- lambda
A user supplied
lambda
sequence.- eps
The algorithm stops when \(| \beta^{old} - \beta^{new} |\) is less than
eps
. Default value is1e-5
.- maxit
The maximum of iterations allowed. Default is 1e5.
Details
The leaky hockey stick loss is \(V(u)=1-u\) if \(u \le 1\) and \(-\log u\) if \(u > 1\). The value of \(\lambda\), i.e., lambda
, is user-specified.
In the linear case (kern
is the inner product and N > p), the lhsc
fits a linear LHSC by minimizing the L2 penalized leaky hockey stick loss function,
$$L(\beta_0,\beta) := \frac{1}{N}\sum_{i=1}^N V(y_i(\beta_0 + X_i'\beta)) + \lambda \beta' \beta.$$
If a linear LHSC is fitted when N < p, a kernel LHSC with the linear kernel is actually solved. In such case, the coefficient \(\beta\) can be obtained from \(\beta = X'\alpha.\)
In the kernel case, the lhsc
fits a kernel LHSC by minimizing
$$L(\alpha_0,\alpha) := \frac{1}{n}\sum_{i=1}^n V(y_i(\alpha_0 + K_i' \alpha)) + \lambda \alpha' K \alpha,$$
where \(K\) is the kernel matrix and \(K_i\) is the ith row.
Value
An object with S3 class lhsc
.
- alpha
A matrix of LHSC coefficients at each
lambda
value. The dimension is(p+1)*length(lambda)
in the linear case and(N+1)*length(lambda)
in the kernel case.- lambda
The
lambda
sequence.- npass
The total number of FISTA iterations for all lambda values.
- jerr
Warnings and errors; 0 if none.
- info
A list including parameters of the loss function,
eps
,maxit
,kern
, andwt
if a weight vector was used.- call
The call that produced this object.
Author
Oh-ran Kwon and Hui Zou
Maintainer: Oh-ran Kwon kwon0085@umn.edu
References
Kwon, O. and Zou, H. (2023+)
“Leaky Hockey Stick Loss: The First Negatively Divergent Margin-based Loss Function for Classification"
See also
predict.lhsc
, plot.lhsc
, and cv.lhsc
.
Examples
data(BUPA)
# standardize the predictors
BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE)
# a grid of tuning parameters
lambda = 10^(seq(3, -3, length.out=10))
# fit a linear LHSC
kern = vanilladot()
DWD_linear = lhsc(BUPA$X, BUPA$y, kern,
lambda=lambda, eps=1e-5, maxit=1e5)
# fit a kernel LHSC using Gaussian kernel
kern = rbfdot(sigma=1)
DWD_Gaussian = lhsc(BUPA$X, BUPA$y, kern,
lambda=lambda, eps=1e-5, maxit=1e5)