solve linear LHSC and Kernel LHSC

Fit the LHSC in input space and reproducing kernel Hilbert space. The solution path is computed at a grid of values of tuning parameter lambda.

Usage

lhsc(x, y, kern, lambda, eps=1e-05, maxit=1e+05)

Arguments

x: A numerical matrix with $N$ rows and $p$ columns for predictors.
y: A vector of length $N$ for binary responses. The element of y is either -1 or 1.
kern: A kernel function; see dots.
lambda: A user supplied lambda sequence.
eps: The algorithm stops when $| \beta^{old} - \beta^{new} |$ is less than eps. Default value is 1e-5.
maxit: The maximum of iterations allowed. Default is 1e5.

Details

The leaky hockey stick loss is $V(u)=1-u$ if $u \le 1$ and $-\log u$ if $u > 1$. The value of $\lambda$, i.e., lambda, is user-specified.

In the linear case (kern is the inner product and N > p), the lhsc fits a linear LHSC by minimizing the L2 penalized leaky hockey stick loss function, $$L(\beta_0,\beta) := \frac{1}{N}\sum_{i=1}^N V(y_i(\beta_0 + X_i'\beta)) + \lambda \beta' \beta.$$

If a linear LHSC is fitted when N < p, a kernel LHSC with the linear kernel is actually solved. In such case, the coefficient $\beta$ can be obtained from $\beta = X'\alpha.$

In the kernel case, the lhsc fits a kernel LHSC by minimizing $$L(\alpha_0,\alpha) := \frac{1}{n}\sum_{i=1}^n V(y_i(\alpha_0 + K_i' \alpha)) + \lambda \alpha' K \alpha,$$ where $K$ is the kernel matrix and $K_i$ is the ith row.

Value

An object with S3 class lhsc.

alpha: A matrix of LHSC coefficients at each lambda value. The dimension is (p+1)*length(lambda) in the linear case and (N+1)*length(lambda) in the kernel case.
lambda: The lambda sequence.
npass: The total number of FISTA iterations for all lambda values.
jerr: Warnings and errors; 0 if none.
info: A list including parameters of the loss function, eps, maxit, kern, and wt if a weight vector was used.
call: The call that produced this object.

Author

Oh-ran Kwon and Hui Zou
Maintainer: Oh-ran Kwon kwon0085@umn.edu

References

Kwon, O. and Zou, H. (2023+) “Leaky Hockey Stick Loss: The First Negatively Divergent Margin-based Loss Function for Classification"

Examples

data(BUPA)
# standardize the predictors
BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE)

# a grid of tuning parameters
lambda = 10^(seq(3, -3, length.out=10))

# fit a linear LHSC
kern = vanilladot()
DWD_linear = lhsc(BUPA$X, BUPA$y, kern,
  lambda=lambda, eps=1e-5, maxit=1e5)

# fit a kernel LHSC using Gaussian kernel
kern = rbfdot(sigma=1)
DWD_Gaussian = lhsc(BUPA$X, BUPA$y, kern,
  lambda=lambda, eps=1e-5, maxit=1e5)