Title: | Kernel Semi-Parametric Models |
---|---|
Description: | To fit the kernel semi-parametric model and its extensions. It allows multiple kernels and unlimited interactions in the same model. Coefficients are estimated by maximizing a penalized log-likelihood; penalization terms and hyperparameters are estimated by minimizing leave-one-out error. It includes predictions with confidence/prediction intervals, statistical tests for the significance of each kernel, a procedure for variable selection and graphical tools for diagnostics and interpretation of covariate effects. Currently it is implemented for continuous dependent variables. The package is based on the paper of Liu et al. (2007), <doi:10.1111/j.1541-0420.2007.00799.x>. |
Authors: | Catherine Schramm [aut, cre], Aurelie Labbe [ctb], Celia M. T. Greenwood [ctb] |
Maintainer: | Catherine Schramm <[email protected]> |
License: | GPL-3 |
Version: | 0.2.1 |
Built: | 2024-11-24 03:54:28 UTC |
Source: | https://github.com/cran/KSPM |
Simple utility returning names of cases involved in a kernel semi parametric model.
## S3 method for class 'kspm' case.names(object, ...)
## S3 method for class 'kspm' case.names(object, ...)
object |
an object of class "kspm", usually, a result of a call to |
... |
additional optional argument (currently unused). |
a character vector.
Catherine Schramm, Aurelie Labbe, Celia Greenwood
kspm for fitting model, nobs.kspm, variable.names.kspm.
Returns linear and kernel coefficients for a model of class "kspm".
## S3 method for class 'kspm' coef(object, ...)
## S3 method for class 'kspm' coef(object, ...)
object |
an object of class "kspm", usually, a result of a call to |
... |
additional optional argument (currently unused). |
Two matrices of coefficients.
linear |
A vector of coefficients for linear part. One row is one variable. |
kernel |
A matrix of coefficients for linear part. One row is one subject, one column is one kernel part. |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric regression of multidimensional genetic pathway data: least squares kernel machines and linear mixed models. Biometrics, 63(4), 1079:1088.
kspm for fitting model.
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) coef(fit)
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) coef(fit)
Computes confidence intervals for one or more parameters in the linear part of a fitted model of class "kspm".
## S3 method for class 'kspm' confint(object, parm = NULL, level = 0.95, ...)
## S3 method for class 'kspm' confint(object, parm = NULL, level = 0.95, ...)
object |
an object of class "kspm", usually, a result of a call to |
parm |
a vector of names specifying which parameters are to be given confidence intervals. If missing, all parameters are considered. |
level |
the confidence level required. By default 0.95. |
... |
additional optional argument (currently unused). |
For objects of class "kspm", the confidence interval is based on student distribution and effective degree of freedom of the model.
A matrix with column giving lower and upper confidence limits for each parameter. These are labelled as and
in percentage.
Catherine Schramm, Aurelie Labbe, Celia Greenwood
kspm for fitting model, summary.kspm.
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) confint(fit)
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) confint(fit)
Computes the Cook's distance method for an object of class "kspm".
## S3 method for class 'kspm' cooks.distance(model, ...)
## S3 method for class 'kspm' cooks.distance(model, ...)
model |
an model of class "kspm", usually, a result of a call to |
... |
furter arguments passed to or from other methods (currently unused). |
Cook's distance values () are computed as follows:
where e_i is the residual of subject i, h_ii is the i th diagonal element of Hat matrix H corresponding to the leverage associated with subject i and tr(H) is the trace of the Hat matrix H.
A vector containing Cook's distance values.
Catherine Schramm, Aurelie Labbe, Celia Greenwood
kspm for fitting model, residuals.kspm, rstandard.kspm, plot.kspm.
A dataset containing the ratings and other attributes of 187 movies.
csm
csm
A data frame with 187 rows and 13 variables:
year at which movies were projected on the screens
ratings
genre of the movie
gross income in USD
budget in USD
number of screens in USA
sequel
sentiment score
number of views of movie trailer on Youtube
number of likes of movie trailer on Youtube
number of dislikes of movie trailer on Youtube
number of comments of movie trailer on Youtube
aggregate actor followers on Twitter
https://archive.ics.uci.edu/ml/index.php
AHMED, Mehreen, JAHANGIR, Maham, AFZAL, Hammad, et al. Using Crowd-source based features from social media and Conventional features to predict the movies popularity. In : Smart City/SocialCom/SustainCom (SmartCity), 2015 IEEE International Conference on. IEEE, 2015. p. 273-278.
derivatives
is a function for "kspm" object computing pointwise partial derivatives of accroding to each
variable.
derivatives(object)
derivatives(object)
object |
an object of class "kspm", usually, a result of a call to |
derivatives are not computed for interactions. If a variable is included in several kernels, the user may obtain the corresponding pointwise derivatives by summing the pointwise derivatives associated with each kernel.
an object of class 'derivatives'
derivmat |
a list of |
rawmat |
a |
scalemat |
scaled version of rawmat |
modelmat |
matrix of correspondance between variable and kernels |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Kim, Choongrak, Byeong U. Park, and Woochul Kim. "Influence diagnostics in semiparametric regression models." Statistics and probability letters 60.1 (2002): 49:58.
Returns the deviance of a fitted model object of class "kspm".
## S3 method for class 'kspm' deviance(object, ...)
## S3 method for class 'kspm' deviance(object, ...)
object |
an object of class "kspm", usually, a result of a call to |
... |
additional optional argument (currently unused). |
This function extracts deviance of a model fitted using kspm
function. The returned deviance is the residual sum of square (RSS).
The value of the deviance extracted from the object object
.
Catherine Schramm, Aurelie Labbe, Celia Greenwood
x <- 1:15 y <- 3*x + rnorm(15, 0, 2) fit <- kspm(y, kernel = ~ Kernel(x, kernel.function = "linear")) deviance(fit)
x <- 1:15 y <- 3*x + rnorm(15, 0, 2) fit <- kspm(y, kernel = ~ Kernel(x, kernel.function = "linear")) deviance(fit)
A dataset containing the energy consumption and other attributes during 22 days.
energy
energy
A data frame with 504 rows and 7 variables:
energy consumption
date
temperature
pression
humidity rate
hour (categorical)
hour (numerical)
https://iles-ponant-edf-sei.opendatasoft.com, https://www.infoclimat.fr
Computes the Akaike Information Criterion (AIC) for a kspm fit.
## S3 method for class 'kspm' extractAIC(fit, scale = NULL, k = 2, correction = FALSE, ...)
## S3 method for class 'kspm' extractAIC(fit, scale = NULL, k = 2, correction = FALSE, ...)
fit |
fitted model, usually the result of kspm. |
scale |
option not available for kspm fit. |
k |
numeric specifying the 'weight' of the effective degrees of freedom (edf) part in the AIC formula. See details. |
correction |
boolean indicating if the corrected AIC should be computed instead of standard AIC, may be |
... |
additional optional argument (currently unused). |
The criterion used is where
is the residual sum of squares and
is the effective degree of freedom of the model.
k = 2
corresponds to the traditional AIC, using k = log(n)
provides Bayesian Information Criterion (BIC) instead. For k=2
, the corrected Akaike's Information Criterion (AICc) is obtained by .
extractAIC.kspm
returns a numeric value corresponding to AIC. Of note, the AIC obtained here differs from a constant to the AIC obtained with extractAIC
applied to a lm object. If one wants to compare a kspm
model with a lm
model, it is preferrable to compute again the lm
model using kspm function by specifying kernel = NULL
and apply extractAIC
method on this model.
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric regression of multidimensional genetic pathway data: least squares kernel machines and linear mixed models. Biometrics, 63(4), 1079:1088.
stepKSPM for variable selection procedure based on AIC.
x <- 1:15 y <- 3*x + rnorm(15, 0, 2) fit <- kspm(y, kernel = ~ Kernel(x, kernel.function = "linear")) extractAIC(fit)
x <- 1:15 y <- 3*x + rnorm(15, 0, 2) fit <- kspm(y, kernel = ~ Kernel(x, kernel.function = "linear")) extractAIC(fit)
Returns fitted values for a model of class "kspm".
## S3 method for class 'kspm' fitted(object, ...)
## S3 method for class 'kspm' fitted(object, ...)
object |
an object of class "kspm", usually, a result of a call to |
... |
additional optional argument (currently unused). |
The vector of fitted values.
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric regression of multidimensional genetic pathway data: least squares kernel machines and linear mixed models. Biometrics, 63(4), 1079:1088.
kspm for fitting model, residuals.kspm, coef.kspm, nobs.kspm.
x <- 1:15 z <- runif(15, 1, 6) y <- 3*x + z^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(z, kernel.function = "polynomial", d = 2, rho = 1, gamma = 0)) fitted(fit)
x <- 1:15 z <- runif(15, 1, 6) y <- 3*x + z^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(z, kernel.function = "polynomial", d = 2, rho = 1, gamma = 0)) fitted(fit)
for flexibility in summary method for an object of class "summary.kspm"
flexible.summary(object, method = "davies", acc = 1e-06, lim = 10000)
flexible.summary(object, method = "davies", acc = 1e-06, lim = 10000)
object |
an object of class "summary.kspm", usually, a result of a call to |
method |
method to approximate the chi square distribution in p-value computation, default is 'davies', another possibility is 'imhof'. |
acc , lim
|
see davies and imhof functions in CompQuadForm package. |
the description of the model, including coefficients for the linear part and if asked for, test(s) of variance components associated with kernel part.
Computes and returns the followimg summary statistics of the fitted kernel semi parametric model given in object
residuals |
residuals |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error |
edf |
effective degrees of freedom |
r.squared |
|
adj.r.squared |
the above |
score.test |
a |
global.p.value |
p value from the score test for the global model. |
sample.size |
sample size (all: global sample size, inc: complete data sample size). |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric regression of multidimensional genetic pathway data: least squares kernel machines and linear mixed models. Biometrics, 63(4), 1079:1088.
Schweiger, Regev, et al. "RL SKAT: an exact and efficient score test for heritability and set tests." Genetics (2017): genetics 300395.
Li, Shaoyu, and Yuehua Cui. "Gene centric gene gene interaction: A model based kernel machine method." The Annals of Applied Statistics 6.3 (2012): 1134:1161.
kspm for fitting model, predict.kspm for predictions, plot.kspm for diagnostics
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) summary.fit <- summary(fit) flexible.summary(summary.fit, acc = 0.000001, lim = 1000)
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) summary.fit <- summary(fit) flexible.summary(summary.fit, acc = 0.000001, lim = 1000)
internal function to compute model parameters
get.parameters(X = NULL, Y = NULL, kernelList = NULL, free.parameters = NULL, n = NULL, not.missing = NULL, compute.kernel = NULL)
get.parameters(X = NULL, Y = NULL, kernelList = NULL, free.parameters = NULL, n = NULL, not.missing = NULL, compute.kernel = NULL)
X |
X matrix |
Y |
response matrix |
kernelList |
list of kernels |
free.parameters |
free parameters |
n |
number of samples |
not.missing |
number of non missing samples |
compute.kernel |
boolean indicating if kernel should be computed |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Returns hyper-parameters for a model of class "kspm".
hypercoef(object, ...)
hypercoef(object, ...)
object |
an object of class "kspm", usually, a result of a call to |
... |
additional optional argument (currently unused). |
A list of parameter.
lambda |
A vector of penalisation arameters. |
kernel |
A vector of tunning parameters. |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric regression of multidimensional genetic pathway data: least squares kernel machines and linear mixed models. Biometrics, 63(4), 1079:1088.
kspm for fitting model.
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) hypercoef(fit)
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) hypercoef(fit)
gives information about Kernel Semi parametric Model Fits
info.kspm(object, print = TRUE)
info.kspm(object, print = TRUE)
object |
an object of class "kspm", usually, a result of a call to |
print |
logical, if |
info.kspm
returns a table of information whose each row corresponds to a kernel included in the model and columns are:
type |
type of object used to define the kernel |
dim |
dimension of data used in the model |
type.predict |
type of object the user should provide in predict.kspm function |
dim.predict |
dimension of object the user should provide in predict.kspm function |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Create a kernel object, to use as variable in a model formula.
Kernel(x, kernel.function, scale = TRUE, rho = NULL, gamma = NULL, d = NULL)
Kernel(x, kernel.function, scale = TRUE, rho = NULL, gamma = NULL, d = NULL)
x |
a formula, a vector or a matrix of variables grouped in the same kernel. It could also be a symetric matrix representing the Gram matrix, associated to a kernel function, already computed by the user. |
kernel.function |
type of kernel. Possible values are |
scale |
boolean indicating if variables should be scaled before computing the kernel. |
rho , gamma , d
|
kernel function hyperparameters. See details below. |
To use inside kspm() function. Given two dimensional vectors
and
,
the Gaussian kernel is defined as where
is the Euclidean distance between
and
and
is the bandwidth of the kernel,
the linear kernel is defined as ,
the polynomial kernel is defined as with
,
is the polynomial order. Of note, a linear kernel is a polynomial kernel with
and
,
the sigmoid kernel is defined as which is similar to the sigmoid function in logistic regression,
the inverse quadratic function defined as with
,
the equality kernel defined as .
Of note, Gaussian, inverse quadratic and equality kernels are measures of similarity resulting to a matrix containing 1 along the diagonal.
A Kernel object including all parameters needed in computation of the model
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric regression of multidimensional genetic pathway data: least squares kernel machines and linear mixed models. Biometrics, 63(4), 1079:1088.
These functions transform a matrix into a
kernel matrix.
kernel.gaussian(x, rho = ncol(x)) kernel.linear(x) kernel.polynomial(x, rho = 1, gamma = 0, d = 1) kernel.sigmoid(x, rho = 1, gamma = 1) kernel.inverse.quadratic(x, gamma = 1) kernel.equality(x)
kernel.gaussian(x, rho = ncol(x)) kernel.linear(x) kernel.polynomial(x, rho = 1, gamma = 0, d = 1) kernel.sigmoid(x, rho = 1, gamma = 1) kernel.inverse.quadratic(x, gamma = 1) kernel.equality(x)
x |
a |
gamma , rho , d
|
kernel hyperparameters (see details) |
Given two dimensional vectors
and
,
the Gaussian kernel is defined as where
is the Euclidean distance between
and
and
is the bandwidth of the kernel,
the linear kernel is defined as ,
the polynomial kernel is defined as with
,
is the polynomial order. Of note, a linear kernel is a polynomial kernel with
and
,
the sigmoid kernel is defined as which is similar to the sigmoid function in logistic regression,
the inverse quadratic function defined as with
,
the equality kernel defined as .
Of note, Gaussian, inverse quadratic and equality kernels are measures of similarity resulting to a matrix containing 1 along the diagonal.
A matrix.
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric regression of multidimensional genetic pathway data: least squares kernel machines and linear mixed models. Biometrics, 63(4), 1079:1088.
internal method for listing all kernel parts included in the model
kernel.list(formula, data, names)
kernel.list(formula, data, names)
formula |
kernel part formula provided in the |
data |
data provided in the |
names |
row names of samples as they are evaluated in |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
These functions transform a matrix into a
kernel matrix.
kernel.matrix(Z, whichkernel, rho = NULL, gamma = NULL, d = NULL)
kernel.matrix(Z, whichkernel, rho = NULL, gamma = NULL, d = NULL)
Z |
a |
whichkernel |
kernel function |
gamma , rho , d
|
kernel hyperparameters (see details) |
Given a matrix, this function returns a
matrix where each cell represents the similarity between two samples defined by two
dimensional vectors
and
,
the Gaussian kernel is defined as where
is the Euclidean distance between
and
and
is the bandwidth of the kernel,
the linear kernel is defined as ,
the polynomial kernel is defined as with
,
is the polynomial order. Of note, a linear kernel is a polynomial kernel with
and
,
the sigmoid kernel is defined as which is similar to the sigmoid function in logistic regression,
the inverse quadratic function defined as with
,
the equality kernel defined as .
A matrix.
Catherine Schramm, Aurelie Labbe, Celia Greenwood
kernel.gaussian, kernel.linear, kernel.polynomial, kernel.equality, kernel.sigmoid, kernel.inverse.quadratic.
internal methods
comb(x, ...) check.integer(N) asOneSidedFormula(object) splitFormula(form, sep = "/") computes.Kernel(x, ind, nameKernel, not.missing = NULL) computes.Kernel.interaction(x, ind, nameKernel, not.missing = NULL) computes.KernelALL(kernelList, not.missing = NULL) renames.Kernel(object, names) objects.Kernel(formula)
comb(x, ...) check.integer(N) asOneSidedFormula(object) splitFormula(form, sep = "/") computes.Kernel(x, ind, nameKernel, not.missing = NULL) computes.Kernel.interaction(x, ind, nameKernel, not.missing = NULL) computes.KernelALL(kernelList, not.missing = NULL) renames.Kernel(object, names) objects.Kernel(formula)
x |
list of objects |
... |
other arguments |
N |
numeric value |
object |
formula provided in the kernel part of |
form |
formula |
sep |
separator |
ind |
index value |
nameKernel |
name of kernel |
not.missing |
non missing values |
kernelList |
list of kernels |
names |
name of kernel |
formula |
formula |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
kspm is used to fit kernel semi parametric models.
kspm(response, linear = NULL, kernel = NULL, data = NULL, level = 1, control = kspmControl())
kspm(response, linear = NULL, kernel = NULL, data = NULL, level = 1, control = kspmControl())
response |
a character with the name of the response variable or a vector containing the outcome or a matrix with outcome in the first column. |
linear |
an optional object of class "formula": a symbolic description of the linear part of the model to be fitted or a vector or a matrix containing covariates included in the linear part of the model. Default is intercept only. The details of model specification are given under ‘Details’. |
kernel |
an object of class "formula": a symbolic description of the kernel part of the model to be fitted. If missing a linear model is fitted using lm function. The details of model specification are given under ‘Details’. |
data |
an optional data frame containing the variables in the model. If NULL (default), data are taken from the workspace. |
level |
printed information about the model (0: no information, 1: information about kernels included in the model (default)) |
control |
see kspmControl. |
The kernel semi parametric model refers to the following equation with
where
is the sample size,
is the univariate response,
is the linear part,
is the kernel part and
are the residuals. The linear part is defined using the
linear
argument by specifying the covariates . It could be either a formula, a vector of length
if only one variable is included in the linear part or a
design matrix containing the values of the
covariates included in the linear part (columns), for each individuals (rows). By default, an intercept is included. To remove the intercept term, use formula specification and add the term
-1
, as usual. Kernel part is defined using the kernel
argument. It should be a formula of Kernel
object(s). For a multiple kernel semi parametric model, Kernel
objects are separated by the usual signs "+"
, "*"
and ":"
to specify addition and interaction between kernels. Specification formats of each Kernel
object may be different. See Kernel for more information about their specification.
kspm
returns an object of class kspm.
An object of class kspm is a list containing the following components:
linear.coefficients |
matrix of coefficients associated with linear part, the number of coefficients is the number of terms included in linear part |
kernel.coefficients |
matrix of coefficients associated with kernel part, the number of rows is the sample size included in the analysis and the number of columns is the number of kernels included in the model |
lambda |
penalization parameter(s) |
fitted.values |
the fitted mean values |
residuals |
the residuals, that is response minus the fitted values |
sigma |
standard deviation of residuals |
Y |
vector of responses |
X |
design matrix for linear part |
K |
kernel matrices computed by the model |
n.total |
total sample size |
n |
sample size of the model (model is performed on complete data only) |
edf |
effective degree of freedom |
linear.formula |
formula corresponding to the linear part of the model |
kernel.info |
information about kernels included in the model such as matrices of covariates ( |
Hat |
The hat matrix |
L |
A matrix corresponding to |
XLX_inv |
A matrix corresponding to |
GinvM |
A list of matrix, each corresponding to a kernel and equaling |
control |
List of control parameters |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric regression of multidimensional genetic pathway data: least squares kernel machines and linear mixed models. Biometrics, 63(4), 1079:1088.
Kim, Choongrak, Byeong U. Park, and Woochul Kim. "Influence diagnostics in semiparametric regression models." Statistics and probability letters 60.1 (2002): 49:58.
Oualkacha, Karim, et al. "Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness." Genetic epidemiology 37.4 (2013): 366:376.
summary.kspm for summary, predict.kspm for predictions, plot.kspm for diagnostics
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) summary(fit)
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) summary(fit)
Allow the user to set some characteristics of the optimisation algorithm
kspmControl(interval.upper = NA, interval.lower = NA, trace = FALSE, optimize.tol = .Machine$double.eps^0.25, NP = NA, itermax = 500, CR = 0.5, F = 0.8, initialpop = NULL, storepopfrom = itermax + 1, storepopfreq = 1, p = 0.2, c = 0, reltol = sqrt(.Machine$double.eps), steptol = itermax, parallel = FALSE)
kspmControl(interval.upper = NA, interval.lower = NA, trace = FALSE, optimize.tol = .Machine$double.eps^0.25, NP = NA, itermax = 500, CR = 0.5, F = 0.8, initialpop = NULL, storepopfrom = itermax + 1, storepopfreq = 1, p = 0.2, c = 0, reltol = sqrt(.Machine$double.eps), steptol = itermax, parallel = FALSE)
interval.upper |
integer or vetor of initial maximum value(s) allowed for parameter(s) |
interval.lower |
integer or vetor of initial maximum value(s) allowed for parameter(s) |
trace |
boolean. If TRUE parameters value at each iteration are displayed. |
optimize.tol |
|
NP |
if DEoptim function is used. See DEoptim.control |
itermax |
if DEoptim function is used. See DEoptim.control |
CR |
if DEoptim function is used. See DEoptim.control |
F |
if DEoptim function is used. See DEoptim.control |
initialpop |
if DEoptim function is used. See DEoptim.control |
storepopfrom |
if DEoptim function is used. See DEoptim.control |
storepopfreq |
if DEoptim function is used. See DEoptim.control |
p |
if DEoptim function is used. See DEoptim.control |
c |
if DEoptim function is used. See DEoptim.control |
reltol |
if DEoptim function is used. See DEoptim.control |
steptol |
if DEoptim function is used. See DEoptim.control |
parallel |
if DEoptim function is used. See DEoptim.control |
When only one hyperparameter should be estimated, the optimisation problem calls the optimize function from stats
basic package. Otherwise, it calls the DEoptim function from the package DEoptim
. In both case, the parameters are choosen among the initial interval defined by interval.lower
and interval.upper
.
search.parameters
is an iterative algorithm estimating model parameters and returns the following components:
lambda |
tuning parameters for penalization. |
beta |
vector of coefficients associated with linear part of the model, the size being the number of variable in linear part (including an intercept term). |
alpha |
vector of coefficients associated with kernel part of the model, the size being the sample size. |
Ginv |
a matrix used in several calculations. |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
link get.parameters for computation of parameters at each iteration
Returns the Log Likelihood value of the kernel semi parametric model represented by obect
evaluated at the estimated coefficients.
## S3 method for class 'kspm' logLik(object, ...)
## S3 method for class 'kspm' logLik(object, ...)
object |
an object of class "kspm", usually, a result of a call to kspm. |
... |
additional optional argument (currently unused). |
The function returns the Log Likelihood computed as follow: where
is the residual sum of squares.
logLik of kspm fit
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric regression of multidimensional genetic pathway data: least squares kernel machines and linear mixed models. Biometrics, 63(4), 1079:1088.
kspm, extractAIC.kspm, deviance.kspm
x <- 1:15 y <- 3*x + rnorm(15, 0, 2) fit <- kspm(y, kernel = ~ Kernel(x, kernel.function = "linear")) logLik(fit)
x <- 1:15 y <- 3*x + rnorm(15, 0, 2) fit <- kspm(y, kernel = ~ Kernel(x, kernel.function = "linear")) logLik(fit)
internal function to optimize model for estimating hyperparameters based on LOOE
lossFunction.looe(param. = NULL, Y. = NULL, X. = NULL, kernelList. = NULL, n. = NULL, not.missing. = NULL, compute.kernel. = NULL, print.lambda. = FALSE)
lossFunction.looe(param. = NULL, Y. = NULL, X. = NULL, kernelList. = NULL, n. = NULL, not.missing. = NULL, compute.kernel. = NULL, print.lambda. = FALSE)
param. |
initial parameter values. |
Y. |
response matrix. |
X. |
X matrix (linear part). |
kernelList. |
list of kernels (kernel part). |
n. |
nb of samples. |
not.missing. |
nb of non missing samples. |
compute.kernel. |
boolean. If TRUE, the kernel matrix is computed at each iteration. Should be TRUE when hyperparameters of kernel functions should be estimated by the model. |
print.lambda. |
boolean. If TRUE, values of tunning parameters (lambda) are printed at each iteration. |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Extract the number of observations use to estimate the model coefficients. This is principally intented to be used in computing BIC (see extractAIC.kspm).
## S3 method for class 'kspm' nobs(object, ...)
## S3 method for class 'kspm' nobs(object, ...)
object |
an object of class "kspm", usually, a result of a call to |
... |
additional optional argument (currently unused). |
A single number (integer).
Catherine Schramm, Aurelie Labbe, Celia Greenwood
kspm for fitting model, extractAIC.kspm.
x <- 1:15 y <- 3*x + rnorm(15, 0, 2) fit <- kspm(y, kernel = ~ Kernel(x, kernel.function = "linear")) nobs(fit)
x <- 1:15 y <- 3*x + rnorm(15, 0, 2) fit <- kspm(y, kernel = ~ Kernel(x, kernel.function = "linear")) nobs(fit)
Plot of derivatives for kernel part of a kspm model.
## S3 method for class 'derivatives' plot(x, subset = NULL, xlab = NULL, ylab = NULL, ...)
## S3 method for class 'derivatives' plot(x, subset = NULL, xlab = NULL, ylab = NULL, ...)
x |
an object of class "derivatives", usually, a result of a call to |
subset |
if a subset of the plots is required, specify the names of the variable for which plot of derivatives is required. |
xlab |
x label |
ylab |
y label |
... |
further arguments passed to or from other methods. |
X axis represents the raw data used as input in kernel part of the model. Y axis represents the pointwise derivative values i.e. the derivatives of fitted value according to the variable of interest.
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Kim, Choongrak, Byeong U. Park, and Woochul Kim. "Influence diagnostics in semiparametric regression models." Statistics and probability letters 60.1 (2002): 49:58.
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) plot(derivatives(fit))
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) plot(derivatives(fit))
Five plots (selectable by which
) are currently available: a plot of residuals against fitted values, a scale Location plot of against fitted values, a Normal Q Q plot for residuals, a plot of Cook's distances versus row labels and a plot of residuals against leverages. By default, the first three and 5 are provided.
## S3 method for class 'kspm' plot(x, which = c(1:3, 5), cook.levels = c(0.5, 1), id.n = 3, labels.id = names(x$residuals), cex.id = 0.75, col.id = "blue", ...)
## S3 method for class 'kspm' plot(x, which = c(1:3, 5), cook.levels = c(0.5, 1), id.n = 3, labels.id = names(x$residuals), cex.id = 0.75, col.id = "blue", ...)
x |
an object of class "kspm", usually, a result of a call to |
which |
if a subset of the plots is required, specify a subset of the numbers 1:5. |
cook.levels |
levels of Cook's distance at which to draw contours. |
id.n |
number of points to be labelled in each plot, starting with the most extreme. |
labels.id |
vector of labels, from which the labels for extreme points will be chosen. NULL uses names associated to response specified in |
cex.id |
size of point labels. |
col.id |
color of point labels. |
... |
further arguments passed to or from other methods. |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Kim, Choongrak, Byeong U. Park, and Woochul Kim. "Influence diagnostics in semiparametric regression models." Statistics and probability letters 60.1 (2002): 49:58.
kspm for fitting the model, summary.kspm for summary
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) plot(fit)
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) plot(fit)
predict method for class "kspm".
## S3 method for class 'kspm' predict(object, newdata.linear = NULL, newdata.kernel = NULL, interval = "none", level = 0.95, ...)
## S3 method for class 'kspm' predict(object, newdata.linear = NULL, newdata.kernel = NULL, interval = "none", level = 0.95, ...)
object |
an object of class "kspm", usually, a result of a call to |
newdata.linear |
should be a data frame or design matrix of variables used in the linear part |
newdata.kernel |
a list containing data frame or design matrix of variables used in each kernel part depending on the specification format of each kernel. When a kernel has been specified using |
interval |
type of interval calculation. If |
level |
confidence level. Default is |
... |
further arguments passed to or from other methods. |
predict.kspm
produces predicted values. If a new dataset is not specified, it will return the fitted values from the original data (complete data used in the model specification). If predict.kspm
is applied to a new dataset, all variables used in the original model should be provided in newdata.linear
and newdata.kernel
arguments but only complete data may be provided. Setting interval
specifies computation of confidence or prediction intervals at the specified level
.
predict.kspm
returns a vector of predictions or a matrix containing the following components if interval
is set:
fit |
predictions. |
lwr |
lower bound of confidence/prediction intervals. |
upr |
upper bound of confidence/prediction intervals. |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) predict(fit, interval = "confidence")
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) predict(fit, interval = "confidence")
print method for class "kspm".
## S3 method for class 'kspm' print(x, ...) ## S3 method for class 'summary.kspm' print(x, ...)
## S3 method for class 'kspm' print(x, ...) ## S3 method for class 'summary.kspm' print(x, ...)
x |
an object used to select a method. Usually, a result of a call to |
... |
additional optional argument (currently unused). |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
kspm for fitting model, summary.kspm
Returns the vector of residuals for a model fit of class "kspm".
## S3 method for class 'kspm' residuals(object, ...)
## S3 method for class 'kspm' residuals(object, ...)
object |
an object of class "kspm", usually, a result of a call to |
... |
additional optional argument (currently unused). |
A vector of residuals. The vector length is the number of observations used in model coefficients estimation (see nobs.kspm).
Catherine Schramm, Aurelie Labbe, Celia Greenwood
kspm for fitting model, nobs.kspm, rstandard.kspm.
x <- 1:15 y <- 3*x + rnorm(15, 0, 2) fit <- kspm(y, kernel = ~ Kernel(x, kernel.function = "linear")) residuals(fit)
x <- 1:15 y <- 3*x + rnorm(15, 0, 2) fit <- kspm(y, kernel = ~ Kernel(x, kernel.function = "linear")) residuals(fit)
computes standardized residuals for an object of class "kspm".
## S3 method for class 'kspm' rstandard(model, ...)
## S3 method for class 'kspm' rstandard(model, ...)
model |
an model of class "kspm", usually, a result of a call to |
... |
furter arguments passed to or from other methods (currently unused). |
Standardized residuals are obtained by
where
is the residual,
is the estimated standard deviation of the errors and
is the leverage of subject i, i.e. the i th diagonal element of the Hat matrix.
a vector containing the standardized residuals.
Catherine Schramm, Aurelie Labbe, Celia Greenwood
kspm for fitting model, residuals.kspm, cooks.distance.kspm, plot.kspm.
internal function to optimize model for estimating hyperparameters
search.parameters(Y = NULL, X = NULL, kernelList = NULL, n = NULL, not.missing = NULL, compute.kernel = NULL, controlKspm = NULL)
search.parameters(Y = NULL, X = NULL, kernelList = NULL, n = NULL, not.missing = NULL, compute.kernel = NULL, controlKspm = NULL)
Y |
response matrix |
X |
X matrix |
kernelList |
of kernels |
n |
nb of samples |
not.missing |
nb of non missing samples |
compute.kernel |
boolean kernel computation |
controlKspm |
control parameters |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Returns the residuals standard deviation (sigma) for object of class "kspm".
## S3 method for class 'kspm' sigma(object, ...)
## S3 method for class 'kspm' sigma(object, ...)
object |
an object of class "kspm", usually, a result of a call to |
... |
additional optional argument (currently unused). |
The value returned by the method is where
is the residual sum of squares and
is the effective degree of freedom.
typically a number, the estimated standard deviation of the errors ("residual standard deviation")
Catherine Schramm, Aurelie Labbe, Celia Greenwood
kspm for fitting model, summary.kspm, residuals.kspm, nobs.kspm, deviance.kspm.
Performs stepwise model selection for Kernel Semi Parametric Model by AIC or BIC.
stepKSPM(object, data = NULL, linear.lower = NULL, linear.upper = NULL, kernel.lower = NULL, kernel.upper = NULL, direction = "both", k = 2, kernel.param = "fixed", trace = TRUE)
stepKSPM(object, data = NULL, linear.lower = NULL, linear.upper = NULL, kernel.lower = NULL, kernel.upper = NULL, direction = "both", k = 2, kernel.param = "fixed", trace = TRUE)
object |
an object of class "kspm" with only one kernel. |
data |
data. |
linear.lower |
one side formula corresponding to the smallest set of variables that should be included in the linear part of the model. |
linear.upper |
one side formula corresponding to the largest set of variables that may be included in the linear part of the model. |
kernel.lower |
one side formula corresponding to the smallest set of variables that should be included in the kernel part of the model. |
kernel.upper |
one side formula corresponding to the largest set of variables that may be included in the kernel part of the model. |
direction |
the mode of stepwise search, can be one of "both" (default), "backward", or "forward". |
k |
type of information criteria used for the variable selection. If |
kernel.param |
define if hyperparameters should be fixed ( |
trace |
integer. If positive, information is printed during the running of step.kspm. Larger values may give more information on the fitting process. |
This procedure may be done on kspm
object defined with only one kernel part and for which a data frame including all variables was provided. Selection may be done on linear part only, on kernel part only or on both at the same time. To perform selection on linear (resp. kernel) part only, kernel.lower
and kernel.upper
(resp. linear.lower
and linear.upper
) should contain all the variables that should stay in the model for kernel (resp. linear) part.
stepKSPM
returns the selected model.
Catherine Schramm, Aurelie Labbe, Celia Greenwood
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 4) z3 <- rnorm(15, 6, 2) z4 <- runif(15, -10, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) dfrm <- data.frame(x = x, z1 = z1, z2 = z2, z3 = z3, z4 = z4, y = y) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2 + z3 + z4, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0), data = dfrm) stepKSPM(fit, k = 2, data = dfrm)
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 4) z3 <- rnorm(15, 6, 2) z4 <- runif(15, -10, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) dfrm <- data.frame(x = x, z1 = z1, z2 = z2, z3 = z3, z4 = z4, y = y) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2 + z3 + z4, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0), data = dfrm) stepKSPM(fit, k = 2, data = dfrm)
summary method for an object of class "kspm"
## S3 method for class 'kspm' summary(object, kernel.test = "all", global.test = FALSE, ...)
## S3 method for class 'kspm' summary(object, kernel.test = "all", global.test = FALSE, ...)
object |
an object of class "kspm", usually, a result of a call to |
kernel.test |
vector of characters indicating for which kernel a test should be performed. Default is |
global.test |
logical, if |
... |
further arguments passed to or from other methods. |
the description of the model, including coefficients for the linear part and if asked for, test(s) of variance components associated with kernel part.
Computes and returns the followimg summary statistics of the fitted kernel semi parametric model given in object
residuals |
residuals |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error |
edf |
effective degrees of freedom |
r.squared |
|
adj.r.squared |
the above |
score.test |
a |
global.p.value |
p value from the score test for the global model. |
sample.size |
sample size (all: global sample size, inc: complete data sample size). |
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric regression of multidimensional genetic pathway data: least squares kernel machines and linear mixed models. Biometrics, 63(4), 1079:1088.
Schweiger, Regev, et al. "RL SKAT: an exact and efficient score test for heritability and set tests." Genetics (2017): genetics 300395.
Li, Shaoyu, and Yuehua Cui. "Gene centric gene gene interaction: A model based kernel machine method." The Annals of Applied Statistics 6.3 (2012): 1134:1161.
kspm for fitting model, predict.kspm for predictions, plot.kspm for diagnostics
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) summary(fit)
x <- 1:15 z1 <- runif(15, 1, 6) z2 <- rnorm(15, 1, 2) y <- 3*x + (z1 + z2)^2 + rnorm(15, 0, 2) fit <- kspm(y, linear = ~ x, kernel = ~ Kernel(~ z1 + z2, kernel.function = "polynomial", d= 2, rho = 1, gamma = 0)) summary(fit)
Perform score tests for kernel part in kernel semi parametric model
test.1.kernel(object) test.global.kernel(object) test.k.kernel(object, kernel.name)
test.1.kernel(object) test.global.kernel(object) test.k.kernel(object, kernel.name)
object |
an object of class "kspm" |
kernel.name |
vector of character listing names of kernels for which test should be performed |
p values
Catherine Schramm, Aurelie Labbe, Celia Greenwood
Schweiger, Regev, et al. "RL SKAT: an exact and efficient score test for heritability and set tests." Genetics (2017): genetics 300395.
Li, Shaoyu, and Yuehua Cui. "Gene centric gene gene interaction: A model based kernel machine method." The Annals of Applied Statistics 6.3 (2012): 1134:1161.
Oualkacha, Karim, et al. "Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness." Genetic epidemiology 37.4 (2013): 366:376.
Ge, Tian, et al. "A kernel machine method for detecting effects of interaction between multidimensional variable sets: An imaging genetics application." Neuroimage 109 (2015): 505:514.
Simple utility returning names of variables involved in a kernel semi parametric model.
## S3 method for class 'kspm' variable.names(object, ...)
## S3 method for class 'kspm' variable.names(object, ...)
object |
an object of class "kspm", usually, a result of a call to |
... |
additional optional argument (currently unused). |
a list of character vectors. The first element correspond to the names of variables included in the linear part of the model. Then, a vector containing names of variables including in kernel part is provided for each kernel.
Catherine Schramm, Aurelie Labbe, Celia Greenwood
kspm, summary.kspm, case.names.kspm.