# Semi-Confirmatory Structural Equation Modeling

In this example, we will show how to use lslx to conduct semi-confirmatory structural equation modeling. The example uses data PoliticalDemocracy in the package lavaan. Hence, lavaan must be installed.

## Model Specification

In the following specification, x1 - x3 and y1 - y8 is assumed to be measurements of 3 latent factors: ind60, dem60, and dem65.

model_sem <- "fix(1) * x1 + x2 + x3      <=: ind60
fix(1) * y1 + y2 + y3 + y4 <=: dem60
fix(1) * y5 + y6 + y7 + y8 <=: dem65
dem60 <= ind60
dem65 <= ind60 + dem60"

The operator <=: means that the RHS latent factors is defined by the LHS observed variables. In particular, the loadings are freely estimated. In this model, ind60 is measured by x1 - x3, dem60 is mainly measured by y1 - y4, and dem65 is mainly measured by y5 - y8. The operator <= means that the regression coefficients from the RHS variables to the LHS variables are freely estimated. In this model, dem60 is influenced by ind60 and dem65 is influenced by dem60 and ind60. Details of model syntax can be found in the section of Model Syntax via ?lslx.

## Object Initialization

lslx is written as an R6 class. Everytime we conduct analysis with lslx, an lslx object must be initialized. The following code initializes an lslx object named lslx_sem.

library(lslx)
lslx_sem <- lslx$new(model = model_sem, sample_cov = cov(lavaan::PoliticalDemocracy), sample_size = nrow(lavaan::PoliticalDemocracy)) An 'lslx' R6 class is initialized via 'sample_cov' argument. Response Variables: x1 x2 x3 y1 y2 y3 y4 y5 y6 y7 y8 Latent Factors: ind60 dem60 dem65  Here, lslx is the object generator for lslx object and $new() is the build-in method of lslx to generate a new lslx object. The initialization of lslx requires users to specify a model for model specification (argument model) and a sample moments to be fitted (argument sample_cov and sample_size). The sample moment must contain all the observed variables specified in the given model.

## Model Respecification

After an lslx object is initialized, model can be respecified by $free_coefficient(), $fix_coefficient(), and $penalize_coefficient() methods. The following code sets y1<->y5, y2<->y4, y2<->y6, y3<->y7, y4<->y8, and y6<->y8 as penalized parameters. lslx_sem$penalize_coefficient(name = c("y1<->y5",
"y2<->y4",
"y2<->y6",
"y3<->y7",
"y4<->y8",
"y6<->y8"))
The relation y5<->y1 under g is set as PENALIZED with starting value = 0.
The relation y4<->y2 under g is set as PENALIZED with starting value = 0.
The relation y6<->y2 under g is set as PENALIZED with starting value = 0.
The relation y7<->y3 under g is set as PENALIZED with starting value = 0.
The relation y8<->y4 under g is set as PENALIZED with starting value = 0.
The relation y8<->y6 under g is set as PENALIZED with starting value = 0. 

To see more methods for respecifying model, please check the section of Set-Related Method via ?lslx.

## Model Fitting

After an lslx object is initialized, method $fit_lasso() can be used to fit the specified model into the given data with LASSO penalty function. lslx_sem$fit_lasso()
CONGRATS: Algorithm converges under EVERY specified penalty level.
Specified Tolerance for Convergence: 0.001
Specified Maximal Number of Iterations: 100 

The $fit_lasso() requires users to specify the considered penalty levels (argument lambda_grid). In this example, the lambda grid is automatically initialized by default. Note that MCP with delta = Inf is equivalent to the LASSO penalty. All the fitting result will be stored in the fitting field of lslx_sem. ## Model Summarizing Unlike traditional SEM analysis, lslx fit the model into data under all the penalty levels considered. To summarize the fitting result, a selector to determine an optimal penalty level must be specified. Available selectors can be found in the section of Penalty Level Selection via ?lslx. The following code summarize the fitting result under the penalty level selected by adjusted Bayesian information criterion (ABIC). lslx_sem$summarize(selector = "abic")
General Information
number of observations                                 75
number of complete observations                        75
number of missing patterns                           none
number of groups                                        1
number of responses                                    11
number of factors                                       3
number of free coefficients                            36
number of penalized coefficients                        6

Numerical Conditions
selected lambda                                     0.058
selected delta                                       none
selected step                                        none
objective value                                     0.806
objective Hessian convexity                         0.027
number of iterations                               15.000
loss value                                          0.593
number of non-zero coefficients                    42.000
degrees of freedom                                 35.000
robust degrees of freedom                             NaN
scaling factor                                        NaN

Fit Indices
root mean square error of approximation (rmsea)     0.060
comparative fit index (cfi)                         0.986
non-normed fit index (nnfi)                         0.978
standardized root mean of residual (srmr)           0.049

Likelihood Ratio Test
statistic         df    p-value

Root Mean Square Error of Approximation Test
estimate      lower      upper

Coefficient Test (Std.Error = "observed_information")
type  estimate  std.error  z-value  P(>|z|)  lower  upper
x1<-ind60    fixed     1.000        -        -        -      -      -
x2<-ind60     free     2.181      0.139   15.679    0.000  1.908  2.454
x3<-ind60     free     1.819      0.152   11.945    0.000  1.520  2.117
y1<-dem60    fixed     1.000        -        -        -      -      -
y2<-dem60     free     1.318      0.181    7.278    0.000  0.963  1.673
y3<-dem60     free     1.062      0.148    7.155    0.000  0.771  1.353
y4<-dem60     free     1.302      0.149    8.763    0.000  1.011  1.593
y5<-dem65    fixed     1.000        -        -        -      -      -
y6<-dem65     free     1.223      0.169    7.250    0.000  0.892  1.553
y7<-dem65     free     1.293      0.159    8.123    0.000  0.981  1.605
y8<-dem65     free     1.301      0.162    8.023    0.000  0.983  1.619

Regression
type  estimate  std.error  z-value  P(>|z|)  lower  upper
dem60<-ind60     free     1.463      0.392    3.737    0.000  0.696  2.230
dem65<-ind60     free     0.537      0.225    2.385    0.017  0.096  0.978
dem65<-dem60     free     0.845      0.099    8.508    0.000  0.650  1.040

Covariance
type  estimate  std.error  z-value  P(>|z|)  lower  upper
y5<->y1      pen     0.534      0.331    1.614    0.107 -0.114  1.183
y4<->y2      pen     0.576      0.604    0.952    0.341 -0.609  1.760
y6<->y2      pen     1.311      0.607    2.160    0.031  0.122  2.501
y7<->y3      pen     0.300      0.590    0.508    0.611 -0.857  1.457
y8<->y4      pen     0.077      0.423    0.182    0.856 -0.753  0.906
y8<->y6      pen     0.896      0.489    1.832    0.067 -0.063  1.855

Variance
type  estimate  std.error  z-value  P(>|z|)  lower  upper
ind60<->ind60     free     0.454      0.088    5.169    0.000  0.282  0.627
dem60<->dem60     free     3.931      0.929    4.232    0.000  2.110  5.752
dem65<->dem65     free     0.158      0.213    0.743    0.458 -0.259  0.575
x1<->x1     free     0.083      0.020    4.137    0.000  0.044  0.122
x2<->x2     free     0.121      0.071    1.702    0.089 -0.018  0.260
x3<->x3     free     0.473      0.090    5.232    0.000  0.296  0.651
y1<->y1     free     1.947      0.425    4.582    0.000  1.114  2.779
y2<->y2     free     6.337      1.085    5.842    0.000  4.211  8.463
y3<->y3     free     5.134      0.933    5.500    0.000  3.305  6.964
y4<->y4     free     2.805      0.649    4.324    0.000  1.534  4.077
y5<->y5     free     2.373      0.454    5.231    0.000  1.484  3.263
y6<->y6     free     4.327      0.727    5.950    0.000  2.902  5.752
y7<->y7     free     3.393      0.681    4.980    0.000  2.058  4.729
y8<->y8     free     2.925      0.603    4.853    0.000  1.744  4.107

Intercept
type  estimate  std.error  z-value  P(>|z|)  lower  upper
x1<-1     free     0.000      0.085    0.000    1.000 -0.166  0.166
x2<-1     free     0.000      0.174    0.000    1.000 -0.342  0.342
x3<-1     free     0.000      0.162    0.000    1.000 -0.318  0.318
y1<-1     free     0.000      0.302    0.000    1.000 -0.592  0.592
y2<-1     free     0.000      0.445    0.000    1.000 -0.872  0.872
y3<-1     free     0.000      0.377    0.000    1.000 -0.739  0.739
y4<-1     free     0.000      0.385    0.000    1.000 -0.755  0.755
y5<-1     free     0.000      0.300    0.000    1.000 -0.589  0.589
y6<-1     free     0.000      0.381    0.000    1.000 -0.747  0.747
y7<-1     free     0.000      0.379    0.000    1.000 -0.742  0.742
y8<-1     free     0.000      0.372    0.000    1.000 -0.729  0.729

In this example, we can see that the PL estimate under the selected penalty level doesn’t contain any zero value, which indicates that all of the covariance of measurements are relevant. The $summarize() method also shows the result of significance tests for the coefficients. In lslx, the default standard errors are calculated based on sandwich formula whenever raw data is available. In this example, because raw data is not used for lslx object initialization, standard error is calculated by using observed Fisher information matrix. It may not be valid when the model is misspecified and the data are not normal. Also, it is generally invalid after choosing a penalty level. ## Objects Extraction In lslx, many quantities related to SEM can be extracted by extract-related method. For example, the coefficient estimate and its asymptotic variance can be obtained by lslx_sem$extract_coefficient(selector = "abic", type = "effective")
        x1<-1/g         x2<-1/g         x3<-1/g         y1<-1/g         y2<-1/g         y3<-1/g
0.0000          0.0000          0.0000          0.0000          0.0000          0.0000
y4<-1/g         y5<-1/g         y6<-1/g         y7<-1/g         y8<-1/g  dem60<-ind60/g
0.0000          0.0000          0.0000          0.0000          0.0000          1.4631
dem65<-ind60/g  dem65<-dem60/g     x2<-ind60/g     x3<-ind60/g     y2<-dem60/g     y3<-dem60/g
0.5368          0.8452          2.1809          1.8186          1.3182          1.0620
y4<-dem60/g     y6<-dem65/g     y7<-dem65/g     y8<-dem65/g ind60<->ind60/g dem60<->dem60/g
1.3018          1.2229          1.2933          1.3011          0.4544          3.9313
dem65<->dem65/g       x1<->x1/g       x2<->x2/g       x3<->x3/g       y1<->y1/g       y5<->y1/g
0.1579          0.0828          0.1209          0.4732          1.9467          0.5342
y2<->y2/g       y4<->y2/g       y6<->y2/g       y3<->y3/g       y7<->y3/g       y4<->y4/g
6.3369          0.5756          1.3112          5.1343          0.2998          2.8054
y8<->y4/g       y5<->y5/g       y6<->y6/g       y8<->y6/g       y7<->y7/g       y8<->y8/g
0.0769          2.3734          4.3267          0.8960          3.3933          2.9252 
diag(lslx_sem\$extract_coefficient_acov(selector = "abic", type = "effective"))
        x1<-1/g         x2<-1/g         x3<-1/g         y1<-1/g         y2<-1/g         y3<-1/g
0.00716         0.03043         0.02635         0.09134         0.19811         0.14220
y4<-1/g         y5<-1/g         y6<-1/g         y7<-1/g         y8<-1/g  dem60<-ind60/g
0.14822         0.09025         0.14533         0.14327         0.13822         0.15327
dem65<-ind60/g  dem65<-dem60/g     x2<-ind60/g     x3<-ind60/g     y2<-dem60/g     y3<-dem60/g
0.05066         0.00987         0.01935         0.02318         0.03280         0.02203
y4<-dem60/g     y6<-dem65/g     y7<-dem65/g     y8<-dem65/g ind60<->ind60/g dem60<->dem60/g
0.02207         0.02845         0.02535         0.02630         0.00773         0.86307
dem65<->dem65/g       x1<->x1/g       x2<->x2/g       x3<->x3/g       y1<->y1/g       y5<->y1/g
0.04524         0.00040         0.00504         0.00818         0.18047         0.10952
y2<->y2/g       y4<->y2/g       y6<->y2/g       y3<->y3/g       y7<->y3/g       y4<->y4/g
1.17651         0.36540         0.36838         0.87136         0.34834         0.42092
y8<->y4/g       y5<->y5/g       y6<->y6/g       y8<->y6/g       y7<->y7/g       y8<->y8/g
0.17914         0.20582         0.52871         0.23928         0.46422         0.36338 

Here, the type argument is used to specify which types of parameters are used to calculate related quantities. type = "effective" indicates that only freely estimated and penalized non-zero parameters are used. By default, type = "all"