10  Measurement Invariance/Equivalence

If you are familiar with human research, you may have seen some examples of group comparisons. For example, if we want to test the effectiveness of an antidepressant, we may want to compare gender differences (since depression can differ based on gender). However, are you sure that the instrument you are using (e.g., Beck Depression Inventory) has the same structure for men or women?

Another example is in cross-cultural research. We could measure differences in subjective well-being between countries and see if some countries are happier than others. But how can you infer that your results are accurate if you don’t know whether you can compare the latent variable scale scores?

One criterion for comparing scale scores is that the latent variable is understood and measured equivalently across groups/countries. This property is known as measurement invariance or lack of bias. (Svetina et al., 2020).

So, whenever you want to compare test scores between groups in some way, it would be interesting to do an invariance analysis on the scale first. You may have already come across results about IQ, saying that some countries or cultures have higher or lower IQ than others, without even testing measurement invariance.

A test is invariant if participants belonging to different groups, who have the same score on the factor underlying the test, have on average the same score on an observed item (Lubke & Muthén, 2004).

A common statistical method for establishing evidence of measurement invariance is through Multigroup Confirmatory Factor Analysis (CFA-MG). In CFA-MG, we use a hierarchical test set to impose constraints on the parameters of interest across groups.

10.1 Defining Measurement Invariance/Equivalence

A general factor model is defined as

\[ \Sigma = \Lambda \Phi \Lambda' + \Theta \]

Where \(\Sigma\) represents the covariance matrix of the observed variables (or items); \(\Lambda\) is the matrix of factor loadings that expresses the degree of association between the vector of latent variables (with associated covariance matrix \(\Phi\)) to the observed variables; \(\Theta\) represents the covariance matrix of measurement errors for the observed variables.

Consider that \(v\) is the mean structure. Then, the observed variables’ means can be represented by \(E(Y)=E(v+\Lambda \eta + \epsilon)\), where \(\eta\) is the vector of latent variables. This model has the assumption that \(E(\epsilon)=0\) and \(E(\eta)=k=0\), then \(E(X)=v\), where \(X\) is the vector of observed variables. This model generalizes to multiple population by permitting separate covariance matrix for each population or group, i.e., \(\Sigma^{(g)}\) with mean structure \(v^{(g)},g=1,...,G\).

To establish measurement invariance is that if the null hypothesis is \(H_0: \Sigma^{(1)}=\Sigma^{(2)}=...=\Sigma^{(G)}\) is rejected, a series of tests follows. I list them below.

  1. Configural invariance: tests whether the factorial structure is the same across groups (i.e. number of factors, and whether items percent on the same factor). Thus, the number and pattern of parameters are assumed to be equal across groups. Nevertheless, the values of the parameters are assumed to differ withing identification constraints. If configural invariance is not found, this means that the items load in different factors for different groups.

  2. Metric invariance: tests whether the factor loadings of items are equal across groups. The null hypothesis states that the pattern and value of factor loadings are equivalent across groups, i.e., \(H_0: \Lambda^{(1)}=\Lambda^{(2)}=...=\Lambda^{(G)}\). If metric invariance is not found, it means that one or more items of the instrument are being answered with bias by one or more groups. Therefore, inferences of differences between groups may be biased. Although this is an indicator of response bias in the items, the next step is necessary to assume equal comparison between groups.

  3. Scalar invariance: in addition to equal factor loadings, it tests whether the intercepts are equal between groups. Thus, the null hypothesis is \(H_0: \Lambda^{(1)}=\Lambda^{(2)}=...=\Lambda^{(G)}\), and \(v^{(1)}=v^{(2)}=...=v^{(G)}\). If scalar invariance is not found, any differences found between groups are not related to the latent trait, but to the non-equivalence of measurement of the instrument parameters.

  4. Strict Invariance: implies that, in addition to loadings and intercepts, residual variances are similar between groups. Residual variance is simply item variance that is not associated with the latent variable you are measuring.

Some authors advocate the need for strict invariance as a condition for comparing group means (Lubke & Dolan, 2003). In practice, this level of invariance is rarely achieved, given that scalar invariance supports comparisons between groups of manifest (or latent) variable means on the latent variable of interest (Hancock, 1997; Svetina et al., 2020, Thompson & Green, 2006). Thus, most authors suggest achieving scalar invariance instead, in order to conclude invariance (Svetina et al., 2020).

10.2 Measurement Invariance for Categorical Variables

Previously, I described the concept of invariance where the distribution of observed variables is assumed to be multivariate normal. However, in Psychology, most of the surveys are binary of ordinal. If we use the multivariate distribution to categorical variables, we might have consequences on parameters, model fit, and cross-group comparisons (Beauducel & Herzberg, 2006; Lubke & Muthén, 2004; Muthén & Kaplan, 1985). To surpass this, we can use other estimators, like the diagonally weighted least squares (DWLS) family of estimators. The categorical measurement invariance goes as follows (Svetina et al., 2020).

Imagine we have a p X 1 vector of observed variables \(Y\), which take ordered values 0, 1, 2, …, \(C\). For each observed variable \(Y_j\), with \(j=1,2,...,p\), it is assumed that there is an underlying continuous latent response variable \(Y^{*}_j\), which has the value of that determines the observed category of the observed variable \(Y_j\). \(Y^{*}_j\) is related to \(Y_j\) by a set of \(C+1\) threshold parameters \(\tau_j=(\tau_{j0},\tau_{j1},...,\tau_{jC+1})\), where \(\tau_{j0}=-\infty\) and \(\tau_{jC+1}=\infty\). Thus, the probability that \(Y_j = c\) is:

\[ P(Y_j=c)=P(\tau_{jc}≤Y^{*}_j≤\tau_{jc+1}) \]

For \(c=0,1,...,C\). The model for the vector of latent response variables is: \[ Y^*=v+\Lambda\eta+\epsilon \] where factor loadings and residuals are defined the same as before, \(v\) is a vector of latent intercept parameters. In addition, mean and covariance structure of this model is the same: \(E(Y^*)=v\), \(Cov(Y^*)=\Sigma^*=\Lambda\Phi\Lambda'+\Theta\), where \(E(Y^*)=v\) is assumed to be zero for identification purposes.

In the typical scenario, the categorical factor model can be expanded to encompass multiple groups by accommodating distinct thresholds and covariance matrices for the latent response variables within each population. These are denoted as \(\tau^{(k)})\) and \(\Sigma^{*(k)}\), where \(k\) ranges from 1 to \(K\) (with \(v^{(k)}=0\) for all \(k\)). In a similar vein, for ordinal data, assessments are made for “baseline” invariance, equivalent slopes, and equal slopes and thresholds, which mirror configural, metric, and scale invariance, respectively. To ascertain the viability of these invariance assumptions, both overall and difference chi-square tests are employed.

10.3 Measurement Invariance in R

To run a Multigroup Confirmatory Factor Analysis, we must first install the lavaan (Rosseel, 2012) and semTools (Jorgensen et al., 2022) packages for analyses, and psych (Revelle, 2023) for the database.

 install.packages("lavaan")
 install.packages("semTools")
 install.packages("psych")

And tell the program that we are going to use the functions of these packages

library(lavaan) 
This is lavaan 0.6-19
lavaan is FREE software! Please report any bugs.
library(semTools)
 
###############################################################################
This is semTools 0.5-6
All users of R (or SEM) are invited to submit functions or ideas for functions.
###############################################################################
library(psych)

Anexando pacote: 'psych'
Os seguintes objetos são mascarados por 'package:semTools':

    reliability, skew
O seguinte objeto é mascarado por 'package:lavaan':

    cor2cov

To run the analyses, we will use the BFI database (Big Five Personality Factors Questionnaire) that already exists in the psych package. We will differentiate between genders (1=Male and 2=Female).

dat<- psych::bfi

We will store the results of our models in an empty matrix called results, where we will extract chi-square, degrees of freedom, RMSEA, CFI and TLI.

results<-matrix(NA, nrow = 3, ncol = 6)

Let’s do the analysis with the 5 BFI factors. First we place the model.

mod.cat <- "Agre =~ A1 + A2 + A3 + A4 + A5
            Con =~  C1 + C2 + C3 + C4 + C5
            Extr =~ E1 + E2 + E3 + E4 + E5
            Neur =~ N1 + N2 + N3 + N4 + N5
            "

10.3.1 Configural Model

First, let’s make the base model (baseline model), where there are no restrictions (constraints) between the groups.

baseline <- measEq.syntax(configural.model = mod.cat,
                          data = dat,
                          ordered = TRUE,
                          parameterization = "delta",
                          ID.fac = "std.lv",
                          ID.cat = "Wu.Estabrook.2016",
                          group = "gender",
                          group.equal = "configural")

The function measEq.syntax from the semTools package automatically generates the lavaan syntax to perform a confirmatory factor analysis. As can be seen from the baseline model specification, items are treated as ordinals, delta parameterization and Wu and Estabrook’s 2016 model identification are used.

Let’s then fit the base model.

model.baseline <- as.character(baseline)

fit.baseline <- cfa(model.baseline, 
                    data = dat,
                    group = "gender",
                    ordered = TRUE)

Now let’s save the results in the matrix we created initially.

results[1,]<-round(
  data.matrix(fitmeasures(fit.baseline,
                          fit.measures = c("chisq.scaled",
                          "df.scaled",
                          "pvalue.scaled",
                          "rmsea.scaled",
                          "cfi.scaled",
                          "tli.scaled"))),
                      digits=3)

10.3.2 Metric Invariance Model

prop4 <- measEq.syntax(configural.model = mod.cat,
                       data = dat,
                       ordered = TRUE,
                       parameterization = "delta",
                       ID.fac = "std.lv",
                       ID.cat = "Wu.Estabrook.2016",
                       group = "gender",
                       group.equal = c("loadings"))

model.prop4 <- as.character(prop4)

fit.prop4 <- cfa(model.prop4,
                 data = dat,
                 group = "gender",
                 ordered = TRUE)

results[2,]<-round(data.matrix(
  fitmeasures(fit.prop4, 
             fit.measures = c("chisq.scaled",
             "df.scaled",
             "pvalue.scaled",
             "rmsea.scaled",
             "cfi.scaled",
             "tli.scaled"))),
             digits=3)

To examine the relative fit of the model and compare the chi-square statistics between the baseline model and the model where threshold constraints are employed, we use function lavTestLRT.

lavTestLRT(fit.baseline,fit.prop4)
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.baseline 328 NA NA 3894.513 NA NA NA
fit.prop4 344 NA NA 3946.388 27.50618 16 0.0361884

10.3.3 Scalar Invariance Model

prop7 <- measEq.syntax(configural.model = mod.cat,
                       data = dat,
                       ordered = TRUE,
                       parameterization = "delta",
                       ID.fac = "std.lv",
                       ID.cat = "Wu.Estabrook.2016",
                       group = "gender",
                       group.equal = c("thresholds", "loadings"))

model.prop7 <- as.character(prop7)

fit.prop7 <- cfa(model.prop7,
                 data = dat,
                 group = "gender",
                 ordered = TRUE
)


results[3,] <- round(
  data.matrix(
  fitmeasures(fit.prop7, 
  fit.measures = c("chisq.scaled",
  "df.scaled",
  "pvalue.scaled",
  "rmsea.scaled",
  "cfi.scaled",
  "tli.scaled"))),
  digits = 3)

colnames(results) <- c("chisq","df","pvalue","rmsea","cfi","tli")
rownames(results) <- c("baseline","thresholds","loadings")

Examining fit indices (results):

print(results)
              chisq  df pvalue rmsea   cfi   tli
baseline   4111.951 328      0 0.096 0.867 0.846
thresholds 3761.074 344      0 0.090 0.880 0.867
loadings   4120.539 404      0 0.086 0.869 0.877

we noticed that in general, the model fit improved as the models became more restricted by imposing equality of thresholds (prop4) and equality of factor loadings (prop7).

We can perform the chi-square difference test between the threshold invariance (prop4) and threshold + factor loadings (pro7) models to assess the feasibility of measurement invariance.

lavTestLRT(fit.prop4,fit.prop7)
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.prop4 344 NA NA 3946.388 NA NA NA
fit.prop7 404 NA NA 3952.985 20.73053 60 0.9999995

10.4 Data Interpretation

For interpretation, you can see Table 1 of the study by Svetina et al., (2020). In it, there is a summary of several simulation studies and which cutoff points are recommended. For example, if you have an ordinal distribution, you are comparing two groups, with each group having 150/150 or 150/500 or 500/500 participants, having 2 to 4 factors, you should test the levels of measurement invariance through of the chi-square difference with p < 0.05 between each level of invariance. If you consider that you have data with normal distribution, you are comparing 2 groups, you have an N of 150, 250 or 500 per group, and you are comparing only 1 factor, the difference between each CFI invariance level should not be greater than or equal to 0.005, while the RMSEA difference must not be less than or equal to 0.010.

10.5 Comparing Confidence Intervals as Means to Measurement Invariance

A study by Cheung & Lau (2012) showed that, even if only few items are non-invariant in a scale, it can lead to poor fit for the whole scale. Thus, his idea is to compare Confidence Intervals (CIs) between item parameters, which allows us to exclude or treat specific items that shows non-invariance. In this approach, we should conclude that the scale structure is the same if we achieve configural invariance (by using Hu & Bentler’s (1999) recommendations, for example [i.e., CFI and TLI > .95, and RMSEA < .06]). Then, we will conclude other types of invariances if there’s no overlap between the 95% Confidence Intervals (CIs) between the loadings and intercepts parameters between groups (Cheung & Lau, 2012). The confidence intervals (CIs) can should be estimated using bias-corrected bootstrap samples to assess metric and scalar invariance. For practical purposes, we will exclude non-invariant items.

10.6 R Code for Comparing CIs

First, create your dataframe using a single factor.

dat<- psych::bfi[,c(1:5,26)]

We will need to run again the first step showed in the “Baseline Model” part.

all.results<-matrix(NA, nrow = 1, ncol = 6)

# Model to be tested
mod.cat <- "Agre =~ A1 + A2 + A3 + A4 + A5
            "

# Analysis
baseline <- measEq.syntax(configural.model = mod.cat,
                          data = dat,
                          ordered = TRUE,
                          parameterization = "delta",
                          ID.fac = "std.lv",
                          ID.cat = "Wu.Estabrook.2016",
                          group = "gender",
                          group.equal = "configural")

summary(baseline)
This lavaan model syntax specifies a CFA with 5 manifest indicators (5 of which are ordinal) of 1 common factor(s).

To identify the location and scale of each common factor, the factor means and variances were fixed to 0 and 1, respectively, unless equality constraints on measurement parameters allow them to be freed.

The location and scale of each latent item-response underlying 5 ordinal indicators were identified using the "delta" parameterization, and the identification constraints recommended by Wu & Estabrook (2016). For details, read:

    https://doi.org/10.1007/s11336-016-9506-0 

Pattern matrix indicating num(eric), ord(ered), and lat(ent) indicators per factor:

   Agre
A1 ord 
A2 ord 
A3 ord 
A4 ord 
A5 ord 

The following types of parameter were constrained to equality across groups:

    configural
# Have to specify as.character to submit to lavaan
model.baseline <- as.character(baseline)

# Fitting baseline model in lavaan via cfa function
fit.baseline <- cfa(model.baseline,
                    data = dat,
                    group = "gender",
                    estimator = "dwls"
)

# Obtaining results from baseline model
summary(fit.baseline)
lavaan 0.6-19 ended normally after 11 iterations

  Estimator                                       DWLS
  Optimization method                           NLMINB
  Number of model parameters                        60

  Number of observations per group:               Used       Total
    1                                              896         919
    2                                             1813        1881

Model Test User Model:
                                                      
  Test statistic                                72.218
  Degrees of freedom                                10
  P-value (Chi-square)                           0.000
  Test statistic for each group:
    1                                           23.027
    2                                           49.191

Parameter Estimates:

  Parameterization                               Delta
  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model        Unstructured


Group 1 [1]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  Agre =~                                             
    A1      (l.1_)    0.373    0.022   17.313    0.000
    A2      (l.2_)   -0.700    0.022  -31.557    0.000
    A3      (l.3_)   -0.810    0.024  -33.698    0.000
    A4      (l.4_)   -0.515    0.022  -23.007    0.000
    A5      (l.5_)   -0.724    0.022  -32.481    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .A1      (n.1.)    0.000                           
   .A2      (n.2.)    0.000                           
   .A3      (n.3.)    0.000                           
   .A4      (n.4.)    0.000                           
   .A5      (n.5.)    0.000                           
    Agre    (a.1.)    0.000                           

Thresholds:
                   Estimate  Std.Err  z-value  P(>|z|)
    A1|t1   (A1.1)   -0.780    0.047  -16.648    0.000
    A1|t2   (A1.2)    0.070    0.042    1.670    0.095
    A1|t3   (A1.3)    0.530    0.044   12.023    0.000
    A1|t4   (A1.4)    1.058    0.052   20.491    0.000
    A1|t5   (A1.5)    1.789    0.078   22.894    0.000
    A2|t1   (A2.1)   -1.895    0.085  -22.363    0.000
    A2|t2   (A2.2)   -1.285    0.057  -22.456    0.000
    A2|t3   (A2.3)   -0.938    0.049  -19.019    0.000
    A2|t4   (A2.4)   -0.189    0.042   -4.472    0.000
    A2|t5   (A2.5)    0.773    0.047   16.523    0.000
    A3|t1   (A3.1)   -1.803    0.079  -22.836    0.000
    A3|t2   (A3.2)   -1.167    0.054  -21.572    0.000
    A3|t3   (A3.3)   -0.765    0.047  -16.398    0.000
    A3|t4   (A3.4)   -0.087    0.042   -2.070    0.038
    A3|t5   (A3.5)    0.875    0.048   18.125    0.000
    A4|t1   (A4.1)   -1.621    0.070  -23.318    0.000
    A4|t2   (A4.2)   -1.058    0.052  -20.491    0.000
    A4|t3   (A4.3)   -0.710    0.046  -15.450    0.000
    A4|t4   (A4.4)   -0.149    0.042   -3.539    0.000
    A4|t5   (A4.5)    0.543    0.044   12.285    0.000
    A5|t1   (A5.1)   -1.879    0.084  -22.458    0.000
    A5|t2   (A5.2)   -1.184    0.055  -21.715    0.000
    A5|t3   (A5.3)   -0.735    0.046  -15.894    0.000
    A5|t4   (A5.4)   -0.115    0.042   -2.738    0.006
    A5|t5   (A5.5)    0.834    0.048   17.515    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
    Agre    (p.1_)    1.000                           
   .A1                0.861                           
   .A2                0.511                           
   .A3                0.343                           
   .A4                0.735                           
   .A5                0.476                           

Scales y*:
                   Estimate  Std.Err  z-value  P(>|z|)
    A1                1.000                           
    A2                1.000                           
    A3                1.000                           
    A4                1.000                           
    A5                1.000                           


Group 2 [2]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  Agre =~                                             
    A1      (l.1_)    0.432    0.017   26.013    0.000
    A2      (l.2_)   -0.703    0.017  -40.564    0.000
    A3      (l.3_)   -0.810    0.019  -42.669    0.000
    A4      (l.4_)   -0.490    0.018  -27.383    0.000
    A5      (l.5_)   -0.636    0.016  -39.543    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .A1      (n.1.)    0.000                           
   .A2      (n.2.)    0.000                           
   .A3      (n.3.)    0.000                           
   .A4      (n.4.)    0.000                           
   .A5      (n.5.)    0.000                           
    Agre    (a.1.)    0.000                           

Thresholds:
                   Estimate  Std.Err  z-value  P(>|z|)
    A1|t1   (A1.1)   -0.292    0.030   -9.776    0.000
    A1|t2   (A1.2)    0.453    0.031   14.805    0.000
    A1|t3   (A1.3)    0.855    0.034   25.356    0.000
    A1|t4   (A1.4)    1.335    0.041   32.338    0.000
    A1|t5   (A1.5)    1.954    0.062   31.290    0.000
    A2|t1   (A2.1)   -2.271    0.083  -27.360    0.000
    A2|t2   (A2.2)   -1.687    0.051  -33.022    0.000
    A2|t3   (A2.3)   -1.338    0.041  -32.362    0.000
    A2|t4   (A2.4)   -0.630    0.032  -19.904    0.000
    A2|t5   (A2.5)    0.358    0.030   11.877    0.000
    A3|t1   (A3.1)   -1.860    0.058  -32.092    0.000
    A3|t2   (A3.2)   -1.391    0.043  -32.695    0.000
    A3|t3   (A3.3)   -1.058    0.036  -29.149    0.000
    A3|t4   (A3.4)   -0.446    0.031  -14.620    0.000
    A3|t5   (A3.5)    0.494    0.031   16.054    0.000
    A4|t1   (A4.1)   -1.693    0.051  -33.003    0.000
    A4|t2   (A4.2)   -1.188    0.038  -30.938    0.000
    A4|t3   (A4.3)   -0.952    0.035  -27.327    0.000
    A4|t4   (A4.4)   -0.479    0.031  -15.592    0.000
    A4|t5   (A4.5)    0.093    0.029    3.169    0.002
    A5|t1   (A5.1)   -2.105    0.071  -29.620    0.000
    A5|t2   (A5.2)   -1.439    0.044  -32.929    0.000
    A5|t3   (A5.3)   -1.008    0.036  -28.334    0.000
    A5|t4   (A5.4)   -0.311    0.030  -10.384    0.000
    A5|t5   (A5.5)    0.617    0.032   19.540    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
    Agre    (p.1_)    1.000                           
   .A1                0.813                           
   .A2                0.505                           
   .A3                0.345                           
   .A4                0.759                           
   .A5                0.595                           

Scales y*:
                   Estimate  Std.Err  z-value  P(>|z|)
    A1                1.000                           
    A2                1.000                           
    A3                1.000                           
    A4                1.000                           
    A5                1.000                           
# Extracting fit indices into the first row of all.results matrix
all.results[1,]<-round(
  data.matrix(
    fitmeasures(
      fit.baseline,
      fit.measures = c("chisq",
                       "df",
                       "pvalue",
                       "cfi",
                       "tli",
                       "rmsea"))),
  digits=3)
colnames(all.results) <- c("chisq",
                           "df",
                           "pvalue",
                           "cfi",
                           "tli",
                           "rmsea")

all.results
      chisq df pvalue   cfi   tli rmsea
[1,] 72.218 10      0 0.991 0.982 0.068

To fit a metric invariance with bootstrap, we will fit the baseline model to a CFA and compare CIs.

fit.metric_full <- cfa(model.baseline,
                  data = dat,
                  group = "gender",
                  estimator = "dwls",
                  bootstrap = 1000,
                  se = "boot",
                  parallel = "snow",
                  ncpus = parallel::detectCores() - 1,
                  iseed = 16102024
                  )
Warning: lavaan->lav_options_set():  
   information will be set to "expected" for estimator = "dwls"
summary(fit.metric_full)
lavaan 0.6-19 ended normally after 11 iterations

  Estimator                                       DWLS
  Optimization method                           NLMINB
  Number of model parameters                        60

  Number of observations per group:               Used       Total
    1                                              896         919
    2                                             1813        1881

Model Test User Model:
                                                      
  Test statistic                                72.218
  Degrees of freedom                                10
  P-value (Chi-square)                           0.000
  Test statistic for each group:
    1                                           23.027
    2                                           49.191

Parameter Estimates:

  Parameterization                               Delta
  Standard errors                            Bootstrap
  Number of requested bootstrap draws             1000
  Number of successful bootstrap draws            1000


Group 1 [1]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  Agre =~                                             
    A1      (l.1_)    0.373    0.041    9.084    0.000
    A2      (l.2_)   -0.700    0.030  -23.634    0.000
    A3      (l.3_)   -0.810    0.027  -30.441    0.000
    A4      (l.4_)   -0.515    0.032  -16.129    0.000
    A5      (l.5_)   -0.724    0.028  -25.887    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .A1      (n.1.)    0.000                           
   .A2      (n.2.)    0.000                           
   .A3      (n.3.)    0.000                           
   .A4      (n.4.)    0.000                           
   .A5      (n.5.)    0.000                           
    Agre    (a.1.)    0.000                           

Thresholds:
                   Estimate  Std.Err  z-value  P(>|z|)
    A1|t1   (A1.1)   -0.780    0.047  -16.642    0.000
    A1|t2   (A1.2)    0.070    0.042    1.679    0.093
    A1|t3   (A1.3)    0.530    0.044   12.099    0.000
    A1|t4   (A1.4)    1.058    0.052   20.274    0.000
    A1|t5   (A1.5)    1.789    0.079   22.591    0.000
    A2|t1   (A2.1)   -1.895    0.088  -21.580    0.000
    A2|t2   (A2.2)   -1.285    0.059  -21.882    0.000
    A2|t3   (A2.3)   -0.938    0.051  -18.342    0.000
    A2|t4   (A2.4)   -0.189    0.044   -4.245    0.000
    A2|t5   (A2.5)    0.773    0.050   15.531    0.000
    A3|t1   (A3.1)   -1.803    0.084  -21.574    0.000
    A3|t2   (A3.2)   -1.167    0.052  -22.478    0.000
    A3|t3   (A3.3)   -0.765    0.047  -16.394    0.000
    A3|t4   (A3.4)   -0.087    0.043   -2.004    0.045
    A3|t5   (A3.5)    0.875    0.048   18.212    0.000
    A4|t1   (A4.1)   -1.621    0.068  -23.874    0.000
    A4|t2   (A4.2)   -1.058    0.052  -20.508    0.000
    A4|t3   (A4.3)   -0.710    0.046  -15.342    0.000
    A4|t4   (A4.4)   -0.149    0.043   -3.447    0.001
    A4|t5   (A4.5)    0.543    0.045   12.191    0.000
    A5|t1   (A5.1)   -1.879    0.086  -21.765    0.000
    A5|t2   (A5.2)   -1.184    0.055  -21.700    0.000
    A5|t3   (A5.3)   -0.735    0.047  -15.799    0.000
    A5|t4   (A5.4)   -0.115    0.044   -2.603    0.009
    A5|t5   (A5.5)    0.834    0.048   17.238    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
    Agre    (p.1_)    1.000                           
   .A1                0.861                           
   .A2                0.511                           
   .A3                0.343                           
   .A4                0.735                           
   .A5                0.476                           

Scales y*:
                   Estimate  Std.Err  z-value  P(>|z|)
    A1                1.000                           
    A2                1.000                           
    A3                1.000                           
    A4                1.000                           
    A5                1.000                           


Group 2 [2]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  Agre =~                                             
    A1      (l.1_)    0.432    0.030   14.546    0.000
    A2      (l.2_)   -0.703    0.022  -32.127    0.000
    A3      (l.3_)   -0.810    0.020  -41.297    0.000
    A4      (l.4_)   -0.490    0.027  -18.443    0.000
    A5      (l.5_)   -0.636    0.023  -27.268    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .A1      (n.1.)    0.000                           
   .A2      (n.2.)    0.000                           
   .A3      (n.3.)    0.000                           
   .A4      (n.4.)    0.000                           
   .A5      (n.5.)    0.000                           
    Agre    (a.1.)    0.000                           

Thresholds:
                   Estimate  Std.Err  z-value  P(>|z|)
    A1|t1   (A1.1)   -0.292    0.031   -9.545    0.000
    A1|t2   (A1.2)    0.453    0.030   15.191    0.000
    A1|t3   (A1.3)    0.855    0.033   25.810    0.000
    A1|t4   (A1.4)    1.335    0.040   33.599    0.000
    A1|t5   (A1.5)    1.954    0.065   30.101    0.000
    A2|t1   (A2.1)   -2.271    0.088  -25.948    0.000
    A2|t2   (A2.2)   -1.687    0.051  -33.092    0.000
    A2|t3   (A2.3)   -1.338    0.040  -33.258    0.000
    A2|t4   (A2.4)   -0.630    0.031  -20.368    0.000
    A2|t5   (A2.5)    0.358    0.031   11.734    0.000
    A3|t1   (A3.1)   -1.860    0.057  -32.590    0.000
    A3|t2   (A3.2)   -1.391    0.041  -33.627    0.000
    A3|t3   (A3.3)   -1.058    0.036  -29.235    0.000
    A3|t4   (A3.4)   -0.446    0.030  -15.038    0.000
    A3|t5   (A3.5)    0.494    0.031   16.013    0.000
    A4|t1   (A4.1)   -1.693    0.050  -34.041    0.000
    A4|t2   (A4.2)   -1.188    0.038  -30.883    0.000
    A4|t3   (A4.3)   -0.952    0.034  -27.632    0.000
    A4|t4   (A4.4)   -0.479    0.030  -15.725    0.000
    A4|t5   (A4.5)    0.093    0.030    3.165    0.002
    A5|t1   (A5.1)   -2.105    0.073  -28.844    0.000
    A5|t2   (A5.2)   -1.439    0.044  -32.841    0.000
    A5|t3   (A5.3)   -1.008    0.034  -29.258    0.000
    A5|t4   (A5.4)   -0.311    0.029  -10.731    0.000
    A5|t5   (A5.5)    0.617    0.031   20.072    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
    Agre    (p.1_)    1.000                           
   .A1                0.813                           
   .A2                0.505                           
   .A3                0.345                           
   .A4                0.759                           
   .A5                0.595                           

Scales y*:
                   Estimate  Std.Err  z-value  P(>|z|)
    A1                1.000                           
    A2                1.000                           
    A3                1.000                           
    A4                1.000                           
    A5                1.000                           
par <- parameterEstimates(fit.metric_full, boot.ci.type = "bca.simple", standardized = TRUE)

Create the following functions to facilitate looking at CIs for Metric Invariance.

############### FUNCTIONS
subset_and_print <- function(rhs_value) {
  inter <- subset(par, op == "=~" & rhs == rhs_value)
  return(inter)
}

show_confidence_intervals <- function(rng, pair_index) {
  interval1 <- c(rng[1], rng[2])
  interval2 <- c(rng[3], rng[4])
  
  print(paste("Confidence intervals for Item", pair_index, ":"))
  print(paste("Interval 1: [", interval1[1], ", ", interval1[2], "]", sep = ""))
  print(paste("Interval 2: [", interval2[1], ", ", interval2[2], "]", sep = ""))
}

# Function to check overlap between confidence intervals
overlap_TRUE_FALSE <- function(rng) {
  olap <- (rng[1] <= rng[4]) & (rng[2] >= rng[3])
  return(olap)
}

Now, let’s to the analysis of CIs for a single factor in order to facilitate interpretation.

############### METRIC FOR Agreeableness
# Define the vector of a Factor
rhs_AGRE <- colnames(dat[,1:5]) 
PER_ol <- as.data.frame(do.call(rbind, lapply(rhs_AGRE, subset_and_print)))

# Initialize an empty list to store results
results <- list()

# Iterate over each pair of variables and check for overlap
for (i in 1:(nrow(PER_ol)/2)) {
  rng <- cbind(pmin(PER_ol[(2*i-1), 11], PER_ol[(2*i-1), 12]),
               pmax(PER_ol[(2*i-1), 11], PER_ol[(2*i-1), 12]),
               pmin(PER_ol[(2*i), 11], PER_ol[(2*i), 12]),
               pmax(PER_ol[(2*i), 11], PER_ol[(2*i), 12]))
  
  # Check for overlap and store the result
  results[[i]] <- overlap_TRUE_FALSE(rng)
  print(paste("Overlap for Item", i, ":", results[[i]]))
}
[1] "Overlap for Item 1 : TRUE"
[1] "Overlap for Item 2 : TRUE"
[1] "Overlap for Item 3 : TRUE"
[1] "Overlap for Item 4 : TRUE"
[1] "Overlap for Item 5 : TRUE"
# Iterate over each pair of variables and show confidence intervals
for (i in 1:(nrow(PER_ol)/2)) {
  rng <- cbind(pmin(PER_ol[(2*i-1), 11], PER_ol[(2*i-1), 12]),
               pmax(PER_ol[(2*i-1), 11], PER_ol[(2*i-1), 12]),
               pmin(PER_ol[(2*i), 11], PER_ol[(2*i), 12]),
               pmax(PER_ol[(2*i), 11], PER_ol[(2*i), 12]))
  
  # Show the confidence intervals
  show_confidence_intervals(round(rng, 3), i)
}
[1] "Confidence intervals for Item 1 :"
[1] "Interval 1: [0.287, 0.447]"
[1] "Interval 2: [0.373, 0.49]"
[1] "Confidence intervals for Item 2 :"
[1] "Interval 1: [-0.753, -0.637]"
[1] "Interval 2: [-0.742, -0.655]"
[1] "Confidence intervals for Item 3 :"
[1] "Interval 1: [-0.864, -0.76]"
[1] "Interval 2: [-0.848, -0.769]"
[1] "Confidence intervals for Item 4 :"
[1] "Interval 1: [-0.578, -0.448]"
[1] "Interval 2: [-0.539, -0.433]"
[1] "Confidence intervals for Item 5 :"
[1] "Interval 1: [-0.775, -0.667]"
[1] "Interval 2: [-0.68, -0.588]"

The results show that the CIs overlap, which means we have metric invariance.

Now let’s run a scalar invariance analysis. Save the following functions for faster interpretation.

# Function to subset the parameter estimates for a specific item and threshold
subset_and_print <- function(lhs_value, rhs_value) {
  inter <- subset(par, lhs == lhs_value & op == "|" & rhs == rhs_value)
  return(inter)
}

# Function to show confidence intervals between two groups
show_confidence_intervals <- function(rng, item_name, threshold) {
  interval1 <- c(rng[1], rng[2])  # Group 1
  interval2 <- c(rng[3], rng[4])  # Group 2
  
  print(paste("Confidence intervals for", item_name, "threshold", threshold, ":"))
  print(paste("Group 1: [", interval1[1], ", ", interval1[2], "]", sep = ""))
  print(paste("Group 2: [", interval2[1], ", ", interval2[2], "]", sep = ""))
}

# Main automation function to compare thresholds between the same item and threshold but across two groups
compare_all_thresholds <- function(param_estimates) {
  # Extract all unique items (lhs) and thresholds (rhs) in the parameter estimates
  item_names <- unique(param_estimates$lhs[param_estimates$op == "|"])
  threshold_values <- unique(param_estimates$rhs[param_estimates$op == "|"])
  
  # Iterate over each item (lhs) and threshold (rhs)
  for (item_name in item_names) {
    for (rhs_value in threshold_values) {
      # Subset parameter estimates for the current item and threshold across both groups
      inter <- subset_and_print(item_name, rhs_value)
      
      # Check if there are exactly two rows (representing two groups)
      if (nrow(inter) == 2) {
        # Extract the confidence intervals for both groups
        rng <- cbind(
          pmin(inter[1, "ci.lower"], inter[1, "ci.upper"]),  # Group 1
          pmax(inter[1, "ci.lower"], inter[1, "ci.upper"]),  # Group 1
          pmin(inter[2, "ci.lower"], inter[2, "ci.upper"]),  # Group 2
          pmax(inter[2, "ci.lower"], inter[2, "ci.upper"])   # Group 2
        )
        
        # Show the confidence intervals for the current item and threshold
        show_confidence_intervals(round(rng, 3), item_name, rhs_value)
        
        if((rng[1] <= rng[4]) & (rng[2] >= rng[3])){
          print(paste("TRUE overlap for item",item_name,"in theshold",rhs_value))
        }
        else{
          print(paste("FALSE overlap for item",item_name,"in theshold",rhs_value))
        }
        
      } else {
        print(paste("Not enough data to compare thresholds for item:", item_name, "and rhs:", rhs_value))
      }
    }
  }
}

Run the analysis for the parameters table within the faster interpretation function.

compare_all_thresholds(par)
[1] "Confidence intervals for A1 threshold t1 :"
[1] "Group 1: [-0.883, -0.696]"
[1] "Group 2: [-0.356, -0.234]"
[1] "FALSE overlap for item A1 in theshold t1"
[1] "Confidence intervals for A1 threshold t2 :"
[1] "Group 1: [-0.017, 0.146]"
[1] "Group 2: [0.389, 0.505]"
[1] "FALSE overlap for item A1 in theshold t2"
[1] "Confidence intervals for A1 threshold t3 :"
[1] "Group 1: [0.443, 0.613]"
[1] "Group 2: [0.785, 0.916]"
[1] "FALSE overlap for item A1 in theshold t3"
[1] "Confidence intervals for A1 threshold t4 :"
[1] "Group 1: [0.964, 1.167]"
[1] "Group 2: [1.258, 1.416]"
[1] "FALSE overlap for item A1 in theshold t4"
[1] "Confidence intervals for A1 threshold t5 :"
[1] "Group 1: [1.643, 1.951]"
[1] "Group 2: [1.83, 2.08]"
[1] "TRUE overlap for item A1 in theshold t5"
[1] "Confidence intervals for A2 threshold t1 :"
[1] "Group 1: [-2.1, -1.749]"
[1] "Group 2: [-2.477, -2.118]"
[1] "FALSE overlap for item A2 in theshold t1"
[1] "Confidence intervals for A2 threshold t2 :"
[1] "Group 1: [-1.41, -1.184]"
[1] "Group 2: [-1.794, -1.597]"
[1] "FALSE overlap for item A2 in theshold t2"
[1] "Confidence intervals for A2 threshold t3 :"
[1] "Group 1: [-1.038, -0.842]"
[1] "Group 2: [-1.428, -1.27]"
[1] "FALSE overlap for item A2 in theshold t3"
[1] "Confidence intervals for A2 threshold t4 :"
[1] "Group 1: [-0.284, -0.112]"
[1] "Group 2: [-0.693, -0.577]"
[1] "FALSE overlap for item A2 in theshold t4"
[1] "Confidence intervals for A2 threshold t5 :"
[1] "Group 1: [0.667, 0.867]"
[1] "Group 2: [0.294, 0.412]"
[1] "FALSE overlap for item A2 in theshold t5"
[1] "Confidence intervals for A3 threshold t1 :"
[1] "Group 1: [-2.008, -1.665]"
[1] "Group 2: [-1.982, -1.76]"
[1] "TRUE overlap for item A3 in theshold t1"
[1] "Confidence intervals for A3 threshold t2 :"
[1] "Group 1: [-1.273, -1.073]"
[1] "Group 2: [-1.48, -1.313]"
[1] "FALSE overlap for item A3 in theshold t2"
[1] "Confidence intervals for A3 threshold t3 :"
[1] "Group 1: [-0.867, -0.685]"
[1] "Group 2: [-1.128, -0.986]"
[1] "FALSE overlap for item A3 in theshold t3"
[1] "Confidence intervals for A3 threshold t4 :"
[1] "Group 1: [-0.171, 0]"
[1] "Group 2: [-0.508, -0.388]"
[1] "FALSE overlap for item A3 in theshold t4"
[1] "Confidence intervals for A3 threshold t5 :"
[1] "Group 1: [0.784, 0.964]"
[1] "Group 2: [0.436, 0.56]"
[1] "FALSE overlap for item A3 in theshold t5"
[1] "Confidence intervals for A4 threshold t1 :"
[1] "Group 1: [-1.762, -1.499]"
[1] "Group 2: [-1.787, -1.597]"
[1] "TRUE overlap for item A4 in theshold t1"
[1] "Confidence intervals for A4 threshold t2 :"
[1] "Group 1: [-1.161, -0.959]"
[1] "Group 2: [-1.267, -1.113]"
[1] "TRUE overlap for item A4 in theshold t2"
[1] "Confidence intervals for A4 threshold t3 :"
[1] "Group 1: [-0.811, -0.623]"
[1] "Group 2: [-1.022, -0.887]"
[1] "FALSE overlap for item A4 in theshold t3"
[1] "Confidence intervals for A4 threshold t4 :"
[1] "Group 1: [-0.24, -0.067]"
[1] "Group 2: [-0.543, -0.427]"
[1] "FALSE overlap for item A4 in theshold t4"
[1] "Confidence intervals for A4 threshold t5 :"
[1] "Group 1: [0.446, 0.626]"
[1] "Group 2: [0.034, 0.151]"
[1] "FALSE overlap for item A4 in theshold t5"
[1] "Confidence intervals for A5 threshold t1 :"
[1] "Group 1: [-2.075, -1.736]"
[1] "Group 2: [-2.289, -2.003]"
[1] "TRUE overlap for item A5 in theshold t1"
[1] "Confidence intervals for A5 threshold t2 :"
[1] "Group 1: [-1.298, -1.088]"
[1] "Group 2: [-1.531, -1.355]"
[1] "FALSE overlap for item A5 in theshold t2"
[1] "Confidence intervals for A5 threshold t3 :"
[1] "Group 1: [-0.838, -0.65]"
[1] "Group 2: [-1.08, -0.948]"
[1] "FALSE overlap for item A5 in theshold t3"
[1] "Confidence intervals for A5 threshold t4 :"
[1] "Group 1: [-0.206, -0.028]"
[1] "Group 2: [-0.368, -0.257]"
[1] "FALSE overlap for item A5 in theshold t4"
[1] "Confidence intervals for A5 threshold t5 :"
[1] "Group 1: [0.732, 0.925]"
[1] "Group 2: [0.556, 0.678]"
[1] "FALSE overlap for item A5 in theshold t5"

The function prints TRUE when there’s an overlap, or FALSE when there’s not. We see that most of the thresholds are not overlapping, which would indicate that the Agreeableness scale for this data has no scalar invariance for gender.

10.7 References

Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling: A Multidisciplinary Journal, 13(2), 186–203. https://doi.org/10.1207/s15328007sem1302_2

Cheung, G. W., & Lau, R. S. (2012). A direct comparison approach for testing measurement invariance. Organizational Research Methods, 15(2), 167-198. https://doi.org/10.1177/1094428111421987

Hancock, G. R. (1997). Structural equation modeling methods of hypothoesis testing of latent variable means. Measurement & Evaluation in Counseling & Development (American Counseling Association), 30(2), 91–105. https://doi.org/10.1080/07481756.1997.12068926

Jorgensen, T. D., Pornprasertmanit, S., Schoemann, A. M., & Rosseel, Y. (2022). semTools: Useful tools for structural equation modeling. R package. Retrieved from https://CRAN.R-project.org/package=semTools

Lubke, G. H., & Muthén, B. O. (2004). Applying multigroup confirmatory factor models for continuous outcomes to Likert scale data complicates meaningful group comparisons. Structural equation modeling, 11(4), 514-534. https://doi.org/10.1207/s15328007sem1104_2

Muthén, B., & Kaplan, D. (1985). A Comparison of Some Methodologies for the Factor Analysis of Non-Normal Likert Variables. British Journal of Mathematical and Statistical Psychology, 38, 171–189. https://doi.org/10.1111/j.2044-8317.1985.tb00832.x

R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

Revelle, W. (2023). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package. https://CRAN.R-project.org/package=psych

Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. [https://doi.org/10.18637/jss.v048.i02\](https://doi.org/10.18637/jss.v048.i02){.uri}

Svetina, D., Rutkowski, L., & Rutkowski, D. (2020). Multiple-group invariance with categorical outcomes using updated guidelines: an illustration using M plus and the lavaan/semtools packages. Structural Equation Modeling: A Multidisciplinary Journal, 27(1), 111-130. https://doi.org/10.1080/10705511.2019.1602776