The package fixest provides a family of functions to perform estimations with multiple fixed-effects. The two main functions are feols for linear models and feglm for generalized linear models. In addition, the function femlm performs direct maximum likelihood estimation, and feNmlm extends the latter to allow the inclusion of non-linear in parameters right-hand-sides. Finally, the functions fepois and fenegbin are aliases for Poisson and negative binomial fixed-effect estimations. Each of these functions supports any number of fixed-effects and is implemented with full fledged multi-threading in C++. Functions feols and feglm further support variables with varying slopes.

This package is currently (Feb. 2020) the fastest software available to perform fixed-effects estimations. See the project’s homepage for a set of benchmarks.

The standard-errors of the estimates can be easily and intuitively clustered (up to four-way).

The function etable allows to seamlessly export the results of multiple estimations into either a data.frame, or into a Latex table.

The main features of the package are illustrated in this vignette. The theory used to obtain the fixed-effects is based on Berge (2018), “Efficient estimation of maximum likelihood models with multiple fixed-effects: the R package FENmlm.” CREA Discussion Papers, 13.

# 1 Simple example using trade data

This example deals with international trade, which is a setup that usually requires performing estimations with many fixed-effects. We estimate a very simple gravity model in which we are interested in finding out the negative effect of geographic distance on trade. The sample data consists of European trade extracted from Eurostat. Let’s load the data contained in the package:

library(fixest)
data(trade)

This data is a sample of bilateral importations between EU15 countries from 2007 and 2016. The data is further broken down according to 20 product categories. Here is a sample of the data:

Destination Origin Product Year dist_km Euros
LU BE 1 2007 139.5719 2966697
BE LU 1 2007 139.5719 6755030
LU BE 2 2007 139.5719 57078782
BE LU 2 2007 139.5719 7117406
LU BE 3 2007 139.5719 17379821
BE LU 3 2007 139.5719 2622254

The dependent variable of the estimation will be the level of trade between two countries while the independent variable is the geographic distance between the two countries. To obtain the elasticity of geographic distance net of the effects of the four fixed-effects, we estimate the following:

$$E\left(Trade_{i,j,p,t}\right)=\gamma_{i}^{Exporter}\times\gamma_{j}^{Importer}\times\gamma_{p}^{Product}\times\gamma_{t}^{Year}\times Distance_{ij}^{\beta}$$,

where the subscripts $$i$$, $$j$$, $$p$$ and $$t$$ stand respectively for the exporting country, the importing country, the type of product and the year, and the $$\gamma_{v}^{c}$$ are fixed-effects for these groups. Here $$\beta$$ is the elasticity of interest.

Note that when you use the Poisson/Negative Binomial families, this relationship is in fact linear because the right hand side is exponentialized to avoid negative values for the Poisson parameter. This leads to the equivalent relation:1

$$E\left(Trade_{i,j,p,t}\right)=\exp\left(\gamma_{i}^{Exporter}+\gamma_{j}^{Importer}+\gamma_{p}^{Product}+\gamma_{t}^{Year}+\beta\times \ln Distance_{ij}\right)$$.

## 1.1 Estimation

The estimation of this model using a Poisson likelihood is as follows:

gravity_pois = fepois(Euros ~ log(dist_km) | Origin + Destination + Product + Year, trade)

The function fepois is actually an alias to the function feglm with family = poisson. The results can be shown directly with the print method:

print(gravity_pois)
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Fixed-effects: Origin: 15,  Destination: 15,  Product: 20,  Year: 10
#> Standard-errors: Clustered (Origin)
#>              Estimate Std. Error t value  Pr(>|t|)
#> log(dist_km) -1.52787   0.115678 -13.208 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -7.025e+11   Adj. Pseudo R2: 0.764032
#>            BIC:  1.405e+12     Squared Cor.: 0.612021

The print reports the coefficient estimates and standard-errors as well as some other information. Among the quality of fit information, the squared-correlation corresponds to the correlation between the dependent variable and the expected predictor; it reflects somehow the idea of R-square in OLS estimations. Note that the estimation is performed using parallel computing which you can control using the argument nthreads (see the “multi-threading” section for more details).

## 1.2 Clustering the standard-errors

To cluster the standard-errors, we can simply use the argument vcov of the summary method. Let’s say we want to cluster the standard-errors according to the first two fixed-effects (i.e. the Origin and Destination variables). Then we just have to do:

summary(gravity_pois, vcov = "twoway")
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Fixed-effects: Origin: 15,  Destination: 15,  Product: 20,  Year: 10
#> Standard-errors: Clustered (Origin & Destination)
#>              Estimate Std. Error  t value  Pr(>|t|)
#> log(dist_km) -1.52787   0.130734 -11.6869 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -7.025e+11   Adj. Pseudo R2: 0.764032
#>            BIC:  1.405e+12     Squared Cor.: 0.612021

The clustering can be done on one, two, three or up to four variables. If the estimation includes fixed-effects, then by default the clustering will be done using these fixed-effects, in the original order. This is why the Origin and Destination variables were used for the two-way clustering in the previous example. If, instead, you wanted to perform one-way clustering on the Product variable, you need to provide it in a formula or use the argument cluster:

# Three ways to summon clustering on the Product variable
summary(gravity_pois, vcov = ~Product)
summary(gravity_pois, cluster = "Product")
summary(gravity_pois, cluster = ~Product)

Both produce the same results:

summary(gravity_pois, cluster = ~Product)
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Fixed-effects: Origin: 15,  Destination: 15,  Product: 20,  Year: 10
#> Standard-errors: Clustered (Product)
#>              Estimate Std. Error t value  Pr(>|t|)
#> log(dist_km) -1.52787   0.098294 -15.544 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -7.025e+11   Adj. Pseudo R2: 0.764032
#>            BIC:  1.405e+12     Squared Cor.: 0.612021

Note that you can always cluster the standard-errors, even when the estimation contained no fixed-effect:

gravity_simple = fepois(Euros ~ log(dist_km), trade)
# We use a formula to specify the variables used for two way clustering
# (note that the values of the variables are fetched directly in the original database)
summary(gravity_simple, ~Origin + Destination)
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Standard-errors: Clustered (Origin & Destination)
#>              Estimate Std. Error  t value   Pr(>|t|)
#> (Intercept)  24.70889   1.124768 21.96798  < 2.2e-16 ***
#> log(dist_km) -1.02896   0.158022 -6.51145 7.4429e-11 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -2.426e+12   Adj. Pseudo R2: 0.185023
#>            BIC:  4.852e+12     Squared Cor.: 0.055107

Finally, the standard-errors can also be computed at estimation time, you simply need to add the vcov argument:

fepois(Euros ~ log(dist_km), trade, vcov = ~Product)
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Standard-errors: Clustered (Product)
#>              Estimate Std. Error  t value  Pr(>|t|)
#> (Intercept)  24.70889   0.330044  74.8654 < 2.2e-16 ***
#> log(dist_km) -1.02896   0.045954 -22.3909 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -2.426e+12   Adj. Pseudo R2: 0.185023
#>            BIC:  4.852e+12     Squared Cor.: 0.055107

## 1.3 Other standard-errors

Talking about standard-errors… there are more than clustered standard-errors that can be computed… and there are many ways to achieve the same thing… and many shortcuts to know. So before you leave don’t forget to have a look at the section describing how to use the vcov argument!

## 1.4 Other estimation functions

Now we estimate the same relationship by OLS. We need to put the left hand side in logarithm (since the right-hand-side is not exponentialized):

gravity_ols = feols(log(Euros) ~ log(dist_km) | Origin + Destination + Product + Year, trade)

Of course you can use different families in feglm, exactly as in glm.

To get the estimation for the fixed-effects Negative Binomial:

gravity_negbin = fenegbin(Euros ~ log(dist_km) | Origin + Destination + Product + Year, trade)

## 1.5 Viewing the results in R

Now let’s say that we want a compact overview of the results of several estimations. The best way is to use the function etable. This function summarizes the results of several fixest estimations into a data.frame. To see the fixed-effects results with the three different likelihoods, we just have to type:

etable(gravity_pois, gravity_negbin, gravity_ols,
vcov = "twoway", headers = c("Poisson", "Negative Binomial", "Gaussian"))
gravity_pois gravity_negbin gravity_ols
Poisson Negative Binomial Gaussian
log(dist_km) -1.528*** (0.1307) -1.711*** (0.1773) -2.170*** (0.1714)
Fixed-Effects: —————— —————— ——————
Origin Yes Yes Yes
Destination Yes Yes Yes
Product Yes Yes Yes
Year Yes Yes Yes
_______________ __________________ __________________ __________________
Family Poisson Neg. Bin. OLS
S.E.: Clustered by: Orig. & Dest. by: Orig. & Dest. by: Orig. & Dest.
Observations 38,325 38,325 38,325
Squared Cor. 0.61202 0.43760 0.70558
Pseudo R2 0.76403 0.03473 0.23640
BIC 1.4e+12 1,293,786.1 151,977.2
Over-dispersion 0.54877

We added the argument vcov="twoway" to cluster the standard-errors for all estimations. As can be seen this function gives an overview of the estimates and standard-errors, as well as some quality of fit measures. The argument headers is used to add information on each estimation column.

In the previous example, we directly added the estimation results as arguments of the function etable. But the function also accepts lists of estimations. Let’s give an example. Say you want to see the influence of the introduction of fixed-effects on the estimate of the elasticity of distance. You can do it with the following code where we use the argument fixef to include fixed-effects (instead of inserting them directly in the formula):

gravity_subfe = list()
all_FEs = c("Year", "Destination", "Origin")
for(i in 0:3){
gravity_subfe[[i+1]] = fepois(Euros ~ log(dist_km), trade, fixef = all_FEs[0:i])
}

The previous code performs 4 estimations with an increasing number of fixed-effects and store their results into the list named gravity_subfe. To show the results of all 4 estimations, it’s easy:

etable(gravity_subfe, cluster = ~Origin+Destination)
model 1 model 2 model 3 model 4
Dependent Var.: Euros Euros Euros Euros
(Intercept) 24.71*** (1.125)
log(dist_km) -1.029*** (0.1580) -1.029*** (0.1581) -1.226*** (0.2045) -1.518*** (0.1282)
Fixed-Effects: —————— —————— —————— ——————
Year No Yes Yes Yes
Destination No No Yes Yes
Origin No No No Yes
_______________ __________________ __________________ __________________ __________________
S.E.: Clustered by: Orig. & Dest. by: Orig. & Dest. by: Orig. & Dest. by: Orig. & Dest.
Observations 38,325 38,325 38,325 38,325
Squared Cor. 0.05511 0.05711 0.16420 0.38479
Pseudo R2 0.18502 0.18833 0.35826 0.59312
BIC 4.85e+12 4.83e+12 3.82e+12 2.42e+12

We have a view of the 4 estimations, all reporting two-way clustered standard-errors thanks to the use of the argument cluster.

## 1.6 Multiple estimations

Note that since version 0.8.0, multiple estimations can be performed at once without requiring loops. Let’s replicate the previous example using fixest stepwise functions:

res_multi = fepois(Euros ~ log(dist_km) | csw0(Year, Destination, Origin), trade)

The previous line of code performs 4 estimations. The function csw0 is the key here, it means: cumulative stepwise starting with the empty element. Starting with the empty element, each new estimation adds a new element in the csw0() function, quite like the previous loop. Then you can consider the results, here res_multi, as a list of results, although with specific methods to easily access each element.

Stepwise functions can be applied to the linear right-hand-side and to the fixed-effects, you can also have multiple dependent variables and perform split sample estimations with the argument split. All of this is detailed in the dedicated vignette: Multiple estimations.

## 1.7 Exporting the results to Latex

So far we have seen how to report the results of multiple estimations on the R console. Now, using the same function etable, we can also export the results to high quality Latex tables. We just need to provide the argument tex = TRUE:

# with two-way clustered SEs
etable(res_multi, cluster = ~Origin+Destination, tex = TRUE)
#> \begingroup
#> \centering
#> \begin{tabular}{lcccc}
#>    \tabularnewline \midrule \midrule
#>    Dependent Variable: & \multicolumn{4}{c}{Euros}\\
#>    Model:              & (1)                   & (2)                   & (3)                   & (4)\\
#>    \midrule
#>    \emph{Variables}\\
#>    (Intercept)         & 24.71$^{***}$         &                       &                       &   \\
#>                        & (1.125)               &                       &                       &   \\
#>    log(dist\_km)       & -1.029$^{***}$        & -1.029$^{***}$        & -1.226$^{***}$        & -1.518$^{***}$\\
#>                        & (0.1580)              & (0.1581)              & (0.2045)              & (0.1282)\\
#>    \midrule
#>    \emph{Fixed-effects}\\
#>    Year                &                       & Yes                   & Yes                   & Yes\\
#>    Destination         &                       &                       & Yes                   & Yes\\
#>    Origin              &                       &                       &                       & Yes\\
#>    \midrule
#>    \emph{Fit statistics}\\
#>    Observations        & 38,325                & 38,325                & 38,325                & 38,325\\
#>    Squared Correlation & 0.05511               & 0.05711               & 0.16420               & 0.38479\\
#>    Pseudo R$^2$        & 0.18502               & 0.18833               & 0.35826               & 0.59312\\
#>    BIC                 & $4.85\times 10^{12}$  & $4.83\times 10^{12}$  & $3.82\times 10^{12}$  & $2.42\times 10^{12}$\\
#>    \midrule \midrule
#>    \multicolumn{5}{l}{\emph{Clustered (Origin \& Destination) standard-errors in parentheses}}\\
#>    \multicolumn{5}{l}{\emph{Signif. Codes: ***: 0.01, **: 0.05, *: 0.1}}\\
#> \end{tabular}
#> \par\endgroup

The user can export the Latex table directly into a file (argument file), add a title (arg. title) and a label to the table (arg. label). Note that when the argument file is present, the Latex format becomes the default (i.e. tex = TRUE by default).

The coefficients can be renamed easily (arg. dict), some can be dropped (arg. drop) and they can be easily reordered with regular expressions (arg. order).

The significance codes can easily be changed (arg. signifCode) and all quality of fit information can be customized (argument fitstat). Among others, the number of fixed-effect per fixed-effect dimension can also be displayed using the argument fixef_sizes.

### 1.7.1 An elaborate example

Consider the following example of the exportation of two tables:

# we set the dictionary once and for all
myDict = c("log(dist_km)" = "$\\ln (Distance)$", "(Intercept)" = "Constant")
# 1st export: we change the signif code and drop the intercept
etable(res_multi, signifCode = c("a" = 0.01, "b" = 0.05),
drop = "Const", dict = myDict, file = "Estimation Tables.tex",
replace = TRUE, title = "First export -- normal Standard-errors")
# 2nd export: clustered S-E + distance as the first coefficient
etable(res_multi, cluster = ~Product, order = "Dist",
dict = myDict, file = "Estimation Tables.tex",
title = "Second export -- clustered standard-errors (on Product variable)")

In this example, two tables containing the results of the 4 estimations are directly exported to a Latex table into the file “Estimation Tables.tex”. First take notice (again) that we do not need to use the argument tex=TRUE since when the argument file is present, the Latex format becomes the default. The file is re-created in the first exportation thanks to the argument replace = TRUE.

To change the variable names in the Latex table, we use the argument dict. The variable myDict is the dictionary we use to rename the variables, it is simply a named vector. The original name of the variables correspond to the names of myDict while the new names of the variables are the values of this vector. Any variable that matches the names of myDict will be replaced by its value. Thus we do not care of the order of appearance of the variables in the estimation results.

In the first export, the coefficient of the intercept is dropped by using drop = "Const" (could be anything such that grepl(drop[1], "Constant") is TRUE). In the second, the coefficient of the distance is put before the intercept (which is kept) thanks to the argument order. Note that the actions performed by the arguments drop or order are performed after the renaming takes place with the argument dict.

Note that you can completely customize the style of the table by using the style and postprocessing arguments, please have a look at the dedicated vignette: Exporting estimation tables.

## 1.8 Extracting the fixed-effects coefficients

To obtain the fixed-effects of the estimation, the function fixef must be performed on the results. This function returns a list containing the fixed-effects coefficients for each dimension. The summary method helps to have a quick overview:

fixedEffects = fixef(gravity_pois)
summary(fixedEffects)
#> Fixed_effects coefficients
#>                         Origin Destination Product  Year
#> Number of fixed-effects     15          15      20    10
#> Number of references         0           1       1     1
#> Mean                      23.3        3.09  0.0129 0.157
#> Standard-deviation        1.28        1.11    1.36 0.113
#>
#> COEFFICIENTS:
#>   Origin:    AT    BE    DE    DK    ES
#>           22.51 23.56 24.71 23.44 24.97 ... 10 remaining
#> -----
#>   Destination:    AT    BE    DE    DK    ES
#>                2.436 2.696 4.323 2.451 4.043 ... 10 remaining
#> -----
#>   Product: 1     2      3     4      5
#>            0 1.414 0.6562 1.449 -1.521 ... 15 remaining
#> -----
#>   Year: 2007    2008     2009    2010  2011
#>            0 0.06912 0.005225 0.07331 0.163 ... 5 remaining

We can see that the fixed-effects are balanced across the dimensions. Indeed, apart from the first dimension, only one coefficient per fixed-effect needs to be set as reference (i.e. fixed to 0) to avoid collinearity across the different fixed-effects dimensions. This ensures that the fixed-effects coefficients can be compared within a given fixed-effect dimension. Had there be strictly more than one reference per fixed-effect dimension, their interpretation would have not been possible at all. If this was the case though, a warning message would have been prompted. Note that the mean values are meaningless per se, but give a reference points to which compare the fixed-effects within a dimension. Let’s look specifically at the Year fixed-effects:

base$x_endo_2 = 0.2 * base$y - 0.2 * base$x_inst_1 + rnorm(150, sd = 0.5) est_iv = feols(y ~ x1 | x_endo_1 + x_endo_2 ~ x_inst_1 + x_inst_2, base) est_iv #> TSLS estimation, Dep. Var.: y, Endo.: x_endo_1, x_endo_2, Instr.: x_inst_1, x_inst_2 #> Second stage: Dep. Var.: y #> Observations: 150 #> Standard-errors: IID #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 1.831380 0.411435 4.45121 1.6844e-05 *** #> fit_x_endo_1 0.444982 0.022086 20.14744 < 2.2e-16 *** #> fit_x_endo_2 0.639916 0.307376 2.08186 3.9100e-02 * #> x1 0.565095 0.084715 6.67051 4.9180e-10 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> RMSE: 0.398842 Adj. R2: 0.761653 #> F-test (1st stage), x_endo_1: stat = 903.2 , p < 2.2e-16 , on 2 and 146 DoF. #> F-test (1st stage), x_endo_2: stat = 3.25828, p = 0.041268, on 2 and 146 DoF. #> Wu-Hausman: stat = 6.79183, p = 0.001518, on 2 and 144 DoF. So we’ve just performed a two stage least squares estimation. The formula coming after the pipe, x_endo_1 + x_endo_2 ~ x_inst_1 + x_inst_2, describes the endogenous variables (on the left) and the instruments (on the right). By default, three statistics are displayed: the F-test from the first stage (weak instrument test), the Wu-Hausman endogeneity test and the overidentifying restrictions (Sargan) test. Note that the Sargan statistic appears only when relevant (i.e. when # instr. > # endo. vars., not the case here). You can use the fitstat command to summon other kind of tests, notably Wald tests on the first/second stages: fitstat(est_iv, ~ ivf1 + ivwald1 + ivf2 + ivwald2, cluster = "fe") #> F-test (1st stage), x_endo_1: stat = 903.2 , p < 2.2e-16 , on 2 and 146 DoF. #> F-test (1st stage), x_endo_2: stat = 3.25828, p = 0.041268, on 2 and 146 DoF. #> Wald (1st stage), x_endo_1 : stat = 1,482.6 , p < 2.2e-16 , on 2 and 146 DoF, VCOV: Clustered (fe). #> Wald (1st stage), x_endo_2 : stat = 2.22157, p = 0.112092, on 2 and 146 DoF, VCOV: Clustered (fe). #> F-test (2nd stage): stat = 194.2 , p < 2.2e-16 , on 2 and 146 DoF. #> Wald (2nd stage): stat = 539,363.2 , p < 2.2e-16 , on 2 and 146 DoF, VCOV: Clustered (fe). As the Wald test relies on a given variance-covariance matrix, you can pass extra arguments to fitstat, as the argument cluster in the previous example, to specify which type of VCOV matrix is desired. Note that you can display the statistics that you wish when printing by changing the default print values: setFixest_print(fitstat = ~ . + ivwald2) est_iv #> TSLS estimation, Dep. Var.: y, Endo.: x_endo_1, x_endo_2, Instr.: x_inst_1, x_inst_2 #> Second stage: Dep. Var.: y #> Observations: 150 #> Standard-errors: IID #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 1.831380 0.411435 4.45121 1.6844e-05 *** #> fit_x_endo_1 0.444982 0.022086 20.14744 < 2.2e-16 *** #> fit_x_endo_2 0.639916 0.307376 2.08186 3.9100e-02 * #> x1 0.565095 0.084715 6.67051 4.9180e-10 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> RMSE: 0.398842 Adj. R2: 0.761653 #> F-test (1st stage), x_endo_1: stat = 903.2 , p < 2.2e-16 , on 2 and 146 DoF. #> F-test (1st stage), x_endo_2: stat = 3.25828, p = 0.041268, on 2 and 146 DoF. #> Wu-Hausman: stat = 6.79183, p = 0.001518, on 2 and 144 DoF. #> Wald (2nd stage): stat = 224.0 , p < 2.2e-16 , on 2 and 146 DoF, VCOV: IID. In the previous code, fitstat = ~ . + ivwald2 means that we want to add the second stage Wald test to the existing printed statistics (represented here by the point). Now what about adding some fixed-effects? That’s of course possible, you need to add them after the first right-hand-side, as follows: est_iv_fe = feols(y ~ x1 | fe | x_endo_1 + x_endo_2 ~ x_inst_1 + x_inst_2, base) est_iv_fe #> TSLS estimation, Dep. Var.: y, Endo.: x_endo_1, x_endo_2, Instr.: x_inst_1, x_inst_2 #> Second stage: Dep. Var.: y #> Observations: 150 #> Fixed-effects: fe: 3 #> Standard-errors: Clustered (fe) #> Estimate Std. Error t value Pr(>|t|) #> fit_x_endo_1 0.666671 0.106558 6.25640 0.024608 * #> fit_x_endo_2 0.413839 0.177769 2.32796 0.145344 #> x1 0.451680 0.153375 2.94495 0.098553 . #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> RMSE: 0.327938 Adj. R2: 0.836626 #> Within R2: 0.585907 #> F-test (1st stage), x_endo_1: stat = 21.6 , p = 6.151e-9, on 2 and 146 DoF. #> F-test (1st stage), x_endo_2: stat = 4.78816, p = 0.00968 , on 2 and 146 DoF. #> Wu-Hausman: stat = 1.31408, p = 0.271968, on 2 and 142 DoF. #> Wald (2nd stage): stat = 19.6 , p = 2.941e-8, on 2 and 146 DoF, VCOV: Clustered (fe). To access the first stage(s), you can use the summary method: summary(est_iv_fe, stage = 1) #> IV: First stage: x_endo_1 #> TSLS estimation, Dep. Var.: x_endo_1, Endo.: x_endo_1, x_endo_2, Instr.: x_inst_1, x_inst_2 #> First stage: Dep. Var.: x_endo_1 #> Observations: 150 #> Fixed-effects: fe: 3 #> Standard-errors: Clustered (fe) #> Estimate Std. Error t value Pr(>|t|) #> x_inst_1 0.705992 0.485614 1.45381 0.28320 #> x_inst_2 0.202337 0.143302 1.41196 0.29346 #> x1 0.189320 0.135622 1.39594 0.29751 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> RMSE: 0.346509 Adj. R2: 0.959865 #> Within R2: 0.338407 #> F-test (1st stage): stat = 21.6, p = 6.151e-9, on 2 and 146 DoF. #> #> IV: First stage: x_endo_2 #> TSLS estimation, Dep. Var.: x_endo_2, Endo.: x_endo_1, x_endo_2, Instr.: x_inst_1, x_inst_2 #> First stage: Dep. Var.: x_endo_2 #> Observations: 150 #> Fixed-effects: fe: 3 #> Standard-errors: Clustered (fe) #> Estimate Std. Error t value Pr(>|t|) #> x_inst_1 -0.546745 0.081370 -6.71920 0.02144 * #> x_inst_2 0.183092 0.083446 2.19415 0.15946 #> x1 0.153198 0.089148 1.71847 0.22785 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> RMSE: 0.51208 Adj. R2: 0.040133 #> Within R2: 0.063919 #> F-test (1st stage): stat = 4.78816, p = 0.00968, on 2 and 146 DoF. When summary shall return more than one element, the object returned is not a regular fixest object but a fixest_multi object. These kind of objects are covered in the vignette: Multiple estimations. You can display the first and second stages in a table with etable: etable(summary(est_iv_fe, stage = 1:2), fitstat = ~ . + ivfall + ivwaldall.p) #> model 1 model 2 model 3 #> Dependent Var.: x_endo_1 x_endo_2 y #> #> x_inst_1 0.7060 (0.4856) -0.5467* (0.0814) #> x_inst_2 0.2023 (0.1433) 0.1831 (0.0834) #> x1 0.1893 (0.1356) 0.1532 (0.0891) 0.4517. (0.1534) #> x_endo_1 0.6667* (0.1066) #> x_endo_2 0.4138 (0.1778) #> Fixed-Effects: --------------- ----------------- ---------------- #> fe Yes Yes Yes #> _______________________ _______________ _________________ ________________ #> S.E.: Clustered by: fe by: fe by: fe #> Observations 150 150 150 #> R2 0.96121 0.07234 0.84211 #> Within R2 0.33841 0.06392 0.58591 #> F-test (IV only) 21.581 4.7882 8.3352 #> Wald (IV only), p-value 2.29e-9 1.32e-9 2.94e-8 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Here we use stage = 1:2 to get all first stage regressions followed by the second stage. Using stage = 2:1 would have done the opposite. Now some explanations regarding fitstat. The suffix all concerns IV only and means the following: if it’s a first stage regression, then the first-stage F-stat is displayed, otherwise it’s the second stage F-stat. The suffix .p is used in ivwaldall.p to access the p-value and not the statistic. Finally, you can permanently set which fit statistic to display in etable by using setFixest_etable, like for example setFixest_etable(fitstat = ~ . + ivfall + ivwaldall.p). # 4 Interaction terms Most R users will be familiar with the base expansion operators for creating model interaction terms, e.g. x1*x2, x1:x2, and x1/x2. These base operators all work with fixest models. However, the package also provides its own specialized syntax for creating interaction terms and combining variables. Relative to the base methods, these fixest methods offer significant performance gains and synergies with the package’s other functions. To balance performance and convenience, the exact syntax depends on whether the interaction involves fixed-effects or not. Here we walk through both categories, further providing examples of common use-cases. ## 4.1 Interactions involving fixed-effects There are two reasons why we would want to interact variables in the fixed-effects slot. First, we may simply wish to combine fixed-effects (e.g. firm × country effects). Second, we wish to allow for varying slopes (e.g. a time trend for each firm). Let us consider each in turn, using a lightly modified version of the iris dataset: # Our base data for this section base = iris names(base) = c("y", paste0("x", 1:3), "fe1") # Create another "fixed-effect" base$fe2 = rep(letters[1:5], 30)
head(base)
#>     y  x1  x2  x3    fe1 fe2
#> 1 5.1 3.5 1.4 0.2 setosa   a
#> 2 4.9 3.0 1.4 0.2 setosa   b
#> 3 4.7 3.2 1.3 0.2 setosa   c
#> 4 4.6 3.1 1.5 0.2 setosa   d
#> 5 5.0 3.6 1.4 0.2 setosa   e
#> 6 5.4 3.9 1.7 0.4 setosa   a

### 4.1.1 Combining several fixed-effects (fe1^fe2^fe3...)

Say we want to “combine” the two fixed-effect variables fe1 and fe2 to create a brand new fixed-effect variable. We can do it simply via fixest’s special ^ operator:

est_comb = feols(y ~ x1 | fe1^fe2, base)
est_comb
#> OLS estimation, Dep. Var.: y
#> Observations: 150
#> Fixed-effects: fe1^fe2: 15
#> Standard-errors: Clustered (fe1^fe2)
#>    Estimate Std. Error t value   Pr(>|t|)
#> x1 0.782815   0.119465 6.55267 1.2854e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.406785     Adj. R2: 0.729861
#>                  Within R2: 0.280234

The ^ operator does the same thing as paste0(species, "_", fe), but is more convenient and significantly faster for large datasets. You can still extract the fixed-effects the same way:

fixef(est_comb)[[1]]
#>     setosa_a     setosa_b     setosa_c     setosa_d     setosa_e versicolor_a
#>     2.443630     2.384084     2.164943     2.296256     2.323630     3.713320
#> versicolor_b versicolor_c versicolor_d versicolor_e  virginica_a  virginica_b
#>     3.800694     4.003367     3.745539     3.575086     4.513272     3.986351
#>  virginica_c  virginica_d  virginica_e
#>     4.423725     4.216804     4.159382

Note further that more than two fixed-effects can be combined in exactly the same manner (e.g. fe1^fe2^fe3), and the syntax carries over to multivariate clustering too (e.g. cluster = ~fe1^fe2^fe3).

### 4.1.2 Varying slopes (fe[x])

You can introduce variables with varying slopes directly into the fixed-effects part of the formula using square brackets ([]). Recall that varying slopes allow us to flexibly control for heterogeneous effects across groups. Common real-life examples could be the inclusion of time trends for each observational unit (e.g. country), or allowing for the effect of some control variable (e.g. income) to be moderated by a fixed-effect (e.g. gender). Here we demonstrate by continuing with our simple dataset.

head(base)
#>     y  x1  x2  x3    fe1 fe2
#> 1 5.1 3.5 1.4 0.2 setosa   a
#> 2 4.9 3.0 1.4 0.2 setosa   b
#> 3 4.7 3.2 1.3 0.2 setosa   c
#> 4 4.6 3.1 1.5 0.2 setosa   d
#> 5 5.0 3.6 1.4 0.2 setosa   e
#> 6 5.4 3.9 1.7 0.4 setosa   a

Say we want to estimate y as a function of x1, but controlling for x2. Moreover, we think that the slope coefficient of our x2 control variable should be allowed to vary by the fe1 fixed-effect variable. We can do this as follows:

est_vs = feols(y ~ x1 | fe1[x2], base)
est_vs
#> OLS estimation, Dep. Var.: y
#> Observations: 150
#> Fixed-effects: fe1: 3
#> Varying slopes: x2 (fe1: 3)
#> Standard-errors: Clustered (fe1)
#>    Estimate Std. Error t value Pr(>|t|)
#> x1 0.450006   0.156731  2.8712  0.10292
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.298706     Adj. R2: 0.863506
#>