Description

Factor analysis is a statistical technique that identifies latent variables (hidden factors) underlying observed correlations in data. It works by analysing how variables relate to each other, finding a smaller number of unobserved factors that explain patterns among multiple observed variables. Unlike PCA which maximises total variance, factor analysis focuses on shared variance (communalities - the variance variables have in common) whilst separating out unique variance and measurement error. After extracting factors, rotation methods like varimax (which creates uncorrelated factors) or oblimin (allowing correlated factors) help make factors more interpretable by aligning them with distinct groups of variables.

Example Use Cases

Explainability

Analysing customer satisfaction surveys to identify key drivers (e.g., 'service quality', 'product value', 'convenience') from dozens of individual questions, helping businesses focus improvement efforts.

Reducing dimensionality of financial indicators to identify underlying economic factors (e.g., 'growth', 'inflation', 'credit risk') for more interpretable risk models.

Transparency

Creating transparent feature groups for regulatory reporting by showing how multiple correlated features can be summarised into interpretable factors with clear business meaning.

Limitations

  • Assumes linear relationships between variables and multivariate normality of data.
  • Results can be abstract and require domain expertise to interpret meaningfully.
  • Sensitive to the choice of number of factors and rotation method, which can significantly affect interpretability.
  • Requires sufficiently large sample sizes relative to the number of variables for stable results.

Resources

Research Papers

Factor Analysis, Probabilistic Principal Component Analysis, Variational Inference, and Variational Autoencoder: Tutorial and Survey
Benyamin Ghojogh et al.Jan 4, 2021

This is a tutorial and survey paper on factor analysis, probabilistic Principal Component Analysis (PCA), variational inference, and Variational Autoencoder (VAE). These methods, which are tightly related, are dimensionality reduction and generative models. They assume that every data point is generated from or caused by a low-dimensional latent factor. By learning the parameters of distribution of latent space, the corresponding low-dimensional factors are found for the sake of dimensionality reduction. For their stochastic and generative behaviour, these models can also be used for generation of new data points in the data space. In this paper, we first start with variational inference where we derive the Evidence Lower Bound (ELBO) and Expectation Maximization (EM) for learning the parameters. Then, we introduce factor analysis, derive its joint and marginal distributions, and work out its EM steps. Probabilistic PCA is then explained, as a special case of factor analysis, and its closed-form solutions are derived. Finally, VAE is explained where the encoder, decoder and sampling from the latent space are introduced. Training VAE using both EM and backpropagation are explained.

Software Packages

factor_analyzer
Dec 5, 2017

A Python module to perform exploratory & confirmatory factor analyses.

Tutorials

Factor Analysis in R Course | DataCamp
Jan 1, 2024
Confirmatory Factor Analysis Fundamentals | Towards Data Science
Rafael Valdece Sousa BastosOct 9, 2021

Tags

Explainability Dimensions

Representation Analysis:
Explanatory Scope:

Other Categories

Data Requirements:
Data Type:
Expertise Needed:
Lifecycle Stage:
Technique Type: