Correspondence analysis is a statistical method for the analysis of multidimensional data, it is a multivariate technique that analyzes patterns of association between qualitative variables.
Qualitative variables are variables that are not represented by numbers, but by modalities, for example: gender, level of education, marital status, etc.
Since qualitative variables are used in the AC, the object of the analysis are the contingency matrices, whose elements indicate the number of times (the counts) that the characteristics of two different quantities have been detected together.
The main goal of AC is to analyze the relationships between a set of qualitative variables observed on a collective of statistical units. This is done through the identification of an "optimal" space, i.e. a small dimension that represents the synthesis of the structural information contained in the original data.
In essence, they will build a series of latent variables (or factors), a combination of the original variables, which express some concepts not directly observable in reality, but the result of the measurement of a set of variables.
In Correspondence Analysis, the variables used do not have to be independent, so the modes of one variable must influence the modes of the other.
Before carrying out a correspondence analysis it is necessary to establish the degree of interdependence between the characters considered because, if they are independent, it may not make sense to search for the correspondences between them.
The test starts of the null hypothesis that considers the two independent variables. The alternative hypothesis will be that the two variables have a certain degree of interdependence.
The contingency tables contain the joint frequencies of the variable modes. Given two qualitative variables X and Y, the relevant contingency table will contain how many times a given mode of variable X occurs with a given mode of variable Y.
The Correspondence Analysis allows to represent the phenomenon both in the space of the rows and in the space of the columns.
Finally, you have to calculate the distances between the profiles to see if the modalities are similar or not, distant or not, i.e. see if the profiles resemble each other or not.
-Euclidean distance favours higher distances than lower ones and is calculated by making the difference between the relative frequencies and then squaring them.
- The distance of the Chi-square favours the lowest distances as it takes into account the number with respect to the rows. It is calculated by weighting the difference in frequencies relative to the frame by the inverse of the marginal of row (or column).
For the AC, R provides a package called FactoMineR.
Keywords (meta tags) AC, qualitative variables, explained inertia, eigenvaluesObjectives/goals:
The aim of this module is to introduce and explain the Principal Component Analysis technique.
At the end of this module you will be able to:
- Know the logic of AC
- Know the requirements
- Conduct an AC
- Conduct an AC in R with the FactoMineR package
In this training module you will be presented the multidimensional analysis technique called Correspondence Analysis, AC.
Correspondence Analysis is a form of multidimensional scaling, which essentially builds a kind of spatial model that shows the associations between a set of categorical variables. If the set includes only two variables, the method is usually called Simple Correspondence Analysis (SCA). If the analysis involves more than two variables, then it is usually called Multiple Correspondence Analysis (MCA). In this module we will deal with the analysis of simple correspondences, the objective of this analysis is to reduce the dimensionality of the phenomenon under investigation while preserving the information contained by it. The technique is applicable to phenomena measured with qualitative variables.
The last part of the module will be dedicated to the application of AC with the R software.
Van der Heijden, P. G. M. & de Leeuw, J. (1985). Correspondence analysis used complementary to loglinear analysis, Psychometrika, 50, pp. 429-447.
Le, S., Josse, J. & Husson, F. (2008). FactoMineR: An R Package for Multivariate Analysis. Journal of Statistical Software. 25(1). pp. 1-18.
Mineo, A. M. (2003). Una Guida all'utilizzo dell'Ambiente
Statistico R, http://cran.r-project.org/doc/contrib/Mineo-dispensaR.pdf.