APPLICATION AND PERFORMANCE ANALYSIS OF MACHINE LEARNING ALGORITHMS IN SOCIAL CLASS PREDICTION
DOI:
https://doi.org/10.56238/arev7n7-061Keywords:
Social stratification, Social classes, Machine LearningAbstract
This research aimed to employ machine learning algorithms to predict social class based on a set of individual characteristics: income, education, race, sex, age, geographic region and occupational status. The objective was to obtain the same categorization of social classes for two distinct data sets: the PNAD-C of IBGE and the research "Opinion on the Coronavirus", of Datafolha, based on a categorization of international reference, to analyze Brazilian public opinion according to social class. For this, we trained and evaluated six machine learning algorithms: MLP Classifier, Random Forest, KNN, Logistic Regression, SVM and GaussianNB, using the annual database of PNAD-C, and later applied the model that obtained better performance in the database of Datafolha, both of 2021. The choice of the model was based on the results of three validation metrics: accuracy, F1-Score and area below the ROC curve. The best performing model was Random Forest. The analysis of the application of this model in the Datafolha database revealed a satisfactory correspondence with the original distribution of the Features of the PNAD-C, especially in the variables of higher weight: schooling, income and literature on social stratification, and provide new insights on the subject.