TEXT MINING: CLUSTERING APPLIED TO SCIENTIFIC ARTICLES IN CHEMISTRY, USING THE CASSIOPEIA MODEL
DOI:
https://doi.org/10.56238/levv15n42-070Keywords:
Text Mining, Corpus, Chemistry, Clustering, Cassiopeia modelAbstract
Chemistry, by dedicating itself to understanding the submicroscopic nature of matter and its transformations, develops its own language and produces fundamental knowledge about nature. Its nature as basic knowledge led it, along with other natural sciences, to compose the knowledge of any citizen, whether to read and understand the natural world or transformed by the hand of man, or to continue studies at a higher or technical level in other areas or professions. However, assimilating and dealing with the large volume of information available, locating them quickly and accurately, has become a great challenge, within the diverse range of existing documents. With this, Text Mining Techniques can assist in this process, through the extraction of textual data. Thus, the objective of this research is to relate concepts of Chemistry by finding similar words in scientific articles in the area, which can demonstrate a connection between some concepts addressed in High School. Through the clustering technique with the use of the Cassiopeia model, in a corpus of academic texts related to Chemistry. The research was developed according to the following actions: bibliographic survey; construction of the corpus; collection of the corpus; statistical analysis of the corpus; text mining; clustering; and, finally, the analysis of the data from the generated clusters. The results obtained showed that the clustering carried out in the corpus provided the relationship between chemical concepts, finding similar words in the scientific articles that make up the corpus developed in this research, which demonstrate the connection of high school chemistry contents.