Semantic Analysis of Group Values Structure Using Roget Thesaurus: Automated Algorithm
Keywords:
Semantic analysis, thesaurus Roget, Python script, vectorization of values, youth valuesAbstract
The article proposes an approach for vectorization and quantitative analysis of group values. To demonstrate the possibilities of the method, the value structure of a group of young people was analyzed for differences between women and men, and for differences in the use of different parts of speech. The values of the group were verbalized in the form of free associations “with something most important in life”. The resulting array of words was converted into an array of semantic groups using Roget Thesaurus. Pairwise comparison of vectors with frequencies of individual semantic groups showed a high level of cosine similarity (0,9664) between subgroups separated by gender. Calculation of statistically significant differences in frequencies of separate semantic groups by сhi-square test allowed us to single out separate semantic groups, for which gender subgroups differed significantly. Frequency vectors obtained from the transformation of arrays of different parts of speech had a low level of cosine similarity in all pairwise comparisons. Nouns were most frequently used to express life values related to cause-and-effect relationships (14% of semantic groups). Adjectives were most often used to express values having a sense of personal predilections (18% of semantic groups). Verbs were most often used to express values related to liking (14% of semantic groups). The developed automatic algorithm will be useful for quantitative comparison of values between different groups, as well as calculating the degree of consistency of the target group values with the declared values of commercial brands.
References
Андреюк Д.С., Петрунин Ю.Ю., Храбровская В.Д. Метод кластеризации групп молодежи на основании ценностных смыслов в отношении профессионального развития и жизни в целом // Государственное управление. Электронный вестник. 2020. № 83. С. 221–242. DOI: 10.24411/2070-1381-2020-10117
Баева Л.В. Ценности изменяющегося мира: Экзистенциальная аксиология истории. Астрахань: Изд-во АГУ, 2004.
Митрофанова О.А. Измерение семантических расстояний как проблема прикладной лингвистики // Структурная и прикладная лингвистика. 2007. № 7. С. 92–101.
Человек в условиях глобальных рисков: социально-психологический анализ / под ред. Т.А. Нестика, А.Л. Журавлева. М: Изд-во «Институт психологии РАН», 2020.
Benjamini Y., Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing // Journal of the Royal Statistical Society: Series B (Methodological). 1995. Vol. 57. № 1. P. 289–300. DOI: https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
Gerasimenko V., Andreyuk D., Kurkova D. Approach for Management of Brand Positioning: Quantification of Value Matching between Brand and Target Audience // Polish Journal of Management Studies. 2021. Vol. 24. P.96–111. DOI: 10.17512/pjms.2021.24.1.06
Hochberg Y. A Sharper Bonferroni Procedure for Multiple Tests of Significance // Biometrika. 1988. Vol. 75. Is. 4. P. 800–802. DOI: https://doi.org/10.2307/2336325
Holm S. A Simple Sequentially Rejective Multiple Test Procedure // Scandinavian Journal of Statistics. 1979. Vol. 6. Is. 2. P. 65–70.
Jarmasz M., Szpakowicz S. Roget’s Thesaurus and Semantic Similarity // Cornell University arXiv. 2012. DOI: https://doi.org/10.48550/arXiv.1204.0245
Klingenstein S., Hitchcock T., DeDeo S. The Civilizing Process in London’s Old Bailey // Proceedings of the National Academy of Sciences. 2014. Vol. 111. Is. 26. P. 9419–9424. DOI: https://doi.org/10.1073/pnas.1405984111
Noble W.S. How Does Multiple Testing Correction Work? // Nature Biotechnology. 2009. Vol. 27. P. 1135–1137. DOI: https://doi.org/10.1038/nbt1209-1135
Ochiai A. Zoogeographical Studies on the Soleoid Fishes Found Japan and Its Neighboring Regions // Bulletin of the Japanese Society of Scientific Fischeries. 1957. Vol. 22. Is. 9. P. 526–530.
Roget P.M. Roget’s Thesaurus of English Words and Phrases. Austin: MICRA, Inc., 1991.
Rothman K.J. No Adjustments Are Needed for Multiple Comparisons // Epidemiology. 1990. Vol. 1. Is. 1. P. 43–46. DOI: 10.1097/00001648-199001000-00010
Sahlgren M. The Distributional Hypothesis. From Context to Meaning // Rivista di Linguistica. 2008. Vol. 20. Is. 1. P. 33–53.
TenHouten W.D. Neurosociology // Journal of Social and Evolutionary Systems. 1997. Vol. 20. Is. 1. P. 7–37. DOI: https://doi.org/10.1016/S1061-7361(97)90027-8
TenHouten W.D. The Emotions of Powerlessness // Journal of Political Power. 2016. Vol. 9. Is. 1. P. 83–121. DOI: http://dx.doi.org/10.1080/2158379X.2016.1149308