A Comparative Evaluation of Structural Topic Models and BERTopic for Short, Open-Ended Survey Responses

arXiv:2605.23093v1 Announce Type: new Abstract: Topic modeling in applied psychology increasingly spans two methodological traditions: probabilistic bag-of-words models and newer embedding-based approaches. Yet many evaluations of these methods rely on longer and cleaner benchmark corpora, leaving less guidance for short, open-ended survey responses. This paper compares Structural Topic Models (STM), a probabilistic topic model, and BERTopic, an embedding-based model, for analyzing open-ended survey responses. We evaluated three STM conditions and five BERTopic conditions, varying typographica
The proliferation of AI models, especially large language models, is driving continuous research into more effective and nuanced natural language processing techniques.
Improved topic modeling for short, open-ended text is critical for extracting actionable insights from qualitative data, impacting market research, social science, and product development.
This research provides clearer guidance on which AI models (STM vs. BERTopic) are most effective for analyzing specific types of unstructured data, leading to better analytics in various fields.
- · Applied Psychology Researchers
- · Market Research Firms
- · AI/ML Developers
- · Survey Platforms
- · Organizations relying on outdated NLP methods
More accurate and efficient analysis of qualitative feedback from surveys and open-ended questions becomes possible.
Businesses and policymakers gain deeper, data-driven understanding of public sentiment and customer needs, leading to more responsive strategies.
Enhanced feedback loops could accelerate product iteration and policy adjustments, fostering more user-centric development across industries.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL