Сейчас загружается
7. Automated Classification of Public Transport Complaints via Text Mining Using LLMs and Embeddings // Information

Rakhimzhanov, D., Belginova, S., & Yedilkhan, D. Automated Classification of Public Transport Complaints via Text Mining Using LLMs and Embeddings // Information. – 2025. – Vol. 16, No. 8. – P. 644. – DOI: 10.3390/info16080644.

Abstract:
The proliferation of digital public service platforms and the expansion of e-government initiatives have significantly increased the volume and diversity of citizen-generated feedback. This trend emphasizes the need for classification systems that are not only tailored to specific administrative domains but also robust to the linguistic, contextual, and structural variability inherent in user-submitted content. This study investigates the comparative effectiveness of large language models (LLMs) alongside instruction-tuned embedding models in the task of categorizing public transportation complaints. LLMs were tested using a few-shot inference, where classification is guided by a small set of in-context examples. Embedding models were assessed under three paradigms: label-only zero-shot classification, instruction-based classification, and supervised fine-tuning. Results indicate that fine-tuned embeddings can achieve or exceed the accuracy of LLMs, reaching up to 90 percent, while offering significant reductions in inference latency and computational overhead. E5 embeddings showed consistent generalization across unseen categories and input shifts, whereas BGE-M3 demonstrated measurable gains when adapted to task-specific distributions. Instruction-based classification produced lower accuracy for both models, highlighting the limitations of prompt conditioning in isolation. These findings position multilingual embedding models as a viable alternative to LLMs for classification at scale in data-intensive public sector environments.

Link / DOI: https://doi.org/10.3390/info16080644

Отправить комментарий