WELER: A COMPLEX METRIC FOR TEXT QUALITY ASSESSMENT

Authors

  • A. R. Dumyn Lviv Polytechnic National University, Lviv, Ukraine
  • N. B. Shakhovska Lviv Polytechnic National University, Lviv, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2026-1-7

Keywords:

snatural language processing, automatic speech recognition, text quality assessment, WER, CER, WELER

Abstract

Context. Assessing text quality is essential for reliable AI that processes language. In ASR, it reflects how faithfully speech becomes text; in OCR, how accurately images yield text; and in NLP, how correct and coherent outputs are.
Objective. The goal of the work is the creation of a complex metric for text quality assessment.
Method. Classic metrics WER and CER are narrow: they capture only lexical edits, weigh all changes equally, ignore context and semantics, and often skip punctuation and case, masking readability issues and error types. We propose WELER, a hybrid metric that blends weighted WER and CER with a semantic component based on contextual embeddings to measure meaning preservation. Weights can be set manually or learned (e.g., via PCA), adapting the metric to ASR, OCR, or NLP tasks. Key challenges include computational cost, choosing optimal weights through correlation with human judgments, and the need for high-quality reference data. Proposed WELER metric integrates accurate word and character level error counting, using Levenshtein distance as a basis, with advanced semantic similarity methods based on contextual embeddings. This allows WELER to take into account not only what was incorrectly recognized, but also how much this error affects the meaning and understanding of the text. The inclusion of selfadjusting weights depending on the text category is a key feature of WELER, which allows adapting the metric to the specific requirements of different applications and domains, prioritizing those aspects of quality that are most critical for a particular task.
Results. Proposed WELER metric is an alternative solution in this direction. It integrates accurate word and character level error counting, using Levenshtein distance as a basis, with advanced semantic similarity methods based on contextual embeddings.
Conclusions. WELER, like all metrics based on reference data, relies on accurate and consistent human-verified transcriptions. Errors in the reference data can affect the accuracy of the assessment. Therefore, for complex metrics, the quality and representativeness of these data are especially important, since semantic and weighted errors are much more sensitive to the quality of the annotation than simple word counts.

Author Biographies

A. R. Dumyn, Lviv Polytechnic National University, Lviv

Post-graduate student

N. B. Shakhovska, Lviv Polytechnic National University, Lviv

Dr. Sc., Professor, Rector

References

Hamed I. Benchmarking Evaluation Metrics for CodeSwitching Automatic Speech Recognition, 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023 : proceedings. [Piscataway], IEEE, 2023, pp. 999–1005. DOI: 10.1109/SLT54892.2023.10023181

Measure and improve speech accuracy, Cloud Speech-toText Documentation. Available at: https://cloud.google.com/speech-to-text/docs/speechaccuracy (accessed: 22 July 2025).

Dumyn A., Fedushko S., Syerov Y. Review of Automatic Speech Recognition Systems for Ukrainian and English Language, Data-Centric Business and Applications : proceedings. Cham, Springer, 2024. (Lecture Notes on Data Engineering and Communications Technologies,

Vol. 212).

Shakhovska N., Shvorob I. The method for detecting plagiarism in a collection of documents, 2015 Xth International Scientific and Technical Conference “Computer Sciences and Information Technologies” (CSIT), Lviv, Ukraine, 2015 : proceedings. [Piscataway], IEEE, 2015, p. 142–145. DOI: 10.1109/STC-CSIT.2015.7325453

Sasindran Z., Yelchuri H., Prabhakar T. V., Rao S. Anew hybrid evaluation metric for automatic speech recognition tasks, 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) : proceedings. [Piscataway], IEEE, 2023, pp. 1–7. DOI:

48550/arXiv.2211.01722

Kim S., Arora A., Le D., Yeh C.-F., Fuegen C., Kalinli O., Seltzer M. L. Semantic distance: A new metric for ASR performance analysis towards spoken language understanding, arXiv preprint arXiv:2104.02138, 2021. Link: https://arxiv.org/abs/2104.02138

Sasindran Z., Yelchuri H., Prabhakar T. V. SeMaScore: a new evaluation metric for automatic speech recognition tasks, arXiv preprint arXiv:2401.07506, 2024. Link: https://arxiv.org/abs/2401.07506

Phukon B., Zheng X., Hasegawa-Johnson M. Aligning ASR Evaluation with Human and LLM Judgments: Intelligibility Metrics Using Phonetic, Semantic, and NLI Approaches, arXiv preprint arXiv:2506.16528, 2025. Link: https://arxiv.org/abs/2506.16528

Zhang T., Kishore V., Wu F., Weinberger K. Q., Artzi Y. BERTScore: Evaluating text generation with BERT, arXiv preprint arXiv:1904.09675, 2019. Link: https://arxiv.org/abs/1904.09675

James J., Gopinath D. P. Advocating character error rate for multilingual ASR evaluation, arXiv preprint arXiv:2410.07400, 2024. Link: https://arxiv.org/abs/2410.07400

Van Schaik T., Pugh B. A field guide to automatic evaluation of LLM-generated summaries, Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, ACM, 2024, pp. 2832–2836.

Arockiya Jerson J., Preethi N. An analysis of Levenshtein distance using dynamic programming method, Proceedings of 3rd International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications (ICMISC 2022). Singapore, Springer Nature Singapore, 2023, pp. 525–532.

Greenacre M., Groenen P. J., Hastie T., d’Enza A. I., Markos A., Tuzhilina E. Principal component analysis, Nature Reviews Methods Primers, Vol. 2, № 1, Article 100.

Measuring the Accuracy of Automatic Speech Recognition Solutions, arXiv. Available at: https://arxiv.org/html/2408.16287v1 (accessed: 22 July 2025).

Hunt M. A. Word Errors and the Significance of Weighted Accuracy Measures, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1990.

Neudecker C., Baierer K., Gerber M., Clausner C., Antonacopoulos A., Pletschacher S. A survey of OCR evaluation tools and metrics, Proceedings of the 6th International Workshop on Historical Document Imaging and Processing. New York, ACM, 2021, pp. 13–18.

Dumyn A.R. Hibrydna metryka otsinky yakosti tekstu na osnovi kontekstnoho zvazhuvannya, Tavriys’kyy naukovyy visnyk. SeriyaL Tekhnichni nauky, 2025, №4, ch. 1, pp. 85-93

Downloads

Published

2026-03-27

How to Cite

Dumyn, A. R. ., & Shakhovska, N. B. . (2026). WELER: A COMPLEX METRIC FOR TEXT QUALITY ASSESSMENT. Radio Electronics, Computer Science, Control, (1), 67–79. https://doi.org/10.15588/1607-3274-2026-1-7

Issue

Section

Neuroinformatics and intelligent systems