WELER: A COMPLEX METRIC FOR TEXT QUALITY ASSESSMENT
DOI:
https://doi.org/10.15588/1607-3274-2026-1-7Keywords:
snatural language processing, automatic speech recognition, text quality assessment, WER, CER, WELERAbstract
Context. Assessing text quality is essential for reliable AI that processes language. In ASR, it reflects how faithfully speech becomes text; in OCR, how accurately images yield text; and in NLP, how correct and coherent outputs are.
Objective. The goal of the work is the creation of a complex metric for text quality assessment.
Method. Classic metrics WER and CER are narrow: they capture only lexical edits, weigh all changes equally, ignore context and semantics, and often skip punctuation and case, masking readability issues and error types. We propose WELER, a hybrid metric that blends weighted WER and CER with a semantic component based on contextual embeddings to measure meaning preservation. Weights can be set manually or learned (e.g., via PCA), adapting the metric to ASR, OCR, or NLP tasks. Key challenges include computational cost, choosing optimal weights through correlation with human judgments, and the need for high-quality reference data. Proposed WELER metric integrates accurate word and character level error counting, using Levenshtein distance as a basis, with advanced semantic similarity methods based on contextual embeddings. This allows WELER to take into account not only what was incorrectly recognized, but also how much this error affects the meaning and understanding of the text. The inclusion of selfadjusting weights depending on the text category is a key feature of WELER, which allows adapting the metric to the specific requirements of different applications and domains, prioritizing those aspects of quality that are most critical for a particular task.
Results. Proposed WELER metric is an alternative solution in this direction. It integrates accurate word and character level error counting, using Levenshtein distance as a basis, with advanced semantic similarity methods based on contextual embeddings.
Conclusions. WELER, like all metrics based on reference data, relies on accurate and consistent human-verified transcriptions. Errors in the reference data can affect the accuracy of the assessment. Therefore, for complex metrics, the quality and representativeness of these data are especially important, since semantic and weighted errors are much more sensitive to the quality of the annotation than simple word counts.
References
Hamed I. Benchmarking Evaluation Metrics for CodeSwitching Automatic Speech Recognition, 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023 : proceedings. [Piscataway], IEEE, 2023, pp. 999–1005. DOI: 10.1109/SLT54892.2023.10023181
Measure and improve speech accuracy, Cloud Speech-toText Documentation. Available at: https://cloud.google.com/speech-to-text/docs/speechaccuracy (accessed: 22 July 2025).
Dumyn A., Fedushko S., Syerov Y. Review of Automatic Speech Recognition Systems for Ukrainian and English Language, Data-Centric Business and Applications : proceedings. Cham, Springer, 2024. (Lecture Notes on Data Engineering and Communications Technologies,
Vol. 212).
Shakhovska N., Shvorob I. The method for detecting plagiarism in a collection of documents, 2015 Xth International Scientific and Technical Conference “Computer Sciences and Information Technologies” (CSIT), Lviv, Ukraine, 2015 : proceedings. [Piscataway], IEEE, 2015, p. 142–145. DOI: 10.1109/STC-CSIT.2015.7325453
Sasindran Z., Yelchuri H., Prabhakar T. V., Rao S. Anew hybrid evaluation metric for automatic speech recognition tasks, 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) : proceedings. [Piscataway], IEEE, 2023, pp. 1–7. DOI:
48550/arXiv.2211.01722
Kim S., Arora A., Le D., Yeh C.-F., Fuegen C., Kalinli O., Seltzer M. L. Semantic distance: A new metric for ASR performance analysis towards spoken language understanding, arXiv preprint arXiv:2104.02138, 2021. Link: https://arxiv.org/abs/2104.02138
Sasindran Z., Yelchuri H., Prabhakar T. V. SeMaScore: a new evaluation metric for automatic speech recognition tasks, arXiv preprint arXiv:2401.07506, 2024. Link: https://arxiv.org/abs/2401.07506
Phukon B., Zheng X., Hasegawa-Johnson M. Aligning ASR Evaluation with Human and LLM Judgments: Intelligibility Metrics Using Phonetic, Semantic, and NLI Approaches, arXiv preprint arXiv:2506.16528, 2025. Link: https://arxiv.org/abs/2506.16528
Zhang T., Kishore V., Wu F., Weinberger K. Q., Artzi Y. BERTScore: Evaluating text generation with BERT, arXiv preprint arXiv:1904.09675, 2019. Link: https://arxiv.org/abs/1904.09675
James J., Gopinath D. P. Advocating character error rate for multilingual ASR evaluation, arXiv preprint arXiv:2410.07400, 2024. Link: https://arxiv.org/abs/2410.07400
Van Schaik T., Pugh B. A field guide to automatic evaluation of LLM-generated summaries, Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, ACM, 2024, pp. 2832–2836.
Arockiya Jerson J., Preethi N. An analysis of Levenshtein distance using dynamic programming method, Proceedings of 3rd International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications (ICMISC 2022). Singapore, Springer Nature Singapore, 2023, pp. 525–532.
Greenacre M., Groenen P. J., Hastie T., d’Enza A. I., Markos A., Tuzhilina E. Principal component analysis, Nature Reviews Methods Primers, Vol. 2, № 1, Article 100.
Measuring the Accuracy of Automatic Speech Recognition Solutions, arXiv. Available at: https://arxiv.org/html/2408.16287v1 (accessed: 22 July 2025).
Hunt M. A. Word Errors and the Significance of Weighted Accuracy Measures, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1990.
Neudecker C., Baierer K., Gerber M., Clausner C., Antonacopoulos A., Pletschacher S. A survey of OCR evaluation tools and metrics, Proceedings of the 6th International Workshop on Historical Document Imaging and Processing. New York, ACM, 2021, pp. 13–18.
Dumyn A.R. Hibrydna metryka otsinky yakosti tekstu na osnovi kontekstnoho zvazhuvannya, Tavriys’kyy naukovyy visnyk. SeriyaL Tekhnichni nauky, 2025, №4, ch. 1, pp. 85-93
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 A. R. Dumyn, N. B. Shakhovska

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) as it can lead to productive exchanges, as well as earlier and greater citation of published work.