✍️ 🧑‍🦱 💚 Autor:innen verdienen bei uns doppelt. Dank euch haben sie so schon 418.243 € mehr verdient. → Mehr erfahren 💪 📚 🙏

EXTRACTING PARALLEL PHRASES FROM ENGLISH-PUNJABI CORPORA

EXTRACTING PARALLEL PHRASES FROM ENGLISH-PUNJABI CORPORA

von Manpreet Singh Lehal
Softcover - 9786208225414
79,90 €
  • Versandkostenfrei
Auf meine Merkliste
  • Hinweis: Print on Demand. Lieferbar in 2 Tagen.
  • Lieferzeit nach Versand: ca. 1-2 Tage
  • inkl. MwSt. & Versandkosten (innerhalb Deutschlands)

Autorenfreundlich Bücher kaufen?!

Beschreibung

This study presents a novel approach to extract parallel data from a comparable English-Punjabi corpus, addressing the scarcity of parallel corpora for this language pair. Unlike previous research, this approach focuses on creating high-precision parallel data using minimal resources. The data is sourced from diverse domains, including Wikipedia articles, TDIL¿s noisy parallel sentences, and Gyan Nidhi reports. The methodology consists of three phases: extracting and aligning documents, translating Punjabi texts into English using OpenNMT-py, and calculating content similarity through three measures¿Euclidean Distance, Cosine, and Jaccard. These algorithms are run individually, and then their results are integrated to improve accuracy. By combining the scores of all three measures, the system achieves a precision of 93% and an accuracy of 86%. This integrated approach significantly enhances parallel data extraction for English-Punjabi corpora and holds potential for improving Statistical Machine Translation (SMT) models.

AN INTEGRATED APPROACH

Details

Verlag LAP LAMBERT Academic Publishing
Ersterscheinung 25. Oktober 2024
Maße 22 cm x 15 cm x 1.3 cm
Gewicht 322 Gramm
Format Softcover
ISBN-13 9786208225414
Seiten 204

Schlagwörter