MFSM: Chinese-English sentence alignment based on multi- feature self-attention mechanism fusion
MFSM: Chinese-English sentence alignment based on multi- feature self-attention mechanism fusion
Blog Article
Bilingual parallel corpora is a very important basic resource in the research field of natural language processing based on statistics.There are cross alignment and empty alignment in Chinese-English bilingual text, it is easy to affect the effect of Chinese-English sentence alignment.Therefore, we propose a novel Chinese-English sentence alignment method based on multi-feature self-attention mechanism fusion.
First, the long features of Chinese-English bilingual sentences are integrated into the Glove word vector.Then bidirectional gated recurrent unit is used to encode the feature word vector nacrack.com to obtain more fine-grained sentence local information.Second, the interactive attention mechanism is introduced to extract global information in bilingual sentences to ensure the effective use of contextual semantic features.
Finally, the Kuhn-Munkres (KM) algorithm is introduced on the basis of multi-layer perceptron, which can deal with non-monotonic aligned text and improve the generalization ability of the model.Experiments show that, the F index with the proposed method exceeds 90%, the proposed method can effectively improve the correct 3m speedglas 9002nc rate and recall rate of sentence alignment, and improve the construction efficiency of Chinese-English parallel corpora.