One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval

Asai, Akari; Yu, Xinyan; Kasai, Jungo; Hajishirzi, Hannaneh

Computer Science > Computation and Language

arXiv:2107.11976 (cs)

[Submitted on 26 Jul 2021 (v1), last revised 28 Oct 2021 (this version, v2)]

Title:One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval

Authors:Akari Asai, Xinyan Yu, Jungo Kasai, Hannaneh Hajishirzi

View PDF

Abstract:We present Cross-lingual Open-Retrieval Answer Generation (CORA), the first unified many-to-many question answering (QA) model that can answer questions across many languages, even for ones without language-specific annotated data or knowledge sources. We introduce a new dense passage retrieval algorithm that is trained to retrieve documents across languages for a question. Combined with a multilingual autoregressive generation model, CORA answers directly in the target language without any translation or in-language retrieval modules as used in prior work. We propose an iterative training method that automatically extends annotated data available only in high-resource languages to low-resource ones. Our results show that CORA substantially outperforms the previous state of the art on multilingual open QA benchmarks across 26 languages, 9 of which are unseen during training. Our analyses show the significance of cross-lingual retrieval and generation in many languages, particularly under low-resource settings.

Comments:	Published as a conference paper at NeurIPS 2021. Our code and trained model are publicly available at this https URL
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2107.11976 [cs.CL]
	(or arXiv:2107.11976v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2107.11976

Submission history

From: Akari Asai [view email]
[v1] Mon, 26 Jul 2021 06:02:54 UTC (4,694 KB)
[v2] Thu, 28 Oct 2021 00:11:20 UTC (3,690 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-07

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Akari Asai
Jungo Kasai
Hannaneh Hajishirzi

export BibTeX citation

Computer Science > Computation and Language

Title:One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators