Isi Artikel Utama
Abstrak
Penelitian ini mengembangkan model passage retrieval untuk aplikasi Question Answering (QA) berbahasa Indonesia dalam domain spesifik, menggunakan teknik BERT embedding dan Faiss index. Model bertujuan meningkatkan efisiensi dan akurasi dalam menemukan jawaban atas pertanyaan pengguna, dengan fokus pada korpus teks yang berkaitan dengan Universitas Perjuangan Tasikmalaya. Evaluasi dilakukan terhadap 80 pertanyaan, mencakup berbagai aspek informasi dalam korpus. Hasil eksperimen menunjukkan bahwa model yang dikembangkan mencapai akurasi sebesar 65% dengan waktu eksekusi rata-rata 0,23 detik per pertanyaan dan total 18,8 detik untuk semua pertanyaan. Meskipun demikian, setelah dilakukan fine-tuning pada beberapa parameter, seperti parafrase pertanyaan dan panjang maksimal karakter dalam passage, akurasi meningkat menjadi 72,5%. Penelitian ini diharapkan memberikan kontribusi terhadap pengembangan teknologi QA berbahasa Indonesia, khususnya dalam pengolahan Passage Retrieval, serta menjadi dasar untuk penelitian lanjutan yang dapat meningkatkan efektivitas dan akurasi sistem QA dalam domain spesifik. Index Terms—Passage Retrieval, Question Answering, BERT Embedding
Kata Kunci
Rincian Artikel
Artikel ini berlisensi Creative Commons Attribution-NoDerivatives 4.0 International License.
References
- Y. Zhang et al., “Knowledgeable preference alignment for llms in domain-specific Question Answering,” arXiv Prepr. arXiv2311.06503, 2023.
- T. I. Ramadhan, A. Supriatman, and T. R. Kurniawan, “Evaluasi dan Implementasi Indobert Question Answering (QA) pada Domain Spesifik Menggunakan Mean Reciprocal Rank,” J. Algoritm., vol. 21, no. 1, pp. 180–188, 2024.
- M. A. Arefeen, B. Debnath, and S. Chakradhar, “Leancontext: Cost-efficient domain-specific Question Answering using llms,” Nat. Lang. Process. J., vol. 7, p. 100065, 2024.
- G. Izacard and E. Grave, “Leveraging Passage Retrieval with generative models for open domain Question Answering,” arXiv Prepr. arXiv2007.01282, 2020.
- R. Kusumaningrum, A. F. Hanifah, K. Khadijah, S. N. Endah, and P. S. Sasongko, “Long short-term memory for non-factoid answer selection in Indonesian Question Answering system for health information,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 2, 2023.
- L. Dai, H. Liu, and H. Xiong, “Improve dense Passage Retrieval with entailment tuning,” arXiv Prepr. arXiv2410.15801, 2024.
- D. Hao, Q. Wang, L. Guo, J. Jiang, and J. Liu, “Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 1857–1868.
- M. Douze et al., “The faiss library,” arXiv Prepr. arXiv2401.08281, 2024.
- P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 9459–9474, 2020.
- P. Qi, Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning, “Stanza: A Python natural language processing toolkit for many human languages,” arXiv Prepr. arXiv2003.07082, 2020.
- R. Wijayanti, M. L. Khodra, and D. H. Widyantoro, “Indonesian abstractive summarization using pre-trained model,” in 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), 2021, pp. 79–84.
- V. Karpukhin et al., “Dense Passage Retrieval for open-domain Question Answering,” arXiv Prepr. arXiv2004.04906, 2020.
- B. Kratzwald and S. Feuerriegel, “Putting question-answering systems into practice: Transfer learning for efficient domain customization,” ACM Trans. Manag. Inf. Syst., vol. 9, no. 4, pp. 1–20, 2019.
- F. Zhu, W. Lei, C. Wang, J. Zheng, S. Poria, and T.-S. Chua, “Retrieving and reading: A comprehensive survey on open-domain Question Answering,” arXiv Prepr. arXiv2101.00774, 2021.
- R. Lee and I.-Y. Chen, “The time complexity analysis of neural network model configurations,” in 2020 International conference on mathematics and computers in science and engineering (MACISE), 2020, pp. 178–183.
- I. Yamada, A. Asai, and H. Hajishirzi, “Efficient Passage Retrieval with hashing for open-domain Question Answering,” arXiv Prepr. arXiv2106.00882, 2021.
- D. Chandrasekaran and V. Mago, “Evolution of semantic similarity—a survey,” ACM Comput. Surv., vol. 54, no. 2, pp. 1–37, 2021.
- W. X. Zhao, J. Liu, R. Ren, and J.-R. Wen, “Dense text retrieval based on pretrained language models: A survey,” ACM Trans. Inf. Syst., vol. 42, no. 4, pp. 1–60, 2024.
- Y. Qu et al., “RocketQA: An optimized training approach to dense Passage Retrieval for open-domain Question Answering,” arXiv Prepr. arXiv2010.08191, 2020.
- C. Qin, C. Deng, J. Huang, K. Shu, and M. Bai, “An efficient faiss-based search method for mass spectral library searching,” in 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), 2020, pp. 513–518.
References
Y. Zhang et al., “Knowledgeable preference alignment for llms in domain-specific Question Answering,” arXiv Prepr. arXiv2311.06503, 2023.
T. I. Ramadhan, A. Supriatman, and T. R. Kurniawan, “Evaluasi dan Implementasi Indobert Question Answering (QA) pada Domain Spesifik Menggunakan Mean Reciprocal Rank,” J. Algoritm., vol. 21, no. 1, pp. 180–188, 2024.
M. A. Arefeen, B. Debnath, and S. Chakradhar, “Leancontext: Cost-efficient domain-specific Question Answering using llms,” Nat. Lang. Process. J., vol. 7, p. 100065, 2024.
G. Izacard and E. Grave, “Leveraging Passage Retrieval with generative models for open domain Question Answering,” arXiv Prepr. arXiv2007.01282, 2020.
R. Kusumaningrum, A. F. Hanifah, K. Khadijah, S. N. Endah, and P. S. Sasongko, “Long short-term memory for non-factoid answer selection in Indonesian Question Answering system for health information,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 2, 2023.
L. Dai, H. Liu, and H. Xiong, “Improve dense Passage Retrieval with entailment tuning,” arXiv Prepr. arXiv2410.15801, 2024.
D. Hao, Q. Wang, L. Guo, J. Jiang, and J. Liu, “Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 1857–1868.
M. Douze et al., “The faiss library,” arXiv Prepr. arXiv2401.08281, 2024.
P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 9459–9474, 2020.
P. Qi, Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning, “Stanza: A Python natural language processing toolkit for many human languages,” arXiv Prepr. arXiv2003.07082, 2020.
R. Wijayanti, M. L. Khodra, and D. H. Widyantoro, “Indonesian abstractive summarization using pre-trained model,” in 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), 2021, pp. 79–84.
V. Karpukhin et al., “Dense Passage Retrieval for open-domain Question Answering,” arXiv Prepr. arXiv2004.04906, 2020.
B. Kratzwald and S. Feuerriegel, “Putting question-answering systems into practice: Transfer learning for efficient domain customization,” ACM Trans. Manag. Inf. Syst., vol. 9, no. 4, pp. 1–20, 2019.
F. Zhu, W. Lei, C. Wang, J. Zheng, S. Poria, and T.-S. Chua, “Retrieving and reading: A comprehensive survey on open-domain Question Answering,” arXiv Prepr. arXiv2101.00774, 2021.
R. Lee and I.-Y. Chen, “The time complexity analysis of neural network model configurations,” in 2020 International conference on mathematics and computers in science and engineering (MACISE), 2020, pp. 178–183.
I. Yamada, A. Asai, and H. Hajishirzi, “Efficient Passage Retrieval with hashing for open-domain Question Answering,” arXiv Prepr. arXiv2106.00882, 2021.
D. Chandrasekaran and V. Mago, “Evolution of semantic similarity—a survey,” ACM Comput. Surv., vol. 54, no. 2, pp. 1–37, 2021.
W. X. Zhao, J. Liu, R. Ren, and J.-R. Wen, “Dense text retrieval based on pretrained language models: A survey,” ACM Trans. Inf. Syst., vol. 42, no. 4, pp. 1–60, 2024.
Y. Qu et al., “RocketQA: An optimized training approach to dense Passage Retrieval for open-domain Question Answering,” arXiv Prepr. arXiv2010.08191, 2020.
C. Qin, C. Deng, J. Huang, K. Shu, and M. Bai, “An efficient faiss-based search method for mass spectral library searching,” in 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), 2020, pp. 513–518.