Isi Artikel Utama

Abstrak

Penelitian ini mengembangkan   model   passage   retrieval untuk aplikasi Question Answering (QA) berbahasa Indonesia dalam domain spesifik, menggunakan teknik BERT embedding dan Faiss index. Model bertujuan meningkatkan efisiensi dan akurasi dalam menemukan jawaban atas pertanyaan pengguna, dengan fokus pada korpus teks yang berkaitan dengan Universitas Perjuangan Tasikmalaya. Evaluasi dilakukan terhadap 80 pertanyaan, mencakup berbagai aspek informasi dalam korpus. Hasil eksperimen menunjukkan bahwa model yang dikembangkan mencapai akurasi sebesar 65% dengan waktu eksekusi rata-rata 0,23 detik per pertanyaan dan total 18,8 detik untuk semua pertanyaan. Meskipun demikian, setelah dilakukan fine-tuning pada beberapa parameter, seperti parafrase pertanyaan dan panjang maksimal karakter dalam passage, akurasi meningkat menjadi 72,5%. Penelitian ini diharapkan memberikan kontribusi terhadap pengembangan teknologi QA berbahasa Indonesia, khususnya dalam pengolahan Passage Retrieval, serta menjadi dasar untuk penelitian lanjutan yang dapat meningkatkan efektivitas dan akurasi sistem QA dalam domain spesifik. Index TermsPassage Retrieval, Question Answering, BERT Embedding

Kata Kunci

Passage Retrieval BERT Embedding Question Answering

Rincian Artikel

Cara Mengutip
[1]
T. I. Ramadhan, A. Supriatman, dan T. R. Kurniawan, “Passage Retrieval untuk Question Answering Bahasa Indonesia Menggunakan BERT dan FAISS”, Jurnal Algoritma, vol. 21, no. 2, hlm. 156–163, Nov 2024.

References

  1. Y. Zhang et al., “Knowledgeable preference alignment for llms in domain-specific Question Answering,” arXiv Prepr. arXiv2311.06503, 2023.
  2. T. I. Ramadhan, A. Supriatman, and T. R. Kurniawan, “Evaluasi dan Implementasi Indobert Question Answering (QA) pada Domain Spesifik Menggunakan Mean Reciprocal Rank,” J. Algoritm., vol. 21, no. 1, pp. 180–188, 2024.
  3. M. A. Arefeen, B. Debnath, and S. Chakradhar, “Leancontext: Cost-efficient domain-specific Question Answering using llms,” Nat. Lang. Process. J., vol. 7, p. 100065, 2024.
  4. G. Izacard and E. Grave, “Leveraging Passage Retrieval with generative models for open domain Question Answering,” arXiv Prepr. arXiv2007.01282, 2020.
  5. R. Kusumaningrum, A. F. Hanifah, K. Khadijah, S. N. Endah, and P. S. Sasongko, “Long short-term memory for non-factoid answer selection in Indonesian Question Answering system for health information,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 2, 2023.
  6. L. Dai, H. Liu, and H. Xiong, “Improve dense Passage Retrieval with entailment tuning,” arXiv Prepr. arXiv2410.15801, 2024.
  7. D. Hao, Q. Wang, L. Guo, J. Jiang, and J. Liu, “Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 1857–1868.
  8. M. Douze et al., “The faiss library,” arXiv Prepr. arXiv2401.08281, 2024.
  9. P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 9459–9474, 2020.
  10. P. Qi, Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning, “Stanza: A Python natural language processing toolkit for many human languages,” arXiv Prepr. arXiv2003.07082, 2020.
  11. R. Wijayanti, M. L. Khodra, and D. H. Widyantoro, “Indonesian abstractive summarization using pre-trained model,” in 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), 2021, pp. 79–84.
  12. V. Karpukhin et al., “Dense Passage Retrieval for open-domain Question Answering,” arXiv Prepr. arXiv2004.04906, 2020.
  13. B. Kratzwald and S. Feuerriegel, “Putting question-answering systems into practice: Transfer learning for efficient domain customization,” ACM Trans. Manag. Inf. Syst., vol. 9, no. 4, pp. 1–20, 2019.
  14. F. Zhu, W. Lei, C. Wang, J. Zheng, S. Poria, and T.-S. Chua, “Retrieving and reading: A comprehensive survey on open-domain Question Answering,” arXiv Prepr. arXiv2101.00774, 2021.
  15. R. Lee and I.-Y. Chen, “The time complexity analysis of neural network model configurations,” in 2020 International conference on mathematics and computers in science and engineering (MACISE), 2020, pp. 178–183.
  16. I. Yamada, A. Asai, and H. Hajishirzi, “Efficient Passage Retrieval with hashing for open-domain Question Answering,” arXiv Prepr. arXiv2106.00882, 2021.
  17. D. Chandrasekaran and V. Mago, “Evolution of semantic similarity—a survey,” ACM Comput. Surv., vol. 54, no. 2, pp. 1–37, 2021.
  18. W. X. Zhao, J. Liu, R. Ren, and J.-R. Wen, “Dense text retrieval based on pretrained language models: A survey,” ACM Trans. Inf. Syst., vol. 42, no. 4, pp. 1–60, 2024.
  19. Y. Qu et al., “RocketQA: An optimized training approach to dense Passage Retrieval for open-domain Question Answering,” arXiv Prepr. arXiv2010.08191, 2020.
  20. C. Qin, C. Deng, J. Huang, K. Shu, and M. Bai, “An efficient faiss-based search method for mass spectral library searching,” in 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), 2020, pp. 513–518.