Optimalisasi Stemming Kata Berimbuhan Tidak Baku Pada Bahasa Indonesia Dengan Levenshtein Distance

Rahardyan Bisma Setya Putra, Ema Utami, Suwanto Raharjo

Abstract


Stemming algorithm Nazief & Andriani has been development in terms of the speed and the accuracy. One of its development is Non-formal Affix Algorithm. Non-formal Affix Algorithm improves the accuracy for non-formal affixed word. In its growth, Indonesian language is used in two ways: formal and non-formal. Non-formal language is commonly used in casual situations such as conversations and social media post (Facebook, Twitter, Instagram, etc.). To get the root of the word of a casual conversation or a social media post, stemming algorithm which can process the non-formal words with affixes already proposed. But, the previous algorithm unable to stem a non-formal word that slightly change the root word. Therefore, this study modifies Non-formal Affix Algorithm to increase stemming accuracy on non-formal word. Modifications are made by adding Levenshtein Distance. The result of the research shows that the algorithm made in this research has 96.6% accuracy while the Non-formal Affix algorithm has 73.3% accuracy in processing 60 non-formal affixed words. Based on the result, Levenshtein Distance approach can increase the accuracy on stemming non-formal affixed word.

Full Text:

References


R. Setiawan, A. Kurniawan, W. Budiharto, I. H. Kartowisastro and H. Prabowo, "Flexible affix classification for stemming Indonesian Language," 2016 13th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Chiang Mai, 2016, pp. 1-6.

Rahardyan Bisma, Ema Utami, “Non-formal Affixed Word Stemming in Indonesian Language,” 2018 International Conference on Information and Communication Technology (ICOIACT), Yogyakarta, 2018.

Mardiana Tari, Bharata Teguh, Hidayah Indriana, “Stemming Influence on Similiarity Detection of Abstract Written in Indonesia”, in TELKOMNIKA, vol. 14, 2016, pp. 219-227.

A. Aulia, D. Khairani and N. Hakiem, "Development of a retrieval system for Al Hadith in Bahasa (case study: Hadith Bukhari)," 2017 5th International Conference on Cyber and IT Service Management (CITSM), Denpasar, 2017, pp. 1-5.

A. Sinaga, Adiwijaya and H. Nugroho, "Development of word-based text compression algorithm for Indonesian language document," 2015 3rd International Conference on Information and Communication Technology (ICoICT), Nusa Dua, 2015, pp. 450-454.

M. K. Keleş and S. A. Özel, "Similarity detection between Turkish text documents with distance metrics," 2017 International Conference on Computer Science and Engineering (UBMK), Antalya, 2017, pp. 316-321.

S. Zhang, Y. Hu and G. Bian, "Research on string similarity algorithm based on Levenshtein Distance," 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, 2017, pp. 2247-2251.

A. Ene and A. Ene, "An application of Levenshtein algorithm in vocabulary learning," 2017 9th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), Targoviste, 2017, pp. 1-4.

A. Aulia, D. Khairani and N. Hakiem, "Development of a retrieval system for Al Hadith in Bahasa (case study: Hadith Bukhari)," 2017 5th International Conference on Cyber and IT Service Management (CITSM), Denpasar, 2017, pp. 1-5.

J. Asian, H. Williams dan S. Tahaghoghi, “Stemming Indonesian”, in Conferences in Research and Practice in Information Technology Series, vol. 38, 2005, pp. 307-314.

A.Z. Arifin, Mahendra and Ciptaningtyas, “Enchanced confix stripping stemmer and ants algorithm for classifying news document in Indonesian language”.

Khotimah Khusnul, “Analysis of Indonesian Affixes in English Words Found in Mobile Guide Edition: 54-59”, in Thesis in English Departmen Faculty of Humanity Diponegoro University, 2012.

D. Medhat, A. Hassan and C. Salama, "A hybrid cross-language name matching technique using novel modified Levenshtein Distance," 2015 Tenth International Conference on Computer Engineering & Systems (ICCES), Cairo, 2015, pp. 204-209.

Zen Laily, “Non-formal Affix in Indonesian Informal Language Variety”, in Lingua: Journal Ilmu Bahsaa dan Sastra, 2011.

Emilya Ully Artha, Ahmad Dahlan, “Klasifikasi Model Percakapan Twitter Mengenai Ujian Nasional”, in JPIT: Journal Pengembangan IT, 2018.




DOI: http://dx.doi.org/10.30591/jpit.v3i2.877

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Terindeks oleh :

 

 

http://ejournal.poltektegal.ac.id/public/site/images/informatika/logoGaruda-kecil1.png

 

 

 http://ejournal.poltektegal.ac.id/public/site/images/informatika/Google_Scholar_logo.png

 

 

 

 

 

 

 

 

 

 

 

 

 

  ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Tim Redaksi JURNAL INFORMATIKA : JURNAL PENGEMBANGAN IT

Program Studi D4 Teknik Informatika
Politeknik Harapan Bersama Tegal
Jl. Mataram No.09 Pesurungan Lor Kota Tegal

Telp. +62283 - 352000

Email :
informatika.ejournal@poltektegal.ac.id

   

Copyright: JPIT (Jurnal Informatika: Jurnal Pengembangan IT) p-ISSN: 2477-5126 (print), e-ISSN 2548-9356 (online) 

Flag Counter
 
 
 
 
site
stats
 
View Visitor Statistic
 
 
 
 
 

 

Creative Commons License
JPIT (Jurnal Informatika: Jurnal Pengembangan IT) is licensed under a Creative Commons Attribution 4.0 International License.