Implementasi Algoritma Naïve Bayes untuk Klasifikasi Konten Twitter dengan Indikasi Depresi

Andre Budiman, Julio Christian Young, Alethea Suryadibrata

Abstract


Depresi merupakan salah satu permasalahan kesehatan yang sangat berdampak bagi para penderitanya. Terdapat begitu banyak faktor depresi, di antaranya pengalaman hidup, pekerjaan, ataupun kehidupan sosial. Pada tahun 2018, diperkirakan 6.1% dari 267.7 juta penduduk di Indonesia mengalami depresi. Hal ini tentunya sangat dipengaruhi oleh stigma masyarakat terkait dengan penyakit kejiwaan dan rendahnya tingkat kesadaran masyarakat untuk melakukan konsultasi kejiwaan. Melalui perkembangan teknologi, saat ini, mayarakat seringkali mengekspresikan dirinya melalui konten-konten di media sosial. Pada penelitian ini dilakukan proses pengumpulan data-data dengan kata kunci yang mengindikasikan gangguan depresi di platform Twitter. Kemudian, dengan melibatkan seorang psikiatri, dilakukan proses pelabelan terhadap dataset untuk menentukan apakah konten memiliki label “terindikasi depresi” ataupun “tidak terindikasi”. Berdasarkan dataset tersebut, dikembangkan model prediktif dengan menggunakan metode Multinomial Naïve Bayes (MNB) dan Complement Naïve Bayes (CNB) sebagai metode klasifikasi dan metode Term Frequency–Inverse Document Frequency (TF–IDF) sebagai metode ekstraksi fitur. Berdasarkan eksperimentasi yang telah dilakukan gabungan metode TF–IDF dan MNB berhasil mencapai tingkat F-score sebesar 91.30% sementara gabungan metode TF–IDF dengan CNB berhasil mencapai tingkat performa sebesar 91.98%.

Keywords


Depresi; Klasifikasi Teks Otomatis; Multinomial Naïve Bayes; Complement Naïve Bayes

Full Text:

References


Badan Pengembangan dan Pembinaan Bahasa, “Kamus Besar Bahasa Indonesia”. [Online]. Available: https://kbbi.kemdikbud.go.id/entri/fakta. [Accessed: 04-Feb-2020].

Badan Pengembangan dan Pembinaan Bahasa, “Kamus Besar Bahasa Indonesia”. [Online]. Available: https://kbbi.kemdikbud.go.id/entri/opini. [Accessed: 04-Feb-2020].

B. Clinten, “Pengguna Aktif Harian Twitter Indonesia Diklaim Terbanyak” [Online]. Availabe: https://tekno.kompas.com/read/2019/10/30/16062477/pengguna-aktif-harian-twitter-indonesia-diklaim-terbanyak#:~:text=JAKARTA%2C%20KOMPAS.com%20%2D%20Jumlah,ke%20angka%20145%20juta%20pengguna. [Accesses: 04-Feb-2020].

Kementrian Kesehatan, “Laporan Hasil Riset Kesehatan Dasar (Riskesdas) Provinsi 2018”. Available: https://www.litbang.kemkes.go.id/laporan-riset-kesehatan-dasar-riskesdas/. [Accessed: 04-Feb-2020].

C. D. Manning, P. Raghavan, and H. Schütze, “Introduction to Information Retrieval”. Cambridge University Press. 2008.

X. Shuo, “Bayesian Naïve Bayes classifiers to text classification,” Journal of Information Science, vol. 44, no. 1, pp. 48–59.

G. Singh, B. Kumar, L. Gaur, and A. Tyagi, “Comparison between Multinomial and Bernoulli Naïve Bayes for Text Classification,” in International Conference on Automation, Computational and Technology Management (ICACTM), 2019, pp. 593–596.

H. J. Kim, J. Kim, J. Kim, and P. Lim, “Towards perfect text classification with Wikipedia-based semantic Naïve Bayes learning,”

Neurocomputing, vol. 315, pp. 128–134.

X. Shuo, Y. Li, and Z. Wang, "Bayesian multinomial Naïve Bayes classifier to text classification," in Park J., Chen SC., Raymond Choo KK. (eds) Advanced Multimedia and Ubiquitous Engineering. Singapore: Springer. 2017.

L. Ximing, and B. Yang, "A pseudo label based dataless naive bayes algorithm for text classification with seed words," in Proceedings of the 27th International Conference on Computational Linguistics, 2018, pp. 1908–1917.

Z. Qu, X. Song, S. Zheng, X. Wang, X. Song, and Z. Li, “Improved Bayes method based on TF-IDF feature and grade factor feature for chinese information classification” in IEEE International Conference on Big Data and Smart Computing (BigComp), 2018, pp. 677–680.

A. Aninditya, M. A. Hasibuan, and E. Sutoyo, “Text Mining Approach Using TF-IDF and Naive Bayes for Classification of Exam Questions Based on Cognitive Level of Bloom's Taxonomy” in 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), 2019, pp. 112–117.

D. Kim, D. Seo, S. Cho, and P. Kang, “Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec,” Information Sciences, vol. 477, pp. 15-29, 2019.

A. A. Jalal, and B. H. Ali, “Text documents clustering using data mining techniques,” International Journal of Electrical & Computer Engineering, vol. 11, no. 1, pp. 2088–8708, 2021.

A, Aizawa. “An information-theoretic perspective of tf–idf measures,” Information Processing & Management, vol. 39, no. 1, pp. 45–65, 2003.

A. K. Uysal and S. Gunal, “The impact of preprocessing on text classification,” Information Processing & Management, vol. 50, no. 1, pp. 104–112, 2014.

S. Vijayarani, M. J. Ilamathi, and M Nithya, “Preprocessing techniques for text mining-an overview,” International Journal of Computer Science & Communication Networks, vol. 5, no. 1, pp. 7–16, 2015.

J. D. Rennie, L. Shih, J. Teevan, & D. R. Karger, “Tackling the poor assumptions of naive bayes text classifiers,” in International Conference on Machine Learning, 2003, pp. 616-623.

A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, “Multinomial naive bayes for text categorization revisited,” in Australasian Joint Conference on Artificial Intelligence, pp. 488–499, 2004.

T. Wood, “What is the F-Score” [Online]. Available: https://deepai.org/machine-learning-glossary-and-terms/f-score. [Accessed at: 04-Feb-2020]

L. Breiman, “Pasting small votes for classification in large databases and on-line”, Machine Learning, vol. 36 no. 1, pp. 85-103, 1999.

G. Louppe and P. Geurts, “Ensembles on Random Patches”, Machine Learning and Knowledge Discovery in Databases, pp. 346 –361, 2012.

D. H. Wolpert, “Stacked generalization,” Neural networks, vol. 5 no. 2, pp. 241-259, 1992.

M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley, “Stochastic variational inference,” Journal of Machine Learning Research, vol. 14 no. 5, 2013.

J. Park and H. J. Oh, “Comparison of topic modeling methods for analyzing research trends of archives management in Korea: focused on LDA and HDP,” Journal of Korean Library and Information Science Society, vol. 48 no. 4, pp. 235–258, 2017.

A. Budiman, “Tweets Berbahasa Indonesia dengan Indikasi Depresi”. Available: https://github.com/andrebudiman/DatasetIndikasiDepresi. [Accessed at:04-Feb-2020]




DOI: https://doi.org/10.30591/jpit.v6i2.2419

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

JPIT INDEXED BY