Implementation of a Plagiarism Detection System Text Based

Authors

  • Progresif Buulolo Magister of Computer Science, Potensi Utama University Author
  • B. Herawan Hayadi Magister of Computer Science, Potensi Utama University Author
  • Dedi Hartama Magister of Computer Science, Potensi Utama University Author

Keywords:

Similarity Plagiarism, Cosine Similarity, Jaccard Similarity, N-Gram, Text Data

Abstract

Plagiarism, the act of plagiarizing or stealing work without acknowledgment, is a serious challenge in the academic world. Scientific work, as a common target for plagiarism, is increasingly influenced by information technology. This research implements a text-based plagiarism detection system by comparing the level of similarity between the Cosine Similarity and Jaccard Similarity algorithms against winnowing for text similarity detection related to variations in N-gram values 3, 5, and 7. Testing was carried out using the Python programming language and its supporting libraries on 20 datasets. The test results show that Cosine Similarity is better at detecting similarities between texts. Accuracy analysis using the confusion matrix produces an accuracy value of 50%. The comparison results of different n-gram variations have a total performance similarity of 15.89% and an average of 0.26%. Meanwhile, the total performance of Jaccard similarity is 13.59%, and the average is 0.23%. Although Cosine Similarity has higher accuracy than Jaccard Similarity, the stability does not reach 100%.

Downloads

Download data is not yet available.
ASTEEC

Downloads

Published

2024-11-27

How to Cite

Implementation of a Plagiarism Detection System Text Based. (2024). ASTEEC Conference Proceeding: Computer Science, 1(1), 43-52. https://www.proceedings.asteec.com/index.php/acp-cs/article/view/7