Please use this identifier to cite or link to this item:
Title: Transforming Message Detection
Authors: Ermakova, L.
Issue Date: 2011
Publisher: St. Petersburg University Press
Citation: Ermakova L. Transforming Message Detection / L. Ermakova // Web of Data: The joint RuSSIR/EDBT 2011 Summer School, August 15–19, 2011, Proceedings of the Fifth Russian Young Scientists Conference in Information Retrieval / B. Novikov, P. Braslavsky (Eds.). — St. Petersburg, 2011 — P. 15-29.
Abstract: The majority of existing spam filtering techniques suffers from several serious disadvantages. Some of them provide many false positives. The others are suitable only for email filtering and may not be used in IM and social networks. Therefore content methods seem to be more efficient. One of them is based on signature retrieval. However it is not change resistant. There are enhancements (e.g. checksums) but they are extremely time and resource consuming. That is why the main objective of this research is to develop a transforming message detection method. To this end we have compared spam in various languages, namely English, French, Russian and Italian. For each language the number of examined messages including spam and notspam was about 1000. 135 quantitative features have been retrieved. Almost all these features do not depend on the language. They underlie the first step of the algorithm based on support vector machine. The next stage is to test the obtained results applying N-gram approach. Special attention is paid to word distortion and text alteration. The obtaining results indicate the efficiency of the suggested approach.
Keywords: SPAM
Conference name: V Russian Summer School in Information Retrieval (RuSSIR’2011)
V Российская летняя школа по информационному поиску (RuSSIR’2011)
EDBT Summer Schools
Conference date: 15.08.2011–19.08.2011
ISBN: 978-5-288-05225-5
Origin: RuSSIR/EDBT2011
Appears in Collections:Информационный поиск

Files in This Item:
File Description SizeFormat 
RuSSIR_2011_02.pdf427,68 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.