Smart Spam Filter  1.0
A spam filter using Machine Learning.
fast_single.py File Reference

Continuous Single Mail Processing Code. More...

Functions

def fast_single.predict (mail_file)
 Method to predict result of a single mail. More...
 
def fast_single.mail_features (mail)
 Method to find features of a single mail. More...
 
def fast_single.preprocessor (mail)
 Method to pre-process the mails. More...
 
def fast_single.find_payload (mail_body, all_words)
 Method to recursively find single part payloads. More...
 
def fast_single.split_payload (payload, all_words)
 Method to split the large payloads into smaller chunks. More...
 
def fast_single.get_words_plain (content, all_words)
 Method to get words out of plain text content. More...
 
def fast_single.get_words_html (content, all_words)
 Method to get words out of html content. More...
 

Variables

 fast_single.nlp = spacy.load("en_core_web_sm")
 
 fast_single.stopWords = spacy.lang.en.stop_words.STOP_WORDS
 
 fast_single.dictionary = json.load(dic)
 
int fast_single.dic_size = 3000
 
string fast_single.SPAM_DIR = "/var/mail/folder/spam"
 
 fast_single.ml_model = pickle.load(open('spamfilter.sav', 'rb'))
 
 fast_single.logfile_location = sys.argv[1]
 
 fast_single.logfile = open(logfile_location, "r")
 
 fast_single.logfile_ino = os.fstat(logfile.fileno()).st_ino
 
 fast_single.fil = open("spamfilter.log", "a")
 
 fast_single.mail = logfile.readline()
 
 fast_single.startTime = datetime.now()
 
def fast_single.result = predict(mail)
 
 fast_single.endTime = datetime.now()
 
 fast_single.processTime = endTime - startTime
 
 fast_single.new = open(logfile_location, "r")
 

Detailed Description

Continuous Single Mail Processing Code.

This code loads the modules, continuously reads address of mails from log file and processes them.

Author
Sudhanshu Dubey
Version
1.0
Date
3/7/2019 \params logfile_location The location of log file.
Bug:
No known bugs

Function Documentation

◆ find_payload()

def fast_single.find_payload (   mail_body,
  all_words 
)

Method to recursively find single part payloads.

Parameters
mail_bodyThe complete mail body
all_wordsList of all words in the mail
Returns
Nothing

◆ get_words_html()

def fast_single.get_words_html (   content,
  all_words 
)

Method to get words out of html content.

Parameters
contentThe html content
all_wordsList of all words in the mail
Returns
Nothing

◆ get_words_plain()

def fast_single.get_words_plain (   content,
  all_words 
)

Method to get words out of plain text content.

Parameters
contentPlain text content
all_wordsList of all words in the mail
Returns
Nothing

◆ mail_features()

def fast_single.mail_features (   mail)

Method to find features of a single mail.

Parameters
mailThe address of mail
Returns
features_matrix: The features of a single mail

◆ predict()

def fast_single.predict (   mail_file)

Method to predict result of a single mail.

Parameters
mail_fileThe address of mail
Returns
result: The result of a mail in binary

◆ preprocessor()

def fast_single.preprocessor (   mail)

Method to pre-process the mails.

Parameters
mailThe address of mail
Returns
all_words: List of all words in mail

◆ split_payload()

def fast_single.split_payload (   payload,
  all_words 
)

Method to split the large payloads into smaller chunks.

Parameters
payloadThe complete payload
all_wordsList of all words in the mail
Returns
Nothing