Smart Spam Filter  1.0
A spam filter using Machine Learning.
fast_mutiple.py File Reference

Mail Processing Code. More...

Functions

def fast_mutiple.multiple (mail_dir)
 Method to predict results for all mails individually. More...
 
def fast_mutiple.predict (mail_file)
 Method to predict result of a single mail. More...
 
def fast_mutiple.mail_features (mail)
 Method to find features of a single mail. More...
 
def fast_mutiple.preprocessor (mail)
 Method to pre-process the mails. More...
 
def fast_mutiple.find_payload (mail_body, all_words)
 Method to recursively find single part payloads. More...
 
def fast_mutiple.split_payload (payload, all_words)
 Method to split the large payloads into smaller chunks. More...
 
def fast_mutiple.get_words_plain (content, all_words)
 Method to get words out of plain text content. More...
 
def fast_mutiple.get_words_html (content, all_words)
 Method to get words out of html content. More...
 

Variables

 fast_mutiple.nlp = spacy.load("en_core_web_sm")
 
 fast_mutiple.stopWords = spacy.lang.en.stop_words.STOP_WORDS
 
 fast_mutiple.dictionary = json.load(dic)
 
int fast_mutiple.dic_size = 3000
 
 fast_mutiple.ml_model = pickle.load(open('spamfilter.sav', 'rb'))
 
 fast_mutiple.directory = sys.argv[1]
 

Detailed Description

Mail Processing Code.

This code loads the modules and process all files at once.

Author
Sudhanshu Dubey
Version
1.0
Date
25/6/2019 \params directory The directory containing all the mails to be classified.
Bug:
No known bugs

Function Documentation

◆ find_payload()

def fast_mutiple.find_payload (   mail_body,
  all_words 
)

Method to recursively find single part payloads.

Parameters
mail_bodyThe complete mail body
all_wordsList of all words in the mail
Returns
Nothing

◆ get_words_html()

def fast_mutiple.get_words_html (   content,
  all_words 
)

Method to get words out of html content.

Parameters
contentThe html content
all_wordsList of all words in the mail
Returns
Nothing

◆ get_words_plain()

def fast_mutiple.get_words_plain (   content,
  all_words 
)

Method to get words out of plain text content.

Parameters
contentPlain text content
all_wordsList of all words in the mail
Returns
Nothing

◆ mail_features()

def fast_mutiple.mail_features (   mail)

Method to find features of a single mail.

Parameters
mailThe address of mail
Returns
features_matrix: The features of a single mail

◆ multiple()

def fast_mutiple.multiple (   mail_dir)

Method to predict results for all mails individually.

Parameters
mail_dirThe directory containing mails
Returns
Nothing

◆ predict()

def fast_mutiple.predict (   mail_file)

Method to predict result of a single mail.

Parameters
mail_fileThe address of mail
Returns
result: The result of a mail in binary

◆ preprocessor()

def fast_mutiple.preprocessor (   mail)

Method to pre-process the mails.

Parameters
mailThe address of mail
Returns
all_words: List of all words in mail

◆ split_payload()

def fast_mutiple.split_payload (   payload,
  all_words 
)

Method to split the large payloads into smaller chunks.

Parameters
payloadThe complete payload
all_wordsList of all words in the mail
Returns
Nothing