Smart Spam Filter  1.0
A spam filter using Machine Learning.
partial_filter.py File Reference

Model Retraining Code. More...

Functions

def partial_filter.update_Dictionary (emails)
 Method to update Dictionary. More...
 
def partial_filter.extract_features (files)
 Method to extract features from all mails. More...
 
def partial_filter.mail_features (mail)
 Method to find features of a single mail. More...
 
def partial_filter.preprocessor (mail)
 Method to pre-process the mails. More...
 
def partial_filter.find_payload (mail_body, all_words)
 Method to recursively find single part payloads. More...
 
def partial_filter.split_payload (payload, all_words)
 Method to split the large payloads into smaller chunks. More...
 
def partial_filter.get_words_plain (content, all_words)
 Method to get words out of plain text content. More...
 
def partial_filter.get_words_html (content, all_words)
 Method to get words out of html content. More...
 

Variables

 partial_filter.nlp = spacy.load("en_core_web_sm")
 
 partial_filter.stopWords = spacy.lang.en.stop_words.STOP_WORDS
 
int partial_filter.dic_size = 3000
 
 partial_filter.dictionary = json.load(dic)
 
 partial_filter.ml_model = pickle.load(open('spamfilter.sav', 'rb'))
 
 partial_filter.directory = sys.argv[1]
 
 partial_filter.spam_status = sys.argv[2]
 
list partial_filter.emails = [os.path.join(directory, f) for f in os.listdir(directory)]
 
 partial_filter.no_of_emails = len(emails)
 
def partial_filter.new_dictionary = update_Dictionary(emails)
 
def partial_filter.new_features = extract_features(emails)
 
 partial_filter.new_train_labels = np.zeros(no_of_emails)
 

Detailed Description

Model Retraining Code.

This code loads the current model and dictionary and updates them based on the new mails.

Author
Sudhanshu Dubey
Version
1.0
Date
29/6/2019
Parameters
directoryThe full address of directory containing retraining mails.
spam_status1 if the mails in directory are spam, 0 if they are ham.
Bug:
No known bugs

Function Documentation

◆ extract_features()

def partial_filter.extract_features (   files)

Method to extract features from all mails.

Parameters
filesThe list of mail files' addresses
Returns
features_matrix A np-array containing features of all mails

◆ find_payload()

def partial_filter.find_payload (   mail_body,
  all_words 
)

Method to recursively find single part payloads.

Parameters
mail_bodyThe complete mail body
all_wordsList of all words in the mail
Returns
Nothing

◆ get_words_html()

def partial_filter.get_words_html (   content,
  all_words 
)

Method to get words out of html content.

Parameters
contentThe html content
all_wordsList of all words in the mail
Returns
Nothing

◆ get_words_plain()

def partial_filter.get_words_plain (   content,
  all_words 
)

Method to get words out of plain text content.

Parameters
contentPlain text content
all_wordsList of all words in the mail
Returns
Nothing

◆ mail_features()

def partial_filter.mail_features (   mail)

Method to find features of a single mail.

Parameters
mailThe address of mail
Returns
features_matrix: The features of a single mail

◆ preprocessor()

def partial_filter.preprocessor (   mail)

Method to pre-process the mails.

Parameters
mailThe address of mail
Returns
all_words: List of all words in mail

◆ split_payload()

def partial_filter.split_payload (   payload,
  all_words 
)

Method to split the large payloads into smaller chunks.

Parameters
payloadThe complete payload
all_wordsList of all words in the mail
Returns
Nothing

◆ update_Dictionary()

def partial_filter.update_Dictionary (   emails)

Method to update Dictionary.

Parameters
emailsThe list of mail files' addresses
Returns
new_dictionary The updated dictionary containing most common words