Skip to content

kdent/spam_blocker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Steps

Training:

get list of message files

for each file:

- read email message to extract from address and subject line.
    (read_file.c, msg_content.c)
- tokenize the from address ignoring the address itself and the subject line
    o look for other features like camel caps, numbers for letters substitution
    (doc_features.c)
- store each unique token in the list of words
- create a matrix of all the words in the corpus


About

Spam classifier based on email headers only

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published