Skip to content

majamil16/gmail-spam

Repository files navigation

README.md

  1. Environment

    • Venv
    • Poetry
  2. Collecting data

    • Setting up GMail
    • Setting up S3 (optional)
    • AWS Lambda code
  3. Visualizing / exploring data

  4. Models - spam vs. not spam

    • a) Bag-of-words + Multinomial Naive Bayes
    • b) LSTM
  5. ...

0. Environment

Venv

python -m venv ./venv

Poetry

poetry export -f requirements.txt --outputrequirements.txt

Collecting data

  1. Generate an application password for your gmail account. I called mine "AWS Lambda", but you can call it whatever you want. To do this, go to your GMail account (click on profile picture) > "Manage your google account" > Security tab > Signing in to Google > App Passwords, then create a password for Mail.

  2. Set up IMAP. In Gmail itself, click on the gear icon to open Settings. Then go to the "Forwarding and POP/IMAP" tab. Scroll down to IMAP Access and make sure IMAP is enabled.

  3. Create dynamodb table (use provided script)

  4. Set up Lambda layer - reference. Running generate_lambda_deployment_package.sh does all this for you. Then add the layer to the lambda (in console -> scroll down below the Cloud9 editor to Layers and attach the layer)

  • make sure to increase timeout to ~15 mins

Data exploration

  • Note : 677 emails are labeled as 'spam' out of X emails total ("Messages that have been in Spam more than 30 days will be automatically deleted. ") - therefore, for this exercise I'll also try to frame as an anomaly detection problem. In practice, over X% of email is spam (SOURCE?).

  • For a quick comparison - in the last 30 days, I have 677 spam emails but X 'real' inbox emails, and of the real inbox emails, Y are from mailing lists / coupon lists / stores

Models

Multinomial N.B - spam vs. nonspam

LSTM - spam vs. nonspam

Other

TODO - decide various ways to explore data (classifying spam category? ex. phishing vs social engineering)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published