IMDB Review Classification

IMDB Review Classification

Completed
BERT PyTorch NLP

Overview

A machine learning project focused on sentiment analysis of IMDB movie reviews using multiple approaches, including BERT, SVM, and neural networks. The project achieves up to 87% accuracy in classifying reviews as positive or negative, demonstrating the effectiveness of different natural language processing techniques.

Technical Details

  • BERT-based classification with 87% accuracy
  • Support Vector Machine implementation achieving 83.5% accuracy
  • Neural network architecture with 79.5% validation accuracy
  • Comprehensive text preprocessing pipeline using NLTK
  • Dataset of 50,000 movie reviews

Implementation Approaches

BERT Model

Implemented using Hugging Face's transformers library, the BERT model achieved the best performance with:

  • Precision: 0.88 for negative, 0.87 for positive reviews
  • Recall: 0.86 for negative, 0.88 for positive reviews
  • F1-Score: 0.87 across both classes

Traditional Machine Learning

Implemented multiple classical approaches including:

  • Support Vector Machine with PCA dimensionality reduction
  • Multi-layer Perceptron neural network
  • Logistic Regression with L1 regularization

Technical Capabilities

Text Processing Advanced tokenization and preprocessing pipeline
Model Architecture Custom LSTM network with attention mechanism
Inference System Batch prediction support with model versioning

Acknowledgments

Special thanks to the Aladdin Persson YouTube channel for guidance on BERT implementation, and to Kaggle for providing the IMDB dataset.

View on GitHub →