IMDB Review Classification

Overview

A machine learning project focused on sentiment analysis of IMDB movie reviews using multiple approaches, including BERT, SVM, and neural networks. The project achieves up to 87% accuracy in classifying reviews as positive or negative, demonstrating the effectiveness of different natural language processing techniques.

Technical Details

BERT-based classification with 87% accuracy
Support Vector Machine implementation achieving 83.5% accuracy
Neural network architecture with 79.5% validation accuracy
Comprehensive text preprocessing pipeline using NLTK
Dataset of 50,000 movie reviews

Implementation Approaches

BERT Model

Implemented using Hugging Face's transformers library, the BERT model achieved the best performance with:

Precision: 0.88 for negative, 0.87 for positive reviews
Recall: 0.86 for negative, 0.88 for positive reviews
F1-Score: 0.87 across both classes

Traditional Machine Learning

Implemented multiple classical approaches including:

Support Vector Machine with PCA dimensionality reduction
Multi-layer Perceptron neural network
Logistic Regression with L1 regularization

Technical Capabilities

Text Processing Advanced tokenization and preprocessing pipeline

Model Architecture Custom LSTM network with attention mechanism

Inference System Batch prediction support with model versioning

Acknowledgments

Special thanks to the Aladdin Persson YouTube channel for guidance on BERT implementation, and to Kaggle for providing the IMDB dataset.