
IMDB Review Classification
Overview
A machine learning project focused on sentiment analysis of IMDB movie reviews using multiple approaches, including BERT, SVM, and neural networks. The project achieves up to 87% accuracy in classifying reviews as positive or negative, demonstrating the effectiveness of different natural language processing techniques.
Technical Details
- BERT-based classification with 87% accuracy
- Support Vector Machine implementation achieving 83.5% accuracy
- Neural network architecture with 79.5% validation accuracy
- Comprehensive text preprocessing pipeline using NLTK
- Dataset of 50,000 movie reviews
Implementation Approaches
BERT Model
Implemented using Hugging Face's transformers library, the BERT model achieved the best performance with:
- Precision: 0.88 for negative, 0.87 for positive reviews
- Recall: 0.86 for negative, 0.88 for positive reviews
- F1-Score: 0.87 across both classes
Traditional Machine Learning
Implemented multiple classical approaches including:
- Support Vector Machine with PCA dimensionality reduction
- Multi-layer Perceptron neural network
- Logistic Regression with L1 regularization
Technical Capabilities
Text Processing
Advanced tokenization and preprocessing pipeline
Model Architecture
Custom LSTM network with attention mechanism
Inference System
Batch prediction support with model versioning
Acknowledgments
Special thanks to the Aladdin Persson YouTube channel for guidance on BERT implementation, and to Kaggle for providing the IMDB dataset.