Skip to the content.

Medical Text Classification

Score: 88/100 — Evaluated by industry expert

Executive Summary

Clinical documentation — discharge summaries, consultation notes, diagnostic reports — accounts for millions of documents processed daily across healthcare systems. Manual routing of these notes to the correct medical department is slow, error-prone, and contributes to delayed patient care. This project builds an NLP classifier that automatically categorizes clinical notes into medical specialties, demonstrating a progression from classical baselines to transformer-based models.

Business Problem

A healthcare system processes 2,000+ clinical notes daily across five departments:

Current state:

Goal: Automate clinical note classification with >85% accuracy, reducing routing errors and enabling faster specialist referrals.

Target Categories

| Department | Key Signals | Note Volume | |———–|————-|————-| | Cardiology | heart, cardiac, ECG, arrhythmia, hypertension | ~22% | | Neurology | headache, seizure, brain, MRI, stroke | ~20% | | Orthopedics | fracture, joint, bone, knee, arthritis | ~18% | | Gastroenterology | abdomen, liver, nausea, endoscopy | ~19% | | Pulmonology | lung, breathing, cough, asthma, respiratory | ~21% |

Approach

Step Method Rationale
Text Preprocessing Lowercasing, regex cleaning, whitespace normalization Reduces vocabulary noise without losing clinical signal
Feature Extraction TF-IDF with bigrams (max 5,000 features) Captures multi-word medical terms (“chest pain”, “heart failure”)
Baseline TF-IDF + Logistic Regression Standard NLP baseline — fast, interpretable, surprisingly strong
Improvement 1 TF-IDF + Linear SVM Max-margin objective works well in high-dimensional sparse spaces
Improvement 2 BiLSTM + GloVe Sequence modeling captures word order and context
Improvement 3 BERT fine-tuned Pre-trained on biomedical text; understands medical language

Results

Model Accuracy Macro F1 Weighted F1
TF-IDF + Logistic Regression (baseline) 0.78 0.76 0.78
TF-IDF + Linear SVM 0.81 0.79 0.81
BiLSTM + GloVe 0.84 0.82 0.84
BERT fine-tuned 0.89 0.88 0.89

Improvement over baseline: +11% accuracy, +12% F1

Key Findings

Evaluation Criteria (Graded)

Criteria Score Notes
Feature Engineering 87% TF-IDF with bigrams, clinical text preprocessing
Baseline Model Training 90% Logistic regression with proper stratified split
Model Evaluation 88% Per-class F1, confusion matrix, error analysis
Improvements over Baseline 88% Clear progression from TF-IDF → SVM → BiLSTM → BERT
Final Running Code 87% Clean pipeline, reproducible

Limitations and Next Steps

How to Run

pip install -r requirements.txt

# Full analysis (recommended)
jupyter notebook medical_text_classification.ipynb

# Or run the pipeline scripts
python src/preprocess.py    # Preprocess clinical notes
python src/train.py         # Train all models

Tech Stack

Python, scikit-learn, TF-IDF, PyTorch, HuggingFace Transformers, BERT, Matplotlib