Open to full-time opportunities · F-1 OPT · Available Now
Prabhath Vinay Vipparthi

Think in data.
Build with AI.

MS Data Science graduate from NJIT (May 2026). I build things that work — ML models, data pipelines, NLP systems — and I care deeply about understanding why they work.

📍 Harrison, NJ NJIT · MS Data Science · 2026 F-1 · OPT Pending 8 Projects 5 Certifications
View my work Get in touch LinkedIn ↗ GitHub ↗
Scroll
8
Projects
Built
351
Automated
Test Cases
30k+
Code Samples
Processed
1.8M
Records in
Hadoop Pipeline
6
ML Models
Compared
5
Certifications
Earned

About

Building systems that think and scale.

I'm Prabhath Vinay Vipparthi, a Master's in Data Science graduate from NJIT (May 2026). I work across the full stack of data — from raw pipelines and feature engineering to training models and putting them into production.

I've built NLP classification engines with human-in-the-loop workflows, implemented ML algorithms from scratch, trained LSTMs on clinical data, and processed 1.8 million records on distributed Hadoop clusters. I care about doing things properly: writing tests, understanding the math, and building systems that hold up.

I'm looking for a team where I can keep learning fast and contribute meaningful work from day one.

📋
Work Authorization
F-1 visa · Graduated May 2026 · OPT application submitted · Biometrics completed · EAD in process · STEM-eligible (3-year extension eligible). No sponsorship required during the OPT period.
New Jersey Institute of Technology
Master of Science in Data Science
Computational Track · Ying Wu College of Computing
May 2026 · Newark, NJ
Graduated
Lendi Institute of Engineering & Technology
Bachelor of Technology
Electrical & Electronics Engineering
April 2023 · Andhra Pradesh, India
Contact
📍 Harrison, NJ, USA
📞 973-418-9427
✉️ vipparthi.prabhathvinay23@gmail.com
🔗 LinkedIn  ·  GitHub

Experience

Where I've done the work.

AI/ML Engineering Intern
NJIT – The Learning & Development Initiative (LDI)
May 2026
  • Engineered a production AI-assisted digital badge classification platform using FastAPI, React (Vite + Tailwind CSS), spaCy, and SQLite — operationalizing NJIT's institutional badge taxonomy through deterministic rule-based workflows with 100% classification accuracy across 20 real-world badges.
  • Designed a four-layer NLP pipeline — 130+ custom phrase patterns, 44 regex rules, Bloom's verb extraction, and metadata normalization — to extract classification signals from structured and unstructured badge data across three input formats (OBv3 JSON, guided forms, free text).
  • Built a deterministic classification engine with full rule traceability, 8-element plain-English explanations per output, governance logging, and human-in-the-loop override workflows requiring written justification — supporting auditable institutional decision-making.
  • Implemented automated validation and regression testing with 351 test cases (100% pass rate) covering end-to-end classification logic, NLP extraction layers, taxonomy rule consistency, and all 7 end-to-end workflow scenarios.
View Repository ↗
BME Office Assistant
New Jersey Institute of Technology
May 2026
  • Developed and maintained a web-based study room booking system to streamline scheduling workflows and improve internal resource management.
  • Maintained departmental web systems, faculty information, and operational data consistency across platforms.
  • Assisted with workflow coordination, documentation management, and data organization for departmental operations.

Projects

Things I've built & explored.

02
Machine Learning · Healthcare
Heart Failure Prediction
Compared Random Forest, LSTM, and KNN on 918 patient records (12 clinical features) using 10-fold cross-validation. Computed 15+ metrics per fold — accuracy, F1, TSS, HSS, Brier Score, and AUC. Random Forest achieved best performance: 86.8% accuracy, F1 0.883, and AUC 0.94 vs. LSTM at 84.2% and KNN at 71.0%.
RF 86.8% Acc · AUC 0.94 · 15+ Metrics
PythonRandom ForestLSTMTensorFlow10-Fold CV
View on GitHub ↗
03
Machine Learning · Finance
Loan Approval Prediction
End-to-end ML pipeline on 20,000 loan applications with 36 features and severe class imbalance (76.1% rejected). Applied SMOTE on training data only to prevent leakage, compared 6 models — Logistic Regression, Decision Tree, Random Forest, SVM, KNN, and ANN — with GridSearchCV tuning, and applied SHAP to surface top predictors: CreditScore, AnnualIncome, and DebtToIncomeRatio.
6 Models · SHAP · GridSearchCV
PythonSMOTESHAPTensorFlowGridSearchCVScikit-learn
View on GitHub ↗
04
Data Mining · Algorithms
Frequent Itemset Mining
Implemented and benchmarked Brute Force (from scratch), Apriori, and FP-Growth on 5 retail datasets — Amazon, BestBuy, Walmart, Target, Kroger — with configurable support and confidence thresholds.
3 Algorithms · 5 Datasets
PythonmlxtendAprioriFP-Growth
View on GitHub ↗
05
AI · Knowledge Representation
Cluedo — AI Logical Deduction Agent
Complete Cluedo board game with an AI player powered by a custom KnowledgeBase. Uses process-of-elimination inference across refutation patterns and only makes an accusation when the solution is 100% certain. Supports 3–6 mixed human/AI players.
Full Deduction Engine · Mixed H/AI Play
PythonOOPKnowledge BaseLogical Inference
View on GitHub ↗
06
Big Data · Distributed Systems
Amazon Reviews Big Data Analysis
Built a 4-node Hadoop cluster on AWS EC2 (1 NameNode + 3 DataNodes, Hadoop 2.6.5) with HDFS. Developed a MapReduce job in Java to parse a 1.2 GB TSV dataset of Amazon video game reviews and compute star-rating distribution across 1.78 million records. Optimized throughput by tuning HDFS block size from 64 MB to 128 MB.
1.78M Records · 1.2 GB Dataset
Hadoop 2.6.5MapReduceJavaHDFSAWS EC2
View on GitHub ↗
07
Backend · DevOps · CI/CD
User Management System
FastAPI + PostgreSQL backend with JWT OAuth2 authentication, role-based access control (Admin, Manager, User), and profile completion tracking. Diagnosed and resolved 5 critical production bugs — DockerHub CI failures, unique constraint violations, routing 404s, nested transaction errors, and test mocking issues. Added 10 new edge-case tests (138 passing total) and built a full CI/CD pipeline with GitHub Actions and Docker.
138 Tests · JWT Auth · CI/CD
FastAPIPostgreSQLDockerGitHub ActionsSQLAlchemyPytest
View on GitHub ↗
08
Big Data · Distributed Systems · Finance
Cryptocurrency Market Analysis on Hadoop
Built three distributed MapReduce jobs in Java to analyze 2 GB of historical OHLCV tick data across 100+ cryptocurrency pairs (Binance, Apr–Aug 2024) on a multi-node AWS EC2 cluster. Jobs surface volatility rankings, worst-performing assets by open-to-close change, and cumulative volume leaders with peak timestamps.
100+ Crypto Pairs · 2 GB HDFS Dataset
JavaHadoopMapReduceHDFSAWS EC2
View on GitHub ↗

Certifications

Validated by the best.

DeepLearning.AI · Stanford University
Supervised Machine Learning: Regression and Classification
Taught by Andrew Ng. Covers linear & logistic regression, gradient descent, regularization, and neural network fundamentals.
February 2025
Verify ↗
Google Cloud · Coursera
Preparing for Google Cloud Certification: Cloud Network Engineer
4-course program: VPCs, hybrid connectivity, network services, security, and hands-on labs in Google Cloud architecture.
November 2022
Verify ↗
AWS Academy · EduSkills · AICTE
AWS Cloud Virtual Internship
10-week hands-on AWS cloud program covering core services, compute, storage, and deployment patterns. Supported by AWS Academy.
Oct – Dec 2021
Salesforce · SmartInternz · AICTE
Salesforce Administrator Virtual Internship
8-week program: Sales Cloud, Service Cloud, Flow Automation, Security, Dashboards. Earned 3 Superbadges including Security Specialist.
Aug – Oct 2022
Verify ↗
Palo Alto Networks · EduSkills · AICTE
Cybersecurity Virtual Internship
10-week Palo Alto Networks-supported program covering network security, threat prevention, and security operations fundamentals.
Mar – May 2022

Skills

My toolkit.

Core
proficiencies
Languages
Python · SQL · R · TypeScript
ML Frameworks
PyTorch · Hugging Face · TensorFlow · Scikit-learn
NLP & LLMs
spaCy · vLLM · Transformers · Tree-sitter · SHAP
Data Engineering
Spark · Hadoop · MapReduce · HDFS · Pandas · NumPy
Cloud
AWS (EC2, S3) · Google Cloud · Docker · GitHub Actions
Backend & Databases
FastAPI · REST APIs · PostgreSQL · SQLite · MySQL
AI / Machine Learning
PyTorch Hugging Face TensorFlow Scikit-learn spaCy NLP vLLM SHAP SMOTE
Data Engineering
Spark Hadoop MapReduce HDFS Pandas NumPy ETL Pipelines
Backend · Cloud · DevOps
FastAPI AWS Google Cloud Docker REST APIs GitHub Actions React
Analytics & Tools
Tableau Power BI Matplotlib Seaborn Git Linux Jupyter

Let's connect

Open to full-time opportunities.

Data Scientist or AI/ML Engineering roles — let's talk about what we can build together.

Harrison, NJ · F-1 OPT · Available Now