Aditya Kumaran - Projects

PolitiQuack

PolitiQuack is a multifaceted interface that seeks to concentrate relevant, reliable data regarding the US 2020 Presidential Election. The features offered include a Fake News Detector, unique party-specific news and updates, FAQs and voter tips, as well as a customized state-wise representative contact map.

Check out our codebase, which includes more information about the project: https://github.com/Peter-Feng-32/PoliticalDashboard

Check out our Fake News Detection notebook: https://colab.research.google.com/drive/1rMUXdFfmRvFZb-CvWPSAvzNEUUhH05gb

Tech Defense

Tech Defense is a tower defense game built with Java and JavaFX, particularly using the FXML and FXGL libraries. Following the Agile Developmental Process, my team and I incorporated elements of procedural generation for random paths and multi-threading to run processes simultaneously.

Check out a demo video for our game: https://www.youtube.com/watch?v=4uMuu8ix6Vk

TaskPalette

Built a task to-do list dashboard using React, Javascript, and TypeScript. Individual tasks are assigned to a due date and can be associated with custom tags. Registered tasks can be sorted based on earliest due date, completed status, or both. HTML/CSS for style.

Check out my codebase: https://github.com/aditya-kumaran/fall2021-dev-takehome

NLP QA Classifier

Built, trained, and tested an NLP classifier to recognize intent of question-style input dataset and accordingly sort into classes. Used CountVectorizer from scikit-learn to vectorize individual words and map their frequency to classes. Modeled through LinearSVC. Analyzed through iris dataset and sklearn metrics. Further documentation available in pdf form in repository.

Accuracy according to sklearn and iris dataset = 75.54%

Check out my codebase: https://github.com/aditya-kumaran/vectorClassifier

MasterClassers

Created a custom website to share entertainment-centric movie reviews, classifying entries into a hierarchy of classes, different genres, and a score out of 10. Provided both spoiler-free and post-watch reviews for each entry. Added option for viewers to contact and request movies to be reviewed. Over 100 movies personally reviewed in the database, being published iteratively using a specialized Bash script.

Check out the website here: https://aditya-kumaran.github.io/MasterClassers/

Check out my codebase: https://github.com/aditya-kumaran/MasterClassers

EMADE

Used the Evolutionary Multi-objective Algorithm Design Engine to evaluate data sets using genetic algorithms and plot the pareto optimal individuals. Set up MySQL Server to run master process; further applied customized evaluation functions to individuals in a database, using Machine Learning algorithms as primitives. Worked with both the Modules and NLP sub-teams, specifically on visualization of pareto indivduals and autopsy of MOGP results relative to BiDAF ML models on the SQuAD dataset respectively. Interfaced with HPC through the PACE supercomputer for faster evolutionary runs and collaborative simulations.

Project files are confidential under student-honor NDA.

Sentiment Analysis of Hotel Reviews Dataset

Discerned the sentiment of hotel review text data with 515,000 reviews compiled from Booking.com concering 1493 luxury hotels across Europe, using various established NLP techniques. Performed data cleaning and pre-processing by removing stopwords, lemmatization, and using NLTK library’s Vader analysis. Feature selection added the original RelativeWordCount feature, and models used TF-IDF, Bag of Words, and Word2Vec embeddings for Multinomial Naive Bayes, RandomForest, SVM, and Logistic Regression to classify reviews according to positive or negative sentiment. Analyzed results with 6 metrics, performed hyperparameter tuning to fine-tune results.

Dataset: https://www.kaggle.com/datasets/jiashenliu/515k-hotel-reviews-data-in-europe

Project Report: https://github.gatech.edu/pages/jkim3663/cs4641-project.github.io/final.html

Good Words: Bad Words:

Project Video Summary: https://www.youtube.com/watch?v=OvUOEI-oxD4

Georgia Tech Student Government Association

Rebuilt legacy JacketPages application, used for submission of monetary bills and resolutions for the entire Georgia Tech student body; identified inoptimal features to be reimplemented and coordinated tasking. As Project Manager, recruited and managed 5-member team of undergraduate and graduate SWE developers. Worked with Directors to negotiate approval from Dean Stein to send survey emails to a student body listserv.

Deep Philharmonics - Music Generation

Created an Deep Learning music generation pipeline by tuning and re-training 4 open-source ML models to produce an original song given an input song (vocals and instrumental) and target voice sample. Fine-tuned Demucs for waveform separation, Spotify’s U-Net and attention-based BasicPitch for audio transcription, and Google’s transformer model for music generation. Calculated cross-correlation between original and output tracks using FFT convolutions as a similarity metric.

Project Report: https://drive.google.com/file/d/1R4FMGbaWnWtzGI37dMBMfv5RyDgHpGX2/view?usp=sharing

Generated samples can be found in the report (Evaluation section).

Dynamic Movie Recommendation Systems - Temporal GNNs

Implemented and evaluate Heterogeneous Graph Trans- formers (HGT) and Temporal Graph Network (TGN) models on the Movie- Lens100k dataset for temporal recommendations to support changing preferences and new users, performing necessary preprocessing steps. Propose new variant of temporal HGT to improve performance.

Project Report: https://drive.google.com/file/d/11sztp_fEgl3_SZQacRu1gy47rqio3y-M/view?usp=sharing

Check out my code: https://drive.google.com/drive/folders/15FVWOEq5oDMdwr_T4bk__HoQxbVutwQo?usp=sharing

Is Your Essay Ivy Material? - Essay Evaluation

Explored the ability of LLMs to classify college essays by university acceptance, trying to detect logical harmony in a generally abstracted task. Discovered that while the lack of a large data corpus does limit unsupervised clustering algorithms, there are valuable quantitative and qualitative motifs uncovered by LLMs on the supervised learning and in-context learning classification task respectively.

Examined the effectiveness of finetuning pretrained autoencoder models (BERT) with strong semantic understandings to identify commonalities between accepted essays, compared to baselines of RobertaForSequenceClassification, unsupervised clustering algorithms such as KMeans and Hierarchical Agglomerative Clustering, and a few-shot chain-of-thought prompting approach, which enables the language model to learn from a few gold-standard examples.

Project Report: https://drive.google.com/file/d/1tM0uKSgT5wxlhREY-FQz1DFWOft5UrgP/view?usp=sharing

Check out my code and data: https://drive.google.com/drive/folders/16EkzS_DfS1zUWHD1SNBaspUC4L9vTZnV?usp=drive_link

Check out my project video: https://www.youtube.com/watch?v=8VCB8Vb-ueg