Posts by Collection

portfolio

projects

Estimating 3D trajectory using feature-based sparse SLAM

A class project for CS 763 (Computer Vision), Spring '19, IIT Bombay

The presented algorithm revolves around feature-based sparse SLAM, called MonoSLAM, for recovering a camera’s 3D trajectory. Key steps include feature extraction, distance-based feature matching, essential matrix estimation, pose estimation, world coordinate computation, and visualization. Code link here

Extraction of Facial Features from Speech

A class project for CS 753 (Automatic Speech Recognition), Fall '19, IIT Bombay

The project aimed to infer a person’s appearance from their voice using a deep neural network trained on YouTube videos. The network encodes speech into a face feature and then decodes it into a canonical face image.

Download/View here

Frequency based Adversarial Patch Localization

Work done as an Intern at SRI International, Summer '23

This study introduces a frequency-based adversarial patch detection method using SAM segmentation and SVM classification on image segments. Through independent analyses of DFT, FFT, and entropy, our approach proves effective in reliably identifying adversarial patches. View the code here

Active Contour Without Edges using Chan-Vese

A class project for CSE 577 (Medical Imaging), Spring '23, Stony Brook University

This project explores the Chan-Vese model, initially designed for two-phase segmentation and grayscale images in the 1970s. It highlights the model’s adaptability to handle 3-D images and its extension to multi-phase segmentation, demonstrating its dynamic forces for boundary movement and showcasing its efficacy through practical examples.

Download/View here

Bypassing NSFW Gatekeepers

A class project for CSE 509 (Network Security) Fall '23, Stony Brook University

This project delves into the vulnerabilities of NSFW detectors on social media platforms. We employed a systematic black-box attack methodology, leveraging Grad-CAM-generated heatmaps, the study exposes weaknesses in existing detectors, offering insights into the robustness of content moderation systems.

Download/View here

research

Perceptual Feature-Guided Denoising for Speech Enhancement

Bachelor Thesis Project (BTP) under Prof. P Balamurugan (IIT Bombay)

This work proposes an end-to-end deep learning methodology for speech enhancement, employing a fully convolutional neural network (FCN) guided by perceptual feature losses for generating clean audio from noisy inputs. The approach emphasizes the training of the Denoising network to preserve intricate details at multiple layers through another network, FeatureLoss Net.

Download/View here

Shadow Removal using Diffusion Models

Done under Prof. Dimitris Samaras

This work explores the potential of DDPM for shadow removal tasks, where preserving hidden features is crucial. We built on a existing RePaint architecture by passing shadow information in the reverse diffusion gradually.

Download/View here

Imitating Expert Behaviour with Optimal Transport Distances

Done as a research project under Prof. Michael Ryoo

In this report, we introduce an algorithm for reward annotation in offline Reinforcement Learning (RL) employing the Optimal Transport (OT) strategy based on Wasserstein distance. Leveraging OT, the algorithm calculates optimal alignments between unlabeled trajectories and expert demonstrations, treating the similarity measure as a reward label.

Download/View here

Swin Transformer-Based Crack Detection

Submitted to IEEE TITS 2024 (In Review)

In this paper, we propose CrackSwinT, a novel crack detection approach, which employs the Shifted window Transformer (Swin-T) architecture, integrating Swin attention blocks and skip connections within encoders and decoders to capture crack details at multiple levels. Additionally, we present an enhanced Crack500 dataset with refined cracks.

Download/View here

CrackFusion: Crack Segmentation with Diffusion Models

Ongoing (To be submitted to ECCV 2024)

The paper introduces a novel approach using diffusion models to create accurate crack segmentation maps by leveraging original image data during reverse diffusion. A “RefineNet” model then ensures the generated maps at each timestep align topologically with actual crack structures.

talks

teaching