Research Work

CrackFusion: Crack Segmentation with Diffusion Models

Ongoing (To be submitted to ECCV 2024)

The paper introduces a novel approach using diffusion models to create accurate crack segmentation maps by leveraging original image data during reverse diffusion. A “RefineNet” model then ensures the generated maps at each timestep align topologically with actual crack structures.

Swin Transformer-Based Crack Detection

Submitted to IEEE TITS 2024 (In Review)

In this paper, we propose CrackSwinT, a novel crack detection approach, which employs the Shifted window Transformer (Swin-T) architecture, integrating Swin attention blocks and skip connections within encoders and decoders to capture crack details at multiple levels. Additionally, we present an enhanced Crack500 dataset with refined cracks.

Download/View here

Imitating Expert Behaviour with Optimal Transport Distances

Done as a research project under Prof. Michael Ryoo

In this report, we introduce an algorithm for reward annotation in offline Reinforcement Learning (RL) employing the Optimal Transport (OT) strategy based on Wasserstein distance. Leveraging OT, the algorithm calculates optimal alignments between unlabeled trajectories and expert demonstrations, treating the similarity measure as a reward label.

Download/View here

Shadow Removal using Diffusion Models

Done under Prof. Dimitris Samaras

This work explores the potential of DDPM for shadow removal tasks, where preserving hidden features is crucial. We built on a existing RePaint architecture by passing shadow information in the reverse diffusion gradually.

Download/View here

Perceptual Feature-Guided Denoising for Speech Enhancement

Bachelor Thesis Project (BTP) under Prof. P Balamurugan (IIT Bombay)

This work proposes an end-to-end deep learning methodology for speech enhancement, employing a fully convolutional neural network (FCN) guided by perceptual feature losses for generating clean audio from noisy inputs. The approach emphasizes the training of the Denoising network to preserve intricate details at multiple layers through another network, FeatureLoss Net.

Download/View here