I am a third year CS PhD student at the University of Michigan, advised by Justin Johnson. I work in computer vision, and my recent work focuses on using natural language supervision to solve vision tasks.

In summer 2021, I was a research intern at Facebook AI Research, working with Laurens van der Maaten and Ishan Misra. Prior to joining UMich, I spent a wonderful year as a visiting scholar at the Georgia Institute of Technology, working with the labs of Devi Parikh and Dhruv Batra.

I graduated from Indian Institute of Technology Roorkee in 2018, with a major in Electrical Engineering and minor in Computer Science. I was selected twice as a Google Summer of Code student, with TARDIS Foundation (2016) and OpenCV (2017). Parallel to my second GSoC, I interned at Goldman Sachs, Bangalore. I was also an active member of the Mobile Development Group IITR, and started a reading group, now named Vision and Language Group.

Feel free to say hi: kdexd at umich dot edu

What's New

[Aug 2021] My recent paper RedCaps: web-curated image-text data created by the people, for the people is accepted in the first NeurIPS 2021 Track on Datasets and Benchmarks!
[Jun 2021] Presented VirTex and CAST at CVPR 2021.
[Jun 2021] Recognized as an outstanding reviewer at CVPR 2021!
[May 2021] Started as a research intern at Facebook AI Research, New York (remote)!
[Mar 2021] Serving as a reviewer for ICCV and NeurIPS 2021.
[Dec 2020] Paper out on arxiv: CASTing Your Model: Learning to Localize Improves Self-Supervised Representations.
[Nov 2020] Serving as a reviewer for CVPR 2021.
[Jun 2020] Paper out on arxiv: VirTex: Learning Visual Representations from Textual Annotations.


CASTing Your Model: Learning to Localize Improves Self-Supervised Representations
Ramprasaath R. Selvaraju*, Karan Desai*, Justin Johnson, Nikhil Naik
cast model
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai and Justin Johnson
virtex model
nocaps: novel object captioning at scale
Harsh Agrawal*, Karan Desai*, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson
nocaps dataset

Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering
Ramakrishna Vedantam, Karan Desai, Stefan Lee, Marcus Rohrbach, Dhruv Batra, Devi Parikh
probnmn model

Open Source

Starter code for the Visual Dialog Challenge. Built using PyTorch v1.0 and provides out of the box support with CUDA 9 and CuDNN 7. Provides a simple implementation of Late Fusion encoder and Discriminative decoder. Complete with efficient scripts for data preprocessing, image feature extraction, training and evaluation, along with support to generate a submission file for the challenge.
Implementation of the paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" by Kottur et al (EMNLP 2017), using PyTorch and ParlAI.
Implementation of Neural Turing Machines introduced in Graves et al (2014), using PyTorch. Supports training and evaluation on four out of six tasks described in the paper.
Trianglify is a highly customizable library to generate beautiful triangle art views for android. Uses the Delaunay Triangulation algorithm behind the curtains.
Yolog wraps over vanilla git log and displays commit history complete with graph, timestamp, author and refs. Colors are configurable, and all standard git log commands work.

First projects

These are my humble beginnings.

My first neural network using numpy (2015), a multi layer perceptron classifier for MNIST. This repository stayed on Github trending charts for almost a week, with 700+ stars.
First github repository (2015), browser based game of snake. During my first year of undergrad studies. The game still works on gh-pages!