Useful Resource-constrained Picture Technology And Visible Understanding: An Interview With Aniket Roy

Within the newest in our sequence of interviews assembly the AAAI/SIGAI Doctoral Consortium members, we caught up with Aniket Roy to seek out out extra about his analysis on generative fashions for laptop imaginative and prescient duties.

Inform us a bit about your PhD – the place did you examine, and what was the subject of your analysis?

I not too long ago accomplished my PhD in Pc Science at Johns Hopkins College, the place I labored below the supervision of Bloomberg Distinguished Professor Rama Chellappa. My analysis primarily targeted on growing strategies for resource-constrained picture technology and visible understanding. Specifically, I explored how trendy generative fashions could be tailored to function effectively whereas sustaining robust efficiency.

Throughout my PhD, I labored broadly on the intersection of generative AI, multimodal studying, and few-shot studying. A lot of my work concerned designing strategies that allow fashions to study new ideas or carry out complicated visible duties with restricted knowledge or computational assets. This included analysis on diffusion fashions, personalised picture technology, and multimodal illustration studying. Total, my work goals to make superior imaginative and prescient and generative AI methods extra adaptable, environment friendly, and sensible for real-world purposes.

Might you give us an outline of the analysis you carried out throughout your PhD?

Throughout my PhD, my analysis broadly targeted on enhancing the adaptability, effectivity, and high quality of contemporary generative fashions for laptop imaginative and prescient duties. The fast progress in generative AI–significantly diffusion fashions and imaginative and prescient–language fashions–has created new alternatives to handle long-standing challenges resembling knowledge shortage, controllable technology, and personalised picture synthesis. My work aimed to develop strategies that permit these massive fashions to adapt successfully with restricted knowledge and computational assets whereas sustaining excessive visible constancy.

One line of my analysis addressed studying in data-constrained settings. For instance, I proposed FeLMi, a few-shot studying framework that leverages uncertainty-guided arduous mixup methods to enhance robustness and generalization when solely a small variety of labeled samples can be found. Constructing on this concept of enhancing coaching knowledge high quality, I additionally developed Cap2Aug, which introduces caption-guided multimodal augmentation. This method makes use of textual descriptions to information artificial picture technology, enhancing visible range whereas lowering the area hole between actual and generated knowledge.

Overview of Cap2Aug.

One other side of my analysis targeted on enhancing the perceptual high quality of pictures generated by diffusion fashions. On this route, I proposed DiffNat, a plug-and-play regularization technique primarily based on the kurtosis-concentration property noticed in pure pictures. By incorporating this precept into diffusion fashions via a KC loss, the generated pictures exhibit extra pure texture statistics and improved perceptual realism, which additionally advantages downstream imaginative and prescient duties.

A serious a part of my work explored personalization and environment friendly adaptation of enormous generative fashions. I launched DuoLoRA, a parameter-efficient framework for composing low-rank adapters that allows fine-grained management over content material and elegance with out requiring full retraining of the bottom mannequin. I additional prolonged personalization to zero-shot settings utilizing a training-free textual inversion method that permits arbitrary objects to be custom-made straight throughout technology. Lastly, I proposed MultiLFG, a frequency-guided multi-LoRA composition framework that makes use of wavelet-domain representations and timestep-aware weighting to allow correct and training-free fusion of a number of ideas in diffusion fashions.

Overview of DuoLoRA.

Total, my analysis contributes towards constructing generative methods which might be extra environment friendly, adaptable, and controllable, enabling high-quality picture technology and understanding even in data-limited or resource-constrained situations.

Was there a particular undertaking or a side of your analysis that was significantly attention-grabbing?

One undertaking that I discovered significantly attention-grabbing throughout my PhD is DiffNat, which was printed in TMLR 2025. Diffusion fashions have turn into the spine of many trendy generative AI methods and have achieved spectacular ends in producing and enhancing sensible pictures. Nonetheless, enhancing the perceptual high quality and naturalness of generated pictures stays an necessary problem.

Overview of DiffNat.

On this work, we launched a easy however efficient regularization method referred to as the kurtosis focus (KC) loss, which could be built-in into customary diffusion mannequin pipelines as a plug-and-play element. The concept was impressed by a statistical property of pure pictures: when a picture is decomposed into completely different band-pass filtered variations–for instance utilizing the Discrete Wavelet Remodel–the kurtosis values throughout these frequency bands are typically comparatively constant. In distinction, generated pictures typically present massive discrepancies throughout these bands. Our technique reduces the hole between the very best and lowest kurtosis values throughout the frequency elements, encouraging the generated pictures to comply with extra pure picture statistics.

As well as, we launched a condition-agnostic perceptual steerage technique throughout inference that additional improves picture constancy with out requiring extra coaching indicators. We evaluated the method throughout a number of various duties, together with personalised few-shot finetuning with textual content steerage, unconditional picture technology, picture super-resolution, and blind face restoration. Throughout these duties, incorporating the KC loss and perceptual steerage persistently improved perceptual high quality, measured via metrics resembling FID and MUSIQ, in addition to via human analysis.

What I significantly preferred about this undertaking is that it connects classical picture statistics with trendy diffusion fashions. It reveals that comparatively easy statistical insights about pure pictures can nonetheless play a robust position in enhancing massive generative fashions.

What are your plans for constructing on the PhD – the place are you working now and what is going to you be investigating subsequent?

Throughout my PhD, I found that I genuinely benefit from the technique of analysis–particularly the second when an instinct or concept seems to work in observe. That technique of exploring new concepts and pushing the boundaries of what we all know is one thing I discover very motivating.

To proceed pursuing this, I will probably be becoming a member of NEC Laboratories America as a Analysis Scientist. On this position, I hope to construct on my PhD work by growing new strategies for generative fashions and exploring how these fashions can work together with broader multimodal methods. Specifically, I’m occupied with advancing analysis on the intersection of generative fashions, imaginative and prescient–language–motion fashions, and embodied AI. Extra broadly, my objective is to contribute to the event of clever methods that may perceive, generate, and work together with the visible world extra successfully, whereas additionally persevering with to push ahead the scientific understanding of those fashions.

I’m occupied with how you bought into the sector. What impressed you to review laptop imaginative and prescient and machine studying?

My curiosity in laptop imaginative and prescient and machine studying began throughout my undergraduate research, once I took programs in sign processing and picture processing. I discovered these topics significantly fascinating as a result of they allowed you to experiment with algorithms and instantly see their results on pictures. That visible and intuitive side made the sector very partaking, and it helped me respect how mathematical ideas can straight translate into significant visible outcomes.

On the similar time, I used to be additionally interested in how the human mind processes visible data—how we’re in a position to acknowledge objects, perceive scenes, and interpret complicated visible indicators so effortlessly. That curiosity led me to wonder if we may design computational fashions that mimic facets of human notion and allow machines to know visible knowledge in the same means.

A serious affect throughout this time was my professor, Dr. Kuntal Ghosh, who inspired me to suppose extra deeply about these issues and method them with a scientific mindset. His mentorship performed an necessary position in shaping my curiosity in analysis. Since then, that curiosity about visible notion and clever methods has continued to drive my work in laptop imaginative and prescient and machine studying.

What was your expertise of the Doctoral Consortium at AAAI?

Sadly, I used to be not in a position to attend the AAAI Doctoral Consortium in particular person on account of visa-related points. Nonetheless, a colleague kindly helped current my poster on my behalf in the course of the occasion. Regardless that I couldn’t be there bodily, I used to be very inspired by the response my work obtained. A number of researchers reached out to me after seeing the poster, and we had some very insightful discussions concerning the concepts and potential future instructions of the analysis. In that sense, I nonetheless discovered the expertise fairly rewarding. The Doctoral Consortium is a superb platform for sharing early-stage concepts, receiving suggestions from the neighborhood, and connecting with different researchers engaged on associated issues. I appreciated the chance to interact with individuals who have been within the work, and people interactions helped spark new views and collaborations.

Might you inform us an attention-grabbing (non-AI associated) truth about you?

Outdoors of analysis, I’m a giant fan of music and stand-up comedy, and I actually get pleasure from touring every time I get the possibility. Exploring new locations, cultures, and views is one thing I discover refreshing—it’s a good way to recharge and keep curious concerning the world past work. I additionally get pleasure from writing poetic satire infrequently, and I often carry out it. It’s a enjoyable artistic outlet that permits me to combine humor and storytelling, which is kind of completely different from the analytical nature of the analysis work I normally do.

About Aniket Roy

Aniket is at the moment a Analysis Scientist at NEC Labs America. He obtained his PhD from the Pc Science dept at Johns Hopkins College below the steerage of Bloomberg Distinguished Professor Prof. Rama Chellappa. Previous to that, he did a Grasp’s from Indian Institute of Know-how Kharagpur. He was acknowledged with the Finest Paper Award at IWDW 2016 and the Markose Thomas Memorial Award for one of the best analysis paper on the Grasp’s degree. Throughout PhD, he explored domains of few-shot studying, multimodal studying, diffusion fashions, LLMs, LoRA merging with publications in main venues resembling NeurIPS, ICCV, TMLR, WACV, CVPR and likewise 3 US patents filed. Throughout his PhD, he additionally gained industrial expertise via a number of internships in Amazon, Qualcomm, MERL, and SRI Worldwide. He was awarded as an Amazon Fellow (2023-24) at JHU and chosen to take part in ICCV’25 and AAAI’26 doctoral consortium.

AIhub square 2021

AIhub
is a non-profit devoted to connecting the AI neighborhood to the general public by offering free, high-quality data in AI.

AIhub square 2021

AIhub
is a non-profit devoted to connecting the AI neighborhood to the general public by offering free, high-quality data in AI.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s developments in the present day: learn extra, subscribe to our e-newsletter, and turn into a part of the NextTech neighborhood at NextTech-news.com

What's Hot

PM Narendra Modi makes the case for 33% reservation to unlock Nari Shakti

Sigmoid vs ReLU Activation Capabilities: The Inference Price of Dropping Geometric Context

Meta reenters the AI house with new ‘Muse Spark’ mannequin

Useful resource-constrained picture technology and visible understanding: an interview with Aniket Roy

Generative AI improves a wi-fi imaginative and prescient system that sees via obstructions

Generative AI improves a wi-fi imaginative and prescient system that sees via obstructions

How meals producers in Europe are automating palletizing with out including headcount

PM Narendra Modi makes the case for 33% reservation to unlock Nari Shakti

Sigmoid vs ReLU Activation Capabilities: The Inference Price of Dropping Geometric Context

Meta reenters the AI house with new ‘Muse Spark’ mannequin

PM Narendra Modi makes the case for 33% reservation to unlock Nari Shakti

Sigmoid vs ReLU Activation Capabilities: The Inference Price of Dropping Geometric Context

Meta reenters the AI house with new ‘Muse Spark’ mannequin

What's Hot

Useful resource-constrained picture technology and visible understanding: an interview with Aniket Roy

Inform us a bit about your PhD – the place did you examine, and what was the subject of your analysis?

Might you give us an outline of the analysis you carried out throughout your PhD?

Was there a particular undertaking or a side of your analysis that was significantly attention-grabbing?

What are your plans for constructing on the PhD – the place are you working now and what is going to you be investigating subsequent?

I’m occupied with how you bought into the sector. What impressed you to review laptop imaginative and prescient and machine studying?

What was your expertise of the Doctoral Consortium at AAAI?

Might you inform us an attention-grabbing (non-AI associated) truth about you?

About Aniket Roy

Related Posts

Subscribe For Latest Updates