ICPR-Compression-Workshop

 
Lyon

Workshop on The Art of Compressing LLMs: Pruning, Distillation, and Quantization Demystified (1st Edition)

Co-located with the 28th International Conference on Pattern Recognition (ICPR 2026)

Location: Lyon, France
Dates: August 21–22, 2026


Overview

As Large Language Models (LLMs) grow in complexity, the computational cost of inference and training has become a significant barrier for the Pattern Recognition and HPC communities.
This workshop explores the art and engineering of compressing these models to make them cheaper, faster, and easier to deploy while strictly preserving practical capability.

We move beyond pure theory to demonstrate how pruning and quantization interact with actual hardware constraints on the NVIDIA stack. Participants will engage with foundational concepts and advanced implementation patterns designed to maximize GPU utilization and reduce end-to-end time-to-solution.


Topics of Interest

We invite practitioners and researchers to explore a range of topics centered on model efficiency, including:


Workshop Goals


Format & Schedule

This is a Full-Day Workshop.
The agenda is modular and includes specialized lab sessions designed to provide hands-on experience with compression implementation.


Important Dates

Organisers

Lavinia Ghita

Lavinia Ghita

Lavinia is a Solutions Architect Manager at NVIDIA, leading the EMEA technical team for the Financial Services Industry (FSI). Her work focuses on large‑scale AI systems, with emphasis on distillation methods, domain‑adaptation techniques and model compression methods for efficient AI pipelines. Her research also covers foundation‑model architectures for time‑series analysis and the development of general methods for modelling structure and latent dynamics in temporal processes. Prior to NVIDIA, Lavinia worked on research and applications in this space at a quantitative hedge fund and in big tech. She holds advanced degrees in Mathematics from the École Polytechnique Fédérale de Lausanne (EPFL).


Liana Mikaelyan

Liana Mikaelyan

Liana Mikaelyan is a Senior Solutions Architect at NVIDIA, specializing in model compression as well as training and inference for efficient deployment of LLMs. Prior to joining NVIDIA, she worked as an applied scientist at Microsoft, where she developed methods for injecting knowledge into LLMs and enabling on-device LLM capabilities. With a background in mathematics, she holds an MSc in Machine Learning from University College London.


Sergio Perez

Sergio Perez

Sergio Perez is a Senior Solution Architect at NVIDIA specialised in training and inference of LLMs. He has experience both in research and industrial application of Generative AI and scientific computing, having completed his PhD at Imperial College London and worked as an AI engineer at Graphcore and Amazon. Sergio has contributed to deep learning courses in Coursera and NVIDIA’s Deep Learning Institute, reaching more than 50k students worldwide.


Harshita Seth

Harshita Seth

Harshita is a Senior Solution Architect at Nvidia, focusing on driving AI adoption among Global System Integrators. She has experience in building and deploying large language models, with a particular focus on applications in the audio domain. Prior to joining Nvidia, she was part of the Samsung Research team, where she contributed to the development of customized on-device AI applications for Samsung phones. In her current role, Harshita leverages her deep expertise to design cutting-edge AI solutions and empower partners to integrate advanced technologies seamlessly into their workflows.