Second IEEE Workshop on Coding for Machines

July 15, 2024, Niagara Falls, Canada

in conjunction with IEEE ICME 2024 

Technical Program

1:00pm - 1:05pm Welcome and opening

1:05pm - 2:00pm Keynote lecture: Enabling Collaborative Intelligence in Dynamic Environments via Learnt Compression  [Slides]
Dr. Nilesh A. Ahuja, Intel Labs

2:00pm - 2:05pm Break

2:05pm - 3:00pm Invited lecture:  Visual Data Compression in the AI Era   [Slides]
Prof. Fengqing Maggie Zhu, Purdue University

3:00pm - 3:20pm Break

3:20pm - 5:00pm Technical session: Visual Coding for Machines

Region-Of-Interest-Based Video Coding for Machines  [Slides]
O. Stankiewicz, T. Grajek, S. Maćkowiak, J. Stankowski, S. Różek, M. Lorkiewicz, M. Wawrzyniak, M. Domański (Poznan University of Technology)

Compression Without Compromise: Optimizing Point Cloud Object Detection With Bottleneck Architectures for Split Computing   [Slides]  [paper]
N. A. Ahuja, O. Tickoo, and V. Kashyap (Intel Labs)

AFC: Asymmetrical Feature Coding for Multi-Task Machine Intelligence   [Slides]  [paper]
Y. Zhang, H. Wang, Y. Li (China Telecom) and L. Yu (Zhejiang University)

Towards Task-Compatible Compressible Representations   [Slides]  [arXiv]
A. de Andrade and I. V. Bajić (Simon Fraser University)

Compressive Feature Selection for Remote Visual Multi-Task Inference  [Slides]  [arXiv]
S. Ranjbar Alvar and I. V. Bajić (Simon Fraser University)

Keynote lecture

Enabling Collaborative Intelligence in Dynamic Environments via Learnt Compression

Edge computing enables real-time analysis and decision making close to source of data. Low-power IoT and client devices can leverage the power of the edge by streaming data to a nearby edge server where AI-based analytics can be deployed. Split-computing is an emerging paradigm for such usages wherein deep neural network (DNN) or other AI models are partitioned into a client-side front-end and a server-side back-end . Task-specific intermediate representations are compressed via end-to-end learned compression and transmitted from the front-end to the back-end. This has been shown to achieve far superior rate-accuracy performance compared to compressing and transmitting raw data using standard image or video codecs. In most split-computing approaches, however, the parameters of the DNN need to be retrained if either the compression level or the split-point needs to changed. This is a serious limitation for deploying in realistic environments where the operating network and platform conditions change dynamically. We will present methods to design lightweight, rate-distortion optimized, trainable neural network layers commonly known as 'bottleneck units' that perform compression of DNN features.  These can be inserted at any split point of the DNN without modifying its original weights. We demonstrate on a variety of image analytic tasks that this approach achieves state-of-the-art performance and enables adaptivity required in dynamic operating conditions. We also extend this approach to video analytics by introducing flow-based prediction in the feature space. This further improves the rate-accuracy performance by exploiting temporal correlations inherent in video data, while simultaneously reducing compute complexity. Finally, we present early results of extending our approach to 3D visual AI with point-cloud data.

Dr. Nilesh A. Ahuja

Intel Labs

Nilesh A. Ahuja is a AI Research Scientist in Intel Labs.  His current research focus is in the area of adaptive AI systems for the Edge. This includes development of efficient and reliable methods for uncertainty estimation in AI systems; its applications to real-world problems such out-of-distribution detection for industrial anomaly detection and novelty detection for continual learning systems; and efficient and adaptive deployments on Edge systems via split computing. His other research interests include 3D computer vision; odometry and SLAM; super-resolution; image, and video processing; and AI methods for video compression. He received his Ph.D. degree in Electrical Engineering from the Pennsylvania State University in 2008. His work has been published in top-tier journals and conferences, and he has over 20 issued or pending US and international patents.

Invited lecture

Visual Data Compression in the AI Era

This talk will delve into the evolution of visual data compression in recent years, spotlighting how compression techniques and AI models are integrated together. Traditionally, image and video codecs such as JPEG, HEVC, and AV1 are designed primarily for accurate pixel reconstruction. However, the advancement of AI technologies has begun transforming these frameworks to meet modern application demands. 

This talk will discuss such shift through three lenses:

Prof. Fengqing Maggie Zhu

Purdue University 

Fengqing Maggie Zhu is an Associate Professor of the Elmore Family School of Electrical and Computer Engineering at Purdue University, West Lafayette, Indiana. Dr. Zhu received the B.S.E.E. (with highest distinction), M.S. and Ph.D. degrees in Electrical and Computer Engineering from Purdue University in 2004, 2006 and 2011, respectively. Prior to joining Purdue in 2015, she was a Staff Researcher at Futurewei Technologies, where she received a Certification of Recognition for Core Technology Contribution in 2012. She is the recipient of an NSF CISE Research Initiation Initiative (CRII) award in 2017, a Google Faculty Research Award in 2019, and an ESI and trainee poster award for the NIH Precision Nutrition workshop in 2021. Her research interests include smart health with a focus on image-based dietary assessment and wearable sensor data analysis, visual coding for machines, and application-driven visual data analytics. Dr. Zhu is a senior member of the IEEE. She is the associate editor for the IEEE Transactions on Circuits and Systems for Video Technology and serves as the IEEE Multimedia Signal Processing Technical Committee award subcommittee chair. She has served on the organizing and program committees of major conferences in her field and received recognition such as the Outstanding Area Chair for ICME 2021.

Workshop scope

Multimedia signals – speech, audio, images, video, point clouds, light fields, … – have traditionally been acquired, processed, and compressed for human use. However, it is estimated that in the near future, the majority of Internet connections will be machine-to-machine (M2M). So, increasingly, the data communicated across networks is primarily intended for automated machine analysis. Applications include remote monitoring, surveillance, and diagnostics, autonomous driving and navigation, smart homes / buildings / neighborhoods / cities, and so on. This necessitates rethinking of traditional compression and pre-/post-processing methods to facilitate efficient machine-based analysis of multimedia signals. As a result, standardization efforts such as MPEG VCM (Video Coding for Machines), MPEG FCM (Feature Coding for Machines) and JPEG AI have been launched.

Both the theory and early design examples have shown that significant bit savings for a given inference accuracy are possible compared to traditional human-oriented coding approaches. However, a number of open issues remain. These include a thorough understanding of the tradeoffs involved in coding for machines, coding for multiple machine tasks, as well as combined human-machine use, model architectures, software and hardware optimization, error resilience, privacy, security, and others. The workshop is intended to bring together researchers from academia, industry, and government who are working on related problems, provide a snapshot of the current research and standardization efforts in the area, and generate ideas for future work. We welcome papers on the following and related topics:

Important dates

Paper submission: 6 Apr 2024

Acceptance notification: 2 May 2024

Camera-ready papers: 16 May 2024

Workshop date: 15 July 2024


Fengqing Maggie Zhu, Purdue University, USA

Heming Sun, Yokohama National University, Japan

Hyomin Choi, InterDigital, USA

Ivan V. Bajić, Simon Fraser University, Canada

Technical Program Committee

Balu Adsumilli, Google/YouTube, USA

Nilesh Ahuja, Intel Labs, USA

João Ascenso, Instituto Superior Técnico, Portugal

Zhihao Duan, Purdue University, USA

Yuxing (Erica) Han, Tsinghua University, China

Wei Jiang, Futurewei, USA

Hari Kalva, Florida Atlantic University, USA

André Kaup, Friedrich-Alexander University Erlangen-Nuremberg, Germany

Xiang Li, Google, USA

Weisi Lin, Nanyang Technological University, Singapore

Jiaying Liu, Peking University, China

Saeed Ranjbar Alvar, Huawei, Canada

Shiqi Wang, City University of Hong Kong

Shurun Wang, Alibaba DAMO Academy, China

Li Zhang, ByteDance, USA