First IEEE Workshop on Coding for Machines

July 10, 2023, Brisbane, Australia

in conjunction with IEEE ICME 2023

Technical Program

8:30am - 9:00am Welcome and opening

9:00am - 10:00am Keynote lecture: Learnt compression for visual analytics on the edge (Prof. Yao Wang, NYU) [Slides]

10:00am - 10:30am Coffee break

10:30am - 12:00pm Tutorial: CompressAI and CompressAI-vision (Dr. Hyomin Choi, InterDigital) [Slides]

12:00pm - 1:30pm Lunch break

1:30pm - 3:00pm Session 1: Visual Coding for Machines

Rate-Controllable and Target-Dependent JPEG-Based Image Compression Using Feature Modulation [Slides]
Seongmoon Jeong, Kang Eun Jeon, Jong Hwan Ko (Sungkyunkwan University)

Stabilizing the Convolution Operations for Neural Network-based Image and Video Codecs for Machines
Honglei Zhang, Nam Le, Francesco Cricri, Hamed Rezazadegan Tavakoli (Nokia Technologies)

Feature-Guided Machine-Centric Image Coding for Downstream Tasks
Sangwoon Kwak (ETRI), Joungil Yun (ETRI), Hyon-Gon Choo (ETRI), Munchurl Kim (KAIST)

Flexible Variable-Rate Image Feature Compression for Edge-Cloud Systems
Md Adnan Faisal Hossain, Zhihao Duan, Yuning Huang, Fengqing Zhu (Purdue University)

3:00pm - 3:30pm Coffee break

3:30pm - 5:00pm Session 2: Coding for Humans and Machines

NARV: An Efficient Noise-Adaptive ResNet VAE for joint image compression and denoising
Yuning Huang, Zhihao Duan, Fengqing Zhu (Purdue University)

Conditional and Residual Methods in Scalable Coding for Humans and Machines [Slides]
Anderson de Andrade, Alon Harell, Yalda Foroutan, Ivan V. Bajić (Simon Fraser University)

VVC+M: Plug and Play Scalable Image Coding for Humans and Machines [Slides]
Alon Harell, Yalda Foroutan, Ivan V. Bajić (Simon Fraser University)

Face Restoration-Based Scalable Quality Coding for Video Conferencing
Hyomin Choi, Fabien Racapé, Simon Feltman (InterDigital)

Keynote lecture

Learnt compression for visual analytics on the edge

This talk will discuss how to compress images to optimize for visual analytics tasks such as object detection and image classification, and its application for edge-assisted visual computing. We will contrast two different approaches: image compression at the mobile followed by decompression and visual analytics at the server, vs. splitting the deep learning computing between the mobile and the server and compressing the intermediate analytics features. We illustrate effective approaches for compressing analytics features and end-to-end training to optimize the rate-analytics trade-off. We demonstrate that split computing not only yields superior rate-analytics performance but also substantially reduces the mobile computing time. We further describe how to generate and code the analytics features progressively, to facilitate adaptation to the bandwidth between the mobile and the edge, and the battery status at the mobile device.

Prof. Yao Wang
New York University

Yao Wang is a Professor at New York University Tandon School of Engineering, with a joint appointment in Departments of Electrical and Computer Engineering and Biomedical Engineering. She is also Associate Dean for Faculty Affairs for NYU Tandon since June 2019. Her research areas include video coding and streaming, multimedia signal processing, computer vision, and medical imaging. She is the leading author of a textbook titled Video Processing and Communications, and has published over 300 papers in journals and conference proceedings. She received New York City Mayor's Award for Excellence in Science and Technology in the Young Investigator Category in year 2000. She was elected Fellow of the IEEE in 2004 for contributions to video processing and communications. She received the IEEE Communications Society Leonard G. Abraham Prize Paper Award in the Field of Communications Systems in 2004, and the IEEE Communications Society Multimedia Communication Technical Committee Best Paper Award in 2011. She was a keynote speaker at the 2010 International Packet Video Workshop, INFOCOM Workshop on Contemporary Video in 2014, the 2018 Picture Coding Symposium, the 2020 ACM Multimedia Systems Conference (MMSys’20), and the 2022 Picture Coding Symposium. She received the NYU Tandon Distinguished Teacher Award in 2016.

CompressAI and CompressAI-Vision: A Tutorial

This tutorial is about two open-source libraries from InterDigital – CompressAI and CompressAI-Vision – with practical examples. In the first part, we will discuss CompressAI (https://interdigitalinc.github.io/CompressAI/), a widely used open-source library for learning-based compression in PyTorch. Practical details such as building an architecture, training, and testing a model will be discussed. A hands-on example will be demonstrated using a Jupyter Notebook. In the second part, we will introduce a relatively new open-source library called CompressAI-Vision (https://github.com/InterDigitalInc/CompressAI-Vision), whose purpose is to help researchers develop, test and evaluate compression models with standardized tests in the context of Coding for Machines. Various coding-for-machines pipelines will be categorized and discussed, including an overview of how this new library will reflect them down the road.

Dr. Hyomin Choi

InterDigital

Hyomin Choi received his Ph.D. degree in engineering science from Simon Fraser University, Burnaby, BC, Canada, in 2022. From 2012 to 2016, he was a Research Engineer at the System IC Research Center, LG Electronics, Seoul, Korea. Since 2021, he has been with the Emerging Technologies Lab, InterDigital, in Los Altos, CA, USA. His research interests encompass end-to-end learning-based image/video coding, video coding for machines, and machine learning with applications in multimedia processing. He has won a number of academic and research awards, including the 2017 Vanier Canada Graduate Scholarship, the 2022 MMSP Best Paper Honorable Mention Award, the 2023 Governor General's Gold Medal from Simon Fraser University, and the 2023 IEEE Transactions on Circuits and Systems for Video Technology Best Paper Award.

Workshop scope

Multimedia signals – speech, audio, images, video, point clouds, light fields, … – have traditionally been acquired, processed, and compressed for human use. However, it is estimated that in the near future, the majority of Internet connections will be machine-to-machine (M2M). So, increasingly, the data communicated across networks is primarily intended for automated machine analysis. Applications include remote monitoring, surveillance, and diagnostics, autonomous driving and navigation, smart homes / buildings / neighborhoods / cities, and so on. This necessitates rethinking of traditional compression and pre-/post-processing methods to facilitate efficient machine-based analysis of multimedia signals. As a result, standardization efforts such as MPEG VCM (Video Coding for Machines) and JPEG AI have been launched.

Both the theory and early design examples have shown that significant bit savings for a given inference accuracy are possible compared to traditional human-oriented coding approaches. However, a number of open issues remain. These include a thorough understanding of the tradeoffs involved in coding for machines, coding for multiple machine tasks, as well as combined human-machine use, model architectures, software and hardware optimization, error resilience, privacy, security, and others. The workshop is intended to bring together researchers from academia, industry, and government who are working on related problems, provide a snapshot of the current research and standardization efforts in the area, and generate ideas for future work. We welcome papers on the following and related topics:

Theories and frameworks for coding for machines
Standards related to coding for machines
Methods for feature compression
End-to-end approaches for coding for machines
Compression for human-and-machine use
Compressed-domain multimedia analysis (understanding, translation, classification, object detection, segmentation, pose estimation, etc.)
Compressed-domain multimedia processing (denoising, super-resolution, enhancement, …)
Datasets for coding for machines
Error resilience in coding for machines
Privacy and security in coding for machines

Important dates

Paper submission: 30-Mar-23

Acceptance notification: 24-Apr-23

Camera-ready papers: 1-May-23

Workshop date: 10-Jul-23

Organizers

Ying Liu, Santa Clara University, USA

Heming Sun, Waseda University, Japan

Hyomin Choi, InterDigital, USA

Fengqing Maggie Zhu, Purdue University, USA

Jiangtao Wen, Tsinghua University, China

Ivan V. Bajić, Simon Fraser University, Canada

Technical Program Committee

Balu Adsumilli, Google/YouTube, USA

Nilesh Ahuja, Intel Labs, USA

João Ascenso, Instituto Superior Técnico, Portugal

Zhihao Duan, Purdue University, USA

Yuxing (Erica) Han, Tsinghua University, China

Wei Jiang, Futurewei, USA

Hari Kalva, Florida Atlantic University, USA

André Kaup, Friedrich-Alexander University Erlangen-Nuremberg, Germany

Xiang Li, Google, USA

Weisi Lin, Nanyang Technological University, Singapore

Jiaying Liu, Peking University, China

Ambarish Natu, Australian Government

Saeed Ranjbar Alvar, Huawei, Canada

Donggyu Sim, Kwangwoon University, Korea

Shiqi Wang, City University of Hong Kong

Li Zhang, ByteDance, USA

Page updated

Google Sites

Report abuse