First IEEE Workshop on Coding for Machines
July 10, 2023, Brisbane, Australia
in conjunction with IEEE ICME 2023
Technical Program
8:30am - 9:00am Welcome and opening
9:00am - 10:00am Keynote lecture: Learnt compression for visual analytics on the edge (Prof. Yao Wang, NYU) [Slides]
10:00am - 10:30am Coffee break
10:30am - 12:00pm Tutorial: CompressAI and CompressAI-vision (Dr. Hyomin Choi, InterDigital) [Slides]
12:00pm - 1:30pm Lunch break
1:30pm - 3:00pm Session 1: Visual Coding for Machines
Rate-Controllable and Target-Dependent JPEG-Based Image Compression Using Feature Modulation [Slides]
Seongmoon Jeong, Kang Eun Jeon, Jong Hwan Ko (Sungkyunkwan University)
Stabilizing the Convolution Operations for Neural Network-based Image and Video Codecs for Machines
Honglei Zhang, Nam Le, Francesco Cricri, Hamed Rezazadegan Tavakoli (Nokia Technologies)
Feature-Guided Machine-Centric Image Coding for Downstream Tasks
Sangwoon Kwak (ETRI), Joungil Yun (ETRI), Hyon-Gon Choo (ETRI), Munchurl Kim (KAIST)
Flexible Variable-Rate Image Feature Compression for Edge-Cloud Systems
Md Adnan Faisal Hossain, Zhihao Duan, Yuning Huang, Fengqing Zhu (Purdue University)
3:00pm - 3:30pm Coffee break
3:30pm - 5:00pm Session 2: Coding for Humans and Machines
NARV: An Efficient Noise-Adaptive ResNet VAE for joint image compression and denoising
Yuning Huang, Zhihao Duan, Fengqing Zhu (Purdue University)
Conditional and Residual Methods in Scalable Coding for Humans and Machines [Slides]
Anderson de Andrade, Alon Harell, Yalda Foroutan, Ivan V. Bajić (Simon Fraser University)
VVC+M: Plug and Play Scalable Image Coding for Humans and Machines [Slides]
Alon Harell, Yalda Foroutan, Ivan V. Bajić (Simon Fraser University)
Face Restoration-Based Scalable Quality Coding for Video Conferencing
Hyomin Choi, Fabien Racapé, Simon Feltman (InterDigital)
Keynote lecture
Learnt compression for visual analytics on the edge
This talk will discuss how to compress images to optimize for visual analytics tasks such as object detection and image classification, and its application for edge-assisted visual computing. We will contrast two different approaches: image compression at the mobile followed by decompression and visual analytics at the server, vs. splitting the deep learning computing between the mobile and the server and compressing the intermediate analytics features. We illustrate effective approaches for compressing analytics features and end-to-end training to optimize the rate-analytics trade-off. We demonstrate that split computing not only yields superior rate-analytics performance but also substantially reduces the mobile computing time. We further describe how to generate and code the analytics features progressively, to facilitate adaptation to the bandwidth between the mobile and the edge, and the battery status at the mobile device.
Prof. Yao Wang
New York University
Yao Wang is a Professor at New York University Tandon School of Engineering, with a joint appointment in Departments of Electrical and Computer Engineering and Biomedical Engineering. She is also Associate Dean for Faculty Affairs for NYU Tandon since June 2019. Her research areas include video coding and streaming, multimedia signal processing, computer vision, and medical imaging. She is the leading author of a textbook titled Video Processing and Communications, and has published over 300 papers in journals and conference proceedings. She received New York City Mayor's Award for Excellence in Science and Technology in the Young Investigator Category in year 2000. She was elected Fellow of the IEEE in 2004 for contributions to video processing and communications. She received the IEEE Communications Society Leonard G. Abraham Prize Paper Award in the Field of Communications Systems in 2004, and the IEEE Communications Society Multimedia Communication Technical Committee Best Paper Award in 2011. She was a keynote speaker at the 2010 International Packet Video Workshop, INFOCOM Workshop on Contemporary Video in 2014, the 2018 Picture Coding Symposium, the 2020 ACM Multimedia Systems Conference (MMSys’20), and the 2022 Picture Coding Symposium. She received the NYU Tandon Distinguished Teacher Award in 2016.
CompressAI and CompressAI-Vision: A Tutorial
This tutorial is about two open-source libraries from InterDigital – CompressAI and CompressAI-Vision – with practical examples. In the first part, we will discuss CompressAI (https://interdigitalinc.github.io/CompressAI/), a widely used open-source library for learning-based compression in PyTorch. Practical details such as building an architecture, training, and testing a model will be discussed. A hands-on example will be demonstrated using a Jupyter Notebook. In the second part, we will introduce a relatively new open-source library called CompressAI-Vision (https://github.com/InterDigitalInc/CompressAI-Vision), whose purpose is to help researchers develop, test and evaluate compression models with standardized tests in the context of Coding for Machines. Various coding-for-machines pipelines will be categorized and discussed, including an overview of how this new library will reflect them down the road.
Dr. Hyomin Choi
InterDigital
Hyomin Choi received his Ph.D. degree in engineering science from Simon Fraser University, Burnaby, BC, Canada, in 2022. From 2012 to 2016, he was a Research Engineer at the System IC Research Center, LG Electronics, Seoul, Korea. Since 2021, he has been with the Emerging Technologies Lab, InterDigital, in Los Altos, CA, USA. His research interests encompass end-to-end learning-based image/video coding, video coding for machines, and machine learning with applications in multimedia processing. He has won a number of academic and research awards, including the 2017 Vanier Canada Graduate Scholarship, the 2022 MMSP Best Paper Honorable Mention Award, the 2023 Governor General's Gold Medal from Simon Fraser University, and the 2023 IEEE Transactions on Circuits and Systems for Video Technology Best Paper Award.
Workshop scope
Multimedia signals – speech, audio, images, video, point clouds, light fields, … – have traditionally been acquired, processed, and compressed for human use. However, it is estimated that in the near future, the majority of Internet connections will be machine-to-machine (M2M). So, increasingly, the data communicated across networks is primarily intended for automated machine analysis. Applications include remote monitoring, surveillance, and diagnostics, autonomous driving and navigation, smart homes / buildings / neighborhoods / cities, and so on. This necessitates rethinking of traditional compression and pre-/post-processing methods to facilitate efficient machine-based analysis of multimedia signals. As a result, standardization efforts such as MPEG VCM (Video Coding for Machines) and JPEG AI have been launched.
Both the theory and early design examples have shown that significant bit savings for a given inference accuracy are possible compared to traditional human-oriented coding approaches. However, a number of open issues remain. These include a thorough understanding of the tradeoffs involved in coding for machines, coding for multiple machine tasks, as well as combined human-machine use, model architectures, software and hardware optimization, error resilience, privacy, security, and others. The workshop is intended to bring together researchers from academia, industry, and government who are working on related problems, provide a snapshot of the current research and standardization efforts in the area, and generate ideas for future work. We welcome papers on the following and related topics:
Theories and frameworks for coding for machines
Standards related to coding for machines
Methods for feature compression
End-to-end approaches for coding for machines
Compression for human-and-machine use
Compressed-domain multimedia analysis (understanding, translation, classification, object detection, segmentation, pose estimation, etc.)
Compressed-domain multimedia processing (denoising, super-resolution, enhancement, …)
Datasets for coding for machines
Error resilience in coding for machines
Privacy and security in coding for machines
Important dates
Paper submission: 30-Mar-23
Acceptance notification: 24-Apr-23
Camera-ready papers: 1-May-23
Workshop date: 10-Jul-23
Organizers
Ying Liu, Santa Clara University, USA
Heming Sun, Waseda University, Japan
Hyomin Choi, InterDigital, USA
Fengqing Maggie Zhu, Purdue University, USA
Jiangtao Wen, Tsinghua University, China
Ivan V. Bajić, Simon Fraser University, Canada
Technical Program Committee
Balu Adsumilli, Google/YouTube, USA
Nilesh Ahuja, Intel Labs, USA
João Ascenso, Instituto Superior Técnico, Portugal
Zhihao Duan, Purdue University, USA
Yuxing (Erica) Han, Tsinghua University, China
Wei Jiang, Futurewei, USA
Hari Kalva, Florida Atlantic University, USA
André Kaup, Friedrich-Alexander University Erlangen-Nuremberg, Germany
Xiang Li, Google, USA
Weisi Lin, Nanyang Technological University, Singapore
Jiaying Liu, Peking University, China
Ambarish Natu, Australian Government
Saeed Ranjbar Alvar, Huawei, Canada
Donggyu Sim, Kwangwoon University, Korea
Shiqi Wang, City University of Hong Kong
Li Zhang, ByteDance, USA