Is Quality Enough : Integrating Energy Consumption in a Large-Scale Evaluation of Neural Audio Synthesis Models

Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

Recommended citation: C. Douwes, G. Bindi, A. Caillon, P. Esling and J. -P. Briot, "Is Quality Enough : Integrating Energy Consumption in a Large-Scale Evaluation of Neural Audio Synthesis Models," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10096975. https://ieeexplore.ieee.org/document/10096975

Deep learning models are now core components of modern audio synthesis, and their use has increased significantly in recent years, leading to highly accurate systems for multiple tasks. However, this quest for quality comes at a tremendous computational cost, which incurs vast energy consumption and greenhouse gas emissions. At the heart of this problem are the standardized evaluation metrics used by the scientific community to compare various contributions. In this paper, we suggest relying on a multi-objective metric based on Pareto optimality, which considers equally the accuracy and energy consumption of a model. By applying our measure to the current state-of-the-art in generative audio models, we show that it can drastically change the significance of the results. We hope to raise awareness on the need to more systematically investigate the energy efficiency of high-quality models, in order to place computational costs at the center of deep learning research priorities.

Download paper here

Code here