Visual onoma-to-wave
accepted to ICASSP 2023
We propose an environmental sound synthesis from visual onomatopoeia and sound-source images (Ohnaka et al., 2023).
External project page is Here.
At that time, the DALL-E prototype had just been released, and I got the impression that it was difficult to effectively use methods like automated image generation. However, I believe it’s now possible to create a much higher-quality method.