Abstract: Diffusion-based speech enhancement has demonstrated remarkable performance, but existing models still lack precise time-frequency modeling capabilities and fall short in aligning with ...