Struggling with Audio Enhancement using GANs - Any Suggestions?

Follow the full discussion on Reddit.
I'm working on a Python project that aims to transform phone-quality acoustic guitar recordings into studio-like ones. My approach involves using a Generative Adversarial Network (GAN) with two components: a Generator and a Discriminator. Here's a quick rundown of my process: Data Loading & Preprocessing: Convert acoustic guitar recordings to spectrograms and split into training and validation sets. Generator: Neural network trained to create high-quality studio recording spectrograms from low-quality inputs. Discriminator: Another neural network trained to differentiate between real and generator-created high-quality spectrograms. Training: Train the Generator and Discriminator against each other in a cat-and-mouse game of deception and detection. Audio Enhancement: Feed the Generator a low-quality spectrogram, get a high-quality one out, and convert it back into an audio file. I'm reaching out because I'm not entirely satisfied with the quality of the output. The enhanced audio is just rhythmic noise, what am i missing with generating the audio? I'm wondering if anyone here has experience with GANs for audio enhancement and can offer some advice. Is there something I might be missing in my approach? Are there any tips or tricks you've found helpful in your own work? And yes, I'm prepared for you to tear me a new one. Bring on the constructive criticism!

Comments

There's unfortunately not much to read here yet...

Discover the Best of Machine Learning.

Ever having issues keeping up with everything that's going on in Machine Learning? That's where we help. We're sending out a weekly digest, highlighting the Best of Machine Learning.

Join over 900 Machine Learning Engineers receiving our weekly digest.

Best of Machine LearningBest of Machine Learning

Discover the best guides, books, papers and news in Machine Learning, once per week.

Twitter