In our last blog post, we touched a little upon the concept of GANs, short for Generative Adversarial Networks. GANs is a relatively new branch of unsupervised machine learning. It was first introduced by Ian Goodfellow in 2014, and has spurred major interests among scientists and researchers with its wide applications and remarkably good results.
To make it sound even more interesting, the concept of GANs was recently described by Yann LeCun, Director of AI Research at Facebook, as the most important development in deep learning, and “the most interesting idea in the last 10 years in ML (Machine Learning).”
A GAN takes two independent networks – one generative and one discriminative – that work separately and act as adversaries. Quite literally, the generative network generates novel synthesized instances, while the discriminative network discriminates between synthesized instances and real ones.
One way to interpret this is through an art investigator and an art forger. The generator, in this case the forger, wants to create, say, a fake Van Gogh painting. He starts by learning what Van Gogh paintings look like, and then imitates with the goal to fool other people. The discriminator, in this case the investigator, starts also by learning the characteristics of Van Gogh in order to recognize what’s fake. Whenever one side loses, either the forger gets caught, or the investigator gets fooled, he works harder to improve. In order to win the battle, both the forger and the investigator train and escalate until both become experts.
Now, imagine expanding on this concept, if machines can soon create masterpieces in art and design, we may be seeing artists on the “Endangered Jobs” list very soon.
Application of GANs
GANs can be applied to multiple scenarios, including image classification, speech recognition, video production, robot behavior generation, etc. One of the most common applications is image generation – more specifically, the generation of “natural” images.
Many of you may have played the “What will you look like when you are old?” game online, or something of that sort, usually just for a laugh. With GANs technology available, scientists have improved the simulation to become much more reliable, and something that one day could be used to help missing person investigations.
In the application, the generative network was trained on 5,000 faces labeled with ages. The machine learns the characteristic signatures of aging, and then apply them to faces to make them look older. The second step of the application takes the discriminative network, and has it compare the original “before” images with the synthesized “after” images to see whether they are the same person.
When pitted against other face-aging techniques, the team using GANs received 60% more successful results of “before” and “after” identifying the same person.
In addition to face recognition, GANs has been proven useful in astronomy research, by a group of Swiss scientists. Up until now, the human ability to observe outer space has been limited by the capabilities of telescopes. However advanced modern telescopes are, scientists are never satisfied with the amount of detail they can show.
In the study, scientists took a space image and deliberately degraded its resolution. Using the degraded image and the original image, scientists trained the GANs to recover the degraded image to the best, and most genuine degree. Then using the trained GANs, scientists were able to receive a much sharper version of the original image, finding it better able to recover features than anything used to date.
Extensions of GANs
Ever since the concept of GANS was introduced, researchers have focused on how to improve the stability of GANs training. More suitable architectures have been developed to put constraints on the training, and tackle specific image generation tasks.
A CGAN is an extension of the basic GAN with a conditional setting. It works by taking into account external information, such as label, text or another image, to determine specific representation of the generated images. The scary cat drawing we mentioned in the previous blog, and the space image recovery technique are both the results of a CGAN. More experimented applications are:
Text to image:
Image to image:
A LAPGAN is a Laplacian Pyramid of GANs, used to generate high quality samples of natural images. The training of a LAPGAN starts first by breaking the original training task into multiple manageable stages. At each stage, a generative model is trained using a GAN. In other words, a LAPGAN increases the models’ learning capability, by allowing them to be trained sequentially. According to the research paper, LAPGAN-generated images were mistaken for real images around 40% of the time, compared to 10% using a basic GAN.
A DCGAN is short for Deep Convolutional GAN, a more stable set of architecture proposed in a paper published in 2016. It works as a reverse of Convolutional Neural Networks (CNN) while bridging the gap between CNNs for supervised learning and unsupervised learning. In the paper, researchers predicted promising extensions of the DCGAN framework into domains such as video frame prediction and speech synthesis.
InfoGAN is an information-theoretic extension to the Generative Adversarial Network. It’s been proven to be able to learn by maximizing the mutual information between a small subset of the latent variables and the observation. Real life applications are concepts of brightness, rotation and width of an object, and even hairstyles and expression on human faces.
Challenges of GANs
GANs have attracted major attention within the academic field since their advent three years ago. Near the end of last year, Apple published its very first AI paper, announcing its efforts in algorithm training using GANs.
In addition to the aforementioned extensions, more variations of GANs are being studied to further implement the model, as well as to tackle its shortcomings, including the difficulty and instability in the training process, as mentioned in detail by Ian Goodfellow in his answer on Quora.
As researchers continue developing advancements to the GAN models and scaling up the training, we can expect to see fairly accurate and realistic machine-generated samples of videos, images, text, interactions, etc in the very near future. Which begs the question…if we see machines being pitted against each other in a manner that gives them human-like abilities to mimic and validate, would this mean that at some point, they will not only have the ability to reflect the world to us, but also have a hand in creating it, too?
If you’re ready to build your GANs and need the most powerful machine learning engines in the world, please visit https://www.amax.com/solutions/deep-learning-solutions/.