Press "Enter" to skip to content

CVPR 2018 Recap: The Hottest Developments in Computer Vision & Pattern Recognition

renemeyer 0

In June, the best and brightest minds in computer vision flocked to Salt Lake City for CVPR 2018, the world’s premier conference for computer vision and pattern recognition. With over 6,500 attendees and 3,300 conference paper submissions (of which 979 were accepted), this conference has grown significantly year over year as potential applications within AI and a wide range of industries draw greater competition and development. Key trends with exciting aggressive growth included advances in autonomous technology, CGI (computer-generated imagery) for film and animation implementing character motion, environmental physics, advanced raytracing to better mimic real life, the use of deep learning to perform 3D object detection, tracing and generation, as well as the rise of GANs (more on that later).

The Best Paper award went to a joint research group from Stanford University and the University of California at Berkeley for the paper titled “Taskonomy: Disentangling Task Transfer Learning.” The paper proposes a “fully computational approach for modeling the structure of space of visual tasks” done by creating a computational taxonomic map for task transfer learning.

The winner of Best Student Paper was awarded to “Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies” from a Carnegie Mellon University research group. The paper presents a unified deformation model for the markerless capture of multiple scales of human movement, including facial expressions, body motion, and hand gestures.

Other papers of interest were in the areas of 3D face reconstruction, camera localization, human body pose estimation, point-cloud processing, optical flow and natural language processing.

DCGANs, SRGANs, iGANs, Everwhere GANs

Could 2018 be the year of the GANs? Google AI Research Scientist Jordi Pont-Tuset performed a keyword analysis of the accepted papers at VCRP and found Generative Adversarial Networks (GANs) are the rising star of computer intelligence. As detailed in his blog post, 8% of the papers accepted at CVPR 2018 included “GANs” in their title, doubling the frequency from CVPR 2017.

So what are GANs and what’s their advantage over traditional neural networks? In short, traditional networks have proven to be extremely effective at content classification (ie this is a cat, dog, mouse) but fail to create new content based on acquired knowledge. Introduced in 2014 by Ian Goodfellow and colleagues, GANs facilitate this ability, paving the way for creative AI tasks starting with realistic image and scene creation and including cognitive tasks based on conceptual non-supervised learning, abstraction and intuition. Instead of a single classifying network, GANs consist of two competing networks—a discriminator network and a generator network. While the generator tries to fool the discriminator into falsely classifying a created image as a real image, the discriminator tries to convict the generator of submitting fake images. The training of both networks is where deep learning meets game theory. Over time, the generator and discriminator learn from each other and while the generator creates more and more realistic images, the discriminator learns to better recognize generated images. The training is completed when the discriminator cannot tell if a generated image is real or fake.

The potential of GANs is intriguing. Neural networks can now actively draw pictures of cats and dogs instead of just classifying them. Algorithms fill in missing content in images to create super resolution, even allowing a user to replace an unwanted person in the picture with a photorealistic generated background. Ordinary users can easily create Photoshopped images that look incredibly realistic.

While GANs bring exciting new capabilities to deep learning and we are just scratching the surface, question is, what the best compute setup for training these types of networks.

Compute Resource Recommendations for Training GANs

GANs and deeper layer neural networks require compute intense resources. Platforms featuring NVIDIA GPUs, particularly the NVIDIA Tesla V100 are the preferred choice of compute power for GANs training, due to

  • A. the highest raw GPU compute power available enabled by 5376 CUDA cores
  • B. 32 Gb of ultrahigh bandwidth HBM2 on-GPU memory per card
  • C. Additional hardware acceleration though 672 Tensor Cores

Because systems featuring high-end NVIDIA GPUs like the V100 can be large investments, companies looking to leverage GANs for their development often choose between doing their development in the cloud versus buying on-premise hardware.

To Cloud or Not To Cloud?

One major topic at the show for both AI startups, enterprises and researchers/developers at universities was when to leverage deep learning compute offered by public cloud services like AWS and Google Cloud, and when to invest in on premise solutions. While the answer ultimately depends on how committed a company is to their AI/Deep Learning initiative and how much they spend or plan to spend a month in the cloud, AMAX has published a white paper quantifying performance and cost between AWS instances as well as AMAX deep learning platforms. The tables below compare a sampling of comparable AWS instances with an 8x GPU AMAX Server (the DL-E280) in terms of hardware cost, hardware + estimated overhead, and estimated leasing costs, as well as estimated breakeven point between cloud and on-premise to give our readers a benchmark for when cloud spend can exceed potential on-premise GPU hardware spend.

Furthermore, for more entry level compute, AMAX also offers GPU-integrated AI development workstations featuring 1 to 4 GPUs from $4,000 and up, which can be leased for $200-400/month depending on configurations, making on-premise a good option for companies spending $200+ a month in the cloud. To read the entire white paper, it can be requested here

What’s New From AMAX?

Meanwhile, AMAX showcased its award-winning Machine and Deep Learning compute platforms at Booth #304. We had great conversations with attendees and exhibitors alike, and hot topics included GPU acceleration for autonomous vehicle development and training, facial or video analysis, particularly for video surveillance operations, and GAN development.Of particular interest was the DL-E48A, our latest and greatest 4U 8xGPU rackmount server.One notable visitor, Mr Jensen Huang, CEO of NVIDIA himself, checked it out and declared, “That’s a beautiful system!”

What makes this platform stunning is not just the fact it’s a heavy-duty balls-to-the-wall performance-oriented server designed for machine learning / deep learning dev clusters and integration into large scale data center deployments. The most unique feature is the ability to reconfigure the PCIe architecture between single root complex and dual root complex via software, so that organizations can test both configurations to determine optimal performance for various machine learning and deep learning applications as well as toggle the root complex remotely and on the fly. That means busy IT folks never have to step foot into the data center to reconfigure the system.

DL-E48A, Industry’s only accelerated GPU computing solution to feature software re-configurable single and dual root complex PCIe architecture

Generally, single root complex is optimal for GPU-intensive deep learning workloads for reducing GPU to GPU memory copy latency and increasing bandwidth. Dual root complex is ideal for CPU-intensive and parallel computing applications to optimize CPU/memory to GPU communications. More information on the DL-E48A can be found here.

For researchers and developers, the DL-E400 was the most popular GPU workstation at the show. This award-winning Deep Learning DevBox features 4x NVIDIA Titan V or 1080Ti cards with onboard dual 1G/10G networking and enterprise-grade motherboard for high-performance in a compact, ultra-quiet footprint.

If you are interested in any of our deep learning workstation or deep learning server products, please contact us here. If you’re not sure what system is right for you, we also offer a free initial consultation with our AI solutions architects to determine the best solution to fit your needs.

What topics or trends really caught your attention at CVPR 2018? Let us know!