NVIDIA BioNeMo Enables Generative AI for Drug Discovery on AWS.
Pharma and techbio companies can access the NVIDIA Clara healthcare suite, including BioNeMo, now via Amazon SageMaker and AWS ParallelCluster, and the NVIDIA DGX Cloud on AWS.
Table of Contents
November 28, 2023 – Leading pharmaceutical and biotech companies’ researchers and developers can now easily deploy NVIDIA Clara software and services for accelerated healthcare via Amazon Web Services.
The initiative, announced today at AWS re:Invent, allows healthcare and life sciences developers who use AWS cloud resources to integrate NVIDIA-accelerated offerings such as NVIDIA BioNeMo—a generative AI platform for drug discovery—which is coming to NVIDIA DGX Cloud on AWS and is currently available via the AWS ParallelCluster cluster management tool for high-performance computing and the Amazon SageMaker machine learning service.
AWS is used by thousands of healthcare and life sciences companies worldwide. They can now use BioNeMo to build or customize digital biology foundation models with proprietary data, scaling up model training and deployment on AWS using NVIDIA GPU-accelerated cloud servers.
Alchemab Therapeutics, Basecamp Research, Character Biosciences, Evozyne, Etcembly, and LabGenius are among the AWS users who have already started using BioNeMo for generative AI-accelerated drug discovery and development. This collaboration provides them with additional options for rapidly scaling up cloud computing resources for developing generative AI models trained on biomolecular data.
This announcement extends NVIDIA’s existing healthcare-focused offerings available on AWS — NVIDIA MONAI for medical imaging workflows and NVIDIA Parabricks for accelerated genomics.
New to AWS: NVIDIA BioNeMo Advances Generative AI for Drug Discovery
BioNeMo is a domain-specific framework for digital biology generative AI, including pretrained large language models (LLMs), data loaders, and optimized training recipes that can help advance computer-aided drug discovery by speeding target identification, protein structure prediction, and drug candidate screening.
Drug discovery teams can use their proprietary data to build or optimize models with BioNeMo and run them on cloud-based high-performance computing clusters.
One of these models, ESM-2, a powerful LLM that supports protein structure prediction, achieves almost linear scaling on 256 NVIDIA H100 Tensor Core GPUs. Researchers can scale to 512 H100 GPUs to complete training in a few days instead of a month, the training time published in the original paper.
Developers can train ESM-2 at scale using checkpoints of 650 million or 3 billion parameters. Additional AI models supported in the BioNeMo training framework include small-molecule generative model MegaMolBART and protein sequence generation model ProtT5.
BioNeMo’s pretrained models and optimized training recipes — which are available using self-managed services like AWS ParallelCluster and Amazon ECS as well as integrated, managed services through NVIDIA DGX Cloud and Amazon SageMaker — can help R&D teams build foundation models that can explore more drug candidates, optimize wet lab experimentation and find promising clinical candidates faster
Also Available on AWS: NVIDIA Clara for Medical Imaging and Genomics
Project MONAI, cofounded and enterprise-supported by NVIDIA to support medical imaging workflows, has been downloaded more than 1.8 million times and is available for deployment on AWS. Developers can harness their proprietary healthcare datasets already stored on AWS cloud resources to rapidly annotate and build AI models for medical imaging.
These models, trained on NVIDIA GPU-powered Amazon EC2 instances, can be used for interactive annotation and fine-tuning for segmentation, classification, registration, and detection tasks in medical imaging. Developers can also harness the MRI image synthesis models available in MONAI to augment training datasets.
To accelerate genomics pipelines, Parabricks enables variant calling on a whole human genome in around 15 minutes, compared to a day on a CPU-only system. On AWS, developers can quickly scale up to process large amounts of genomic data across multiple GPU nodes.
More than a dozen Parabricks workflows are available on AWS HealthOmics as Ready2Run workflows, which enable customers to easily run pre-built pipelines.