We’re excited to announce an expanded partnership between AWS and Hugging Face to accelerate the training, refinement, and deployment of large language and vision models used to build generative AI applications. Generative AI applications can perform a variety of tasks, including summarizing text, answering questions, generating code, creating images, and writing essays and articles.
AWS has a deep history of generative AI innovation. For example, Amazon uses AI to deliver conversational experiences with Alexa, which customers interact with billions of times per week, and is increasingly using generative AI as part of new experiences like Create with Alexa. In addition, M5 is an Amazon Search group that helps Amazon teams bring large models to their applications, train large models to improve search results on Amazon.com. AWS is constantly innovating in all areas of ML, including infrastructure, tools in Amazon SageMaker, and AI services like Amazon CodeWhisperer, a service that improves developer productivity by generating code suggestions based on IDE code and on comments. AWS also built purpose-built ML accelerators for large language and vision model training (AWS Trainium) and inference (AWS Inferentia) on AWS.
Hugging Face chose AWS because it offers flexibility in modern tools to train, refine, and deploy Hugging Face models, including Amazon SageMaker, AWS Trainium, and AWS Inferentia. Developers using Hugging Face can now easily optimize performance and reduce costs to produce generative AI applications faster.
Highly efficient and cost-effective generative AI
Building, training, and deploying large language and vision models is an expensive and time-consuming process that requires deep expertise in machine learning (ML). Because models are so complex and can contain hundreds of billions of parameters, generative AI is largely out of reach for many developers.
To close this gap, Hugging Face is now partnering with AWS to make it easier for developers to access AWS services and deploy Hugging Face models specifically for generative AI applications. The advantages are as follows: For example, Amazon EC2 Trn1 instances powered by AWS Trainium provide faster time to training while offering up to 50% cost savings over comparable GPU-based instances. Amazon EC2’s new Inf2 instances, powered by the latest generation of AWS Inferentia, are targeted to host the latest generation of large language and vision models and increase the performance of Inf1, delivering up to 4x higher throughput and up to 10x lower delay. Developers can use AWS Trainium and AWS Inferentia through managed services such as Amazon SageMaker, a service with ML tools and workflows. Or they can self-manage on Amazon EC2.
Get started today!
Customers can get started using Hugging Face models on AWS in three ways: through SageMaker JumpStart, Hugging Face AWS Deep Learning Containers (DLCs), or tutorials for deploying your models to AWS Trainium or AWS Inferentia. The Hugging Face DLC is packed with optimized transformers, datasets, and token libraries that enable generative AI applications to be set up and running in hours instead of weeks with minimal code changes. The SageMaker JumpStart and Hugging Face DLCs are available in all regions where Amazon SageMaker is available and are provided at no additional cost. Read the documentation and discussion forums to learn more, or try sample notebooks today.