Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn't arrive within 3 minutes, check your spam folder.

Ok, Thanks

AWS announces the general availability of Trainium2 instances at re:Invent 2024

AWS has launched Trainium2 EC2 instances, offering a groundbreaking 30-40% improvement in price performance for AI model training. UltraServers connect 64 chips to deliver unprecedented computational power for large language and foundation models.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara
AWS announces the general availability of Trainium2 instances at re:Invent 2024
Credit: Amazon/AWS

AWS announced this Tuesday at re:Invent 2024 it is readying itself for the new generation of LLM with the general availability of Trainium2 (Trn2)-powered Amazon Elastic Compute Cloud instances within its offerings. The company also introduced new Trn2 UltraServers, and gave the world a glimpse Trainium3, the next generation of its in-house chips.

The new Trn2 instances offer a compelling 30-40% improvement in price performance compared to previous GPU-based EC2 instances. With 16 Trainium2 chips connected with NeuronLink interconnect and providing 20.8 peak petaflops of compute, these instances are designed to handle the most demanding generative AI workloads efficiently. Relatedly, the new Trn2 UltraServers can interconnect four Trn2 servers into a massive server which leverages 64 Trainium2 chips to deliver an unprecedented 83.2 peak petaflops of compute. AWS is collaborating with Anthropic to build an EC2 UltraCluster of Trn2 UltraServers, named Project Rainier. Project Rainier will enable Anthropic to distribute model training across hundreds of thousands of chips, which will yield 5x more exaflops than the ones used to train the current Claude models.

In addition to Anthropic, AWS partners Hugging Face, Databricks and poolside are prepared to incorporate Trn2 instances to their infrastructure. Databricks plans to adopt Tr2 to deliver improved outcomes and reduce costs for its customers. By incorporating Trainium2 to its offerings, Hugging Face will enable developers to leverage the chips' performance when developing and deploying AI models. Finally, poolside will also leverage Tr2 instances to better serve its users and will use Trainium2 UltraServers to train future models.

Offering a glimpse to the future, AWS teased Trainium3, expected in late 2025. Built on a 3-nanometer process, these chips are projected to be 4x more performant than the current generation, signaling AWS's commitment to pushing the boundaries of AI computing.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara
Updated

Data Phoenix Digest

Subscribe to the weekly digest with a summary of the top research papers, articles, news, and our community events, to keep track of trends and grow in the Data & AI world!

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Read More