Preparing the data center for a world with generative AI
Posted: Mon Dec 23, 2024 7:14 am
Entering the world of generative AI (GenAI) is like entering a new realm, filled with unique challenges and opportunities. Just as Dorothy needed help getting around Oz, organizations must prepare their data centers to handle the demands of AI infrastructure.
The Emerald City's IT Requirements
Implementing an AI infrastructure presents significant challenges, starting with the computational requirements, the most important of which are those related to model training. Even if an organization does not train models from scratch, the computational requirements for inference of large language models (plus vector embedding for recovery augmented generation [RAG]) and fine-tuning far exceed those used for current applications.
To meet these requirements, the physical size, weight, cabling, networking, power how to get telegram profile link nd cooling specifications of GPU-powered generative AI servers are several times higher than the corresponding specifications of standard servers. Careful planning is needed for organizations to deploy this AI infrastructure in their data centers.
For example, the Dell PowerEdge XE9680 server , validated by Dell for inference use cases, is a 6U server with 8 NVIDIA H100 GPUs. Due to its rugged construction and cooling capabilities, this server weighs over 90 kg (200 lbs). A rack with 4 XE9680 servers consumes 20-40 kW of power, contains over 100 cables, and weighs over 453 kg (1000 lbs).
Depending on your needs and the scope of your AI deployment, you may choose to apply the recommendations outlined in this blog to your data center as a whole or to a dedicated AI section within your data center.
The Scarecrow's Brain: Data Center Capacity
In the classic story, the Scarecrow says he needs a brain and his plan is to follow Dorothy to find the Wizard. In the world of AI infrastructure, it is critical to have a plan for data center sizing and space allocation for server and rack installation, airflow optimization, and maintenance.
Dell Services implementation specialists can work with your team to design the space to efficiently manage large numbers of AI infrastructure racks and provide additional capacity for future expansion.
Organizing racks to allow for easy maintenance and access to servers and infrastructure is critical to good data center design, and applies to AI infrastructure as well. Teams should establish a regular maintenance schedule to follow, including periodic checks and replacing air filters, fans, and cooling units as needed.
The Lion's Courage: Effective Airflow Management
Airflow is critical to managing the heat generated by servers and infrastructure systems. AI infrastructure consumes significantly more power than traditional servers, which generates more heat and makes airflow and cooling even more important.
Organizations should use structured airflow management strategies, such as containment in hot and cold aisles, and directing cold air directly to server inlets and hot exhaust air away from equipment. This will increase cooling efficiency and reduce energy costs.
The Emerald City's IT Requirements
Implementing an AI infrastructure presents significant challenges, starting with the computational requirements, the most important of which are those related to model training. Even if an organization does not train models from scratch, the computational requirements for inference of large language models (plus vector embedding for recovery augmented generation [RAG]) and fine-tuning far exceed those used for current applications.
To meet these requirements, the physical size, weight, cabling, networking, power how to get telegram profile link nd cooling specifications of GPU-powered generative AI servers are several times higher than the corresponding specifications of standard servers. Careful planning is needed for organizations to deploy this AI infrastructure in their data centers.
For example, the Dell PowerEdge XE9680 server , validated by Dell for inference use cases, is a 6U server with 8 NVIDIA H100 GPUs. Due to its rugged construction and cooling capabilities, this server weighs over 90 kg (200 lbs). A rack with 4 XE9680 servers consumes 20-40 kW of power, contains over 100 cables, and weighs over 453 kg (1000 lbs).
Depending on your needs and the scope of your AI deployment, you may choose to apply the recommendations outlined in this blog to your data center as a whole or to a dedicated AI section within your data center.
The Scarecrow's Brain: Data Center Capacity
In the classic story, the Scarecrow says he needs a brain and his plan is to follow Dorothy to find the Wizard. In the world of AI infrastructure, it is critical to have a plan for data center sizing and space allocation for server and rack installation, airflow optimization, and maintenance.
Dell Services implementation specialists can work with your team to design the space to efficiently manage large numbers of AI infrastructure racks and provide additional capacity for future expansion.
Organizing racks to allow for easy maintenance and access to servers and infrastructure is critical to good data center design, and applies to AI infrastructure as well. Teams should establish a regular maintenance schedule to follow, including periodic checks and replacing air filters, fans, and cooling units as needed.
The Lion's Courage: Effective Airflow Management
Airflow is critical to managing the heat generated by servers and infrastructure systems. AI infrastructure consumes significantly more power than traditional servers, which generates more heat and makes airflow and cooling even more important.
Organizations should use structured airflow management strategies, such as containment in hot and cold aisles, and directing cold air directly to server inlets and hot exhaust air away from equipment. This will increase cooling efficiency and reduce energy costs.