top of page

New AI servers 2025: rethinking design for managed power

  • Cedric KTORZA
  • Nov 17, 2025
  • 8 min read

Updated: Nov 18, 2025


In 2025, AI infrastructure shifts from “more power” to “managed power,” reshaping server, rack and facility design so performance, efficiency and resiliency advance together. This article explains what’s changing, why it matters, and how a pragmatic, integrated approach helps you deploy AI at scale without losing control of energy, cooling or uptime.

 

At a glance

  • AI servers now treat power as a first-class constraint: design from rack power budgets down to workload caps.

  • Liquid cooling, 48V distribution and rack-scale power shelves are moving into mainstream AI deployments.

  • Telemetry (Redfish/PMBus) and power-aware orchestration let you cap, schedule and right-size compute in real time.

  • Facilities matter: grid capacity, renewables, storage and heat reuse must be co-architected with IT.

  • At Score Group, Noor ITS, Noor Energy and Noor Technology deliver an end-to-end path from audit to run.

 

Why AI servers in 2025 demand a new design envelope

The AI boom concentrates compute into accelerators with extreme power density and memory bandwidth demands. That stress surfaces new bottlenecks: power delivery, heat removal, and cross-node interconnects. The question is no longer “can we feed the GPUs?” but “how do we govern power so performance and availability remain predictable?”

External factors amplify the shift. Industry observers expect data centre electricity use to rise materially this decade, driven in part by AI workloads. The International Energy Agency highlights a steep increase through 2026, pressing operators to improve efficiency and embrace demand-side flexibility IEA report. Managed power becomes both an engineering and a sustainability imperative.

Power has become a software-defined resource. The winning AI platforms are those that can sense, decide and act on power—at the chip, the rack and the grid edge.

 

Managed power, not just more power

“Managed power” means designing the stack to deliver target service levels within explicit energy and thermal budgets.

Key principles:

  • Budget from the outside in: set site and room limits, then allocate per row/rack/node.

  • Instrument everything: fine-grained telemetry informs planning, control and capacity forecasting.

  • Orchestrate with intent: align workload placement and tuning with power, cooling and SLA constraints.

  • Engineer for adaptability: use modular power/cooling blocks and firmware-controlled limits to match shifting AI demand.

This rethink spans hardware selections (accelerators, interconnects, power distribution), software (schedulers, capping, observability), and facilities (cooling topologies, grid contracts, renewables and storage).

 

Hardware building blocks reshaped by power constraints

 

Accelerators and interconnects

Modern AI servers cluster accelerators and rely on high-bandwidth GPU-GPU fabrics. Vendor designs increasingly expose profiles to trade absolute performance for watts per token or watts per training step. New platforms like NVIDIA’s Blackwell-generation GB200 put emphasis on rack-scale systems with liquid cooling and optimized power delivery NVIDIA data center platform.

Interconnect choices matter. PCIe 5/6 for host-to-accelerator and NVLink- or CXL-based fabrics for peer-to-peer influence both performance and efficiency. Standards bodies like PCI-SIG continue evolving bandwidth and power features that affect board-level design PCI-SIG specifications.

 

Memory and CXL pooling

AI models are increasingly memory-bound. Compute Express Link (CXL) allows memory pooling and tiering across nodes so you can right-size DRAM/HBM footprints and reduce stranded capacity—an indirect but important lever on energy per inference/training job CXL Consortium. The payoff: fewer overprovisioned servers and better performance per watt with memory closer to the workload’s actual needs.

 

Networking at 400/800G

Moving gradients and embeddings at scale pushes east-west traffic to 400/800G Ethernet or InfiniBand. Higher speeds add power draw per port; designs mitigate that by using energy-efficient optics, topology choices (e.g., leaf-spine consolidation) and workload placement that reduces cross-rack chatter. IEEE 802.3 work on high-speed Ethernet frames the evolution path and implementation trade-offs IEEE 802.3 Ethernet Working Group.

 

Power delivery: 48V, busbars and rack-scale shelves

As rack densities rise, higher-voltage distribution reduces copper losses and improves efficiency. Open Compute Project’s Open Rack exemplifies 48V busbar distribution with modular power shelves—an architecture well-suited to AI clusters where per-rack draw can be substantial OCP Open Rack. Designers also evaluate 415V three-phase at the row and point-of-load (PoL) converters on the board, balancing efficiency, redundancy and serviceability.

 

Cooling: from advanced air to liquid

Air can still serve moderate densities with containment and high-efficiency coils. Beyond a threshold, direct-to-chip (D2C) liquid cooling and, in some cases, immersion become the practical path for thermal headroom and acoustic control. ASHRAE’s datacom guidance covers environmental envelopes and liquid-cooling considerations useful for design teams and facility operators ASHRAE Datacom resources.

Industry surveys note rising planned rack densities, with operators increasingly preparing for tens of kilowatts per rack and liquid-cooled designs enabling far higher. The specific numbers depend on your hardware and duty cycles; what’s constant is that thermal strategy must be locked to power governance from day one.

 

Software orchestration for power-aware AI

 

Telemetry and open control planes

You cannot manage what you cannot measure. Server and power gear increasingly expose sensors and controls via DMTF Redfish and PMBus. Integrating these feeds into DCIM/observability stacks enables real-time and historical views of watts, temperatures, and fan or pump speeds DMTF Redfish PMBus. At the application layer, per-job power data feeds chargeback, SLO tracking and model optimization.

 

Workload scheduling, caps and right-sizing

Schedulers should place and tune jobs based on both performance targets and instantaneous power/cooling headroom. Techniques include:

  • Per-accelerator power caps aligned to SLA tiers.

  • Queuing or migrating low-priority inference during thermal excursions.

  • Selecting precision/algorithm variants that reduce joules per token. Open-source efforts are emerging around energy-aware Kubernetes and exporters that surface power metrics for decisioning—see community initiatives like Kepler in the CNCF ecosystem Kepler project.

 

Security and resilience by design

New control planes widen the attack surface. Power and thermal telemetry, firmware controls and BMC interfaces must be hardened. Align cyber practices with safety and availability: network segmentation for out-of-band management, secure firmware supply chains, and incident response plans tied to power/cooling anomalies. Uptime Institute’s research underscores resilience discipline as densities and complexity rise Uptime Institute research hub.

 

Facilities and energy: grid, renewables and heat reuse

AI-ready facilities need more than bigger feeders. They benefit from grid-aware operations, on-site generation, storage and, where feasible, heat reuse. The EU’s work on data centre efficiency and reporting, along with voluntary codes of conduct, illustrates how policy and practice are evolving in tandem EU JRC – Energy efficiency in data centres.

Practical steps include:

  • Contracting capacity flexibly and participating in demand response to cap peak charges and support the grid.

  • Integrating solar PV and batteries to shave peaks and backstop critical loads.

  • Engineering heat-recovery loops in districts that can accept low-grade heat. As the IEA notes, balancing digital growth with energy system stability will rely on both efficiency and flexibility measures IEA overview.

 

How Score Group delivers an integrated path

At Score Group, we bring energy, digital infrastructure and new tech together so you can scale AI with control and confidence. Our three-pillar architecture ensures your strategy is cohesive from board to grid.

  • Noor ITS – The infrastructure backbone

  • Data center design and optimization, cloud/hybrid architectures, networks and PRA/PCA for resilience.

  • Integration of Redfish/PMBus telemetry into unified observability and capacity planning.

  • Secure-by-design implementations across on-prem and hosted environments.

  • Noor Energy – Intelligence for energy performance

  • Smart energy management: metering, monitoring and optimization of consumption across sites.

  • Building management (GTB/GTC), renewable integration (PV, storage), EV infrastructure and demand response.

  • Thermal engineering: advanced air containment, liquid cooling readiness, and heat reuse studies.

  • Noor Technology – Innovation applied to operations

  • AI-driven automation and predictive analytics for power/cooling forecasting.

  • IoT sensor meshes and real-time connectivity for fine-grained environmental control.

  • Application development to expose power-aware controls to your DevOps and MLOps teams.

As an integrator, we align stakeholders—IT, facilities, finance and sustainability—around measurable outcomes. Des solutions adaptées à chacun de vos besoins. Explore our approach at Score Group.

 

A phased roadmap to “managed power” AI

  1. Baseline and objectives - Audit site and rack power, thermal capacity, and network topology. - Define service tiers (training vs. inference) and power/SLA targets.

  2. Architecture decisions - Select accelerator platforms, interconnects and memory strategies (including CXL readiness). - Choose power distribution (48V, busbars, shelves) and cooling topology (advanced air, D2C liquid, immersion where justified).

  3. Telemetry and control - Standardize on Redfish/PMBus, integrate into DCIM/observability, and expose APIs to schedulers. - Implement per-job and per-node power caps; create runbooks for thermal or grid events.

  4. Facilities integration - Plan grid capacity, renewables, storage and possible heat reuse. - Align BMS with IT telemetry for coordinated control loops.

  5. Operate and optimize - Track KPIs: performance per watt, PUE/TUE, energy per inference/training step. - Iterate via firmware settings, workload placement and preventive maintenance.

 

AI server design levers and power governance matrix

Layer

2025 design choice

Power governance lever

Typical impact

Notes/standards

Accelerator fabric

GPU-centric nodes with high-speed links

Per-accelerator power caps; workload precision tuning

Stabilizes rack draw; predictable SLAs

Vendor power profiles; PCIe/NVLink evolution PCI-SIG

Memory

CXL-enabled pooling/tiering

Right-size memory to workload; reduce stranded DRAM

Lower idle power; higher perf/W

Networking

400/800G leaf–spine

Optics selection; placement to cut east–west

Less network energy per job

Power distribution

48V busbar, rack power shelves

Rack-level budgeting and shedding

Higher delivery efficiency; modularity

OCP Open Rack

Cooling

Direct-to-chip liquid

Variable pump curves; inlet temp control

Higher density with stable temps

ASHRAE Datacom

Telemetry

Redfish + PMBus

Closed-loop scheduling, alerting

Measurable control of watts and thermals

Schedulers

Power-aware orchestration

Job capping; DR participation

SLA compliance within power caps

CNCF ecosystem (e.g., Kepler)

Facility

PV + storage + BMS integration

Peak shaving; demand response

Cost and carbon reduction

IEA overview

 

Implementation pitfalls to avoid

  • Designing the rack without a facility plan: cooling and grid constraints surface late and force compromises.

  • Telemetry blindness: without standardized sensors and APIs, power-aware scheduling remains theoretical.

  • One-size-fits-all SLAs: treat training, batch inference and real-time inference differently to optimize power and cost.

  • Overlooking cyber-physical risk: unsecured BMCs or power controllers can become single points of failure.

 

FAQ

 

What rack power density should we plan for with AI servers?

Densities vary by hardware and workloads. Industry guidance shows many operators now planning for tens of kilowatts per rack, with liquid cooling enabling much higher levels for dense accelerator clusters. The safe approach is to design for your heaviest duty cycle, include a contingency margin, and adopt modular power and cooling blocks that scale. Consult resources from ASHRAE on environmental envelopes and consider surveys from groups like Uptime Institute for trends; then validate with empirical tests during pilot phases.

 

Do we need liquid cooling, or can advanced air still work?

Both can work—choice depends on target density, acoustic/space constraints, and climate. Advanced air (containment, high-efficiency coils, optimized airflow) can support moderate densities. Direct-to-chip liquid cooling unlocks higher thermal headroom and more stable operations for dense GPU nodes. Begin with thermal modeling, evaluate vendor heat rejection specs, and consider total lifecycle aspects (maintenance procedures, water quality, leak detection) as outlined in ASHRAE datacom guidance before committing.

 

How does managed power affect AI performance?

Managed power aims to keep performance predictable under real-world constraints. Instead of peak benchmarks, you operate to a defined SLA with caps, placement rules and, where necessary, model precision adjustments. Many accelerator platforms now expose power and frequency profiles that trade minimal performance for significant efficiency gains. The net effect is better performance per watt and fewer thermal or electrical excursions that cause throttling—typically improving delivered performance over time.

 

What telemetry is essential to control AI power effectively?

Start with per-node power draw, inlet/outlet temperatures, fan/pump speeds, and per-accelerator power and utilization. Use standard interfaces like DMTF Redfish for server/BMC data and PMBus for power components. Correlate these with workload metadata (job type, SLA, dataset, precision) in your observability stack. This context enables schedulers to cap or move jobs intelligently, and facilities to adjust cooling dynamically. Consistent time-series data also supports capacity planning and energy reporting frameworks.

 

How do facilities and grid strategy fit into AI server design?

Treat facilities as part of the architecture, not a backdrop. Coordinate with energy teams to secure capacity, evaluate on-site renewables and storage, and establish demand-response playbooks. Integrate building management systems with IT telemetry so cooling and power respond coherently to workload changes. Policy frameworks and guidance—such as the EU’s focus on data centre efficiency and IEA insights on electricity demand—can help shape investment choices and reporting obligations.

 

Key takeaways

  • AI in 2025 demands a shift from raw power to governed power across server, rack and facility layers.

  • Liquid cooling, 48V distribution and rack-scale power shelves are key enablers of stable high density.

  • Telemetry and open control planes (Redfish/PMBus) are prerequisites for power-aware orchestration.

  • Facilities strategy—grid capacity, renewables, storage and heat reuse—must be co-designed with IT.

  • Score Group unites Noor ITS, Noor Energy and Noor Technology to deliver end-to-end, managed-power AI platforms.

Ready to align your AI ambitions with energy, resilience and sustainability goals? Talk to us at Score Group.

 
 
bottom of page