TL; DR: Unlocking information from infrastructure already in your datacenter, an innovative suite of APIs and plugins from Intel® provides real-time power and thermal monitoring without changing your current hardware. Data Center Manager helps you manage power consumption and operational costs, making your datacenter run more efficiently. With additional diagnostics from companion tool Virtual Gateway, you can extend the life of your hardware from anywhere.
How unsettling is it to get a surprisingly high electric bill? You open the envelope, see a higher number than you expected — and have no idea where you used the extra power. With multiple people fighting over the thermostat and some rooms running hotter than others, you might be spending more than you budgeted running the heat while a roommate has the window open.
If your home were a datacenter, Intel would be able to tell you exactly where the problems are. Their Data Center Manager (DCM) tools tap into the architecture’s pre-existing sensors and instrumentation to provide real-time metrics on power consumption and thermal output.
The information helps datacenter operators lower costs and extend infrastructure lifespans by automating data collection and presenting insights into ideal operating conditions and configurations, according to Jeff Klaus, General Manager of Data Center Solutions.
“You want as many data points as possible with this huge investment because if there’s a problem, you want to know exactly where it is,” he said. “We take the single utility bill you get from your house or your datacenter, and we break it down into as many small pieces as possible.”
Intel’s Decision to Use Software to Extract More Value From Hardware
Recognizing the need for information about power consumption and infrastructure management, Jeff said Intel realized their growth and market share depended on more than just churning out powerful hardware components.
“It needs to be a combination of what insights and capabilities we can provide beyond just the ability to do compute,” he said. “What enhancements can we do to make the overall solution more valuable to the customer?”
According to Jeff, Intel will sometimes release an open-source program with emerging functionalities to gauge interest. Intel leaders invested in a small team to build and support a software development kit, or set of APIs, to streamline the process of inserting DCM into existing interfaces.
Data Center Manager Takes Advantage of Existing Infrastructure
For a hardware company as reputable as Intel to invest in a software solution, Jeff said the program needs to have a demonstrable impact on the company’s platform of chipsets, processors, network interface controllers, integrated circuits, and flash memory.
“The ability for software to extract value out of the hardware is often where the special sauce or the differentiation comes about,” he said. “Every two to three years, we come out with a new set of platform capabilities with more instrumentation and more abilities to peek around the corner into the hardware and see what it’s doing like never before.”
Jeff said he still frequently runs into customers from major companies who aren’t aware DCM can provide data from servers already running in their datacenter. Manufacturers with Intel components, such as Dell, HP, and Lenovo, might not feature DCM as a product’s differentiator, but Jeff said hardware built after 2007 has at least some data to share.
“You’ve already bought it,” he said. “You don’t have to buy anything else.”
Users Extend Data Center Manager Into Automation and App Monitoring
As customers and partners become more familiar with DCM, Jeff said they are improving and adding onto the tools to adapt the functionality to more specific needs.
Some uses of DCM have morphed into application performance monitoring and more sophisticated applications of how the hardware and software work together.
“It’s evolving to become more automated,” he said of the platform. “I really couldn’t envision the level of usage several years ago, but it has become pretty exciting to see what people have done with it.”
How Real-Time Power and Thermal Monitoring Reduce Operational Costs
Instead of spending dozens of man-hours each week manually gathering information and checking power sensors, DCM automates and centralizes data collection. Depending on the infrastructure, DCM can pinpoint exactly the information you need.
“We can go really, really small with our data,” Jeff said. “We can get to the point where it’s only the real geeks who find it useful.”
One of the common suggestions Jeff sees from DCM data is to increase the temperature in datacenters and minimize cooling costs.
“Don’t look at the old-school way of doing things and keep it at a temperature that’s unnecessary,” he said. “You’ve got more data points than ever, so just extract that data and give yourself more confidence to be able to turn up the temperature and lower your air conditioning costs.”
From there, the DCM team can set threshold levels and implement algorithms to try to predict temperatures and to alert datacenter operators of potential problems.
Data Center Manager Learns Languages to Communicate Across OEMs
All hardware manufacturers follow the Intelligent Platform Management Interface, or IPMI, specifications to report performance metrics independently of the hardware’s CPU, firmware, or operating system. Each brand customizes their IPMI feed slightly to differentiate their products.
Jeff called DCM a “living, breathing organism of languages” that collects the various IPMI feeds from servers, power supplies, storage devices, and temperature sensors. As more products and platforms are released, DCM’s library of languages grows.
DCM provides a simplified data feed to infrastructure and application performance managers to interpret or to connect with a facilities management interface. The out-of-band solution has its own discovery mechanism to locate network devices and languages.
“If there’s ever a system we can’t communicate with that’s kind of off in the corner, we’ll look at that and say, ‘What’s your language?'” Jeff said. “We’ll add that to our library, so the value of the product is really the ability to maintain a current library.”
Use Cases Help Interpret Data to Increase Efficiency and Rack Density
Rather than dumping infrastructure metrics on operators without context, the DCM team devised ways to package the information into use cases and actionable insights.
The use cases — which supply videos, white papers, and case studies — show the six most commonly improved areas of operation: workload management, rack density, identifying ghost servers, disaster avoidance, thermal profiles, and power management.
“The use cases really help people interpret the technology in a way that a COO or somebody who is paying the bills really understands how this can help them,” Jeff said.
Remote Diagnostics and Troubleshooting With Virtual Gateway
Off the success of DCM, Intel partners asked Jeff if his team could access any other useful information. By running a remote session, Jeff’s team could access logs and BIOS data to monitor system health metrics.
DCM’s companion product, Virtual Gateway, is a set of APIs that let datacenter operators tap into those resources with a keyboard-video-mouse (KVM) interface.
“Not many datacenter operators want to add more hardware if they can help it,” Jeff said. “We were able to accomplish roughly the same set of abilities through the existing network and not have to add more hardware.”
Intel Teams Push for Efficiency Beyond Data Center Manager
With the DCM team in “maintenance mode” between platform releases, Jeff’s team is exploring Rack Scale Design to further increase the efficient use of computing resources.
“DCM helps you start to understand how much underutilized infrastructure you have, but Rack Scale Design truly allows you to have an orchestration of what’s truly needed in order to size that appropriately to the types of workloads your cloud environment or business demand,” Jeff said. “It really only turns on what you need for compute, network, and storage in order to complete that workload as efficiently as possible.”