Failure Analysis of Direct Liquid Cooling System in Data CentersSource: Journal of Electronic Packaging:;2018:;volume( 140 ):;issue: 002::page 20902Author:Alkharabsheh, Sami
,
Puvvadi, Udaya L. N.
,
Ramakrishnan, Bharath
,
Ghose, Kanad
,
Sammakia, Bahgat
DOI: 10.1115/1.4039137Publisher: The American Society of Mechanical Engineers (ASME)
Abstract: In this paper, the impact of direct liquid cooling (DLC) system failure on the information technology (IT) equipment is studied experimentally. The main factors that are anticipated to affect the IT equipment response during failure are the central processing unit (CPU) utilization, coolant set point temperature (SPT), and the server type. These factors are varied experimentally and the IT equipment response is studied in terms of chip temperature and power, CPU utilization, and total server power. It was found that failure of this cooling system is hazardous and can lead to data center shutdown in less than a minute. Additionally, the CPU frequency throttling mechanism was found to be vital to understand the change in chip temperature, power, and utilization. Other mechanisms associated with high temperatures were also observed such as the leakage power and the fans' speed change. Finally, possible remedies are proposed to reduce the probability and the consequences of the cooling system failure.
|
Collections
Show full item record
contributor author | Alkharabsheh, Sami | |
contributor author | Puvvadi, Udaya L. N. | |
contributor author | Ramakrishnan, Bharath | |
contributor author | Ghose, Kanad | |
contributor author | Sammakia, Bahgat | |
date accessioned | 2019-02-28T11:14:08Z | |
date available | 2019-02-28T11:14:08Z | |
date copyright | 5/9/2018 12:00:00 AM | |
date issued | 2018 | |
identifier issn | 1043-7398 | |
identifier other | ep_140_02_020902.pdf | |
identifier uri | http://yetl.yabesh.ir/yetl1/handle/yetl/4254138 | |
description abstract | In this paper, the impact of direct liquid cooling (DLC) system failure on the information technology (IT) equipment is studied experimentally. The main factors that are anticipated to affect the IT equipment response during failure are the central processing unit (CPU) utilization, coolant set point temperature (SPT), and the server type. These factors are varied experimentally and the IT equipment response is studied in terms of chip temperature and power, CPU utilization, and total server power. It was found that failure of this cooling system is hazardous and can lead to data center shutdown in less than a minute. Additionally, the CPU frequency throttling mechanism was found to be vital to understand the change in chip temperature, power, and utilization. Other mechanisms associated with high temperatures were also observed such as the leakage power and the fans' speed change. Finally, possible remedies are proposed to reduce the probability and the consequences of the cooling system failure. | |
publisher | The American Society of Mechanical Engineers (ASME) | |
title | Failure Analysis of Direct Liquid Cooling System in Data Centers | |
type | Journal Paper | |
journal volume | 140 | |
journal issue | 2 | |
journal title | Journal of Electronic Packaging | |
identifier doi | 10.1115/1.4039137 | |
journal fristpage | 20902 | |
journal lastpage | 020902-8 | |
tree | Journal of Electronic Packaging:;2018:;volume( 140 ):;issue: 002 | |
contenttype | Fulltext |