Health Monitoring and Prognostics for Computer Servers

Abstract

Prognostics solutions for mission critical systems require a comprehensive methodology for proactively detecting and isolating failures, recommending and guiding condition-based maintenance actions, and estimating in real time the remaining useful life of critical components and associated subsystems.

A major challenge has been to extend the benefits of prognostics to include computer servers and other electronic components. The key enabler for prognostics capabilities is monitoring time series signals relating to the health of executing components and subsystems. Time series signals are processed in real time using pattern recognition for proactive anomaly detection and for remaining useful life estimation. Examples will be presented of the use of pattern recognition techniques for early detection of a number of mechanisms that are known to cause failures in electronic systems, including: environmental issues; software aging; degraded or failed sensors; degradation of hardware components; degradation of mechanical, electronic, and optical interconnects. Prognostics pattern classification is helping to substantially increase component reliability margins and system availability goals while reducing costly sources of "no trouble found"

events that have become a significant warranty-cost issue.

Bios

Aleksey Urmanov is a research scientist at Sun Microsystems. He earned his doctoral degree in Nuclear Engineering at the University of Tennessee in 2002. Dr. Urmanov's research activities are centered around his interest in pattern recognition, statistical learning theory and ill-posed problems in engineering. His most recent activities at Sun focus on developing health monitoring and prognostics methods for EP-enabled computer servers. He is a founder and an Editor of the Journal of Pattern Recognition Research.

Anton Bougaev holds a M.S. and a Ph.D. degrees in Nuclear Engineering from Purdue University. Before joining Sun Microsystems Inc. in 2007, he was a lecturer in Nuclear Engineering Department and a member of Applied Intelligent Systems Laboratory (AISL), of Purdue University, West Lafayette, USA. Dr. Bougaev is a founder and the Editor-in-Chief of the Journal of Pattern Recognition Research. His current focus is in reliability physics with emphasis on complex system analysis and the physics of failures which are based on the data driven pattern recognition techniques.

Data and Resources

Additional Info

Field Value
Maintainer Elizabeth Foughty
Last Updated March 31, 2025, 13:54 (UTC)
Created March 31, 2025, 13:54 (UTC)
accessLevel public
accrualPeriodicity irregular
bureauCode {026:00}
catalog_@context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
catalog_@id https://data.nasa.gov/data.json
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy https://project-open-data.cio.gov/v1.1/schema/catalog.json
harvest_object_id 66b3a150-a74c-40ae-b784-1162895b9129
harvest_source_id 61638e72-b36c-4866-9d28-551a3062f158
harvest_source_title DNG Legacy Data
identifier DASHLINK_60
issued 2010-09-10
landingPage https://c3.nasa.gov/dashlink/resources/60/
modified 2020-01-29
programCode {026:029}
publisher Dashlink
resource-type Dataset
source_datajson_identifier true
source_hash c78e01960680bab6c17cf2134392f4ba53eff5cbf90223e5e8d93abfad53b5a0
source_schema_version 1.1