HPC Cloud traces for better cloud service reliability

Download dataset

This dataset includes system metrics (anonymised) such as CPU and memory utilisation, as well as hard drive metrics from SMART (Self-Monitoring, Analysis, and Reporting Technology), collected from more than 100 cloud servers and is used in our editor's choice study “A Combined System Metrics Approach to Cloud Service Reliability Using Artificial Intelligence”. If you use this dataset in your study, please cite [1, 2].

[1]. Chhetri, T.R., Dehury, C.K., Lind, A., Srirama, S.N. and Fensel, A., 2022. A Combined System Metrics Approach to Cloud Service Reliability Using Artificial Intelligence. Big Data and Cognitive Computing, 6(1), p.26.

[2]. Dehury, C.K., Chhetri, T.R., Lind, A., Srirama, S.N. and Fensel, A., 2021. HPC Cloud traces for better cloud service reliability.

The dataset contains 20 feature columns, details of which are provided in Table below.

Metrics nameDescription
CPU utilizationHost CPU usage in %
Memory utilization
Memory usage in bytes
IO utilization
IO usage in time
Network overhead
Network usage in bytes
Bits read
Data written out from disk in bytes
Bits write
Data written into disk in bytes
Smart 188
Command time out
Smart 197
Current pending sector count
Smart 198
Uncorrectable sector count
Smart 9
Power-on hours
Smart 1
Read error Rate
Smart 5
Reallocated sectors count
Smart 187
Reported uncorrectable errors
Smart 7
Seek error rate
Smart 3
Spin up time
Smart 4
Start/stop count
Smart 194
Temperature
Smart 199
UltraDMA CRC error count
Time
Timestamp
id
Anonymised server