HPC Cloud traces for better cloud service reliability

HPC Cloud traces for better cloud service reliability

Overview

This dataset includes system metrics (anonymised) such as CPU and memory utilisation, as well as hard drive metrics from SMART (Self-Monitoring, Analysis, and Reporting Technology), collected from more than 100 cloud servers and is used in our study “A Combined System Metrics Approach to Cloud Service Reliability Using Artificial Intelligence”.

The dataset contains 20 feature columns, details of which are provided in Table below.

SN Metrics Name Description
1 CPU utilisation Host CPU usage in %.
2 Memory utilisation Memory usage in bytes
3 IO utilisation IO usage in time
4 Network overhead Network usage in bytes
5 Bits read Data written out from disk in bytes
6 Bits write Data written into disk in bytes
7 Smart 188 Command time out
8 Smart 197 Current pending sector count
9 Smart 198 Uncorrectable sector count
10 Smart 9 Power-on hours
11 Smart 1 Read error Rate
12 Smart 5 Reallocated sectors count
13 Smart 187 Reported uncorrectable errors
14 Smart 7 Seek error rate
15 Smart 3 Spin up time
16 Smart 4 Start/stop count
17 Smart 194 Temperature
18 Smart 199 UltraDMA CRC error count
19 Time Timestamp
20 id Anonymised server

Directory structure

  • Root
    • README.md
    • anonymised.py - The code used for anonymisation.
    • data - The directory that contains the actual data (total 101 files).