Information technology - Data centre facilities and infrastructures - Part 4-31: Key performance indicators for Resilience

This document
a)   defines metrics as key performance indicators (KPIs) for resilience, dependability, fault tolerance and availability tolerance for data centres;
b)   covers the data centre infrastructure (DCI) of power distribution and supply, and environmental control;
c)   can be referred to for covering further infrastructures, e.g. telecommunications cabling;
d)   defines the measurement and calculation of the metrics and resilience levels (RLs);
e)   targets maintainability, recoverability and vulnerability;
f)   provides examples for calculating these KPIs for the purpose of analytical comparison of different DCIs.
This document does not apply to IT equipment, cloud services, software or business applications.

Informationstechnik - Einrichtungen und Infrastrukturen von Rechenzentren - Teil 4-31: Leistungskennzahlen für die Ausfallsicherheit

Technologie de l’information - Installation et infrastructures de centres de traitement de données - Partie 4-31: Indicateurs-clés de performance pour la résilience

Le présent document:
a)   définit les métriques constituant des indicateurs-clés de performance (KPI) pour la résilience, la sûreté de fonctionnement, la tolérance aux pannes et la tolérance de disponibilité des centres de traitement de données;
b)   couvre l’infrastructure de centre de traitement de données (DCI) concernant l’alimentation et la distribution de l’énergie, et le contrôle environnemental;
c)   peut constituer une référence pour traiter des autres infrastructures, par exemple celles concernant le câblage de télécommunications;
d)   définit les modes de mesure et de calcul des métriques et des niveaux de résilience (RL);
e)   cible la maintenabilité, la récupérabilité et la vulnérabilité;
f)   donne des exemples pour calculer ces KPI à des fins de comparaison analytique de différentes DCI.
Le présent document ne s’applique pas aux équipements de traitement de l’information, aux services cloud, aux logiciels ni aux applications métier.

Informacijska tehnologija - Objekti in infrastrukture podatkovnega centra – 4-31. del: Ključni kazalniki uspešnosti za odpornost

Ta dokument
a)   opredeljuje metrike kot ključne kazalnike uspešnosti (KPI) za odpornost, zanesljivost, toleranco napak in toleranco razpoložljivosti za podatkovne centre;
b)   zajema infrastrukturo podatkovnega centra (DCI) za distribucijo in dobavo električne energije ter okoljski nadzor;
c)   je mogoče uporabljati tudi za dodatno infrastrukturo, npr. telekomunikacijske kable;
d)   določa meritve ter izračun metrik in ravni odpornosti (RL);
e)   se osredotoča na vzdržljivost, obnovljivost in ranljivost;
f)   podaja primere za izračun teh ključnih kazalnikov uspešnosti za namene analitične primerjave različnih infrastruktur podatkovnega centra.
Ta dokument se ne uporablja za opremo IT, storitve v oblaku, programsko opremo ali poslovne aplikacije.

General Information

Status
Published
Public Enquiry End Date
29-Jun-2024
Publication Date
24-Oct-2024
Current Stage
6060 - National Implementation/Publication (Adopted Project)
Start Date
24-Oct-2024
Due Date
29-Dec-2024
Completion Date
25-Oct-2024
Technical specification
SIST-TS CLC/TS 50600-4-31:2024
English language
66 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)


SLOVENSKI STANDARD
01-december-2024
Informacijska tehnologija - Objekti in infrastrukture podatkovnega centra – 4-31.
del: Ključni kazalniki uspešnosti za odpornost
Information technology - Data centre facilities and infrastructures - Part 4-31: Key
performance indicators for Resilience
Informationstechnik - Einrichtungen und Infrastrukturen von Rechenzentren - Teil 4-31:
Leistungskennzahlen für die Ausfallsicherheit
Technologie de l’information - Installation et infrastructures de centres de traitement de
données - Partie 4-31: Indicateurs-clés de performance pour la résilience
Ta slovenski standard je istoveten z: CLC/TS 50600-4-31:2024
ICS:
35.020 Informacijska tehnika in Information technology (IT) in
tehnologija na splošno general
35.110 Omreževanje Networking
35.160 Mikroprocesorski sistemi Microprocessor systems
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

TECHNICAL SPECIFICATION CLC/TS 50600-4-31

SPÉCIFICATION TECHNIQUE
TECHNISCHE SPEZIFIKATION October 2024
ICS 35.020; 35.110; 35.160
English Version
Information technology - Data centre facilities and infrastructures
- Part 4-31: Key performance indicators for Resilience
Technologie de l'information - Installation et infrastructures Informationstechnik - Einrichtungen und Infrastrukturen von
de centres de traitement de données - Partie 4-31: Rechenzentren - Teil 4-31: Leistungskennzahlen für die
Indicateurs-clés de performance pour la résilience Resilienz
This Technical Specification was approved by CENELEC on 2024-09-02.

CENELEC members are required to announce the existence of this TS in the same way as for an EN and to make the TS available promptly
at national level in an appropriate form. It is permissible to keep conflicting national standards in force.

CENELEC members are the national electrotechnical committees of Austria, Belgium, Bulgaria, Croatia, Cyprus, the Czech Republic,
Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, the
Netherlands, Norway, Poland, Portugal, Republic of North Macedonia, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland,
Türkiye and the United Kingdom.

European Committee for Electrotechnical Standardization
Comité Européen de Normalisation Electrotechnique
Europäisches Komitee für Elektrotechnische Normung
CEN-CENELEC Management Centre: Rue de la Science 23, B-1040 Brussels
© 2024 CENELEC All rights of exploitation in any form and by any means reserved worldwide for CENELEC Members.
Ref. No. CLC/TS 50600-4-31:2024 E

Contents Page
European foreword . 3
Introduction . 4
1 Scope . 7
2 Normative references . 7
3 Terms, definitions, abbreviations and symbols . 7
3.1 Terms and definitions . 7
3.2 Abbreviations . 12
3.3 Symbols . 12
4 Area of application . 14
4.1 General . 14
4.2 DCI service definition . 14
5 Resilience considerations as part of the life cycle . 15
5.1 Implementation in the design process . 15
5.2 Documentation during operation . 17
5.3 Documentation of resilience level . 17
5.4 Documentation of dependability . 18
5.5 Documentation of fault tolerance . 18
5.6 Documentation of availability tolerance . 18
6 Determination of KPIs for resilience . 18
6.1 General . 18
6.2 Structuring of the KPIs for resilience . 19
6.3 Dependability . 21
6.4 Fault tolerance . 25
6.5 Availability tolerance . 26
6.6 Resilience level . 27
6.7 Application to data centre infrastructures . 30
Annex A (informative) Failure Mode Effects and Criticality Analysis . 33
Annex B (informative) Dependability data . 35
Annex C (informative) Resilience analysis for data centre infrastructures . 52
Annex D (informative) SPoF Analysis for DCIs . 57
Annex E (informative) Resilience level analysis for DCIs . 61
Annex F (informative) Interval of confidence . 63
F.1 Overview. 63
F.2 Estimation of the mean failure rate . 63
F.3 Estimation of the boundaries of the failure rate . 63
F.4 Case when no failure has appeared . 64
F.4.1 General . 64
F.4.2 Example 1 . 64
F.4.3 Example 2 . 65
Bibliography . 66
European foreword
This document (CLC/TS 50600-4-31:2024) has been prepared by CLC/TC 215, “Electrotechnical aspects of
telecommunication equipment”.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. CENELEC shall not be held responsible for identifying any or all such patent rights.
This document is based on, but not identical to, ISO/IEC TS 22237-31:2023.
This document has been prepared under a standardization request addressed to CENELEC by the European
Commission. The Standing Committee of the EFTA States subsequently approves these requests for its
Member States.
Any feedback and questions on this document should be directed to the users’ national committee. A complete
listing of these bodies can be found on the CENELEC website.

Introduction
The unrestricted access to internet-based information demanded by the information society has led to an
exponential growth of both internet traffic and the volume of stored/retrieved data. Data centres are housing and
supporting the information technology and network telecommunications equipment for data processing, data
storage and data transport. They are required both by network operators (delivering those services to customer
premises) and by enterprises within those customer premises.
Data centres usually provide modular, scalable and flexible facilities and infrastructures to easily accommodate
the rapidly changing requirements of the market. In addition, energy consumption of data centres has become
critical both from an environmental point of view (reduction of environmental footprint) and with respect to
economical considerations (cost of energy) for the data centre operator.
The implementation of data centres varies in terms of:
a) purpose (enterprise, co-location, co-hosting or network operator facilities);
b) security level;
c) physical size;
d) accommodation (mobile, temporary and permanent constructions).
The needs of data centres also vary in terms of availability of service, the provision of security and the objectives
for energy efficiency. These needs and objectives influence the design of data centres in terms of building
construction, power distribution, environmental control, telecommunications cabling and physical security as
well as the operation of the data centre. Effective management and operational information are important in
order to monitor achievement of the defined needs and objectives.
Recognizing the substantial resource consumption, particularly of energy, of larger data centres, it is also
important to provide tools for the assessment of that consumption both in terms of overall value and of source
mix and to provide Key Performance Indicators (KPIs) to evaluate trends and drive performance improvements.
At the time of publication of this document, the EN 50600 series is designed as a framework of standards,
technical specifications and technical reports covering the design, the operation and management, the key
performance indicators for energy efficient operation of the data centre as well as a data centre maturity model.
The EN 50600-2 series defines the requirements for the data centre design.
The EN 50600-3 series defines the requirements for the operation and the management of the data centre.
The EN 50600-4 series defines the key performance indicators for the data centre.
The CLC/TS 50600-5 series defines the data centre maturity model requirements and recommendations.
The CLC/TR 50600-99-X Technical Reports cover recommended practices and guidance for specific topics
around data centre operation and design.
This series of documents specifies requirements and recommendations to support the various parties involved
in the design, planning, procurement, integration, installation, operation and maintenance of facilities and
infrastructures within data centres. These parties include:
1) owners, operators, facility managers, ICT managers, project managers, main contractors;
2) consulting engineers, architects, building designers and builders, system and installation designers,
auditors, test and commissioning agents;
3) facility and infrastructure integrators, suppliers of equipment;
4) installers, maintainers.
At the time of publication of this document, the EN 50600-4 series comprises the following documents:
— EN 50600-4-1, Information technology — Data centre facilities and infrastructures — Part 4-1: Overview of
and general requirements for key performance indicators

— EN 50600-4-2, Information technology — Data centre facilities and infrastructures — Part 4-2: Power
Usage Effectiveness
— EN 50600-4-3, Information technology — Data centre facilities and infrastructures — Part 4-3: Renewable
Energy Factor;
— EN 50600-4-6, Information technology — Data centre facilities and infrastructures — Part 4-6: Energy
Reuse Factor;
— EN 50600-4-7, Information technology — Data centre facilities and infrastructures — Part 4-7: Cooling
Efficiency Ratio;
— EN 50600-4-8, Information technology — Data centre facilities and infrastructures — Part 4-8: Carbon
Usage Effectiveness;
— EN 50600-4-9, Information technology — Data centre facilities and infrastructures — Part 4-9: Water
Usage Effectiveness.
The inter-relationship of the documents within the EN 50600 series is shown in Figure 1.

Figure 1 — Schematic relationship between the EN 50600 series documents
EN 50600-2-X documents specify requirements and recommendations for particular facilities and infrastructures
to support the relevant classification for “availability”, “physical security” and “energy efficiency enablement”
selected from EN 50600-1.
EN 50600-3-X documents specify requirements and recommendations for data centre operations, processes
and management.
EN 50600-4-X documents specify requirements and recommendations for key performance indicators (KPIs)
used to assess and improve the resource usage efficiency and effectiveness, respectively, of a data centre.
NOTE Within the EN 50600-4-X series, the term “resource usage effectiveness” is more generally used for KPIs in
preference to “resource usage efficiency”, which is restricted to situations where the input and output parameters used to
define the KPI have the same units.
The various parts of the EN 50600 series reference four qualitative Availability Classes as well as structural
definitions to categorize different designs. The documents also refer to resilience criteria in order to improve
structural requirements for a qualitative approach.

This document introduces quantitative metrics as key performance indicators (KPIs), in order to meet the
requirements necessary for evaluating or comparing different designs or to validate service level agreements
(SLAs) for data centres. The proposed KPIs cover resilience attributes, including dependability and fault
tolerance metrics. The characteristics of aging of infrastructures are covered by reliability criteria.
Through the use of KPIs, the comparison of designs, functional elements and components of infrastructure
designs becomes possible. In addition, it is possible to optimize data centre infrastructures (DCI) with holistic
targets. It is recommended to use the KPIs of this document in combination with the efficiency and sustainability
KPIs of the EN 50600-4 series.
EN 50600-1:2019, Annex A, demonstrates that a single KPI, such as Availability, is not sufficient to describe
the complexity of a DCI. In recognition, this document has been developed in order to compare and value
different designs with different Availability Classes of DCIs based on a set of selected KPIs.
Furthermore, the document has been created to establish KPIs for resilience of DCIs with defined resilience
levels. The resilience objectives can vary depending on the outcome of the EN 50600-1 risk analysis, the end
user information technology equipment (ITE) process criticality, and the data centre type of business.
Using the different stages of a data centre design process, this document describes in which phases the
application of KPIs for resilience is appropriate. With its assistance, data centre designers, planners and
operators will be supported in defining resilience Levels, performing theoretical assessments and designing and
operating DCIs which are able to meet SLAs.
Additional standards in the EN 50600-4-X series will be developed, each describing a specific KPI for resource
usage effectiveness or efficiency.
The EN 50600-4-X series does not specify limits or targets for any KPI and does not describe or imply, unless
specifically stated, any form of aggregation of individual KPIs into a combined nor an overall KPI for data centre
resource usage effectiveness or efficiency.
This document is intended for use by and collaboration between architects, building designers and builders,
system and installation designers and main contractors.
This series of documents does not address the selection of information technology and network
telecommunications equipment, software and associated configuration issues.

1 Scope
This document
a) defines metrics as key performance indicators (KPIs) for resilience, dependability, fault tolerance and
availability tolerance for data centres;
b) covers the data centre infrastructure (DCI) of power distribution and supply, and environmental control;
c) can be referred to for covering further infrastructures, e.g. telecommunications cabling;
d) defines the measurement and calculation of the metrics and resilience levels (RLs);
e) targets maintainability, recoverability and vulnerability;
f) provides examples for calculating these KPIs for the purpose of analytical comparison of different DCIs.
This document does not apply to IT equipment, cloud services, software or business applications.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references, the
latest edition of the referenced document (including any amendments) applies.
EN 50600-1:2019, Information technology — Data centre facilities and infrastructures — Part 1: General
concepts
EN 50600-2-2:2019, Information technology — Data centre facilities and infrastructures — Part 2-2: Power
supply and distribution
EN 50600-2-3, Information technology — Data centre facilities and infrastructures — Part 2-3: Environmental
control
EN 50600-4-1, Information technology — Data centre facilities and infrastructures — Part 4-1: Overview of and
general requirements for key performance indicators
3 Terms, definitions, abbreviations and symbols
3.1 Terms and definitions
For the purposes of this document, the terms and definitions given in EN 50600-1, EN 50600-2-2, EN 50600-2-
3 and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— IEC Electropedia: available at https://www.electropedia.org/
— ISO Online browsing platform: available at https://www.iso.org/obp
3.1.1
availability
ability to be in a state to perform as required
[SOURCE: IEC 60050-192:2015, 192-01-23 – modified: Notes 1 and 2 have been deleted]
3.1.2
availability tolerance
ability to be in a state to perform as required with certain failures present

3.1.3
dependability
ability to perform as and when required
Note 1 to entry: In this document, the term is used for the determination of data centre reliability, availability and failure
rate.
[SOURCE: IEC 60050-192:2015, 192-01-22 – modified: Notes 1 and 2 to entry have been replaced by a new
Note 1 to entry]
3.1.4
double point of failure
combination of two functional elements whose simultaneous failures causes overall system fault
[SOURCE: IET, Journal of Engineering, Vol. 2019 Iss. 12, 99. 8419-8427] [13]
3.1.5
double point of reduced availability
combination of two functional elements whose simultaneous failures result in the violation of the service level
agreement
[SOURCE: IET, Journal of Engineering, Vol. 2019 Iss. 12, 99. 8419-8427] [13]
3.1.6
down state
state of being unable to perform as required, due to failures or faults
Note 1 to entry: The state can be related to failures of items or faults at a specified operation point (OP)
[SOURCE: IEC 60050-192:2015, 192-02-20, modified – definition has been reworded and Notes 1 and 2 have
been replaced by a new Note to entry]
3.1.7
event
something that happens and leads to one or more failures or faults
3.1.8
failure
loss of ability to perform as required
Note 1 to entry: In this context it is irrelevant if the cause was planned or unplanned.
[SOURCE: IEC 60050-192:2015, 192-03-01 – modified: Notes 1 to 3 to entry have been replaced by Note 1 to
entry]
3.1.9
failure rate
limit of the ratio of the conditional probability that the instant of time, T, of a failure of a product falls within a
given time interval (t, t + Δt) and the duration of this interval, Δt, when Δt tends towards zero, given that the item
is in an up state at the start of the time interval
[SOURCE: IEC 60050-821:2017, 821-12-21 – modified: Notes 1 and 2 have been deleted]
3.1.10
fault
inability to perform as required, due to an internal state
Note 1 to entry: Opposite of success. In the context of the expected resilience level (RL), at a specified operation point (OP).

[SOURCE: IEC 60050-192:2015, 192-04-01, modified – Notes 1 to 4 have been replaced by a new Note to
entry]
3.1.11
fault tolerance
ability to continue functioning with certain faults present
[SOURCE: IEC 60050-192:2015, 192-10-09]
3.1.12
information technology equipment
equipment providing data storage, processing and transport services together with equipment dedicated to
providing direct connection to core and/or access networks
[SOURCE: EN 50600-2-2:2019, 3.1.13]
3.1.13
infrastructure
technical systems providing functional capability of the data centre
Note 1 to entry: Examples are power distribution, environmental control, telecommunications cabling, physical security.
[SOURCE: EN 50600-1:2019, 3.1.22, modified: examples were moved into Note 1 to entry and
“telecommunications cabling” has been added to the list
3.1.14
inherent availability
availability provided by the design under ideal conditions of operation and maintenance
Note 1 to entry: Delays associated with maintenance, such as logistic and administrative delays, are excluded.
[SOURCE: IEC 60050-192:2015, 192-08-02]
3.1.15
mean down time
average downtime caused by scheduled and unscheduled maintenance, including any logistics time
Note 1 to enty: For the purposes of this document, this definition deliberately differs from that given in IEC 60050-
192:2015, 192-08-10.
[SOURCE: IEEE Std. 493-2007, Annex Q]
3.1.16
mean operating time between failures
average time calculated between failure occurrences
Note 1 to enty: For the purposes of this document, this definition deliberately differs from that given in IEC 60050-
192:2015, 192-05-13.
[SOURCE: IEEE Std. 493-2007, Annex Q]
3.1.17
mean operating time to failure
expectation of the operating time to failure
Note 1 to entry: In the case of non-repairable items with an exponential distribution of operating times to failure, i.e. a
constant failure rate, the mean operating time to failure is numerically equal to the reciprocal of the failure rate. This is also
true for repairable items if after restoration they can be considered to be “as-good-as-new”.
Note 2 to entry: The term “mean time to failure” (MTTF) is used synonymously in this document.

[SOURCE: IEC 60050-192:2015, 192-05-11 – modified: Note 2 to entry was replaced by the one above]
3.1.18
mean time between maintenance
average time between scheduled and unscheduled maintenance, including any logistics time
[SOURCE: IEEE Std. 493-2007, Annex Q]
3.1.19
mean time to restoration
average time to accomplish repairs on an item
Note 1 to entry: For the purposes of this document, this definition deliberately differs from that given in IEC 60050-
192:2015, 192-07-23.
[SOURCE: IEEE Std. 493-2007, Annex Q]
3.1.20
normal resilience level
resilience level mandatory during nominal operation
3.1.21
operation point
point of reference for which calculation of resilience level is performed
Note 1 to entry: This can be an individual socket taking into account the entire data centre Infrastructure (DCI) or certain
defined parts of the infrastructure. The documentation of the referenced operation point (OP) is required for any key
performance indicator (KPI).
3.1.22
operational availability
availability experienced under actual conditions of operation and maintenance
[SOURCE: IEC 60050-192:2015, 192-08-03 – modified: Note 1 to entry has been deleted]
3.1.23
past availability
availability measured during a period of 1 year
Note 1 to entry: For the purposes of this document, 1 year equals 8 760 hours.
3.1.24
reduced resilience level
resilience level mandatory during reduced operation in case of one or more failures
3.1.25
resilience
ability to withstand and reduce the magnitude and/or duration of disruptive events, including the capability to
anticipate, absorb, adapt to, and/or rapidly recover from such an event
[SOURCE: IEEE Task Force on Definition and Quantification of Resilience, PES-TR65:2018-04] [14]
3.1.26
resilience level
enumeration of attributes for the determination of resilience aspects of a defined service at a defined operation
point (OP)
3.1.27
redundancy
provision of more than one means for performing a function
Note 1 to entry: In a data centre, redundancy can be achieved by duplication of devices, functional elements, and/or supply
paths.
[SOURCE: IEC 60050-192:2015, 192-10-02, modified – Note 1 to entry has been replaced by a new Note 1 to
entry.]
3.1.28
reliability
ability to perform as required, without failure, for a given time interval, under given conditions
[SOURCE: IEC 60050-192:2015, 192-01-24, modified – Notes 1 to 3 to entry have been deleted.]
3.1.29
resilience model
representation 𝑥𝑥 of the data centre infrastructure (DCI) that shows all required subsystems, components and
items as well as their systemic interdependencies
3.1.30
service level agreement
agreement defining the content and quality of the service to be delivered and the timescale in which it is to be
delivered
[SOURCE: EN 50600-3-1:2016, 3.1.20]
3.1.31
single point of failure
functional element whose failure causes overall system fault
[SOURCE: IET, Journal of Engineering, Vol. 2019 Iss. 12, 99. 8419-8427] [13]
3.1.32
single point of reduced availability
functional element whose failure results in the violation of the service level agreement
[SOURCE: IET, Journal of Engineering, Vol. 2019 Iss. 12, 99. 8419-8427] [13]
3.1.33
socket
connection enabling supply of power to attached equipment
Note 1 to entry: This can be a de-mateable or a hardwired connection.
[SOURCE: EN 50600-2-2:2019, 3.1.29]
3.1.34
system success path
infrastructural path, consisting of a minimum of functional elements, to express the success of the infrastructure
system at the operation point (OP) to be in the up state
Note 1 to entry: Each functional element can consist of one or more devices.
3.1.35
time interval
part of the time axis limited by two instants
[SOURCE: IEC 60050-113:2011, 113-01-10, modified – Notes 1 to 3 have been deleted.]

3.1.36
up state
state of being able to perform as required
Note 1 to entry: The state can be related to items or to a specified operation point (OP).
[SOURCE: IEC 60050-192:2015, 192-02-01 modified – Notes 1 to 4 have been deleted and replaced by a new
Note to entry.]
3.2 Abbreviations
For the purposes of this document, the abbreviations given in EN 50600-1, EN 50600-4-1 and the following
apply.
DCI data centre infrastructure (infrastructure residing within a data centre)
DPoF double point of failure
DPoRA double point of reduced availability
FAT factory acceptance test
FMECA Failure Mode Effects and Criticality Analysis
ITE information technology equipment
KPI key performance indicator
MDT mean down time
MTBF mean operating time between failures
MTBM mean time between maintenance
MTTF mean time to failure
MTTR mean time to restoration
NRL normal resilience level
OP operation point
PDF probability density function
PREP power reliability enhancement program
RBD reliability block diagram
RL resilience level
RRL reduced resilience level
SLA service level agreement
SPoF single point of failure
SPoRA single point of reduced availability
SSP system success path
UPS Uninterruptible Power System
3.3 Symbols
For the purposes of this document, the symbols given in EN 50600-1, EN 50600-4-1 and the following apply.
𝛼𝛼 confidence rate
inherent availability
A
i
operational availability
A
o
normal resilience level operational availability
A
o,NRL
required operational availability
A
o,req
reduced resilience level operational availability
A
o,RRL
past availability
A
p
chi-square distribution function law with two degrees of freedom
χ
D nominal diameter
N
disjoint sum of system success paths of x
Dx
( )
duration of time interval
∆t
f frequency
probability density function (PDF)
ft
()
I nominal current
N
inherent failure rate
λ
i
mean failure rate
λ
mean
operational failure rate
λ
o
past failure rate
λ
p
number of failures during time interval t
N
f
number of x
N
x
P nominal power
N
Q nominal cooling capacity
N
t
reliability in time interval
Rt
()
inherent reliability
R
i
operational reliability
R
o
past reliability
R
p
success, x is in the up state
Sx
( )
environmental control success function
Sx
( )
E
overall success function
Sx
( )
OP
power and distribution success function
Sx
( )
P
time interval of
x
t
x
T instant of time
U nominal voltage
N
vector of elements of x of the mth DCI
x
mi()
m
functional element X of the mth DCI with the index i
x
mi
()
set of all functional elements X of the mth DCI
X
m
4 Area of application
4.1 General
The KPIs for resilience, including the dependability, fault tolerance and availability tolerance KPIs, as specified
in this document are associated with the DCIs of EN 50600 series:
a) EN 50600-2-2: Power supply and distribution;
b) EN 50600-2-3: Environmental control.
The application may be extended to additional infrastructures, e.g. EN 50600-2-4 (telecommunications cabling
infrastructure).
4.2 DCI service definition
To determine system success at the operation point (OP), it is required to define the relevant DCI. In general,
the overall success function Sx is represented by a certain number, N, of successes of infrastructures
( )
OP
inside the DCI as shown in the Formula (1):
N
Sx = Sx (1)
( ) ( )
OP m
∩m=1
Sx
The success ( ) of the enumerated infrastructures x is connected by the ∩ operator. In general, these
m
m
infrastructures are not mutually exclusive, because the functions depend on each other. Functional
dependencies shall be taken into account in the calculations.
To operate the information technology equipment (ITE) within the permitted parameters, the service success
requires:
— adequate service quality of the power supply and distribution, fed by the sockets;
— adequate service quality of the cooling by the environmental control.
The DCI is represented by the vector x , which refers to Formula (1). The operation of the DCI is considered to
Sx Sx
be successful if power supply and distribution ( ) and environmental control ( ) are by themselves
P E
operating successfully at the specified OP. Formula (2) defines the system success function as follows:
Sx Sx∩ Sx (2)
( ) ( ) ( )
OP P E
The operation of the power supply and distribution system is deemed successful, Sx =1, if the
( )
P
infrastructure provides the required power quality to the specific socket defined as OP. A violation of the power
Sx = 0
quality, as required by the ITE at a specific socket, is defined as a failure: ( ) . The cause of the failure
P
can be planned or unplanned.
=
The operation of the environmental control system is deemed successful, Sx =1, if the environmental
( )
E
requirements of the ITE at the specified socket defined as OP are satisfied. A violation of the environmental
Sx = 0
conditions of a specific functional element or device is defined as a failure ( ) . The cause of the failure
E
can be planned or unplanned.
Sx = 0
A failure or the combination of failures which lead to ( ) is deemed as fault. For calculation purposes
OP
using Formula (2), the following criteria shall be taken into account:
a) The power and cooling capacity of the entire DCI shall be specified.
b) The OP shall be selected in relation to the outcome of the risk analysis.
c) The specified power and cooling capacity shall be given for the selected OP.
d) The service quality of power supply and distribution and environmental control at the selected OP shall be
represented by the DCI model.
The selection of the OP depends on the specific task. In general, the OPs with the highest requirements of
service quality are of relevance.
5 Resilience considerations as part of the life cycle
5.1 Implementation in the design process
5.1.1 General
According to EN 50600-1, the data centre design process is split into 11 project phases. The resilience of the
DCI can be managed all along the life cycle, from the strategy phase (1) until the operation phase (11). In
particular, the usage of the KPIs for resilience covers the following of these phases.
5.1.2 Phase 1 — Strategy
Phase 1 is for information collection in order to define the project objectives. This phase requires the following:
a) gather the requirements, for example, SLAs;
b) decide about application of resilience KPIs for design;
c) decide about application of resilience KPIs for operation;
d) Define the DCI services for application of KPIs for resilience.
5.1.3 Phase 2 — Objectives
Phase 2 is handled by the owner to convert the strategy into objectives. This phase requires the definition of
the resilience objectives according to the risk analysis respective to SLAs.
a) Define the OP, for example: protected/non-protected sockets, server racks, rack rows, etc.
b) Define the maximum accepted downtime at the OP, for example:
— the maximum time interval of loss of the power supply (see EN 50600-2-2);
— the maximum time interval of loss of the power distribution (see EN 50600-2-2);
— supply boundary that ITE can tolerate without experiencing unexpected shutdowns or malfunctions
(see Reference [17]);
— the maximum time interval of loss of the environmental control (see EN 50600-2-3);
— the maximum time of fault of the entire DCI.
c) Define the maximum accepted failure rate at the OP deemed as faults during the time interval of reporting.
d) Define the set of KPIs depending on the resilience objective, for example:
— dependability requirements (reliability, availability, failure rate);
— fault tolerance requirements (number of SPoF, number of DPoF);
— availability tolerance requirements (number of SPoRA, number of DPoRA).
The definitions of resilience objectives can be made by making the provisions of 6.6 mandatory during nominal
operation (NRL) and during reduced operation (RRL).
5.1.4 Phase 3 — System specifications
Phase 3 defines the target specifications for all infrastructures. The output of the specifications shall be validated
in accordance with the objectives of Phase 2.
5.1.5 Phase 4 — Design proposal
Phase 4 offers several options for a design proposal. This phase requires the following:
a) compare/optimize different designs through the application of KPIs for resilience;
b) approve compliance of the designs for the defined requirements.
5.1.6 Phase 6 — Functional design
Phase 6 offers the functional design. This phase requires the following:
a) approve the functional design through the application of KPIs for resilience.
5.1.7 Phase 8 — Final design and project plan
During Phase 8 the designer defines volume and/or pieces for all items of the DCI. To meet the resilience
objectives, the definitions made in previous phases shall be taken into account, by the help of the applied KPIs
of resilience.
5.1.8 Phase 10 — Construction
Phase 10 includes supervision and acceptance verification of the DCI, until it is put into service. The Resilience
objectives shall be taken into account during the following:
a) factory acceptance tests (FATs);
b) equipment transportation and installation on site;
c) commissioning tests, such as functional performance tests (FPT) and integrated system tests (IST);
d) failure simulations on functional elements;
e) failure simulations on the entire DCI.
The outcome of this phase is deeper knowledge of the resilience properties of the DCI.

5.1.9 Phase 11 — Operation
Phase 11 describes the handover to the owner for operation. This phase requires the following:
a) approve compliance of the DCI for the assumptions of the KPIs used;
b) monitor the defined KPIs of resilience during operation;
c) approve compliance of the DCI for the defined requirements in case of planned interruptions, times for
logistics, response times;
d) review and, if required, recalculate the KPIs for Resilience of the DCI.
5.2 Documentation during operation
Documentation of metrics and causes are the basis for optimization of resilience during operation. In order to
be able to monitor aspects of resilience, the organization shall document the following metrics:
a) MTBF and MTTR of the utility supply;
b) MTBF, MTTR, MTBM and MDT data of the functional elements or components;
c) causes for failures and/or faults;
d) causes and scope of restoration.
For evaluation and documentation of failures, the Failure Mode Effects and Criticality Analysis (FMECA) is
applicable. See Annex A.
5.3 Documentation of resilience level
5.3.1 General
In order to evaluate KPIs for resilience, the following information shall be provided:
a) the resilience model of the DCI;
b) the OPs studied and their load assumptions;
c) the MTBF, MTTR, MTBM and MDT data of the functional elements or components;
d) the number of SPoF and DPoF;
e) if applicable, the number of SPoRA and DPoRA;
f) the calculation method.
Periods of runtime shall be documented on an annual basis, where 1 a = 8 760 h.
The recalculation of the resilience KPIs is required after an incident that involves structural modifications as well
as modifications on functional elements. Structural change requires the review and, if necessary, the revision
of the resilience model.
5.3.2 Requirements
Cause and duration of violations of the resilience level shall be documented to calculate the past reliability, past
availability, and past failure rate.

5.4 Documentation of dependability
5.4.1 Requirements
In general, reliability, availability and failure rate shall be reported at a minimum of four and a maximum of six
decimal places. The chosen OP and the load assumption of the DCI shall always be quoted alongside
documented values.
To gauge the availability KPI, a corresponding NRL shall be defined.
5.4.2 Recommendations
To distinguish between calculated availabilities, i.e. the inherent availability, the operational availability, and the
A
measured past availability of a data centre in operation, the measurement of (past availability) should be
p
R
documented in percentage terms. This is also applicable to the measurement of the past reliability, , and the
p
λ
past failure rate, .
p
A reduced resilience level (RRL) during periods of planned reconstruction, adaptation or renewal should be
defined.
To avoid rounding errors, the data of the system's items should be used at least one order of magnitude higher
than the KPIs to be calculated.
5.5 Documentation of fault tolerance
The number of SPoF and DPoF shall be documented as integers; see Formulae (14), (15). Based on the
resilience model of the DCI, the KPIs of SPoF and DPoF shall be calculated.
5.6 Documentation of availability tolerance
5.6.1 Requirements
The number of SPoRA and DPoRA shall be documented as integers; see Formulae (16), (17).
The RRL, as a condition of planned maintenance, shall be defined. Based on the resilience model of the DCI,
the operational availability for all cases of SPoF and DPoF shall be calculated. The number of violations of
A in cases of SPoF gives the number of SPoRA, and in case of DPoF gives the number of DPoRA.
o,RRL
5.6.2 Recommendations
Comparing DCI models in terms of the number of SPoRA and DPoRA allows deeper insights into the resilience
characteristics than are acheivable using the number of SPoF and DPoF. Particularly for the optimization and/or
comparison of different DCIs, these KPIs are crucial.
6 Determination of KPIs for resilience
6.1 General
EN 50600-1 gives a qualitative availability classification, starting with class 1 for low availability and going up to
class 4 for high availability. Furthermore, EN 50600-1 uses the term “resilience” to specify the coherence
between fault tolerance, determined by the number and the design of supply paths, and the data centre
availability.
The EN 50600 series opens the possibility to combine different Availability Classes in the paths of power supply
distribution and environmental control. Asymmetric infrastructure designs are possible, where supply or
distribution paths can be different from each other. Different design goals, such as high energy efficiency or
minimizing life cycle cost, can lead to very different results.

The design and/or operation of data centres is often contracted by an SLA. SLAs can be limited to the availability
for a defined infrastructure service. Occasionally SLAs do not take into account aspects such as reliability, failure
rate, fault to
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...