Q&A - Outage - SO/SONE
The Contact Us function at the top of every page on the tl9000.org
website is the preferred means for asking questions and receiving answers from the subject matter experts
of the TIA QuEST Forum. Over the last few years many questions have been answered through this means. The
number of each question is the ticket number in the Contact Us tracking system.
These questions generally relate to the system outage measurements that impact end-user customers (SO)
and network equipment (SONE).
Question 9030 — This question concerns outage reporting (SO and
SONE) for Family 4.2.1 On-line Critical Operation Support Systems. Our systems may manage a
single network element or they may be managing an entire network. In general, an outage of our system
does not impact the network being managed, only the management system itself is impacted such as
inability to add trails, survey the customer’s activity, etc. The normalization unit for SO and SONE is
“system”. How should we determine the percent of the system impacted if there is no impact on the network
or if we do not readily know the size of the network?
Answer — SO and SONE are concerned with the loss of functionality of
the product being measured. SO deals only with the loss of the primary function. SONE measures loss of
any functionality. For Family 4.2.1, the primary function is network management. Additional
functionality would include alarm reporting, performance monitoring, and any other OA&M function the
product was designed to perform. The end-user here is the service provider using the equipment to manage
their network and not the end-user of the network itself. Therefore, the impact to be considered is that
caused by the loss of function in the network management system and not the network being managed. For
guidance on the weighting of different partial outages of these products, please see the entry for 4.2.1
in Table A-3.
Question 9111 — We have questions about category 4.2.1.1. In our
company we have Network Management System, which is managing one or more than one element. The Network
Management System is managing Elements Management Systems (EMS, up to 10000 for example in one network).
Which category relates to big networks management? Is category 4.2.1.1 for one EMS only? The benchmark data
of 4.2.1.1 includes SO data. In network management system we don't have SO of the end users, the
traffic is not lost, so we think SO is not relevant. We expected to see that SO in 4.2.1.1 category will
not be available, so it's not clear for us. Is category 4.2.1.1 for one EMS only?
Answer — The definition for Family 4.2.1 On-line Critical
Operations Support Systems is “Real time network management systems, demanding high availability,
typically 24 hours a day and 7 days per week.” The number of elements being managed is not a direct
factor, but products in this family typically manage large networks from a single point. SO
measures the loss of primary functionality to the end-user of the product. The primary functionality is
indicated by the bold text in the definition. The one thing to understand with regards to the application
of the SO measurement to this category is the user of the product is the service provider
managing their network and not the end-user of the network being managed. Therefore, SO should measure
loss of network management capability. Since the SO normalization for this category is system, the
weighting of different partial outages as shown for 4.2.1 in Table A-3 can be used for guidance on how to
measure events which do not involve complete loss of functionality by the network manager.
Question 12346 — According to TL 9000 Measurement book an outage
which is more than 15 second will be considered as outage otherwise not. But we have different outage
criteria for different customer according to the SLA which is contracted with them. Example for one
customer outage more than 3 minute will be considered outage otherwise not so let me know if we must
follow the 15 second criteria which is in measurement handbook or we can follow the criteria which is
contracted with the customer?
Answer — The rules listed in the TL 9000 Measurements Handbook
concerning the length of outage are to be followed. These are 6.1.4 b) 2) for the SO measure and 6.2.4 b)
3) for SONE. Customer SLA’s do not modify these rules.
Question 10695 — Customers are expressly authorized to determine if
a reported event is an outage, but the TL 9000 counting rules constrain which events are countable as
outages. Can the customer insist for example that a 10-second service affecting outage (rather than 15
seconds) be treated as a valid outage measurement that we have to report to the MRS? Can the customer
insist that loss of 10% (rather than the minimum 20%) of end-user mailboxes for a category 6.1 product is
a reportable outage? If the customer contractually requires us to report TL 9000-compliant data to them
but insists we use their non-compliant counting rules, can we call the customer report TL 9000 compliant
and at the same time continue to report to TIA QuEST Forum different data we believe is genuinely valid?
Answer — The data for reported outages submitted to the MRS must follow
the TL 9000 reporting rules. These rules cannot be modified by any agreement between the organization and
its customer(s). In order to ensure comparability of the reported data, there is no exception or
modifications of these rules based on any SLA the organization may have with the customer. What is
required to be reported by the organization to its customer is between those two parties. The MRS can
only comment on the correct interpretation of the TL 9000 Measurement rules. It cannot become involved
with contractual discussions between the organization and any customer.
Question 10854 — There are several measurement requirements within
TL 9000 that are difficult to support given our business model. As an example, we are not aware of
customer outages; number of outage incidents reported, or outage fix response times. Can we still proceed
with implementing a TL 9000 program and seek certification with these holes in our data?
Answer — It is not unusual for customers to not report outage data to
their suppliers so you can be exempt from reporting system outage measurements in those cases where your
customer doesn't supply outage data, including the number of outage incidents and the duration of these
incidents. You can, therefore, proceed with implementing a TL 9000 program.
Note however that problems reported to you can be counted and fix time for those problems can be
measured. These problems and fixes must be reported.
If no customers provide the data needed for you to report the outage measurement and you have no other
means to collect the data, you may claim an exemption from reporting the measurement. As noted in Section
4.2.1 Customer Source Data in the Measurements Handbook, you would enter “Exempt” in the data submission.
You are also required to document the justification for the exemption for review by your auditor.
Question 12444 — Our products are the DC to AC inverters, AC and DC
switch gear etc. I am responsible for the TL9000 measurement data submission. We have a problem about SO
measurement submission:
About all the major customers cannot give us detail data about our product SO information in
telecommunication network. Could we enter the "Exempt" word when we submit the SO measurement for PC 5.3
according to measurement handbook, section 4.2.1?
Answer — Please see Section 6.1.5 of the TL 9000 Measurements Handbook
for a full explanation of the requirements covering this case.
You are required to report measurement SO if any of your customers provide you with the needed
information or if you can determine the information from internal sources.
If a subset of your customers report SO to you then you are to adjust SOs, and the outage
sub-measurements to cover only those customers that report the data.
If none of your customers report SO data to you and you are not able to obtain the data from your own
records, then you may enter EXEMPT in all sub-measurements of the SO measurement on your data
submissions.
You should also document this situation including showing that you have attempted to contact your
customer for this data and that the customer refused to supply this data. This justification will be
needed for your audits to TL 9000. It is not sufficient just to say that the customer didn't supply the
data. You must show that you asked for the data and that the customer would not supply the data.
Question 12365 — 2. For Optical amplification, how can I calculate for
Optical channel and network element? For WDM, how also can I calculate for Optical channel and network
element?
Answer — Please refer to the glossary in the Measurements Handbook for
a definition of network element. For the type of equipment in category 3.2.2.1.2.2, it is likely each network node
will be a network element. The optical channel count would be the total number of optical channels that
node can handle excluding any protection channels.
Question 11189 — Is there some relationship between outage and
critical problems? In the glossary, there are some descriptions of problem report-critical examples,
which read "such as product inoperability (total or partial outage)". Do all outages automatically become
critical problems, no matter if they are SO or SONE? In our company, we have some different voices about
the relationship.
Answer — There is not a defined relationship between outages and
critical problem reports in TL 9000. However, it is not uncommon for an outage to result in a critical
problem report, especially if the outage impacts end-customers, which will make the outage an SO reported
outage. There are some outages that do not impact end-customers and only result in the total or partial
loss of a network element, which impacts the service provider. These outages can result in a critical
problem report depending on the impact to the service provider and other factors. For instance, the
failure of a non-redundant network element that results in an outage will result in data for SO and/or
SONE and would generally result in a countable problem report. Remember that all problem reports must
originate with the customer. It is their decision to make a problem report based on an outage.
Question 11548 — In establishing a product-attributable outage, do
performance issues count as outages? Specifically, a customer wants to include as a service impacting
outage the following: 1) record one-way-audio condition on a fraction of voice calls, and 2) the
situation where some calls are dropped after being established. Would either of these scenarios qualify
as a "partial outage"?
Answer — Partial outages do not apply for SO. For the Service Impact
Outage Measurement (SO), all events “that result in a complete loss of primary functionality for all or
part of the system for a duration greater than 15 seconds….”. There is no minimum number of users
specified. Clearly a continuing one-way audio condition would meet this criterion and such an event would
be reportable under SO. If the system was consistently dropping calls for a period of time, then that
would be reportable also. For the Network Element Impact Outage Measurement (SONE), there are minimum
amounts of traffic that must be impacted before the event is included in the reported data (Table A-3).
If the minimum is met for the category, then the event would be reportable in SONE as well.
Question 11196 — What is the relationship between SO and SONE?
Answer — Outages that impact the end-user are reported in SO. Loss of a
network element in whole or in part is reported in SONE if the outage meets the conditions for the
category described in Table A-3 of Appendix A of the TL 9000 measurements handbook. An outage may
be reported in SO only, in SONE only, or in both, or in neither depending on the nature of the outage as
determined by the counting rules in 6.1.4 b), 6.2.4 b) and Table A-3.
There is no simple connection between SO and SONE.
Question 11110 — Service impact product-attributable SO3/SO4. We
were out of compliance on our SO3/SO4 measurements and our service provider customer requested a
corrective action that we have given them and also have fixed the issue with the deployment of a software
patch. We have developed and delivered a solution to our customer that will result in the product
performing at the agreed level. Until our customer deploys the SW solution the expectation is that the
product will remain out of compliance as a direct result of our customer not deploying the solution.
Question: Since this situation is caused by our customer not deploying the fix do we have to continue to
accept these non-compliances against our SO3 and SO4 Measurements?
Answer — TL 9000 does not get involved with the setting of specific
performance objectives for the TL 9000 measurements. That is between the organizations and its customers.
So, we cannot comment on whether your organization has to accept the non-compliances from your customer
against your SO3 and SO4 performance. We can offer some insight into how the failure to implement the
required fix may or may not impact the calculation of SO3/SO4. If the delay in deployment is due to the
normal length of time it takes for the customer to validate the new software and install it on all
systems, then any outages due to the problem fixed by the software change will still need to be included
in the SO3/SO4 data. Your organization could, of course, offer to speed up the deployment by providing
assistance in the form of field service personnel, etc. If the customer has decided to delay the
deployment of the fix or to not deploy it at all due to reasons of its own not related to verification of
the fix and the performance of the new software, then any new outages due to the problem fixed by the new
software would be considered customer attributable and not counted in SO3/SO4. It is important to note
that if the fix is only available to the customer in a software release that they must purchase, then the
fix has not been delivered to them and all events related to problem would still count in SO3/SO4. Review
counting rule 6.1.4 b) 4).
Question 11219 — Just recently our company released a bulletin to
our clients, advising them that a software patch has been released to fix an observed problem in our
product. The bulletin mentioned that the fix is necessary to avoid a system outage on our product. The
client after reading the bulletin, decided not to apply the software patch to their system. Sometime
later, the problem appeared on the client's system and an outage was observed by the client. The
question: Should this outage be included in any of the TL 9000 measurements?
Answer — If the customer has decided to delay the deployment of the fix
or to not deploy it at all due to reasons of its own not related to verification of the fix and the
performance of the new software, then any new outages due to the problem fixed by the new software would
be considered customer attributable and counted in SO1/SO2 and not counted in SO3/SO4. It is important to
note that if the fix is only available to the customer in a software release that they must purchase,
then the fix has not been delivered to them and all events related to the problem will count in SO3/SO4.
See counting rule 6.1.4.b)4). The same logic applies to SONE.
Question 12615 — When applying rule 4 of 6.1.4 b), when does fix
deployment start:
1) When the customer acquires the fix through download or receipt of media?
2) When the customer first loads the fix on a lab or acceptance system for testing?
3) When the customer first loads the fix on an in-service system?
4) How do we know when the customer has commenced deployment?
Answer — Rule 4 is only concerned with a decision by the customer not
to implement a that would have prevented an outage.
Rule 4 explicitly pulls in rule 7 if there is an issue with the customer taking an excessively long time
to deploy a fix. In those cases, the outage itself is still product-attributable for the outage frequency
measures. Rule 7 does require the organization to keep detailed start and stop times for an excessive
delay that is to be excluded from the product-attributable duration and counted as customer-attributable
duration. If, with customer agreement, it has been determined that the customer has taken an excessively
long time to deploy the fix (and the organization has offered to assist with that deployment per the no
cost part of the first clause in Rule 4) then outages which occur from that point on would be included in
the product-attributable outage frequency but with zero duration and in customer-attributable duration
but with no frequency.
Question 12156 — How does SOTS implement Category Table A-3?
The SOTS record does not include the category-specific details needed to answer the questions in
Table A-3, so should we assume the customer has already excluded the unreportable NE outages? How are
outages recorded in the SOTS data record that meet SO counting rules but not SONE counting rules, and the
reverse?
Answer — The SOTS template provides all the information required to
report all the TL 9000 outage measurements. As noted in the description of the SOTS template, the partial
outage information is to be completed in accordance with Table A-3. The value in the form can therefore
be used as is for the outage calculations. The same is true of the other data fields in the SOTS record.
They are to be taken at face value when performing the calculations for SO, SONE, and SSO. For more
information on SOTS see https://tl9000.org/sots/overview.html
Question 9539 — Regarding Outage Calculations: Using
category 3.3.1 as an example, the TL 9000 measurements handbook says a partial outage is recognized when
there is a loss of 5% or more of provisioned capacity. Outage Downtime calculation begins at this time.
Does outage downtime end when the loss of provisioned capacity drops less than 5% or when the
system is 100% restored (0%)?
Answer — The information in Table A-3 defines events that are to be
reported as a partial outage. Once the event exhibits one or more of the conditions it is reportable in
the Network Element Outage measurement. Outage downtime continues until the event is over when all
functional capability is restored to the network element. (100% functionality restored).
Question 12561 — I am a service-provider employee responsible for
working with our suppliers to measure performance and develop reliability improvement initiatives. We
have established a supplier report card using many of the standard TL 9000 measurements to measure
supplier performance.
I have a question on the counting rules for Partial impact outages. We are seeing a number of outages
where a portion of the outage is below the TL 9000 5% provisioned capacity impact threshold and the
balance of the outage is above it. As an example, we recently had an outage that was 943 minutes in
duration. The first 937 minutes of the event had a 4% impact to capacity and the last 6 minutes had a 14%
impact.
How should we be treating these outages when calculating the partial impact duration? Do we disregard the
portion of the outage that is below the 5% threshold, or since a portion of the event was above the
threshold, weight the duration for each portion of the outage by its impact and sum the weighted
durations for an overall impact?
Answer — Per the rules for the SONE measure, only the time after the 5% threshold
was reached would be counted as a partial impact outage. The time where the impact was below the 5%
threshold would not be counted in SONE. The entire time would be counted in the SO service impact measure
as there is no minimum customer impact floor for that measure. SO does not include any loss of NE
functionality other than customer traffic while SONE includes loss of OA&M, alarms, and other
non-traffic related capabilities. These differences are why the two outage measurements are included in
the TL 9000 measurement set.
Question 13114 — I have a query regarding outage impact assessment
on a partial outage on an MSC. The MSC in question supports both mobility traffic and gateway traffic
(land-to-land). For simplicity sake, let us suppose that these call volumes are roughly equal (50-50).
Let us suppose that there was a degradation in the system that affected only the mobility traffic. 25% of
the mobility traffic was impacted. In such a scenario, our assessment is that the MSC had a 12.5% impact
on a Nodal level (25% of 50% of the total traffic on the MSC). Can you please confirm that this is an
accurate assessment of the impact?
Answer — Yes, this is an accurate assessment.