Communicate with Supplier? Supplier
Zhong Xian Zhang Mr. Zhong Xian Zhang
What can I do for you?
Contact Supplier
 Tel:86-0755-23737136
Home > News > Is the data center you built really reliable?
Online Service
Zhong Xian Zhang

Mr. Zhong Xian Zhang

Leave a message
Contact Now

Is the data center you built really reliable?


In the data center industry, redundant devices are widely used to achieve higher system availability, typically requiring a range of 99.999% (5 9). However, the level of redundancy required depends on the reliability of the device.

For example, "N+1" redundant systems cannot achieve high availability through unreliable system components, and the likelihood of simultaneous failure of these components is high. Reliability can affect availability, but reliability and availability are not the same in the downtime and failure events that data centers experience. Reliability also affects data center operating costs. Longer downtime equals more maintenance and repair expenses.


Reliability is the probability that an item will perform its intended function within a specified time interval under specified conditions. Regarding reliability, there are some important issues to be aware of:

• Does the data center use the Reliability Centered Maintenance (RCM) concept to optimize maintenance?
• Has the equipment critical analysis been completed?
• Is the Mean Time Between Failure (MTBF) tracked regularly?
• Has the preventive maintenance (PM) program been optimized?
• Are you tracking equipment failures and improving the process accordingly?

Goal: Minimize expenses and maximize reliability.

In today's highly competitive market, operating expenses must be minimized without sacrificing reliability and uptime. Many data centers only develop service areas for their critical equipment based on OEM service recommendations. Although this can produce enough results, it is usually not the best result. Many times, these recommendations are for the best interests of the organization, not the end user. In fact, there is usually a better way to use reliability-centric maintenance (RCM) principles to increase reliability while reducing costs.

Although reliability-centered maintenance (RCM) programs have proven effective, they can be costly and require significant resources. They involve creating detailed Failure Mode and Impact Analysis (FMEA) and filling decision worksheets, which require expertise and can be very time consuming. With this in mind, implementing a comprehensive reliability-centric maintenance (RCM) program in a data center is often not cost effective. Conversely, implementing a preventative maintenance (PM) optimization program that uses key reliability-centric maintenance (RCM) elements and historical information about common failure modes is a cost-effective strategy that has proven to be effective in other industries. Adoption provides a good model.

The figure below shows the failure probability curve (P-F curve) using a preventive and predictive maintenance strategy.


The P-F curve is based on the principle of reliability-centered maintenance (RCM) and can be successfully applied without detailed analysis. Many of these reliability tools can be used to significantly improve the condition and longevity of your assets.

Implement a reliability plan:In 2017, data center operator RagingWire decided to implement a reliability program for its data center. The company employs reliability engineers with a production background.

Its initial reliability measures include:

Service area

1. Develop 81 related equipment
2. Enter the OEM recommendations and codes of the regulatory agencies (IEEE, ANSI/NETA, ASHRAE, NFPA)
3. The equipment list includes supporting equipment such as forklifts, pallet lifts, elevators, lightning protection devices, elevated doors, loading platforms, valves and water supply systems.
4. Used to create a task list for all devices and set up in the Computer Maintenance Management System (CMMS) of the Preventive Maintenance (PM) program.

Computer Maintenance Management System (CMMS)

1. Develop and document standards
2. Redeploy the program to unused or unwanted information
3. Added reliability fields such as faults, cause and maintenance codes, and service life
4. Enter corrective work orders for internal and external work activities
5. Train implementation changes within the company
6. Establish training matrix for ongoing annual training and new employees
7. Establish a consulting team that meets monthly to discuss employment and changes that can improve the program.
8. create a detailed user guide
9. Develop environmental health and safety (EHS) periodic requirements to ensure compliance

1.Develop a roadmap for the allocation of responsibilities
2.Reliability Steering Group
cut costs

1. Established a cost reduction team, including engineering and operations personnel

2. The procurement team reached an agreement on major equipment and expenses.
3. using the scope of services, can save 250,000 US dollars per year
Preventive maintenance (PM)
1. Established an analysis team for diesel generators and transformer oil through online reporting.
2. Preventive Maintenance (PM) optimization process for critical equipment through Failure Mode and Impact Analysis (FMEA)
asset Management
1. Define assets and create lists
2. Device hierarchy has been defined
3. Determine device criticality
4. Determined maintenance strategies: preventive maintenance (PM), fault finding, redesign, operation to failure
Root Cause Analysis (RCA)
1. Develop procedures in accordance with approved policies and detailed procedures
2. Root cause analysis (RCA) software was chosen to consolidate the process
3. Trained selected engineers and operators
Create policies and documents
Create preventive maintenance (PM) optimization strategies and procedures, thermal imaging strategies and procedures, predictive maintenance strategies, oil analysis strategies and procedures, motor circuit analysis strategies, vibration analysis strategies, and computer maintenance management system (CMMS) hiring strategies.

Typical benefits expected from a reliability program include reduced equipment failures and maintenance costs, increased work order efficiency, increased asset life, and a safer environment resulting from reduced equipment maintenance risks.

In addition, some of the added benefits include collecting device history for asset management and annual budgets, systematically eliminating the root cause of failures, and evaluating maintenance activities for continuous improvement.

Data center operators save costs and increase productivity through their new reliability programs. It is expected that capturing fault data and improving the maintenance process will continue to increase the expected life of the asset, thereby reducing capital expenditures. You can also track key metrics to ensure that expectations match the results. By prioritizing reliability and focusing on the inherent redundancy measures in the data center, this is an important step toward becoming a more reliable and economical data center provider.

Guangdong Giant Fluorine Energy Saving Technology Co.,Ltd
Business Type:Distributor/Wholesaler , Manufacturer , Trade Company , Agent
Product Range:Other Chemicals , Organic Intermediate , Other Chemicals
Products/Service:Fluorocarbon Refrigerant , Fluoride solution , Hydrofluoroether , UV printer ink , AF-coating , Anti-Fingerprint Original Solution
Company Address:Room 401 Building 2 No. 51 Bihu Dadao Fenggang Zhen Dongguan City GUangdong, Shenzhen, Guangdong, China

Previous: A total investment of 10 billion yuan! The Southwestern Big Data Center is emerging

Next: Direct investment of 2 billion, 120,000 servers! Two days later, Baidu started another data center.

Related Products List