Practical example
Friday, September 30th, 2022
Reasons for Storage Migration
Legacy landscapes are increasingly reaching their limits. The amount of data that needs to be stored is constantly growing and the requirements for storage infrastructures are constantly increasing. In our three-part blog on storage migration, we described in detail the reasons why storage migration is necessary.
Among other things, a storage migration can be carried out for pure performance optimization, but is often postponed for cost reasons, even though it leads to cost reductions in the long term. However, sooner or later, an outdated storage solution brings problems with it that make the migration unavoidable. The following describes a practical example in which a simple extinguishing system test in a data center room provided the fundamental reason for the migration.
Initial situation for the problem
The technology installed in data centers leads to increased heat generation, which is the reason sufficient cooling is indispensable. Basically, data centers must be kept at a constant temperature level within their rooms. If the temperature rises above the specified limits of the hardware at individual points, this can already lead to the failure of certain components.
At the same time, technical defects and short circuits generally cannot be ruled out. This combination leads to an increased risk of fire within data centers. To counteract this, legally prescribed safety precautions must be observed. These include the installation of an extinguishing system, which helps to get a fire under control in the event of an emergency. To ensure that the system functions properly, regular tests must be carried out. Failure to comply with the regulations could result in the loss of the operating license.
Furthermore, a functioning extinguishing system helps to keep the economic damage as small as possible in the event of a fire. The recovery of data after a fire is associated with high costs and effort. In the worst case, company-relevant data, such as patent, product development and warranty-relevant data, can be completely lost. Also, not to be neglected is the direct danger to employees if the fire spreads.
In our practical example, one of the server rooms poses a major problem for the upcoming extinguishing system test. Corporate-relevant data that is essential for daily business is stored on one of the servers located there. But the data is stored on an outdated storage system with conventional hard disks (HDDs with read heads). Most systems of this type have reached their end of service by now but are often kept in use for cost reasons. Due to the conditions of the system, a condition-compliant erasure test would pose a direct threat to the installed hardware and thus to the data on it.
An important part of the extinguishing system test is also the testing of the alarm sound. This comes with an increased volume and therefore leads to vibrations within the room.
The vibrations could cause the read heads of the HDDs to vibrate and damage themselves or the data carriers. This already happened in 2018 in a Swedish data center. An article on the incident can be found in Heise Online: https://www.heise.de/newsticker/meldung/Loeschanlagen-Ton-zerstoert-Festplatten-in-schwedischem-Rechenzentrum-4029730.html.
As a result, measures were necessary to protect the important data on the servers. These had to be planned and implemented from scratch, for which in our case only a small window of time was available. Postponement was not an option, as failure to meet the deadline threatened to result in the servers being shut down by the responsible authority.
To ensure that the problem would be solved in the long term and would not reoccur in the same form during the next extinguishing system test, it was decided to renew the hardware (long-term), as well as to migrate the storage (temporary solution).
Within the available time window, the hardware for the new storage solution had to be procured, installed, and the data migrated. But even the purchase of the hardware turned out to be a major hurdle due to supply chain problems as a result of the Corona crisis. It was sold out by all manufacturers and could not even be procured through direct contacts with the manufacturers. Even an intensive search on aftermarket sites was without success.
In order to prepare the move of the data, it is first necessary to identify the responsible persons as well as the know-how carriers. These are active in different areas of expertise and therefore have a variety of requirements for the new storage solution. In addition, there are various wishes for the timing of the data migration.
The totality of the given conditions led to the conclusion that the storage migration could not be carried out within the time window available for the erasure system test. Therefore, the migration was considered a long-term goal for the time being. Nevertheless, a transitional solution was needed to enable the extinguishing system test to be carried out.
Solution Variants
On the one hand, the transitional solution had to be implemented quickly due to the tight timeframe, and on the other hand, it had to be as resource-efficient as possible due to the upcoming storage migration. In addition, the server could not be shut down, as it might not have been possible to start it up again due to its age.
First and foremost, concepts were analyzed that did not require the immediate commissioning of another storage solution. In the course of this, possibilities were investigated to move the server to other facilities before the extinguishing system test, or to shield it sufficiently against the sound. However, both options carried a residual risk due to lack of experience.
None of the solutions examined could completely rule out a possible defect of the server, so the use of additional storage solutions was unavoidable.
Therefore, virtual machines were to be put into operation. Copying the data to the virtual machines was to ensure day-to-day business, as well as data recovery in the event of a defect of the old systems. This was implemented by accessing the data via the virtual machines and not directly via the server for the duration of the extinguishing system test.
The goal of the long-term planned storage migration is to build a new infrastructure to enable the move of all corporate data to virtual machines. This brings several advantages:
- Consolidation: Multiple virtual servers on one physical server reduces investment and operating costs and simplifies data center operations.
- Intelligent management: Virtual servers can be managed much more intelligently and flexibly. Automating many tasks is no longer a problem.
- Fast provisioning: Virtual workloads can be easily scaled or moved. This makes it possible to respond more quickly to new requirements.
- Security and availability: Virtual servers avoid application downtime and significantly accelerate disaster recovery.
After finishing all backup processes, the extinguishing system test could be carried out without any problems. According to the forecasts, one hard disk failed during the test. However, the data on it could be restored from the virtual machines to the original system via the backup without any problems.
After the extinguishing test, the data was retrieved again directly via the old server. The backup remains on the virtual machines for security reasons, as the failure of further hard disks cannot be ruled out due to their age. In addition, this created a safety net and the data is protected until the end of the upcoming storage migration.
Conclusion
By reacting quickly and appropriately to the circumstances, a solution was found and the extinguishing system test was possible within the specified timeframe. At the same time, the approved budget was adhered to, thus saving resources for the planned storage migration.
The practical example shows that, in addition to basic know-how, transfer knowledge is also essential for carrying out a storage migration – because the ability to react to individually occurring problems with flexible solutions is indispensable in every storage migration. Experience and best practices help to ensure that the actual goal is not lost sight of when dealing with such problems. In concrete terms, this means always balancing the long-term strategic planning of a migration strategy with the ability to act operationally. This should be reflected, among other things, in the design of decision-making paths and coordination. In this way, it should be possible to complete the project successfully despite the high dynamics and complexity of a storage migration.
Patrick Hanke
Author
Projectmanager
Paul Stapf
Author
Junior Consultant