August 25, 2008
Diagnosing Downtime — Radiology IT Experts Offer Tips on Handling Scheduled and Unscheduled System Interruptions
By Beth W. Orenstein
Vol. 9 No. 17 P. 18
Working in a digital environment is a tradeoff of sorts, say those involved in radiology IT. The technology has helped increase and improve workflow and diagnostics, but it also makes a department’s or facility’s operations more vulnerable to network downtime.
When hospitals and imaging centers were film based, computer downtime was not as disruptive to workflow. Some work could still be done if the system was taken down for maintenance or if it crashed.
However, most imaging facilities today rely on some sort of PACS and RIS to serve their patients. In a digital environment, even a brief interruption of computer service—planned or unplanned—can wreak havoc on clinical operations.
Planning Is Paramount
The good news is that PACS administrators and informatics experts agree the disruption from scheduled downtimes and even crashes can be minimized with proper planning.
“Planning is critical and must involve all of those who are involved with the systems—the technical staff, the clinicians, and the administration,” says Steven C. Horii, MD, FACR, FSIIM, a professor of radiology and the clinical director of the medical informatics group at the University of Pennsylvania in Philadelphia.
There is never a good time to take a PACS or RIS down, says R. L. “Skip” Kennedy, MSc, CIIP, technical director of imaging informatics for Kaiser Permanente medical centers in Northern California, but by negotiating with all the clinicians who use it and the facility’s administrators, you can find some times that are better than others and try to accommodate everyone as well as possible.
Most facilities, even hospitals with level 1 trauma centers, have times that are less busy and, if the staff know about a downtime in advance, they can plan to work with a PACS on backup and diminished capacity, Kennedy says.
At Kaiser Permanente, he says, “We try really hard not to schedule downtime during any department hours when the system is running at its highest capacity.”
To schedule downtime for maintenance or upgrades, the medical center requires at least 14 days notice, and everyone who could be affected knows about the downtime well in advance, Kennedy says.
And the schedulers can adjust accordingly. “If we know the system will be operating on backup, we won’t schedule CTs or MRIs for that block of time. The backup systems should be able to handle the walk-in emergencies,” Kennedy explains.
If you are planning to take your PACS down, says Michael Toland, PACS administrator team manager at the University of Maryland Medical System in Baltimore, remember that it’s more than the radiology department that you need to alert. For example, he says, “Surgery needs to be involved because the surgeons rely heavily on PACS during OR [operating room] procedures. From a business perspective, administrators need to be involved to understand the business impact.”
Toland says to keep scheduled downtime minimal, it is important to run a full backup before it starts. Horii also notes that backups should be checked prior to a downtime situation. “Never just assume that they are working,” he says.
PACS vendors will generally recommend an unscheduled full backup before any upgrades because there’s always the risk of losing data, Toland says.
However, a common mistake people make is to start the downtime and then do the backup. “That means it’s going to create a longer downtime since it will include the backup plus whatever you have to do with that downtime. You want to be able to work with the vendor to schedule the backup so that you have it start before the downtime and end as close to the beginning of the downtime as possible,” he explains.
Any planned downtime should have a detailed action plan, PACS administrators agree. “It is important to have this in advance,” Toland says, “because you want to be able to go over this action plan and think of all the things that can go wrong. Initially, you should get documentation from the vendor that details what is required for the downtime. Then you can internally discuss the action plan and revisit the plan with the vendor. Ultimately, the goal is to have a detailed plan of action that has consideration for errors that occur and instructions for those scenarios. This provides more confidence for the engineers and admins involved, but it also eases the mind of the administrators and clinical staff.”
As part of the action plan, Toland says to identify the point of no return. “When you’ve taken the system down to do an upgrade or just make some configuration changes to resolve a problem, you have to decide at what point you can’t go backwards and undo the changes,” he says. If something goes wrong, decide whether to stop and reschedule the downtime or proceed and know that the PACS may be down for additional time, he says.
One possible worst-case scenario is that you will have to reinstall from backup, Toland says. “Restoring from backup is always difficult due to the large quantity of data often associated with PACS.” A radiology department that does about 60,000 studies per year has about 1.75 terabytes of data and to restore them from tape could take more than 24 hours. “If your planned downtime was two hours and it goes bad, it could turn into a two-day catastrophe,” he says.
The point of no return can be somewhat flexible as long as the clinical stakeholders are kept informed and involved in the decision-making process, Toland says. For example, if there is a problem while taking the system down to do an upgrade at his facility, an academic medical center with a shock trauma center, he will consult with the administrators and have everyone decide whether to proceed or postpone the changes. “If they will be operating on patients and will need the PACS, we can do the upgrade at a later time,” he says. “In most cases, for a planned downtime, you have the option to wait. You rarely hit that point where you can’t push it off another couple days or whatever would satisfy key stakeholders.”
The action plan will help ensure that downtime activities go smoothly and the engineers and administrators can accomplish their tasks in the time scheduled, Toland says.
Not everyone will be involved in the downtime planning at the technical level, but a detailed action plan can help everyone understand what will happen, Toland says, and the same is true of the on-site resources. “They’re sending out the vendor because the vendor specializes in it, but the on-site resources want to participate and have a better understanding of what is happening,” he explains.
The action plan must also include documentation of all tasks to be performed during a scheduled downtime, and the documents become a template for future downtimes, Toland says. “An action plan also helps because you’re going to sit down with a group of people and think about every step that has to happen. The more you plan about the different steps of the upgrade, the more seamless it will go. You feel a lot more confident as you’re doing it since you have a detailed action plan,” he says.
Let People Know
Another key component to the success of planned downtimes is communication—not only with the users but all key stakeholders and vendors. “You want to give them adequate communications of what’s going to happen,” Toland says. “Avoid sending an e-mail with only times of system outages.”
Toland says he spends a lot of time crafting the communications because it’s important to let the users know exactly what the downtime will mean to them and their ability to do their jobs. “People in surgery might not necessarily associate an e-mail that says, ‘PACS will be down on such and such,’ to ‘I’m going to be in surgery, and I can’t view the images or 3D models on PACS.’ So we are careful to tell everyone what exactly it means to them, such as ‘You will not be able to view images,’ or ‘If RIS goes down, you won’t be able to schedule new exams.’”
Horii adds that, if possible, it helps to have extra staff before a planned downtime. At Penn, he says, three additional attending radiologists and four additional fourth-year residents each covered a six-hour shift during a planned downtime. Then he and his colleagues compared report turnaround times for a previous planned downtime, an unplanned downtime, and the planned downtime with the extra staffing. The report turnaround time for the planned downtime with the extra staff was significantly better than the downtime when the extra staff were not available. The downtime with the extra staff also was better than typical uptime periods, he says.
The point is that comprehensive planning for scheduled downtime can keep radiology performance on a similar level to normal operations, according to Horii. He also notes that “because the additional faculty and residents are not expected to work for free, administration has to agree to support the added salary costs if this option is used.”
Vendors believe their systems are extremely reliable and not subject to crashes. “I’ve heard vendors say that their systems never fail, but that’s simply not true,” says Kennedy.
Unfortunately, not much can done when it comes to unplanned downtime, but having disaster recovery and business continuity plans in place as The Joint Commission and HIPAA require will help, Kennedy says. How redundant the systems are (or need to be) depends on the type of facility and whether it is a major medical center with level 1 trauma care or an imaging facility in a rural area.
One key when a system crashes is to set timelines for action. Decide up front how long the system can be down before the technicians call for help from the vendor or others. “You can say that if the system is out for more than 30 minutes, you will call the vendor,” Toland says. If you don’t plan that call and stick to your deadline, he says, another 10 or 20 minutes could pass while you try different solutions and nothing works. “The next thing you know, an hour has gone by and you haven’t told anybody,” he says.
Also, when the system crashes, keep users informed of what’s happening and how much longer it may be down, Toland says. If they are informed, they won’t keep calling the IT department and distracting those who are working to fix it.
Once the problem is resolved, Toland says, make sure you do a postincident review and look back at what happened. “You want to record what the problem was, how it was noticed, what the clinical impact was, what the business impact was, and what actions were taken throughout the life of the incident,” he says. “You want to time stamp things as much as possible, so you know who is doing what and when they are doing it.”
That information will be extremely useful should something similar happen in the future, he says. From the postincident review, you can create a proactive plan to prevent the failure.
— Beth W. Orenstein is a freelance medical writer and a regular contributor to Radiology Today. She writes from her home in Northampton, Pa.