Ask any IT organization to identify the No. 1 cause of network performance problems, and they’ll probably point to high-profile events: denial-of-service attacks, computer viruses, fiber cuts, power outages or hardware failures. However, studies show that more than two-thirds of network issues are actually tied to a simple everyday activity: The ungoverned process of IT staff making network configuration changes.
Change is an opportunity for mistake. Internal errors — often inadvertent — can take a heavy toll on overall network performance. This is a serious IT challenge, especially because spending massive amounts of time on last-minute fire drills can be a huge burden. IT organizations typically spend anywhere from 60 percent to 85 percent of their time doing unplanned work, most of which is reactive, time-consuming troubleshooting.
Check out the following five network management pitfalls. Are you a victim? If so, you’re wasting precious resources, which could be spent focusing on more strategic activities.
And remember: change happens. Just be prepared.
Mistake 1: Manage Change Reactively
Change management is not the same as managing change. These two processes are different but complementary. Change management is how IT departments develop, request, schedule and implement change to network devices. Once it’s implemented, this process is complete.
Managing change is the process of understanding a change’s impact on network health and compliance. Even the smallest, most routine update can knock an entire network out of compliance. Instead of waiting for user complaints to come flooding in, this process finds potential issues before performance degrades.
Organizations should automate both processes: change management and managing change. A best practice includes both 1) a change management process that focuses on planning, scheduling and deploying the change, and 2) ensuring that the planned change is — and remains — a positive modification once it has been implemented.
Mistake 2: Too Many Manual Configuration Changes
Each time human hands interact with equipment, there’s greater potential for error, even with experienced staff, and fat-fingered users can wreak serious havoc.
The benefit of custom-built scripts or programs is that they reduce the time-consuming, manual effort of collecting and storing configuration and change data. However, when additional scripts are needed over time, this adds to the complexity of the custom-built solution. On top of this is the added worry of staff turnover — when the original creator leaves the organization, so does their knowledge. Custom scripts, then, can reduce the manual effort, but also grow unwieldy.
Automating the collection, storage, analysis and reporting of network change and configuration data not only reduces time and effort; it also lowers the risk of degraded service by reducing the number of individual, human touches. Automation empowers IT to focus on projects that can improve the overall success of the organization, instead of spending time on manual, repetitive tasks.
Mistake 3: Treat Performance Management and Change Management in Isolation
Almost every organization has tools that give visibility into network performance. The challenge is that these tools live in separate bubbles, frequently managed by different people. This becomes even more problematic once you start overlaying new technologies such as virtualization or cloud computing on top of the network infrastructure. Without a correlated view, IT must play a guessing game.
In more complex scenarios, a change doesn’t immediately cause the network to exceed a monitoring threshold, so it doesn’t trigger an alert. Hours, days, potentially even weeks later, the suboptimal configuration can combine with other factors, such as new usage patterns, to create unexpected network service degradation. In these cases, troubleshooting is especially tedious — the root cause is likely buried in a stack of historical reports, and staff must play detective, slogging through every possible cause of performance degradation.
The most successful IT organizations tie network change views with network performance views. Instead of having multiple tools, a single system provides a correlated view, eliminating guesswork.
Mistake 4: Grant Too Much Administrative Access
Are you too trusting? There’s a tendency to provide full administrative rights to any and all IT staff who manage devices. This is risky, especially once the list of “privileged” personnel grows to a substantial size.
IT folks usually make device modifications individually as they see fit, and often with the best of intentions. They think, “This is a small change, it won’t impact anything. I’ll just make it myself and not wait for the maintenance window.” But keeping the IT team in the dark increases potential for an undesirable ripple effect — one misconfiguration can affect a multitude of devices.
Organizations need to give access based upon individual roles and responsibilities. These should all be documented and managed from a central console. Success here hinges on giving appropriate levels of access and system views to each member of the entire IT team but avoiding overextending.
Mistake 5: Ignore the Impact of Change on Neighboring Devices
Ignorance is not bliss. One of the most frequent IT gaffes is taking a narrow, device-centric view when configuring an individual network component. It’s crucial to correctly implement and understand how each modification impacts neighboring devices and overall network health.
For example, say there are several different help tickets. The respective issues are fixed, the necessary changes made, each modification is reviewed individually and appears fine. However, negative consequences can still be in store. One change can impact neighboring network devices, if the new configuration triggers a ripple effect that causes major problems as variations with users, applications or usage occur.
Instead of looking at devices in isolation, IT groups must view the impact of changes holistically, as well as on nearby devices. Note that using only a manual process, it’s virtually impossible to determine the domino effect of a change across a complex, multi-device network. Successful organizations build automation into the process to identify potential issues before end users are affected.
In Conclusion
Change is inevitable, but organizations can take control of it in a way that reduces risk. The key takeaways are to move away from a reactive, troubleshooting approach to network change and configuration — toward a proactive monitoring one, that limits risk by minimizing human configuration errors and greatly reducing the time and effort required to isolate and correct problems.
When it comes to networking, what you don’t know definitely can hurt you.
Don Pyle is chairman and CEO of Netcordia.