Storage Peer Incite: Notes from Wikibon’s September 13, 2011 Research Meeting
Recorded audio from the Peer Incite:
Network management is plagued by unnecessary complexity. Switch and router vendors provide monitoring and management tools, but only for their products. And they only provide spot views of what is happening in the individual devices. This means that managing the typical heterogeneous network with these tools requires interpreting information constantly scrolling across numerous monitors to get some idea of the data flow over the entire network. And when something starts going wrong in one of those devices, the cascade effect causes devices across the network to report errors. Spotting the device causing the problem becomes an unnecessarily complex, time-consuming problem-solving exercise.
For decades network monitoring companies like Ipswitch, maker of the popular WhatsUpGold product for SMBs, and divisions of enterprise firms such as HP, have worked to provide "single pane of glass" network monitoring and management systems that can automatically map all devices on a network and report on data flows across the entire network graphically and, when problems happen, identify the root cause and in many cases take action automatically. These efforts have been largely successful today, but 100% coverage is still often frustrated by devices that do not support SNMP. Users should make SNMP support a basic requirement for all network devices on their network and eliminate any non-SNMP switches and routers from their purchasing short lists.
The "single pane of glass" is a huge step forward in network management, but is it enough in this age o virtualized environments? David Vellante, in his article below, presents a vision of the next generation of network management. Whether you agree completely or not, it does raise the question of what we need beyond the glass.
The articles below were inspired by a recent Peer Incite meeting discussing the issues behind network management with a particular focus on WhatsUp Gold from Ipswitch, an independent provider of network management tools. G. Berton Latamore, Editor
IT organizations are under constant pressure to keep up with the growth and changing requirements of the lines-of-business. Similar to other infrastructure silos, in networking this has led to a heterogeneous mix of switches, routers, and other various types of gear that staff must manage and support. While customers may be considering emerging options for designing next-generation “fabric” architectures, in the mid-markets most customers mostly want to deploy new gear faster and simplify overall management. Virtualization has a ripple effect on networking by requiring even more flexibility and mobility than traditional server infrastructure. The bottom line is that IT staffs require tools that can manage infrastructure with the minimum amount of effort.
While the Wikibon community came to the conclusion in a Peer Incite Research Meeting that the industry still has work to do to create that single pane-of-glass to manage the entire IT infrastructure, moving to a single tool set for network management such as the example that Rene Delbe, Systems Engineer at Infinigate, a German-based distributor of Ipswitch’s WhatsUpGold (WUG), discussed, delivers progress towards the ultimate goal.
Consolidation of Tools
When it comes to installing, monitoring and managing changes in a network, an administrator can end up with a tool for every family of switch, router, and end device in the data center with some spreadsheets and network diagrams to help consolidate information manually. While enterprise accounts will consider investing in orchestration tools, Rene said that the mid-market customers often turn to open source software to try and consolidate on a single tool.
Rene goes on-site with his resellers on a weekly basis to support WhatsUpGold, and he says that they can deploy 30-3500 device configurations in 1-2 days, which he expects would take months to deploy with open source alternatives. WhatsUpGold utilizes SNMP to collect, monitor and manage the various pieces of the datacenter. SNMP solutions allow agent-less management of an environment, and for most business-grade devices, this means that no additional setup is required (Windows administrators will need to enable the management IP ports).
While there may be cases where a device-specific management tool or agent-based solution has deeper functionality, Rene stated that SNMP provides the functionality that his customers require. While Rene did not quote any specific ROI numbers for the solution that he installs, it is his experience that the deployment of WUG can allow the network team to manage environments with a reduced headcount.
While server virtualization adds a layer of abstraction between the hardware and operating systems, the stability of the physical infrastructure becomes even more critical, as an outage will affect more users. Monitoring tools have the ability to track everything from the basic environmentals of power and temperature of the physical environment to the status of the advanced virtualization mobility activity of VMware vMotion and similar tools. While hypervisor-based management tools such as VMware vCenter and Microsoft SCVMM are increasingly becoming the center of activity for server virtualization administrators, APIs also allow external management tools to manage into the virtual environment.
Rene discussed a WUG plug-in module for supporting VMware environments that integrates with VMware APIs and vCenter, but runs from a separate machine, just like standard WUG. It is worth noting that VMware’s native virtual switching (vSwitch) is not managed by SNMP, but that WUG can monitor the physical infrastructure and have visibility into the VMs. SNMP was designed for physical devices with fixed resources, which does not fit with the virtual model; this is why API integration is needed. See Server virtualisation management complications for more on this topic.
Deploying a network management toolset requires coordination beyond the network team. Rene says that a typical planning session should include server, Windows, VMware, and database (WUG uses SQL) administrators. Users should be encouraged to leverage services from resellers to implement the management software as there will be a mix of automated discovery devices that require enabling monitoring and additional considerations such as using APIs as discussed in the virtualization section above.
Rolling out a solution is a good time to inventory equipment and confirm notification/escalation paths. Tools can often be set with scripts to act according to certain scenarios and alerts can be sent via various communication methods.
Justification of Network Management
In general, network management is justified by helping prevent or reduce the time-to-resolution of a network outage. A complex network with multiple points of management will require time to determine the source of the issue through the various alerts and rippling impacted devices. The ROI of a good tool can also be determined through both the initial time to deploy and saved man-hours from moving to a central management console.
Action item: IT organizations are constant pressure to extract greater utility out of infrastructure while managing the growing environment with fewer resources. Having a well-documented environment that can be monitored and controlled from a centralized tool (both at a Web console and increasingly via mobile access) is a key step towards allowing IT staffs to provide more strategic support to development and application requirements rather than fighting to keep the lights on. The management ecosystem is still highly fragmented and users should look to leverage services along with software to properly deploy efficient management.
The current state of network management is a collection of tools for each component in the network, and heterogeneous methods to monitor and manage those components. Every device has its own set of tools for network monitoring and management, making network administration a cumbersome and time consuming task. Root cause analysis and network planning are especially challenging in this environment. Network management costs are not linear and tend to increase at a faster rate than the network size increases.
Network management nirvana is the so-called "single pane of glass" where:-
- All of the network topology can be discovered and mapped automatically with the minimum of agents;
- Role-based authorization pushes responsibility down to the right people;
- Consolidated alerts provide a view of what is happening across the network rather than a fragmented set of snapshots at individual switches, and automated responses are used when appropriate to respond instantly to specific events;
- A common database collects all historical data and network change information;
- Analysis tools work seamlessly across all components of the network;
- Mobile devices are fully supported;
- Security and compliance objectives are supported across the whole network;
- What-if planning of changes to the network can be done against the data and against different future scenarios.
The key to working towards this single-pane-of-glass nirvana for a complete installation (we are not there yet!) is committing the organization to a common set of network reporting standards. SNMP has improved over time to include the ability to monitor and manage all of a network with the exception of storage, VMware virtualized networks, and some older equipment.
Committing to the latest SNMP standards and to vendors who support them fully now and in the future will allow CIOs to make significant reductions in network costs and network administrative costs while improving the quality of network service and the ability to work with other parts of IT on complex problem resolution. This would allow CIOs to make significant reductions in network CAPEX, network administration costs, improve quality of service, and reduce the security risks for the network.
VMware is plotting its own path in network management. Most IT infrastructure environments will be a mixture of virtualized and bare metal for some time to come, with the most demand systems outside the virtualization walls. Network management needs to encompass the whole network. There are APIs in VMware that allow some network data to flow out to SNMP-based management tools. However, the lack of SNMP reporting by VMware breaks the single-pane-of-glass concept and complicates overall network management. The predominance of VMware means that this will have to be managed around, although as other virtualization platforms mature and if they offer true SNMP support, there may be a case for moving work sensitive to network management to these platforms.
Many network management tools do provide full-function good SNMP support, including (but not a full list) the following:-
- WhatsUp Gold from Ipswitch (the subject of a recent Wikibon Peer Incite);
- Orion from SolarWinds;
- OpManager vs from Advent;
- SMNPC from CastleRock.
The weak points for most of the tools are integration with VMware and support for mobile. The roadmap for implementation of these requirements should be explored in depth.
Action item: In order to minimize the cost of network equipment and network management and provide higher levels of service and security, CIOs should set a standard that all networking equipment (excluding storage) should support the latest levels of SNMP and that network management tools that support SNMP fully should be deployed. CIOs should be satisfied that the SNMP-based tool integrates with VMware, Hyper-V, and other virtualization platforms as well as possible and will support the future requirements for mobile within the organization.
When it comes to managing infrastructure, administrators are looking for Simple Network Management Protocol (SNMP) to live up to its name. SNMP was defined by the IETF to be the standard for monitoring IP devices. While SNMP has been around since the late 1980s and most devices have SNMP support, there are many considerations ranging from device discovery through scripting for custom functionality. For those that are unfamiliar with implementing SNMP, take a look at this introduction article by Matt Simmons; it’s especially illuminating to read about some of the inconsistent support between versions of SNMP.
One of the biggest advantages with using solutions based on SNMP is that agents (which can be obtrusive and must be kept up to date) are not required. Most devices that support SNMP can be discovered automatically pinging or by using TCP requests. Some devices (such as Windows servers) will require enabling responding to SNMP requests. As discussed in the summary of the recent Peer Incite on network management, VMware environments are discovered and managed via an API rather than SNMP. Management tools can pull together SNMP and other agent-less methods of monitoring and controlling devices.
SNMP allows for vendor-neutral management, but that does not mean that all vendor equipment is supported equally. Some vendors do not fully adhere to the SNMP versions (which are documented in IETF RFCs). There are conflicting specifications in different versions, which can be confusing and frustrating to sort out.
Once devices are discovered, SNMP functionality includes acting on the MIB (management information base, which is the structure of managing the device) including Gets (retrieving information), Sets (changing something on the device) and Traps (a notification that can be set based on certain criteria). Network management software can support automation to enhance SNMP, replacing scripts that customers would have either written themselves or paid professional services to create in the past.
Action item: While SNMP solutions do have gaps and inconsistencies, it is still the most broadly supported vendor-neutral option for network management. Users should leverage vendor documentation and knowledgeable partners to help successfully install network management.
Footnotes: See NetworkManagementSoftware.com for more on this space
Over a decade ago I was just learning about LANs for my small business. Although I had decades of experience in IT, my exposure to TCP/IP/Ethernet trouble shooting was limited.
I had several servers networked on a LAN for file sharing and shared Internet access. When I relocated one of those servers to a remote corner of the building, network performance for that server ground almost to a halt. I checked the wiring, which was brand new, connections etc., but everything seemed physically fine, and I could ping the server no problem.
Then I started searching for troubleshooting tools. I tried many. Most were either too cryptic or reported nothing was wrong.
Then I stumbled upon WhatsUp Gold (WUG) from Ipswitch, Inc. and after installing a 30-day free trial of the standard version I began to trouble shoot my LAN. WUG was wonderful! It had an intuitive interface, produced a nice topology map, and told me things about my network infrastructure I never knew before. I probably learned more about LANs and Microsoft networking in one day than I had in the whole previous year.
Having gotten comfortable with WUG, I addressed the network performance problems with this one server. In no time at all, I learned that it was experiencing huge packet losses and the likely cause was too long a cable run. Given that I had, indeed, moved that server quite a distance from the patch panel, that seemed logical. I put a new switch between the server and the patch panel, and the problem was instantly solved!
For the rest of the trial period, I played with WUG and came to love it. As the end of the trial neared I explored purchasing a license for WUG, but at more than $1,500 for a single license I was dismayed. That was way over my budget, and I ended up not purchasing it. To this day I miss WUG.
Action item: Ipswitch should offer a license option for small businesses that is truly affordable and put the “S” back in SMB.
Ethernet is offering improved capabilities for connecting storage and servers. iSCSI is increasingly offered on lower-end storage arrays, and the extensions to Ethernet allowing lossless connection have enabled fibre channel over Ethernet (FCoE) as a protocol. This has allowed the vision of a single Ethernet fabric to include all the connections within an organization:-
- Voice connections;
- WAN connections;
- LAN connections;
- Low performance storage connections;
- High performance storage connections;
- Inter-processor connections.
The connection list sequence above has several characteristics:-
- The number of connection points tends to decrease with numerical position;
- The rate of change tends to decrease with numerical position;
- The bandwidth requirements tend to increase with numerical position;
- The availability requirements tend to increase with numerical position;
- The latency requirements tend to decrease with numerical position;
- The equipment connected tends to increase in cost with numerical position.
Ethernet is the protocol of choice for multiple connection points, and it is very easy and cheap to switch Ethernet traffic. It is very easy to make changes, and the bandwidth capabilities have increased significantly to 10Gb. The ability to oversubscribe the network reduces the cost of Ethernet networks. The availability of the latest SNMP standards enables effective monitoring and management of the whole network with tool like WhatsUp Gold and others.
In low performance storage environments, Ethernet connections are both suitable and lower cost. In many file-and-print environments, CIFS and NFS protocols based on Ethernet allow suitable performance and improved connectivity to very large files systems.
However, Ethernet does not work well so well in database environments where random IO performance for small block sizes and latency are the key requirements. Latency is the time taken to get the data, and is of importance when there is serialization of tasks. It becomes very important in database systems, and in high performance systems. Dennis Martin gave a presentation at the 2011 Flash Summit illustrating the difference in IOPS performance and latencies of different storage protocols, and shows clearly that neither 10Gb or 1Gb iSCSI is not well suited for high-performance storage environments. Figures 1 and 2 are based on this presentation.
Virtualization, erasure coding and flash storage are all changing the storage landscape. Virtualization is putting pressure on IO performance, and making it more random. Erasure encoding is enabling much lower costs of distributed storage. Flash storage is reducing or eliminating the long latencies of hard disk mechanics, and the storage network protocol is now relatively a higher proportion of the IO time and of increasing importance.
When looking at systems interconnect, Ethernet can work for lower performance systems but does not have the bandwidth and latency for high performance systems. Of the top 100 supercomputers, only one (position 45 as of June 2011) is using 10Gb Ethernet as the processor interconnect fabric. Very low latency InfiniBand from Mellanox and QLogic dominate in this space.
Some networking companies and networking professionals have argued that Ethernet now offers the potential to provide all the interconnect requirements for an organization. Wikibon believes that this is not the case, and that inter-system and high performance storage requires different protocols.
Very importantly, these system networks should be managed by the system side of the house, (system, database and high-performance storage), and not the LAN and WAN networking groups. The availability requirements for storage networks are much higher, performance requirements (especially latency) are more demanding, the rate of change lower and the requirements for change control more stringent. These requirements are more in line with the tools and mindset required to assist the server and database management groups.
The lack of SNMP standards for storage means that Wikibon would advise as a general rule keeping high-performance storage networks and storage devices separate from general-purpose networks. However, the systems and storage side have an important role in helping the networking group by ensuring that latest levels of SNMP are available on all server and storage equipment.
Action item: Wikibon recommends that CTOs and CIO should continue to separate out the management of the networks of high-performance storage and systems from the general purpose LAN and WAN networks. Equally, Wikibon recommends that IT organizations should work with networking vendors for general purpose networking, and system and high-performance storage vendors for high performance/low-latency networks. This will help minimize (but not eliminate) the unproductive religious wars that can break out between the different mindsets that are required for the two environments.
Too often network management today still depends on huge numbers of tools with different ways to manage or monitor individual devices on the network, a situation that has not changed significantly since the 1980s. Generally, switches, routers and the like have their own sets of tools for monitoring the health of a network, creating a sea of complexity for network managers, often making management a difficult and expensive task.
The concept of a single pane of glass for network management is alluring and in and of itself can deliver ROI by cutting costs. The reality however is that the vendor community needs to go further. The key to 10X increases in value can be found in the application portfolio. Specifically, applications are the link between network infrastructure and business value creation.
By taking an application view, network managers can switch the conversation from cost cutting to business value creation. Supporting this approach increases a vendor's standing within customer organizations and can create lasting strategic relationships that include both tactical problem solving and ongoing value delivery.
Indeed, Wikibon practitioners who identify and rank their company’s critical applications report positive outcomes that directly impact the way in which they design and ultimately monitor, manage, and remediate network infrastructure. A major focus of these customers is on ensuring policy-based quality of service (QoS).
The process taken by these practitioners is to map supporting network elements to key applications and prioritize applications based on business value contribution (business benefit divided by TCO to support the apps). Providing an application view of network management enables customers to focus monitoring and troubleshooting efforts on applications that are delivering the most business value. A key aspect of this approach is to proactively measure and report on time to recover from application outages and even degraded application performance.
Where Should Vendors Focus? Vendors should integrate application monitoring explicitly into tool sets. Specifically, there are five areas the Wikibon community identifies as critical for network management suppliers to deliver, including:
- Reporting that maps application value to network infrastructure. Specifically, cost of downtime and/or degraded performance should be a key performance indicator delivered through customizable dashboards.
- The monitoring of critical IP services should be reported on an application-by-application basis to ensure that key processes supporting applications are available and performing to SLA requirements
- Monitoring should provide anticipatory identification of application performance bottlenecks before users are affected.
- Monitoring should identify application performance issues in real-time or near real-time.
- The network monitoring and management system should provide the assurance (supported through reports) that data integrity exists across various IP services.
Vendors should focus efforts on supporting and monitoring performance and health at the application level for IP services (e.g. Ping, SNMP, HTTP, etc), e-mail services (e.g. Exchange), database services (e.g. SQL Server and Oracle), OS services and other critical infrastructure that support business applications.
By performing some basic research and allocating R&D dollars to provide application views, network management will go from being seen as a roadblock to change to a business value enabler.
Action item: Efforts by network management vendors to market a single pane of glass as a means of reducing complexity are noble and differentiable. However vendors should endeavor to deliver far more business value than can be realized by consolidating tool sets. Specifically, while such consolidation might reduce the need to add headcount, cut training costs and improve agility, focusing on application-level performance indicators can deliver orders of magnitude more business value and change the perception of network management from cost containment to profit generation. Taking an application view will allow vendors not only to fulfill tactical customer needs but also deliver long-term strategic value to clients.
Today's complex IT environment brings with it a broad collection of device-specific tools to populate asset catalogs, monitor system health and performance, send alerts, manage devices, and provide a visual representation of the environment. Larger organizations consolidate much of this information into a monitoring and management framework, with hooks into, and data out of, device-specific tools. But these frameworks are often beyond the budget of smaller IT organizations.
In many smaller organizations, systems and network administrators have often relied on spreadsheets to keep track of devices, and have relied almost exclusively on device-specific tools for health and status. As a result, they often lack an end-to-end view of the environment.
The broad deployment of server virtualization software and the more dynamic application deployment and movement that is enabled by leading virtualization suppliers, however, creates a more challenging monitoring and management problem for network administrators. Discrete tools and spreadsheets will not typically be sufficient to meet the needs of network and server administrators in highly-virtualized environments.
Recently, agentless consolidated tools that leverage the broad support for SNMP have become available. These consolidated tools are simple to deploy and maintain. Although they may lack some of the automated management of larger frameworks and the ability to integrate with device-specific tools, they typically provide a solution that is affordable by smaller organizations and offer a sufficient level of monitoring and alerting for most devices in the environment. These tools can eliminate much of the need for device-specific monitoring tools, allowing organizations to reduce training and provide more complete, end-to-end monitoring of a dynamic virtual environment.
Terms in existing management software license agreements may limit the initial savings and simplification that may come from consolidating into a single network management framework. Some device-specific tools may still be needed when SNMP doesn't provide the level of detail needed. In those instances, an agent-based monitoring tool may be necessary. That said, having a consolidated tool that can eliminate the bulk of device-specific tools offers savings in contracts management, licenses fees, training, maintenance and support and enables a simplified foundation for maintaining and managing service levels for applications.
Action item: When it comes time to consider consolidation of management, it is often best to under-promise and over-deliver. In order to properly set expectations, the individual responsible for selecting a comprehensive management tool should:
- Assemble a team with representatives from server management, virtualization management, network management, storage management, and contract management, strategic sourcing, or IT procurement.
- Create a comprehensive list of current tools in use and the level of asset detail, monitoring, and alerting provided by each of the tools.
- Review each of the current management software licenses for license expiration dates and maintenance renewal periods.
- Select dates for implementation of new tools and retirement of legacy tools, recognizing that all tools may not be replaced at the same time.
- Provide an opportunity for specialists in each of the disciplines to maintain device-specific tools, when necessary to provide more comprehensive levels of monitoring, alerting, and management.