Journal of Network and Systems Management, Vol. 4, No. 1, 1996
Thresholds Edited by Lawrence Bernstein
The System is the Business--But at What Cost? Lawrence Bernstein, Ed Bryan, and C. M. Yuhas Networks are breaking down office walls every day. People's offices are wherever they are, not where their desks are. Databases are stretching around the world. The future is here. This is where it's at: AT&T saves 44% in real estate costs in California with aggressive network computing; lawyers and judges carry entire legal libraries into court with a 7 pound PC; express delivery drivers update their database the instant a recipient signs their computer notebook. These examples are just hints of a tremendous change in the physical structure of business brought about by network computing. Where do you fit as the network manager? The good manager is no longer providing a "support" system; the system is the business. The network manager becomes its living heart. For network managers, this is a level of responsibility even higher than what is experienced today. Today a number of companies depend on network reliability for virtually all their business activity. That model is spreading. Soon, all network managers will be affected by the stemest requirement for reliability as increasing functionality migrates onto the network. How can information professionals address the increased importance of reliability in information services? As your company approaches new levels of highly distributed information processing and sharing, how will you balance the opposing pulls of swift problem-solving via PCs versus protecting the company's information assets by maintaining accounting control and valid databases? Fortunately, the emerging paradigm is an amalgamation of several ideas that have been around long enough for solid engineering principles to be available for their adoption, adaptation and application. The departure from the past is not so radical as to require new tools for analyzing and evaluating system requirements. There is no need for a good manager to do "seat of the pants" system manipulation. Here are six approaches to quality and reliability to help managers distinguish among fantastic claims about network computing, terrific proposals, and honest promises being blandished by vendors. They'll help you write a contract
1 t064-7570196/0300-0001509.50/0 9 1996 Plenum Publishing Corporation
2
Bernstein, Bryan, and Yuhas
for a system with qualifying conditions adequate to ensure the vendor's keen attention to detail and your reputation for running a system that serves. Get a firm grip on your engineering principles and consider the following factors in judging any new aspect of network computing: 1. RELIABILITY. Set a high standard. The future will belong to those software systems that realize better than 99.9% reliability. There are serious operational problems in the range between 98% and 99.9%. Below 98%, the system is intolerable. When estimating reliability in proposals and systems you're examining, count outages from all causes. Users don't care if the problem is power, hardware, operations or software, but they do care about every outage that hits them on line. Software may come to the rescue of other system elements that can experience failure. Good software design can compensate for deficiencies in the environment so that the user obtains acceptable reliability. Good power protection, fast recovery techniques and fault-tolerance built into the software are the most important ingredients in achieving this goal. 2. SYSTEM RESPONSE TO THE UNEXPECTED. The customer (or end user) is the boss. He or she is also a human being, prone to use a system in ways that, from the point of view of the software designer, may appear capricious. Unexpected stimuli to the system are inevitable, and every system must be able to handle them, meeting inventive and unanticipated operations with aplomb. In engineering terms, the system you adopt has to function in a broad range around its designed operating point. Watch for signs that a system is highly optimized for a single operating scenario, for in this case high-performance can mean low quality. The concrete measure here is a rated load test. A rated load is more than just a certain number of transactions per second. Rated load incorporates a proportionally comparable transaction mix from the soak site and the storage requirements. Consider peaks and variations in traffic distribution in light of 24hour-a-day operation. Also, consider the effects of the absence of a load. System designers don't always do this, but you have to. For the user, failure under no-load destroys productivity just like failure under high load. A less exact but still compelling sense of how well a system deals with the unexpected can come from anecdotal evidence obtained at the soak site and from sampling at other sites as the system is widely deployed. Assign someone to call the soak-site operations manager every day. A regular chat will pay off in insights and subtle clues to the "unexpected" that will occur later on. 3. SYSTEM T H R O U G H P U T - T H E O R E T I C A L OR REAL? A third measure of distributed system quality is true system capacity. This is a combination of throughput and response time.
The System is the Business--But at What Cost?
3
Defined in these terms during specification writing, capacity can be measured during system test and again at the soak site. The manager takes periodic measurements for analysis after system deployment. The measurements must come from all sites, not only those that display problems. Also, consider whether the on-line system is engineered such that it outruns its recovery system. Any on-line system capacity in excess of the recovery system's capacity is useless. In the event of an outage, there are not enough hours in the day to recover the database. 4. SERVICE CALLS-TROUBLES OR FAILURES? This quality measure is the number of service calls the customer can expect to make. When you're evaluating a system, measure its quality not by the total number of reported problems, but only by those that cause outages or capacity problems. Trouble reports can be valuable in grading a system if the nature and severity of the problems determine the grade, not just the raw count. 5. COST/EFFECTIVENESS--THE BUSINESS CASE. Client/ server solution is the architecture of choice because of the freedom it gives the end user, the low cost of the PC, the rich software office tools available and the wide choice of suppliers. Even though first costs are low, life-cycle costs need to be carefully managed. Here are some things to look for: a. Establishing system requirements is a substantial and vital effort. Models must be produced and costs, savings and performance predicted. Then the models must be calibrated against data from the crucible of field experience. b. Custom Development. For many large projects, especially those that must be tightly integrated with a company's business, no ready-to-use application will be available. The new system must be tested with users. Therefore, prototypes must be built, developed with a selected tool set and development process. Several iterations of these tasks can be expected because few projects get the requirements, design, and development done in one pass. c. Training. User training and developer training need to be carefully planned. Success of the system depends heavily on acceptance by those who will use it. Network Managers are often asked to make up for inadequate training with elaborate help desks. d. The Prototype Trap. An advantage of client/server systems is the availability of very good, productive development tools which can cut development and maintenance costs and improve product quality, but therein also lies a trap. Prototypes of the target system can be made very easily and quickly, sometimes in a matter of a few days. These are excellent for showing how the system will operate, particularly at the user interface. They are a trap if they give a false impression of how quickly and easily the full system can be implemented. Often what is shown is only a small fraction of the full system. The boundary conditions are left yet to do, as are the error checks, the special situation calculations, the business rules for unique cases and detailed database design, including
4
Bernstein,
Bryan,
and
Yuhas
those for adequate production performance, system security, backup and recovery, and transaction integrity. 6. T H E V A L U E P R O P O S I T I O N . Client/Server systems get much of their advantage from the availability o f standard parts that can easily work together. This gives wide choice o f capabilities, performance and suppliers. Unfortunately, vendors often miss the mark and do not exactly meet s t a n d a r d s - parts d o n ' t always work together, resulting in bugs, system failure, and intermittent problems. The network manager selects the parts that meet the requirements and assembles the system. Real life happens during both initial test and operation when the network manager must find the source o f any problems, identify the responsible vendor and find an operational work-around while waiting for the fix. The difficulty and cost for these efforts are often wildly underestimated. A typical analysis shows that the cost o f a P C - L A N client/server system is $6,445 per user. Here is a sample o f the costs that go along with the benefits o f networked computing. Item
Cost per Client
Client hardware Server hardware Network--either ATM or TCP/IP Adapter System software Network software Middleware Database management Transaction processing monitor Mail software Workflow software System management: User authorization Usage accounting Security File backup and restore Tape management Performance monitoring Load balancing/ Workloading scheduling Software dist./updating Software license control Consultation Software upgrades Training Management support staff
$3K to $10K
Cost per Server $30K to $ 1 5 0 K
--
$1.6K $50 $50 to $100 $1K to $9K --
$120 to $1.5K --
$300 to $1.5K $1K $5K to $40K $15K
$500 $2K to $4K varies greatly by application varies greatly by application $60 ---
-
$6K to $ 8 K $1K to $5K $700
varies greatly by application varies greatly by application $5 to $35 $150 per call and $100 per hour 5% to 25% of first cost per year varies greatly by application varies greatly by application
The System is the Business--But at What Cost?
5
The role o f the n e t w o r k m a n a g e r is to satisfy the c u s t o m e r in the real world o f messy operations, strange flukes and general carelessness. P o o r p e r f o r m a n c e and a timid approach to i n v e s t m e n t s on any o f the quality measures will m a k e the client/server system unacceptable, no matter h o w rich its feature. Meticulous engineering, h o w e v e r , can g i v e a h u g e c o m p e t i t i v e advantage.
Lawrence Bernstein is Chief Technical Officer of the Operations Systems Business Unit at AT&T Bell Laboratories. He holds a BEE from RPI and a MEE from NYU. He has contributed to the evolution of Network Management and is a Fellow of ACM, IEEE, and Ball State. He is listed in Who's Who in America.
G. Edward Bryan is a consultant in software development and methodology. He has worked in software development since 1958, primarily in operating systems and communications. He has held software design and development positions at International Meta Systems, Honeywell, Xerox, Scientific Data Systems, The Rand Corporation, and Bell Telephone Laboratories.
C. M. Yuhas is a freelance writer who has published articles on network management in IEEE Journal on Selected Areas in Communication and IEEE Network. She has a Bachelor's in
English from Douglas and a Master's in Communications from NYU.