In-Depth
Cross-Platform Information Sharing
As companies build their infrastructures to accommodate information access from various computer platforms, storage vendors are making strides to deliver universal access to a single copy of data. The concept of universal information sharing traditionally involves data written from the mainframe to a disk drive with direct read/write access provided to UNIX, NT or other systems types. Users are beginning to realize, however, that in many cases the associated risks and long-term costs of universal and direct information access may, in fact, outweigh the benefits.
Often heralded as the "Holy Grail of IT," the ability to provide common access to a single copy of data by all applications and all users may over-simplify the problem. Attempts at single-copy methods have confronted a formidable list of complexities, including data security, data integrity, transaction application performance and organizational concerns.
New technology exists to enable companies to isolate the information sharing process from these perils. This enabling technology for cross-platform information sharing has nothing to do with the actual movement of data across systems types.
Dealing with Mixed Environments
Two of the most significant IT trends are the closer alignment of technology with business demands and a move towards data-centricity. The former establishes the principle of implementing applications on the most suitable platform; the latter endorses the same principle, but it also emphasizes that the relative suitability of applications platforms changes, and that data can outlive them as a corporate asset. As a result, data storage must be capable of underpinning a varied and varying mix of application environments.
Enterprises must improve, reconstruct and integrate their business processes; pursue product and commercial differentiators, such as time to market; and be prepared to adjust corporate structures through divestment, merger, alliance or take-over. As a result, decision makers have placed increasing demands on IS, becoming increasingly less likely to accept technical restrictions on the volume of data available to key decision support applications, or on the promptness with which that data reflects the state of production TP systems.
Data has an intrinsic value reflecting its contribution to business activities rather than the status of the platform that "owns" it. But the organizational and political issues surrounding universal access to a single copy of data often overshadow the technical ability to make it happen. Ownership of data, and the corollary risk of sharing it, has significant impact on the implementation of information sharing solutions. One group within the organization, usually the data center manager, has control of the corporation’s production information, but the necessity of sharing it introduces additional burden and overhead. The other side, typically the data warehouse or other applications manager doesn’t have ready access and views the exercise as the price of admission for advancing the business.
For example, because billing applications equal direct revenue, a business unit manager’s willingness to provide access to mainframe transaction data for data warehouse updates becomes secondary. Corporate management will show little tolerance if customer billing can not manufacture bills.
However, if IT can provide controlled access to the company’s billing data under well-defined terms and conditions (e.g., predefined schedules, zero impact on data integrity and performance, etc.), only then do the technology and psychology of information sharing align. Technology creates the ability to share. If the psychological barriers that prevent sharing today are removed, then "Holy Grail" information sharing technology comes into harmony with real-world business constraints.
Now, rather than waiting two weeks for access, new technology gives the data warehouse access to billing systems every hour if necessary, with no impact to corporate data security, integrity or system performance. Open systems gain access to current data and the mainframe preserves the sanctity of its primary copy of the data without massive reengineering.
Storage Evolution
Applications are now platform-transcendent. As such, open systems – UNIX and Windows NT – must have access to the industrial-strength storage that is familiar in OS/390 circles.
Even in the more cost-conscious off-mainframe context, raw price per megabyte is becoming extinct as a major factor in acquisition decisions, to be replaced by more long-term total cost of ownership considerations, and a greater awareness of added-value contributors, such as facilitating business function. Those responsible for data must provide access to it at personal computing levels while also managing, controlling and securing it to mainframe standards. Finally, the increasingly international nature of business, especially with the rise of electronic commerce, dictates that production systems and the data to support them are available 24x7.
Increasingly powerful controllers have enabled data storage to progress from strings of dumb devices to disk arrays forming sophisticated input/output (I/O) subsystems. Open server platforms have also gained storage to mainframe standards, in terms of data management services, such as automated archiving, as well as the engineering of the device itself.
In the middle of the decade a debate on device sharing and data sharing raged between industry analysts and I/O subsystem suppliers. Without rehashing the arguments in detail, device sharing allows a single physical device to be shared, by partitioning it internally and attaching each partition to one type of applications server, offering economies of scale and an overall simplification of management and administration.
Information sharing takes the further step of removing the partitioning and permitting heterogeneous access to a single copy of the data, adding to conceptual simplicity but introducing a range of technical challenges, which will be discussed later. This is only one aspect of making data available to more than one platform, and at this time, turns out not to be the most significant one.
Non-Disruptive Multiple Mirroring
A "killer application" has emerged in storage that generates an independent copy of data which is separately addressable to multiple other servers, be they mainframes, UNIX or NT. The huge impact of a new ability to manipulate time through multiple mirroring more than justifies any incremental acquisition costs. Specifically, corporate data is available on a 24x7 basis to support production as well as test and decision support systems.
Vendors and users have only just begun to explore the potential of multiple mirroring. Recent developments are based on making a local, non-disruptive, point-in-time copy of a volume of production data, also known as business continuance volumes (BCV). Production processing experiences minimal to no interruption while database disk-write buffers are flushed. Multiple, non-disruptive copies of mainframe and open systems production data are being used for backups, Year 2000 testing, "euro" currency conversion, data warehouse loading, decision support applications, application development and other activities that require copies of the data.
Providing additional, "zero-production-impact" copies of data within a single applications environment is only one option. Once available, the I/O subsystem can be employed to make its content available to other business areas, other applications and servers, via a translation process and across a partition within the subsystem if necessary. To give an example, production DB2 or IMS data on a System/390 can be propagated into an Oracle, Sybase or Informix data warehouse environment on a parallel processing UNIX server. Such mechanisms depend on device rather than information sharing, and they relieve the applications servers of copying, transferring, and translating duties, imposing only a minimal inter-task communication overhead to trigger the mirroring operation.
There is still scope for further refinement of these processes, especially as speed is always an issue. The effectiveness and performance of independently addressable data will be improved when copies can be taken at a sub-volume level, reducing the amount of data that needs to be copied and propagated to achieve a particular business result.
If information sharing can offer a higher level of access than multiple mirroring, why are storage vendors approaching it so cautiously? In comparing information sharing with multiple mirroring, we need to assess its benefits and take a realistic look at limitations.
By definition, information sharing saves disk space. Multiple mirroring, conversely, has to support as many instances of data as there are types of application server accessing it, plus the temporary overhead of BCVs during the copying and propagating processes.
How serious this issue is depends on the trade-off between disk space, which is a relatively cheap commodity, and the ability to have significant impact on the business’ ability to compete, which is relatively hard to come by. Initial acquisition costs are more than justified by significantly increased impact on the business and reductions in both technical staff overhead and risk.
One variable is the timeliness of access. Real-time information sharing – instantaneous access between servers and storage devices – is incredibly complex and, thus, currently impractical. Where more than one on-line system needs to access information in exactly the same state, even multiple mirroring would have to carry out continual copying exercises. The coordination required between applications, servers and storage systems render this design over-engineered and creates unnecessary overhead on the entire operation.
As users begin to relax the requirement of real-time access and accept the concept of rapid, near-real-time access, an interesting thing happens. A much larger common ground surfaces between the principal players. A focus on the restrictions around why frequent data access cannot be achieved quickly dissolve into a focus on the benefits of timely and fresh data.
Near real-time, as defined by specific user situations, is measured by the acceptable period of time between copies of the data – every hour, every four hours, or once a day as the business demands. Near-real-time sharing creates an environment where data becomes more available with the added benefits of increased data security, data integrity, system performance and organizational efficiency.
By using multiple mirroring technology to address the time aspects of information sharing, near-real-time information sharing suddenly becomes feasible.
Overcoming Security Issues
The technology of protecting mainframe data is 30 years mature. The technology of information sharing, however, has the ability to bypass 30 years of security through direct physical device access. Depending on your perspective, this either is really good or Armageddon. The reality is, the people who "own" the data are not going to share it without a contract that mainframe security rules will be obeyed.
Data security is a major issue. There is a global obligation for organizations to exercise due care, reinforced by general legal requirements in some countries and more stringent rules in some industries, such as banking and financial services. It is axiomatic that any computing environment is only as secure as its weakest point. While off-mainframe platforms can be made as secure as a mainframe, there is a tendency toward security problems. Concerns have been intensified by Internet-based access and the wider aspects of supply chain management. Penetrating security on Web sites has become a non-spectator sport, with numerous well-reported instances.
Storage-based information sharing can be implemented by enforcing mainframe security rules. Since the mainframe only performs security checks, systems performance remains relatively untouched. All subsequent I/Os are between the UNIX or NT host and the storage subsystem.
By incorporating known security procedures and technology, near-real-time information sharing becomes feasible.
Overcoming Integrity Issues
Integrity applies to write access,
or physically "touching" the data. Widespread access to the core data poses possible data corruption, placing in jeopardy key business data.
Data integrity has always posed problems for online systems implementers, with multiple update privileges necessitating specific locking techniques to prevent simultaneous updates from causing inconsistencies. Imagine five people showing up at the airport with tickets for the same airplane seat. While this is no longer a problem within one processing environment, multiple updating in an information sharing environment demands repeated operation of a comparable locking mechanism and inter-task communication between platforms, significantly increasing operational complexity. Any compromise increases the risk of data corruption, and there are particular problems in carrying out recovery. Current examples of information sharing normally simplify these issues by restricting updating privileges to one platform, allowing read-only access to any others.
Data center mainframe managers "own" the current state-of-the-business data. The risks associated with multiple-application access, particularly prior to backup, jeopardizes the business. This may be an unacceptable risk to the business. However, creating an independently addressable copy of the data solves the problem. This allows the production manager to maintain integrity of the primary copy while allowing data access from the secondary copy. Should the second copy become corrupted, the primary copy remains safe and intact.
By pointing the information sharing solution at a copy of the data, thus maintaining data integrity for the enterprise, near-real-time information sharing becomes feasible.
Overcoming Performance Issues
Anything that compromises systems performance is anathema to IT in general and to OLTP in particular. If the method of access dictates the need to stop posting transactions while open systems applications read the file, there is a problem. The introduction of information sharing also can introduce unknown performance risks. The technology of multiple mirrors introduces little to no performance impact, and it is well-known and understood.
Maintaining integrity while sharing data poses distinct performance problems, especially as wide area networks are bound to be involved. Apart from its unwelcome complexity, recovery will also be slow. Open systems platforms are improving in this respect, but they still suffer more performance degradation than a mainframe of comparable theoretical horsepower when carrying out data administration tasks.
Multiple mirroring essentially carries out a single bulk lock, involving much simpler inter-task communication. After the lock has been withdrawn, further on-line processing and, typically, batch processing carry on separately, benefiting the performance of both. As the term "business continuance volume" implies, multiple mirroring benefits from simple and fast recovery.
As each application platform type expects its data to be held in its own particular format, any form of data transfer demands some translation. This is carried out once in bulk during multiple mirroring; data sharing does it during production processing for all but one of the attached applications servers, and does it every time data is accessed by the non-native application.
Through multiple mirror technology, users deflect OLTP performance-threatening activities away from production systems. As a result near-real-time information sharing becomes feasible.
Overcoming Organizational Issues
Simply stated, it often is easier to get computers to talk with each other than to overcome the internal political and sociological complexities involved with cross-functional cooperation.
Information sharing, by definition, involves two major stakeholders with seemingly diametric needs – those who create the data and those who need access to it. They likely answer to separate management with differing agendas, perhaps separated by geography, and may have varied backgrounds and philosophies regarding risk and change.
While the trend towards consolidation places responsibility for management and administration back under the aegis of the IT professionals, users still expect to control their own destinies. As NT grows up into a departmental server platform and beyond, the tendency for an application to span platforms will be reinforced, as will departmental determination to own the application environment. This only exaggerates the need for cooperation and flexibility. Corporate structures are seldom static and shifting platform allegiances add to the inevitability of some change occurring at almost any time.
Within this environment, embracing heterogeneity is more productive than trying to eliminate it. This places a premium on flexibility, where the combination of multiple mirroring and information sharing technologies has distinct advantages. By developing solutions based on multiple mirroring that create common ground and provide access to the data when it is needed, organizational barriers to sharing become irrelevant and near-real-time information sharing becomes feasible.
Information sharing was in danger of becoming an end in itself rather than a means to an end. No technology is useful if it cannot be deployed to solve real-world business problems.
Oh, the real secret: First, make a copy.
About the Author:
John Howard is Product Marketing Manager at EMC (Hopkinton, Mass.).