Science in China Series E: Technological Sciences © 2008
SCIENCE IN CHINA PRESS
Springer
www.scichina.com tech.scichina.com www.springerlink.com
Corrosion science general-purpose data model and interface (II): OOD design and corrosion data markup language (CDML) TANG ZiLong School of Materials Science and Engineering, Tianjin University, Tianjin 300072, China (email:
[email protected])
With object oriented design/analysis, a general purpose corrosion data model (GPCDM) and a corrosion data markup language (CDML) are created to meet the increasing demand of multi-source corrosion data integration and sharing. “Corrosion data island” is proposed to model the corrosion data of comprehensiveness and self-contained. The island of tree-liked structure contains six first-level child nodes to characterize every important aspect of the corrosion data. Each first-level node holds more child nodes recursively as data containers. The design of data structure inside the island is intended to decrease the learning curve and break the acceptance barrier of GPCDM and CDML. A detailed explanation about the role and meaning of the first-level nodes are presented with examples chosen carefully in order to review the design goals and requirements proposed in the previous paper. Then, CDML tag structure and CDML application programming interface (API) are introduced in logic order. At the end, the roles of GPCDM, CDML and its API in the multi-source corrosion data integration and information sharing are highlighted and projected. corrosion, data model, corrosion data markup language (CDML), application programming interface (API), information sharing
Multi-source corrosion data shows great diversity and polymorphism as discussed in the previous work. The effect of these two characteristics is so deep and wide that it makes corrosion data sharing and information extraction pretty hard. Until now, no valuable effort has been made because of the limitation of time and persons who have strong background in both corrosion and computer science. In order to meet constantly increasing demands for valuable corrosion information, three aspects of work are very important: 1) integration and synchronization of multiple-source data, 2) new knowledge discovery of material corrosion, and 3) data mining technique for more accurate information. The general purpose corrosion data model (GPCDM) and public data exchange interface are Received April 25, 2007; accepted July 24, 2007; published online September 10, 2008 doi: 10.1007/s11431-008-0121-x
Sci China Ser E-Tech Sci | Nov. 2008 | vol. 51 | no. 11 | 1850-1857
proposed to integrate the multi-source data. In GPCDM infrastructure, data is expected to be distributed to as more servers as possible instead of one or limited servers. The distributed data design not only fits in the currently “owned by author” picture, but also makes integrated information system more reliable and robust, and is matured in a evolutionary manner. Each joiner of integrated information, either data producer or data consumer, owns equal right, which helps the reduction of information barrier. Continued with the previous work, a reference implementation of GPCDM is created based on a comprehensive discussion on the characteristics of multi-source corrosion data from both corrosion and IT aspects. Moreover, the corrosion data markup language (CDML) and its application programming interface (API) are also created.
1 Requirements of general corrosion data model 1.1 Requirements from information techniques Data model requires many features like reliability and stability from system performance, extendibility and security. Extendibility of model can broaden the coverage of data and information, and prolong the life span of the information system as a consequence. Security requirement comes from the data transportation and distribution. Object Oriented Design (OOD)[1,2] is a currently dominant approach in software design. OOD/A modularizes the real world object according to people’s perceiving habit. A successful OOD helps to decrease the learning curve of new model and lowers the barrier of acceptation. Encapsulation, inheritance, and polymorphism are key concepts in OOD/A. Data objects generated by OOD/A are of hierarchical structure, self-contained, independence, extendibility, encapsulation and reusability, which exactly match GPCDM requirements[3]. Since most of the modern programming languages like Java[4] and C#.net fully support OOD and OOP, GPCDM data objects should be able to be shared and reused among the multiple programming languages and operating systems. Portability of GPCDM data object will make information sharing and distribution easier. 1.2 Requirements from corrosion science From corrosion science aspect of view, the following characteristics should be included in GPCDM: ● Comprehensiveness: Modularized class must match the attributes and methods of the real world objects. The real world object is characterized by specific attributes defined in a class. Specific methods reflect how the real world objects respond to the outside’s stimulation. ● Independence: A relatively independent object is suitable to be embedded into others for construction of complicated objects. Independent classes can be adopted either directly, or with little modification by either adding new attributes and/or methods or redefining them. Due to a wide variety of corrosion data types, the independence design of data model increases the efficiency of class usage, minimizes data redundancy and keeps model compact. ● Container: Scientific data usually presents in a collected form. Value classes of GPCDM must act as a container to encapsulate data of various types. ● Template: It helps the standardization of CDML object construction, simplifies user’s operations and saves more time.
TANG ZiLong Sci China Ser E-Tech Sci | Nov. 2008 | vol. 51 | no. 11 | 1850-1857
1851
Balance: Object tree or model must be balanced based on complexity and comprehensiveness. Too fine model than necessary generates much processing burden at run time, even makes the model too complicated to be used. On the other hand, an over simplified model could lose the coverage of required attributes and methods, and risk the system with endless model patch. ●
2 Node design of general corrosion data model The “corrosion data island” concept, termed CDMLIsland in CDML, is proposed to characterize a comprehensive data object and acts as the access root. The hierarchy diagram of CDMLIsland is illustrated in Figure 1. In the diagram, three icons are used to denote different data nodes, denotes the node being discussed; denotes the sub-node of the discussing node denotes the attributes or without detail; methods of a node. Same notations are adopted in the following diagrams. Six child nodes are included in a CDMLIsland, including: ● IslandInfo: describing the CDMLIsland itself; Figure 1 Structure diagram of corrosion data island with first ● Experiment: about how to conduct the level children. experiment; ● Media: referring corrosion environment such as liquid medium in which material exists; ● Measurement: data related to measurement like the used equipment; ● Electrode: corrosion electrode and any other auxiliary electrodes; ● IslandValue: container of various types of corrosion data such as text, number and binary data. The first five nodes are necessary even required. Without these nodes, the CDMLIsland node becomes an isolated “island” in the information ocean since no path to this “island”. The first five nodes also provide the necessary data to conduct information searching and querying. Considering performance optimization, null assignment to first five nodes and their sub nodes are allowed except some required data in IslandInfo node. Additionally, a utility node called CDMLCluster (refer to the API document for detail) is included to facilitate the organization and transportation of a collection of CDMLIsland nodes. 2.1 IslandInfo node Filling IslandInfo (Figure 2) node is quite appreciated to facilitate search and query of CDMLIsland. ● DoneBy: mandatory, refers who completes or provides the result. Searching of this item could generate a list of what “DoneBy” did. ● Subject: topic of this CDMLIsland, for example, corrosion evaluation of stainless steel in seawater. ● Keyword: a short list of key words to identify features and/or anything important. 2.2 Experiment node Experiment (Figure 3) node is designed to record data related to experiment configuration.
1852
TANG ZiLong Sci China Ser E-Tech Sci | Nov. 2008 | vol. 51 | no. 11 | 1850-1857
Date and Site: start, end time and duration of experiment are encapsulated in “Date” element. “Site” element refers a site like lab where the experiment was done. “Site” element supports GPS coordinates. ● Note: record the observation at three experiment phases, i.e. before, during and after. ● ProcedureList: any procedure adopted in this experiment is encouraged to be added to this container. ● StandardList: any standard adopted in experiment is encouraged to be added to this container. A Standard node is constructed at first by standard name, issued date and organization, and then is put into the StandardList container. For example, ASTMG59-97(2003) polarization curve measurement is adopted in pitting resistance test of stainless steel. A Standard node is at first constructed by [ASTMG59-97(2003), 2003, ASTM], and then is added into the StandardList container. ●
Figure 2 Structure diagram of IslandInfo data object.
Figure 3 Structure diagram of Experiment data object.
2.3 Media node Data related to corrosion environment and media is encapsulated in Media container (Figure 4). ● MediaComponentList: a list of chemical composition of components in media. MediaComponent node needs component name, concentration, and concentration unit for object construction. The constructed MediaComponent node is then put into this container. Example is inhibition performance evaluation in HCl. Two MediaComponent nodes, [HCl, 1, mol/L] for HCL and [Inhibitor A, 0.1, mmol/L] for inhibitor, need to be constructed, then put them into this container. ● MediaParameters: contains a set of media parameters like pH, flow rate in the form of Parameter node ● EnvironmentParameters: same as MediaParameters, but for environment parameters like temperature.
Figure 4 Structure diagram of Media data object.
2.4 Measurement node Measurement (Figure 5) node encapsulates all data required to make measurement understandable. ● Equipments: hold all used equipments with Equipment node as input, in turn, equipment name, TANG ZiLong Sci China Ser E-Tech Sci | Nov. 2008 | vol. 51 | no. 11 | 1850-1857
1853
vendor’s name, model and role in measurement are required for Equipment node construction. ● MeasurementControl: includes control factor and control pattern. Control factor refers control parameter of measurement like potential. Control pattern refers how the control is applied like constant potential control. Default options are provided for both items. ● ControlParameters: a set of Parameter nodes constructed with parameter name, value and unit. Example is the polarization curve test with potential range 0.0―1.0 V and period 1000 s. So three Parameter nodes, [min, 0.1m V], [max, 1.0, V] and [scan rate, 1, V/s], might be constructed in general.
Figure 5 Structure diagram of Measurement data object.
2.5 Electrode node Electrode (Figure 6) node holds required data to describe electrodes involved in the test. ● WorkElectrode: Material name is mandatory in Electrode node like Q235 steel. Geometry node defines the shape, area and area unit of working electrode. Default options are available to shape and area unit element. Special symbol, like ∞ denoting an extremely large area, is also supported in area element. The step by step processing history of working electrode might be recoded in Pre-Processing and Post-Processing nodes. ● OtherElectrodes: home for reference, counter and any other electrodes. ElectrodeInfo node, filled with electrode type, role and shape, is required as input in this container.
Figure 6 Structure diagram of Electrode data object.
2.6 IslandValue node Illustrated in Figure 7 is the IslandValue, the most complicated and valuable node, including six sub nodes. ● ResultParameters: a list of the pieced corrosion data. Each piece of data needs a Parameter node filled with parameter name, value, unit, and then the constructed Parameter node is put into this container. Example is long term weight loss measurement accompanied by instant polarization resistance test. Two parameters, weight loss rate [Weight Loss, 1.0, mm/a] and average polarization resistance [Rp, 1000, Ω×cm2], are needed in this container. If no unit for a parameter, then do not fill in. There are a lot of such examples in corrosion handbooks like leveled or indexed corrosion 1854
TANG ZiLong Sci China Ser E-Tech Sci | Nov. 2008 | vol. 51 | no. 11 | 1850-1857
rate. ● SingleValueLists, TwoValuesList, and ThreeValuesLists: designed to hold data list of number type with one to three columns. TwoValuesList and ThreeValuesLists could be most frequently used because data generated by most electrochemistry workstations comes in these formats. Taking polarization curve measurement as an example, some equipment generates a list of two columns data with format potential-current, and others may generate a list of three columns data with format potential-current-time. All data held in these three containers can certainly be held in MatrixList container. However, the abstraction of these containers is necessary for storage and speed optimization due to very frequent usage of them. ● MatrixList: general purpose container for both number and text data. Use case is long-term and multiple-locations corrosion monitoring of pipeline. All result can be encapsulated in MatrixList container with a Matrix node for each location as input. ● Images: Since photos and pictures are popular corrosion data, this container is designed to hold them. Actually, the Images container can hold any binary data without modification. Named so is just because images is so popular.
Figure 7 Structure diagram of IslandValue data object.
3 Corrosion data markup language (CDML) CDML is implemented to define the data structure and access pattern of GPCDM by taking W3C ― XSD[5 7] as the working language. A namespace and a set of more than 50 tags are included in the CDML. The purpose of namespace is to avoid potential name confliction with that defined in other languages. The CDML namespace is defined as “http://www.tju.edu.cn/zlt633/schemas/cdml” short for “cdml”. All tags in CDML belong to this namespace. Tags are categorized into packages. Tag embedment defines structure and relationship of corrosion data. Structures of first-level child nodes of CDMLIsland and CDML source code are presented in Figure 8. The tag structure exactly matches that of GPCDM. Limited by length, no more tag structure is presented. Interested user may refer to the API document and CDML source code for detail. CDML defines structure and layers of corrosion data instead of corrosion data itself. Data is included in the XML document, and XML document is validated by CDML to confirm GPCDM in turn. API library of CDML for programming language is required to make CDML work.
TANG ZiLong Sci China Ser E-Tech Sci | Nov. 2008 | vol. 51 | no. 11 | 1850-1857
1855
Figure 8 Snapshot of CDML source code for CDMLIsland Tag.
4 Corrosion data markup language and its API Two API libraries are implemented by the author, one for Java language from SUN and one for .Net languages from Microsoft. Java API library is taken for explanatory purpose in this work. Totally, 11 packages and 65 classes are included in CDML API library as illustrated in Figure 9. Packages and classes are arranged to match nodes structure in GPCDM. This arrangement is also in accord with the learning habit of corrosion scientists, therefore decreases the learning curve of CDML API library. Fortunately, users do not have to deal with API library directly because the GUI front will help user out. GUI is so simple that user experiencing with web browser should not have any problem to use it. The primary function of API library is the seamless transformation of data among text, stream and data object of various programming language. It is this function that makes GPCDM and CDML critical in the data integration and sharing. The advantages of GPCDM and CDML are summarized in the following short list: ● Guide the design of database schema. ● CDML API adoption in both client and server sides promotes the data sharing and exchange. ● GPCDM minimizes the data redundancy and can be extended for special case. Figure 9 Snapshot of CDML Java API package. ● GPCDM facilitates the searching, querying and 1856
TANG ZiLong Sci China Ser E-Tech Sci | Nov. 2008 | vol. 51 | no. 11 | 1850-1857
other data access operations. ● Data object defined in different programming languages and operation systems can exchange seamlessly. ● Built-in encryption/decryption of CDML API makes data protection transparent and information secure.
5 Summary and conclusion GPCDM and corrosion data markup language (CDML) including API library for Java and .Net platforms programming language are created: (1) “Corrosion data island” concept is proposed to model the corrosion data of comprehensiveness and self-contained. Meanwhile, it acts as the access root of all sub nodes. (2) Data structure of GPCDM follows the logic that corrosion scientists get used to. The acceptance barrier and the learning curve of GPCDM are expected to be decreased as a consequence. (3) GPCDM data is portable across computer languages and operating environments. (4) API libraries of CDML for Java and .Net can be used in the standard and embedded applications. And the built-in security which makes data encryption and decryption transparent. 1
Arthur J R. Object-Oriented Design Heuristics. Boston: Addison Wesley, 1996. 1―19
2
Kuchana P. Software Architecture Design Patterns in Java. London: CRC Press LLC. 2004. 55―75
3
Tang Z L. Corrosion science general-purpose data model and interface I – Meanings and issues of design and implementation.
4
Tang Z L, Qian X, Zhao K. J2SE Advanced Features (in Chinese). Beijing: Mechanical Industry Press, 2004. 1―35
5
Martin F. UML Distilled. Boston: Addison Wesley. 2003. 39―56
6
Bates C, XML in Theory and Practice. England: John Wiley & Sons Ltd, 2003. 13―99
7
Gabrick K A, Weiss D B. J2EE and XML Development. Greenwich: Manning Publications Co. 2002. 37―155
Sci China Ser E- Tech Sci, 2008, 51(8):
TANG ZiLong Sci China Ser E-Tech Sci | Nov. 2008 | vol. 51 | no. 11 | 1850-1857
1857