Chin. Sci. Bull. (2014) 59(15):1755–1764 DOI 10.1007/s11434-014-0225-6
csb.scichina.com www.springer.com/scp
Article
Materials Science
Integrated materials design and informatics platform within the materials genome framework Zhuo Wang • Xiaoyu Yang • Yufei Zheng Qilong Yong • Hang Su • Caifu Yang
•
Received: 14 August 2013 / Accepted: 15 October 2013 / Published online: 11 March 2014 Ó Science China Press and Springer-Verlag Berlin Heidelberg 2014
Abstract The fundamental idea of materials genome initiative is the integration of computing platform, experimental platform, and data platform to speed up the material innovation hence reduce time and cost. This paper describes the basic concept of building an integrated computational platform and data platform for material innovation from the perspective of high-throughput simulation and materials knowledge management. The material data platform that can integrate material database, heterogeneous material data, various scripts, and open-source material simulation code together is particularly discussed. Taking metallic materials as an example, a brief introduction to metallic materials data management is given, and how to manage the semi-structure and unstructured iron and steel material data is also presented. Keywords Integrated materials design Materials database High-throughput materials simulation Materials informatics Experimental data management MatCloud Electronic supplementary material The online version of this article (doi:10.1007/s11434-014-0225-6) contains supplementary material, which is available to authorized users. SPECIAL ISSUE: Materials Genome Z. Wang (&) Q. Yong H. Su C. Yang Central Iron & Steel Research Institute, Beijing 100081, China e-mail:
[email protected] X. Yang Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China Y. Zheng Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
1 Introduction The exploration and production of new materials have become one of the key factors that restrict the rapid development of industrialization. For decades, developed countries, the United States for instance, have been maintaining their uncontroversial leadership of global competitiveness by developing advanced materials and using them to design new products [1]. However, the United States government clearly recognizes that the problems are hidden under the prosperity. The cycle of the material development and optimization has not been kept as fast as expected. The application of the new materials manufacturing techniques has become more and more difficult and the application period has become shorter and shorter. On one hand, the advanced computing and engineering technologies have already been available that can radically reduce the time needed to optimize new products. On the other hand, at the moment, there is no integrated information platform specialized for in material design and engineering, which makes the development cycle of new materials unable to keep up with the cycle of product design and development. Thus, the development of the materials has become a bottleneck in the product design process. It has potentially become a threat to the competitiveness of certain influential industries in the United States, such as the electronics industry, the automotive industry, and the aviation industry. In the past, it was the efficient collaboration of product designs, materials development, and manufacturing processes over the years that has laid a competitive advantage in these industries for the United States. So, on June 24, 2011, U.S. President Barack Obama announced Advanced Manufacturing Partnership (AMP) program with more than 500 million U.S. dollars, which aimed to strengthen U.S. advantage in manufacturing through the cooperation among
123
1756
the government, universities, and enterprises. Again materials genome initiative (MGI), as one of the four sub-plans in the program, is placed in an important position for the efficient development and application of new advanced materials. MGI aims to achieve the efficient development of new materials through the development and integration of computational tools, experimental tools, and data & information tools. Shortly after the announcement of AMP, the U.S. government announced another program Big Data Research and Development Initiative on March 29, 2012, to promote the ability to acquire the knowledge and insight from large data collection. The Big Data initiative has reinforced the idea of MGI that the increased use of data and computing technologies in new material design can accelerate the industry development. MGI promotes materials innovation by three major components, i.e., computing platform, experiment platform, and data platform. In this paper, the visions for integrated materials design and materials data platform are proposed. In authors’ opinion, the main functionality of materials data platform is to help achieve the integrated materials design, which is mainly composed of material basic design system and material information knowledge database. Taking metallic materials as example, the data source of basic material information system for metallic materials can be composed of two parts: materials basic data obtained from the first-principles calculations and materials thermodynamics data optimized by calculated phase diagram (CALPHAD, a representative method) and similar methods. The materials knowledge system is the processing, acquisition, cleaning, storage, deep search, data mining, and analysis application on materials information. The role of the materials data platform is to get the theory data of materials by high-performance, high-throughput computation, and combined with experimental data and empirical data to process massive, multiple heterogeneous material data in order to achieve the search for materials ‘‘gene,’’ by means of knowledge discovery, data sharing, and cross-platform combined study. The integrated materials design can be an effective way to abandon the traditional ‘‘try-and-error’’ materials development mode and transit to a materials innovation mode which is theory based and predictable.
2 Integrated material design and data management 2.1 Integrated materials design: effective integration of materials simulation, materials database, and materials data analysis The traditional ‘‘try-and-error’’ approach for new material design is time-consuming hence ineffective. We think the
123
Chin. Sci. Bull. (2014) 59(15):1755–1764
computational combinatorial chemistry and informatics techniques can be appropriately employed to identify the material ‘‘gene’’ in a theoretical and predictable way. This can be an effective method to transit new material design from ‘‘try-and-error’’ approach to ‘‘integrated material design.’’ Combinatorial chemistry approach initially has gained considerable success in the pharmaceutical drug discovery [2], and has now been widely used in new material design experimentally to reduce the time and cost. The principle of combinatorial chemistry approach is to prepare sets of chemical ‘‘building blocks’’ to form a diverse set of molecular entities, and scientists can then employ ‘‘high-throughput screening’’ methodology to identify the appropriate compounds. Material informatics is ‘‘the application of computational methodologies to process and interpret materials science and engineering data’’ [3]. Modern materials science and engineering research produce a large amount of heterogeneous data, and computational methods that can help organize, manage, interpret, and analyze these data are now becoming essential tools for materials science and engineering community. However, the integrated material design approach requires a comprehensive e-Infrastructure to facilitate the material development. The existed e-infrastructures or material computational platforms/tools (e.g., AFLOW) are either closely tied to material simulation code that imposes some usage restrictions or are not entirely for open access, which has restricted the material innovation in some sense. In order to address the limitation, an associated highthroughput Density Function Theory computation e-infrastructure to facilitate the integrated material design approach is now being developed by Network Computer Information Center of Chinese Academy of Sciences. We aim to investigate how the idea of ‘‘building block’’ and ‘‘high-throughput screening’’ from combinatorial chemistry can be used in computational material science to build a material computational platform and associated database, where data, various scripts, and open-source material simulation code can be integrated. We believe the innovative methods and techniques for computational material science and material informatics developed in this project can open new dimensions for ‘‘integrated material design’’ and MGI. A high-level schematic presentation of the e-Infrastructure to facilitate the integrated material design is shown in Fig. 1. We think the integrated materials design e-Infrastructure will consist of the following 4 modules (Fig. 1): highthroughput materials computation, computing environments, materials databases, and data analysis. These modules are further discussed as follows: (1) Computation of high-throughput materials will support multi-scale simulation including the first-principles calculation (such as quantum mechanics, thermodynamics, and kinetics), and
Chin. Sci. Bull. (2014) 59(15):1755–1764
1757
Fig. 1 (Color online) Integrated materials design: effective integration of material simulation, materials database, and materials big data
the calculated data will be stored in the material database; (2) Computing environment will provide a responsive, lowcost, and easy-to-use high-throughput computing hardware and software deployment; (3) Database will include computing database and basic material database; and (4) Material computational data, experimental data, and empirical data constitute the material multi-source heterogeneous data. Material data analysis modules use techniques such as mathematical statistics, machine learning, and data mining tools to establish materials knowledge base, thereby exploring ‘‘materials gene’’ which can decide critical properties of materials (‘‘arrangement of atoms in a material microstructure decides the properties of the material, like gene sequencing decides functions of the human body in the human cells,’’ so that the ‘‘materials gene’’ can be considered as a relationship of ‘‘componentstructure–property’’ [4] in some way), and used to inform the new materials design. A demonstration of highthroughput material computation is discussed as follows. 2.1.1 MatCloud MatCloud is an integrated high-throughput materials design platform, which is now under development by Computer Network Information Center of Chinese Academy of Sciences following the above materials integrated
design concept. MatCloud (v 1.0) currently supports the CASTEP material calculation software. It has already connected to the Supercomputing Centre of Chinese Academy of Sciences (http://www.sccas.cn). It now supports on-line crystal structure modeling, on-line highthroughput computational jobs submission & monitoring, post-simulation calculation, and data integration. In addition, MatCloud (v 1.0) can also support the setup of exchange–correlation functions, pseudo-potential, and the k points. The whole simulation process with post-simulation calculations can be automatically completed without human intervention. 2.1.2 Test case The calculation of band structure by MatCloud is demonstrated. YbZn2Sb2 is a new thermoelectric material developed by the project partner, which has enriched experimental data. It is usually the case that YbZn2Sb2 needs to be doped to modulate the associated properties. In this case we need to calculate the electrical properties, thermal conductivity, electric conductivity, etc., of this material under different doping concentrations. This involves a series of simulations and calculations, and it is a pain to get it done manually. With the help of MatCloud, running this series of simulations and calculations becomes straightforward.
123
1758
Here we look at the calculation of band structure of YbZn2Sb2 at a certain doping concentration. The MatCloud can automate the generation of simulations jobs, job submission and monitoring, data download and archiving, band structure calculation, and band structure plotting. Once these series tasks end, the band structure diagram has been generated already, as shown in Fig. 2. The generated band structure can be verified against the CASTEP output as follows: in the CASTEP output, there are 188 k points distributed in 4 groups; each group has got 47 k points; and each k point has got 39 electronic energy values. MatCloud can automatically finish the critical data extraction operation and create this band structure diagram. 2.2 Materials data platform The U.S. National Bureau of Standards (NIST) has built dozens of types of database, among which material database account for a large proportion, for instance, alloy phase diagram database, ceramic phase diagrams database, material corrosion database, and material friction and wear database. The crystal data center collects 100,000 crystal data, including metallic and non-metallic materials. There are a variety of ways to search data such as advanced search, according to the type of material, the nature, composition, product name, manufacturer, etc., and retrieval is very convenient. It is also one of the main promoters of American Materials Genome Project. It aimed to create a more efficient way to share materials research and development information, to create a more efficient and rapid material application process in combination with the manufacturing industry. Japan’s National Institute for Materials Science (NIMS) is also very active in the materials database. In the past few
Fig. 2 (Color online) Creation of band structure diagram automatically by MatCloud
123
Chin. Sci. Bull. (2014) 59(15):1755–1764
decades the Japanese National Institute for Materials Science NIMS built the online material database website, namely, MatNavi. MatNavi is now composed of eleven materials database: radioactive substances and purifying agent database; neutron transmutation data; thermal conductivity database; diffusion database; superconducting materials database; metal material database; CCT database; and creep, fatigue, corrosion and strength database for construction materials. MatNavi not only supports fundamental education for materials science on the crystal structures and phase diagrams, but also provides support to the development of new materials, and provides optimization selection and application of materials for the private enterprise to the researchers and engineers in order to build a safer society. In addition, as a useful tool MatNavi is used to predict material properties, compare material properties, and identify materials. In terms of material data system, Massachusetts Institute of Technology and Lawrence Berkeley National Laboratory jointly developed a MaterialsProject project under the framework of the MGI which is a web-based platform for materials information acquisition, calculation, and prediction. MaterialsProject uses advanced methods of scientific computing and innovative design tools, and can speed up the research and development of new materials. MaterialsProject currently contains 30,732 material data; 3,044 band structures data; 409 intercalation compound battery materials data; and 15,175 battery materials data, equipped with functions and modules such as material retrieval, phase diagram tools, retrieval of battery materials, chemical reactions calculators, structure prediction and crystal structure toolbox, and so on. Central Iron and Steel Research Institute (CISRI) has 10 years of working experience in constructing steel materials database. Cooperating with other scientific research institutes and enterprises, it has built the aircraft engine material database, marine steel material database, automotive steel material database, petrochemical steel material database, advanced iron and steel database, alloy steel database, international iron and steel material database, etc. It is very convenient for users to choose the economic and reasonable material and process in many of the steel material database according to the specific conditions of use. Moreover, the collected data reflect the needs for national economic development since the founding of our country; citing foreign steel and the characteristics of China’s resources to develop new steel major findings and research trend basically can meet the needs of the industry in our country at present of alloy steel varieties application demand, and the formation of our own steel material system. However, with the rapid development of information technology, the development of the modern steel material database is not a simple sense of the
Chin. Sci. Bull. (2014) 59(15):1755–1764
structure of the database, complete the data collection classification and query the function can be retrieved. With greatly enhanced computing capacity and the development of data search engine, materials science, physics, chemistry, mathematics, engineering mechanics, and many other disciplines intersect with each other to form the initial framework of the integrated R & D. A new generation of materials database will be capable of seamlessly integrating large-scaled material property database; material reference database; the standard library, books, patents and industrial applications repository; materials simulated cloud computing platform; and experimental data traceability system; cross-database based on intelligent, across data structures unified data mining and analysis, found the research and application of innovative methods and trends in research into the application of a uniform integration framework which improve the efficiency of research and deployment, to achieve the ultimate goal to shorten the cycle of material application. Material database applications: the choice and optimization of the material. CISRI, with the simulation team, user unit works in close cooperation—according to the characteristics of the materials genome framework, by combining materials database with materials application system, and focusing on the optimization materials selection and the breadth of calculation extension. Here is a combined case about steel material database and petrochemical equipment management system to describe our research ideas. The case achieved the refinery process material management based on API571 and the corrosion environment calculation simulation, exploring and made material cost, and weldability estimates model based on typical ingredients agreed, for follow-up design and the developed petrochemical device auxiliary selection system laid a good of foundation, and can preliminarily achieve material optional and optimization process based on material automatically match technology [5] (Fig. S1 online). Cost analysis model can be established on the basis of selection material optimal system to cost estimate: P¼2
N X
w i pi ;
i¼1
where P is the imputed cost of materials, pi is the price of components (elements) i (Yuan/kg), and wi is content of components (elements) of i in the typical steel. Twice factor is cost increased after roughly considering the technological factors—profit. The elements mainly taken into account are Si, Mn, Cr, Ni, Mo, V, Cu, Nb, Ti, Al, W, Co, B, Re, and so on. The cheap elements C, P, and S or impurities are excluded. Assuming under compound corrosive environment, corrosion mechanism which has the highest corrosion rate
1759
dominates, and the corrosion rate under this mechanism is a compound corrosive environment corrosion rate. In this process, the user enters the materials, parameters needed by corrosion mechanism, corrosion rate threshold range, and variety of corrosion mechanism to calculate corrosion rate. On the basis of this research, through expansion, deepening of the database cluster, and through developing and perfecting intelligent selection model, further supporting the development of fully functional database of steels for petrochemical refinery and aided selection system to serve the refinery project, such as engineering design, material procurement, installation construction/operation/maintenance, fault analysis, life assessment throughout the life cycle, and which has broadened application prospects. According to this idea, the system can also be combined with additional manufacturing data systems, and can short cycles for material from discovery to application development.
3 Application of semi-structured or unstructured data managing and computing technology Information could be categorized into three types: structured information, semi-structured information, and unstructured information. Structured information refers to the data of clearly defined mode and fixed structure, such as data in relational database; semi-structured information refers to those data with no clearly defined mode or fixed structure, for example, a comprehensive database integrated by heterogeneous database, and manuals, standards, web pages, and so on; unstructured information refers to those data with no pattern or text, such as technical staff papers, patents, work experience, etc. Due to the breadth and complexity of the knowledge of the material, a large part is unstructured or semi-structured. Traditional information processing technology is capable of dealing with structured data and information, but there is no effective management method for semi-structured or unstructured data and information. In order to effectively resolve this problem, our material knowledge service platform employs the advanced technology in knowledge management field; ontology, metadata, data mining and information extracting techniques were used to finish semi-structured or unstructured data collecting and classification and clustering, and also combined with deep learning, depth search, and computing technologies, to further realize the functions of trend analysis and information discovery, so as to provide strong support for data and analysis for materials research and deployment. In order to solve description and semantic heterogeneity of materials’ knowledge document which is massive and
123
1760
Chin. Sci. Bull. (2014) 59(15):1755–1764
semi-structured or unstructured; first, we must build the material knowledge ontology, define material metadata, and express clearly and semantically about concept properties in the field of materials and the links between them; and then use the material body to transform semi-structured or unstructured material knowledge into structured knowledge.
3.1 Materials ontology Ontology originated in a philosophical concept, but it has already formed its own unique meaning after constantly developing and deepening in the field of computer science. Ontology is an explicit description of concepts and relations existing in a domain and an explicit specification of conceptualization. Generally speaking, the ontology provides a set of terms and concepts to describe a field of knowledge, and knowledge base uses these terms to express the fact in the field [6]. At present, it has gained more and more attention in the field of computer science, and has been used widely in the fields of knowledge intelligence, database design, information model, information inquiry, etc. The concept of ontology is used in the research of existential nature of entities to provide the means for knowledge sharing and interoperating among different systems, and to achieve knowledge representation, sharing, and reusing. The representative definition
was given by Gruber [7] in 1993: the ontology is the formal and explicit specification for shared conceptualization. It contains four meanings, i.e., conceptualization, clarity, formal, and sharing. The ontology is designed by human as a formal representation of the conceptual model in a field. Some applications in the real world could be abstracted or generalized into the relationships between a set of concepts and one concept, which are used to construct ontology in that field and to be able to better support the collaboration between people and computer. In the design of materials ontology, the basic element is category which represents the belonging collective in the material field. Each category has a corresponding individual and instance. The basic relationships between categories are inheritance and implementation. We employ frame-based representation to describe material knowledge objects, which use slot descriptor to define category and its individual and instance, and use facet to limit the slot name and slot value—as shown in the following figure. Things in the real world usually have a lot of attributes and relationships. Therefore, the definition of a category may contain a considerable number of slots to describe them. Each frame includes a frame name and a set of slots. The frame name is the identifier of knowledge object—the steel material, and hence the name is unique in material knowledge database. A slot is a pair of a slot name and a slot value, and the currently used slots include two types, i.e., the attribute slots and relationship slots.
Ontology definition of “steel” (partly) { Subclass: steel English name: steel Form definition: all (X: substance) is a (X, steel) is main chemical composition (iron, X)^ with chemical compositions (X, C, 0.02%-2.04%) Property: steel grade : Type string : Use of country : Comment “the grade of steel called steel grade for short, is the name taken on each specific steel products” Property: chemical composition : Type string array : Comment “the chemical elements of material” Property: physical property : Type string array : Value domain: strength, toughness, plasticity, hardness… Classification based on: (application; aeronautics and astronautics, ships, marine engineering…) Classification based on: (chemical composition; Non-alloy steel, low alloy steel, alloy steel…)
123
Chin. Sci. Bull. (2014) 59(15):1755–1764
3.2 Material metadata Metadata that is, data about data, is a description of data information [8]. It describes a structured data of some resource, including basic concepts, relationships, and constraint semantics. It also includes the steps and principles for construction of database model. It has descriptive, dynamic, diversity, complexity, multi-level nature, and supporting features. It consists of words and symbols which describe resource features. Specific metadata can be formed according to the mutual relations and different combinations of these elements and is applied to the different areas of expertise. The use of the metadata is to identify, evaluate, and track resource changes in the course of application in order to achieve effective discovery and searching of a large number of information resources, and efficient management. The metadata provides specifications, descriptive benchmarks, and approaches for the various forms of digital information unit and resource collection, and play an increasingly larger role in the organization and management of network information resources. The metadata is the basis and prerequisite for the information sharing and exchange, and play its full role in data document building, data issuing, data converting, data usage, etc. Metadata can provide instructive description for data sources, data owner, data quality, data series (data production history), etc. It can also realize complex information resources sharing and mutual operation for fuzzy model semantic, model integration, and information sharing. According to the above characteristics, the metadata can be
1761
applied to manage complex product data, such as integration, sharing, query, reading, and exchanging. Tables 1 and 2 are the name and value of metadata, respectively. 3.3 Description method of material data based on metadata and ontology based Ontology and metadata can be used as standard encoding language for formalizing process, and can be used to describe a range of resources. The ontology which describes the domain is called as domain ontology, while the metadata which describes the domain is called metadata standard. Metadata standard defines the specifications for describing resources. Without metadata standards, the description of the resources cannot be unified. But only the metadata is not enough, ontology is also needed to illustrate metadata semantics. Ontology is a concept interpretation in specific areas. The ontology, which can express semantic logic and can be used for reasoning, makes the terminology in the field to form a knowledge system. The metadata in general focuses on information resource taxonomy and information description about resource itself, which express weaker and less clear relationship among resources than ontology does. Metadata is the solution to describe resources, while ontology describes the relationship between resources. The relationship between metadata and ontology can be seen as the relationship between syntax and semantics, or micro and macro relations. Material data are of many characteristics, such as a wide variety, different formats, complex relationship, unstructured features, etc. Traditional descriptive methods for
Table 1 Name of materials metadata Metadata name
Instruction
Upper class of metadata
List of upper metadata name
No. of upper class metadata
No. of upper class metadata
No. of materials metadata
Using mnemonic hybrid coding, has a unique identifying role
Description of materials metadata
Description of materials metadata
Type of materials metadata
String, integer, float, time, picture
Comparison of materials metadata
More than, less than, not greater than, not less than
Keywords list
The keyword described material metadata
References
Information material for reference
Language
Language encoding value in accordance with IEEE
Editor name
According to the definition of language habits
Editor code Editor time
Code for editor, unique You can use the characters (such as ‘‘/’’ or ‘‘-’’) or space separated date elements
Class materials related
Materials related to the material class
Maintenance records
Maintenance records of point. Format is as follows: update time, update people, update contents
On-line time
Information published
Other properties
Spare
123
1762
Chin. Sci. Bull. (2014) 59(15):1755–1764
Table 2 Value of material metadata Metadata name
Instruction
No. of materials metadata value
Using mnemonic hybrid coding, has a unique identifying role
No. of materials type
No. of material, e.g., 08# steel
No. of materials metadata
The code of material metadata name, e.g., steel grade
No. of materials type value
No. of material, e.g., 08# steel
Value of materials metadata
Description of value of materials metadata, e.g., 08, 08F
Keywords list
The keyword described material metadata
References
Information material for reference
Language
Language encoding value in accordance with IEEE
Editor name
According to the definition of language habits
Editor code
Code for editor, unique
Editor time
You can use the characters (such as ‘‘/’’ or ‘‘-’’) or space separated date elements
Class materials related
Materials related to the material class
Maintenance records
Maintenance records of point. Format is as follows: update time, update people, update contents
On-line time
Information published
Other properties
Spare
resources based on keywords and attributes could not provide a complete description for resources and could not achieve automatic discovery of material data. Using semantic web technology, based on study of the metadata and ontology, the description method for material data based on the metadata and ontology is proposed. The method can introduce explicit formal semantic description to describe resources. By introducing the semantics, it can achieve consistent and common understanding to resources for materials users and materials applications. Figure 3 shows the metadata- and ontology-based resource description method. First, XML Schema is used to describe material metadata information and realize format description for
materials data. Then, material ontology is used to describe the relationship among metadata. With the help of semantic description of materials data, it provides formal specification description for metadata and its relationships and realizes the formal description for material data. 3.4 The process of semi-structured or unstructured data into structured knowledge First, the pretreatment such as data structures characteristic extraction, redundant information cleaning, and remodeling and indexing pretreatment of semi-structured or unstructured material data are processed on the basis of the
Fig. 3 (Color online) Description method of materials data based on metadata and ontology
123
Chin. Sci. Bull. (2014) 59(15):1755–1764
established material metadata and material ontology. Then the preprocessed data are organized by means of the domain dictionary for word segmentation and speech marking, and relevant set of terms in the field are extracted and used to build material knowledge and expand material ontology. Finally, according to the material ontology to obtain, represent, and organize various knowledge in fields, to form the knowledge libraries and associated knowledge specification libraries based on ontology, to provide system specifications and knowledge source for future knowledge retrieval and sharing, and to establish open bridge between unstructured knowledge sources and application service layer, Fig. 4 shows the specific process. 3.5 Information management system for steel and iron: the processing of unstructured and semi-structured data Through the above design concept, CISRI set up steel materials information management platform based on the construction of metadata, organization of material knowledge, and depth retrieval that can realize the description and flexible management of unstructured and semi-structured data. First, we try to make metadata description about unstructured and semi-structured data in steel knowledge, see Fig. S2a (online); Upon completion of the metadata description of the steel materials, knowledge system of steel was systematically build, see Fig. S2b (online).
Fig. 4 (Color online) Process of semi-structured or unstructured data into structured knowledge
1763
4 Deep retrieval On this basis, the knowledge service platform will use advanced search computing technology to provide users with a better service, to achieve semantic retrieval, to improve recall and precision rates of material knowledge retrieval, and to provide efficient data analysis. Firstly, create a user model that contains the personalization information shown in Fig. 5 to record user knowledge background, idioms, etc. After receiving a request to retrieve from a specific user, according to the user model, we process pretreated and then submit to knowledge management systems, using ontology matching techniques to search match and filter of semantic levels to get optimal search results. When needed, the user can analyze the search results for further processing by some analysis calculation module from system and the system will automatically give the user the desired final result. Finally, according to the results of a user selected to correct user model, constantly improve the quality of retrieval.
5 Experiment data management system Improving the ability of calculation simulation and materials information data processing and analysis can accelerate the R&D efficiency of materials. But, cross-scale multi-layered material computational simulation cannot be completely established because of the limitation of calculation ability, precision, and calculation theory. Therefore,
Fig. 5 Search based on knowledge service platform
123
1764
the development of advanced fast means of testing laboratory and efficient experimental data processing capabilities, such as combo chip technology and proliferation multi-section method, are also essential. In other words, rapid processing and analysis of experimental data can accelerate the basic understanding and the pace of new material research for materials researcher, effectively filling the gap left by current computational simulation. At the same time, comparison and validation of experimental data can provide computational simulation with continuous optimization data and data basic for models. Relationship between composition, processing, and property of new material is the main content of experiment, comparing these experiments which used for complementary of calculation methods and uncertainty data generated by theory (data generated during the experiment) [9] into the deterministic system (have evaluated mature material system), together with the ‘‘empty data’’ [10] left in the data distribution of material properties in system, can find new materials development direction to fill some of the special properties gaps—thereby improving the ability to design and selection of materials. CISRI has developed a CCT curve data processing system based on the above concept—the discrete data point’s reconfiguration, storage, analysis, and visualization through the system; and the microstructure photos are aggregated with CCT data. Through large amounts of data comparison, researchers can quickly find the target material relationship and law between microstructure and heat treatment, thus speeding up the materials screening. This system has been fully applied to the development of a variety of structural materials, in order to achieve rapid development purposes (Fig. S3 online).
6 Conclusions The integrated material computational platform and data platform are becoming increasingly important. In this paper, the integrated material design approach was primarily discussed, and a high-level schematic presentation of the supporting e-Infrastructure was proposed. An
123
Chin. Sci. Bull. (2014) 59(15):1755–1764
integrated high-throughput computational material platform, namely MatCloud, is now under development and a demonstration of using MatCloud to automatically calculate and plot a band structure is given. A brief introduction of the Metallic Materials Data Management Platform and the associated techniques for handling the unstructured data were given. The process of how to transform the semistructured or unstructured data into structured knowledge was lastly described. Acknowledgments This work was partially supported by Hundred Talents Program of Chinese Academy of Sciences, and Director Funding of Computer Network Information Center, Chinese Academy of Sciences. Thanks to the ‘‘China Engineering Science and Technology Knowledge Center’’ of Chinese Academy of Engineering for the support and funding on the metal material information platform. Thanks to Key Laboratory of Transparent Optical Functional Inorganic Materials of Shanghai Institute of Ceramics, Chinese Academy of Sciences for the assistance in the development of MatCloud platform.
References 1. National Science and Technology Council. Materials Genome Initiative for Global Competitiveness, 2011 June 2. Li JWH, Vederas JC (2009) Drug discovery and natural products: end of an era or an endless frontier. Science 325:161–165 3. Rodgers JR, Cebon D (2006) Materials informatics. MRS Bull 31:975–980 4. Balachandran PV, Broderick SR, Rajan K (2011) Identifying the ‘‘inorganic gene’’ for high-temperature piezoelectric perovskites through statistical learning. Proc R Soc A 467:2271–2290 5. Su H, Zhang J, Chen XL et al (2005) A computer matching technique for steel grades comparison between different standards. Mater Rev 11:8–11 (in Chinese) 6. William S, Austin T (1999) Ontologies. IEEE Intell Syst 14:18–19 7. Gruber TR (1993) Towards principles for the design of ontologies used for knowledge sharing. Int J Hum Comput Stud 43:907–928 8. Selim SZ, Alsultmi K (1991) A simulated annealing algorithm for the clustering problem. Pattern Recogn 24:1003–1008 9. Ashby MF (2005) Hybrids to fill holes in material property space. Philos Mag 85:3235–3257 10. Ashby MF (1998) Checks and estimates for material properties, I: ranges and simple correlations. Proc R Soc London Ser A 454:1301–1321