J Grid Computing (2018) 16:317–344 https://doi.org/10.1007/s10723-018-9438-2
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs Yacine Djemaiel · Sarra Berrahal · Noureddine Boudriga
Received: 17 July 2017 / Accepted: 18 February 2018 / Published online: 7 March 2018 © Springer Science+Business Media B.V., part of Springer Nature 2018
Abstract Nowadays, healthcare applications become among the most important services that help to improve the public safety through, for example, the prevention of the propagation of some epidemic diseases from patient to another or better from a location to another. Wireless Body Area Networks (WBANs) are considered among the major sources that enable the collection of such kind of data that shares in most cases the properties of big data and that needs to be stored and managed in an efficient manner. Regarding this need, Wireless Storage Area Networks (WSANs) have been considered among the alternatives to store generated health data. Even if these infrastructures enable the storage of huge volumes of data, there are some issues related to the efficient storage and processing of health data that still not resolved and that are of interest for the research communities. In this context, this paper proposes a cloud-based WSAN approach that enables the storage and the management of health data in an efficient manner by representing the collected data and their dependencies using Temporal Conceptual Graphs (TCGs). The validity of the
Y. Djemaiel () · S. Berrahal · N. Boudriga Communications Networks and Security Research Laboratory, University of Carthage, Carthage, Tunisia e-mail:
[email protected] N. Boudriga Computer Science Department, University of Western Cape, Belleville, Cape Town, South Africa
generated graphs is verified by the proposed graph checker that enables the localization of semantic errors in such structure to prevent some threats to be realized for stored health data and to ensure the privacy of patients. The efficiency of the proposed approach is illustrated for different defined scenarios of diseases and their associated health data represented through generated TCGs. Keywords WSAN · WBAN · TCG · Cloud · Storage · Dependency · SRB · Health data
1 Introduction With the emergence of new technological trends such as wearable and integrated wireless sensing devices as well as Cloud Computing (CC), healthcare systems have been evolved from being stand-alone solutions offering a limited range of services to several interconnected e-health systems that encompass more effective, comprehensive, and advanced services [1, 28]. In fact, a set of wireless and multi-functional sensors can be strategically attached to the patient’s body to form the commonly known WBAN. The latter improves the monitoring of patients health statuses through the collection of valuable information (e.g., patient’s vital signs, motion parameters, and environmental data) and their wireless transmission to remote healthcare servers, where they will be processed and analyzed to make appropriate medical diagnosis and
318
to prescript the associated treatments with minimal effort from the patient [24]. Although the main purpose of monitoring patients’ health statuses through a Sensor-Cloud based infrastructure is to contribute to the enhancement of the quality of healthcare services by enabling the real-time processing and analysis of the collected health data, facilitating data sharing and inter-systems collaboration, and improving the delivery of e-health services over the internet to a large community of users and several types of applications, it rises several hard-hitting issues in terms of properly managing, storing, querying, visualizing, and securing data generated from several heterogeneous and geographically distributed sensing and storage platforms [6, 22]. To manage the generated data, medical departments are expected to use several servers based on traditional storage solutions, which require high deployment, configuration, and maintenance costs. Such storage solutions are only suited for small-scale and shortterm deployments and are designed to fit the requirements of a specific application [20]. In addition, they are, generally, based on Relational Database Management Systems (RDBMSs) that, inspite of their success to meet the requirements of classical applications, they are considered to be less cloud friendly and are not appropriate to manage e-health big data due to its massive and varied nature [23]. In fact, while RDBMs expect a certain kind of data that is structured in a certain way, e-health big data includes several types of data such as structured Electronic Health Record (EHR) data, unstructured medical notes, genetic data, and medical imaging data [29, 30]. Besides, the generated data sets may be connected to each other either by sharing some common features or by being timedependent. As data in RDBMSs is stored in multiple tables, therefore, to find dependent data sets, multiple Join operations that match primary as well as foreign-keys of the rows of the tables (to be joined) are performed. These operations are computed at the querying time, which makes them intensive tasks in terms of computation and memory consumption [26]. Therefore, several restrictions in terms of effective data processing, scalability, and costs are associated to the RDBMSs when used for big data [16]. We argue that the management of voluminous medical storage systems with huge amounts of health data entities coming from a variety of sources and sharing several types of relationships can be provided through
Y. Djemaiel et al.
graph-based structures [18]. Indeed, graphs are considered as a promising approach that facilitates the representation of big-data and the linking of different kinds of interrelated information. They allow the representation, modeling, and querying of highly connected data in an easy and intuitive way (i.e., using operators such as vertices, edges, pattern matching and neighborhood traversal), regardless its significant size and value [25]. In addition, compared to relational models, instead of providing a list of answers to a received query, graph-based models have only to transverse the required path that connects information included in the query’s parameters in order to provide an answer to the received question [19]. In cloud-based environment, the graph-based model has better performance than relational models in terms of queries ’ execution time, database management flexibility (e.g., new relationships can be added without the need to alter the whole database schema), and data representation expressiveness [3, 26]. On the other hand, to effectively support the requirements of e-health big-data, a variety of enhanced storage technologies such as WSANs have been used as an Infrastructure as a Service (IaaS) by the most of public and even private cloud providers [17]. WSANs are, generally, employed for permanent data storage objectives and for enabling the provision of data archiving and data retrieval services to multiple applications simultaneously. In addition, a high degree of reliability can be provided by WSANs through the interconnection of several distributed high-speed storage devices holding data collections from several sources and the back-end servers of a special organization [11]. In fact, such an interconnected storage environment allows the stored data to be easily moved from one device to another and to be shared between several connected servers. However, WSANs in their current form will not be able to properly support the provisioning of e-health monitoring services due to the high costs associated with equipment deployment, the complexity of data processing, and the expensive network maintenance [4]. Therefore, a novel form of WSANs has to be proposed through the redefinition of new storage and communication components that can be integrated in an infrastructure that can be made available on-demand to multiple users and applications simultaneously. Accordingly, we introduce in this paper the concept of cloud-based WSANs as a long-term storage
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs
solution that relies on the integration of massively distributed networks of WBANs and cloud-based storage solutions. This is in line with the objective of optimizing the representation and the interrogation of the stored data sets, while being aware of the possible forms of dependencies that could exist between them. The paper contributions are four folds: 1) A graph-based approach is proposed in order to represent and manage e-health data by considering big-data requirements and by representing the different kinds of dependencies that may exist between the stored health data. Two types of graphs are proposed; namely, the TCG and the Global TCG (GTCG). While the former enables the representation of dependent e-health data stored in the same storage devices, the latter represents dependencies that exist between data belonging to different storage devices (i.e., between TCGs); 2) A hybrid architecture that integrates private and public cloud storage systems along with networks of WBANs is proposed as a cloud-based WSANs for the provision of a comprehensive and easyto-use e-health monitoring services. The WBANs are organized into sensors-cloudlets that are in charge of continuously collecting patient’s data as well as environmental parameters using smart sensors. The cloud-based WSANs refer to a set of active storage cloud platforms that are geographically distributed and provide real-time processing and retrieval of the collected data. A data interrogation model that allows to effectively interrogate the storage system as well as the construction and the management of the associated TCGs and GTCGs is also proposed; 3) The management of the stored health data is based on the content of TCGs and GTCGs which are verified against a set of defined validation rules provided by a graph checker model that we propose. The validation rules enable to check the consistency of the represented data, the violation of health data privacy, and the accuracy of the collected data; 4) The linking property associated to the defined TCGs and GTCGs enables the tracing and propagation control of some dynamic phenomena. Therefore, the monitoring of the Ebola Virus Disease (EVD) is proposed as a case study in our
319
approach in order to evaluate the performance of the proposed graphs in the tracing of the propagation of such an epidemic disease. The remainder of this paper is structured as follows. In Section 2, we give a brief overview of the main related work that investigated the provision of a storage solution for healthcare and we highlight their major limitations using a comparative table. The third section details the set of requirements that should be met in order to properly manage big e-health data. The proposed WSAN-based architecture and the associated processing for the main components are then introduced in Section 4. Section 5 describes how temporal graphs will facilitate the provision of an effective representation and querying of the collected and stored health data in addition to the different kinds of dependencies that are represented through TCGs and GTCGs. In Section 7, the efficiency of the proposed architecture and the proposed graph checker is illustrated through a defined cloud-sensor health monitoring system and a set of scenarios for the monitoring and the tracking of some diseases. In addition, a simulation is conducted in the same section in order to compare the effectiveness of the graphbased approach in comparison to traditional storage system in terms of data processing time and data dependencies tracking. The last section concludes the paper.
2 Related Work In this section, we provide a brief overview of the research works that investigated the provision of sensor-cloud storage solutions for e-health data as well as the management and representation of such a data. A comparison between the selected works and the approach that we propose is provided in Table 1 and is made based on a set of criteria and parameters including the support of big-data management, the consideration of data-dependency either during storage or querying processes, the type of data representation. In [4] the concept of a WSAN Cloud is introduced in order to provide a Network as a Service (NaaS) to different applications, users and data collection systems. The WSAN cloud infrastructure encompasses three main tiers. The first tier is a network of wireless embedded sensing devices which are in charge of
cloud cloud cloud cloud/WSN cloud cloud/WSN database cloud cloud WSAN/WBAN/cloud No No No Y es No Y es No Y es Y es Y es No No No No No No Y es Y es Y es Y es Centralized Centralized Centralized Distributed / Centralized Distributed Distributed /Centralized Centralized Distributed Distributed Distributed /Centralized JSON Format Relational Model JSON Format NA NA NA Graph-based Graph-based Graph-based Graph-based Y es Y es Y es No Y es Y es No Y es Y es Y es [9] [15] [21] [7] [23] [6] [31] [12] [13] Our work
Y es Y es Y es Y es NA Y es Y es Y es Y es Y es
Storage system Dependencybased querying Dependencybased Storage Data processing type Data representation Data Interrogation Big data Management Work
Table 1 Comparison between the current works and our approach
No No No No No No No No No Y es
Y. Djemaiel et al. Provision for model validation
320
performing sensing and transmission tasks. The second tier is the gateway that interconnects the sensor nodes to the back-end system and compared to the first tier is endowed with enhanced computation capabilities. The third tier is the back-end core located at the enterprize side. This tier is the main component and the more powerful in terms of computation capacities. It is usually running on the enterprize server and provides a platform for managing system’s components interfacing with the WSAN domain. An automatic services provisioning is described as an Orchestration Model for Service Provisioning (OMSP). This paper only focuses on the description of the concept of WSAN cloud as an on demand infrastructure that allows users to access to a shared pool of configurable storage and computing resources. Authors in [9] present a hospital cloud architecture that is able to manage HL7-compliant big data. The Health Level 7 (HL-7) [27] refers to a set of international standards that facilitates the exchange of clinical and administrative health data between various healthcare providers. The proposed cloud architecture includes three main components: Ingest, Archival Storage, and Access. The Ingest element is in charge of receiving and preparing data for the storage. The medical data can be produced by patients, medical instruments, doctors, or any other Health Information System (HIS). The Archival storage component focuses on storing health data by managing the memorization, recovery, and long-term data availability. The Access element is responsible to prepare data for the transmission. This solution is designed following the directives for an Open Archival Information System (OAIS). To manage big data and facilitate inter-operability, a HL7 XML-to-JSON/JSONto-XML parsers and the No-SQL distributed parallel database MongoDB were included [9]. In [15], a technological hybrid solution that stores e-health big data is proposed. This data can be collected from different healthcare departments or at patients’ home through heterogeneous medical devices and systems. The health monitoring system is composed by low-cost wearable sensing solutions (e.g., smart watches and bracelets) and is equipped with different wireless communication technologies that allow patients to be anytime and anywhere connected. The environment where such a system is deployed is characterized by an active broad-band connection (e.g., Wi-Fi, mobile, and Bluetooth). A
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs
neurologic Tele-Rehabilitation (TR) is considered as a use case to present and validate the proposed system. To describe health data organization and the schedule of their transmission, authors use an Entity Relationship (E-R) model. In [21], a hybrid storage architecture for storing huge amount of data collected from different gathering interfaces is proposed. Document and Object oriented strategies are coupled together to improve the storage, the querying and the retrieval of data. The hybrid architecture is composed of two main components the Data Manager and the Data Publishing. The Data Manager Component (DMC) is proposed to implement data abstraction and to provide a uniform semantic description of heterogeneous data. Sensor Web Enablement (SWE) specifications are used to extend the information schema of each object with context-aware metadata compliant with [21]. In [7], a WBAN-Cloud architecture composed of mobile cloudlets (i.e., networks of WBANs), private e-health clouds (i.e., private storage systems), and a public e-health cloud (i.e., a data sharing platform) is proposed to build epidemic diseases databases in rural areas. The WBANs act as temporary databases that may return data streams as a response to realtime queries. The proposal implements a querying approach that combines an SQL-based language, which is the Synchronized SQL (SyncSQL) language, and the distributed query processor for smart sensor networks, which is the TinyDB. The declarative SyncSQL language is used to support the querying of continuous data streams. Whereas, the TinyDB is implemented to support the interrogation of a network of geographically distributed sensors. Two levels of querying are supported. The Descendant Queries (DesQ) are completed by the content of cloud databases outputs and can be also updated using realtime and streamed data generated from the WBAN nodes. The Ascending Queries (AscQ) are implemented as read/write transactions issued from the WBAN nodes to interact with the private e-health clouds. In [23] a storage system is made up by the application cluster that manages the incoming users’ requests and the storage cluster that makes available data storage as a service. The latter operates on the basis of MongoDB and NoSQL technology and distributes the data sets over multiple servers. This work only focuses on the description of a private cloud solution
321
that implements attached storage systems (Network Attached Storage, NAS) to store e-health big data without highlighting issues related to the complex management of such data. The management of big health data should deal with efficient representation of heterogeneous, voluminous, and highly variable data. The main purpose of all previously described works is to propose storage systems for health data that are based on architectures that integrate networks of sensors (referring to medical data sources) and cloud computing (through what e-health services will be provided). Although, the visualization of heterogeneous types of data is one of the main requirements that should be considered to properly manage e-health big data, it has been missed by all these solutions. For example, the solution in [9] only focuses on the storage and processing of one type of medical data, which is the HL7. However, patients’ medical information may include medical images, doctor’s notes, and sensors’ readings. Each type of data has its own requirements and features that need to be considered by the proposed solution. In addition, representing interrelations and dependencies between medical data is another issue that haven’t been addressed. Such dependencies need to be carefully studied and tracked in order to properly manage patients health statuses as well as symptoms that could be associated to epidemic diseases. Therefore, an approach that considers and manages relationships and health data dependencies has to be considered. In the literature, few works ([12, 23, 31]) have addressed the optimization of the management of big data in an environment where multiple dependent data, services and applications are run. These works have investigated the integration of several storage solutions in a cloud-based environment to provide either temporary or permanent storage services. The description as well as the limitations associated to these works are detailed in the following. In [6], an extension of the previously described querying solution is proposed. In particular, cooperative Sensor-Clouds are assumed to be deployed in areas where the communication infrastructure can be poor or even absent to facilitate the provision and composition of public safety services. A querying approach that combines an SQL-based language, which is the Continuous Query Language (CQL), and the TinyDB language is described. Such a querying approach allows the retrieval of data already collected from the networks of sensors and stored in the
322
sensor-cloud environment (i.e., in cloud databases) as well as the immediate processing of the incoming data streams before being stored. In addition, a model for efficiently composing Sensor-Cloud services is provided through the consideration of both static and dynamic services composition. This model relies on an ontology-based query translation module that integrates a semantically rich conceptual model (i.e., a vocabulary) to facilitate the decomposition of complex queries into sub-queries that can be executed by cloudlets of sensors networks. Besides, a service composition management module is proposed to manage the generation of redundant and dependent data. This approach lacks the consideration of data representation. In addition,although data dependencies have been considered, the description of how such dependencies are traced is missed. In [31], a graph-based database is proposed in order to store and retrieve biological data from heterogeneous and complex networks. The interrogation model is based on the Neo4j, which is a frequently used graph database. To evaluate the performance of their approach several biological data such as drug-target, protein-protein interaction, and gene-disease are collected. The interrogation complexity of this work is compared with MySQL in diverse situations using the collected data and a set of relationships established between them. It was demonstrated in this work that Neo4j is better than MySQL in querying heterogeneous data with complex relationships in terms of execution time. The limitation associated to this work is that the provided results are only valid for a single database, where dependent data collected from heterogeneous sources are stored. In addition, the challenging issues related to the management of big e-health data and that we are discussing in this paper are not taken into consideration such as the representation of data and their dependencies, the tracing of data dependencies, the improve of health data storage. In [12], a graph-based big data management scheme that extends the description of Conceptual Graphs is proposed. This work allows to support the heterogeneous types of data (i.e., structured, semistructured, or unstructured) by designing a structured querying language that exploits conceptual graphs and dynamic marks. In addition, to secure the data management an approach based on the use of conceptual graphs is used in order to trace inter-related data and learn new attack strategies by considering mainly the
Y. Djemaiel et al.
marks. However, as the proposed approach deals with huge volume of data, the size of the conceptual graph may be great, and consequently its management and processing will be time consuming [13]. In [13], the time feature was integrated to the conceptual graph and a TCG was introduced. The latter tries to optimize data storage and information extraction. It does not present all the data in the monitored system from the initial time but it only provides a snapshot of the storage environment at a given instant, which allows the optimization of the size of the graph. A TCG-based big data management approach has been also discussed for the reconstruction of attacks scenarios using information contained in the big data. The efficiency of such a proposal was illustrated through an analytical study that showed that the graph size has been improved. Therefore, the issue related to the processing time of the conceptual graph described in the previous work is handled. The efficiency of such a solution can be investigated in environments characterized by the presence of heterogeneous, dependent, and variable big data such is the case of health monitoring applications.
3 Requirements for e-Healthcare Applications In order to properly define an effective e-health data management system, the following requirements need to be met. Store and process the voluminous e-health data in an efficient and scalable way : The massive usage of newest technological trends for health monitoring and clinical analysis is bringing to a huge production of high variety and volumes of data ranging from simple medical records to images and videos data. In 2012, the volume of universal healthcare data was equal to 500 petabytes and it is expected to reach 25,000 petabytes in 2020 [23]. In addition, traditional storage systems have some inherent limitations in terms of storage scalability, storing and retrieving varied data format, read/write speed, and data correlating heterogeneous types of data. These requirements point out the need to design new e-health storage solutions that provide scalable, reliable, secure, and rich storage and querying services. Moreover, the important volume of e-health data generated at each instant in addition to the dependency of some kinds of data should be
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs
considered during the storage and the interrogation of such a data by adopting an efficient management scheme. Cope with issues related to e-health data heterogeneity : Beyond traditional sources of health data generated from medical activities, it is possible currently to monitor patients health through all kinds of physical devices that are interconnected through public/private networks [14]. This data may be structured, semistructured or unstructured and describes a set of varied and heterogeneous parameters ranging from quantitative (e.g., sensors’ records, medical images, lab tests, and genomic) to qualitative data sources (e.g., doctors’ notes, medical journals, history of diseases, and treatments) [12]. Thus, major problems in data storage for healthcare applications are related to the heterogeneity of information, collected using different devices from different systems and in different places. A set of metadata should be, therefore, considered in order to represent the set of properties associated to each generated health data. A marking scheme enables in this case the attachment of such information related to each heath data. Trace e-health data dependencies and Involve the consideration of data-time dependency : The data dependencies is among the main issues that should be highlighted when dealing with monitoring applications such as healthcare. Different types of dependencies can exist between the data stored within the same or different storage systems (e.g., spatial, temporal, spatio-temporal, and even contextual). Therefore, the capability of tracing data dependency in healthcare applications will facilitate, for example, the identification of diseases ’ causes, the prevention of their occurrence and propagation, and will help in the provision of customized medical diagnosis [10]. In addition, due to the variable and complex nature of the monitored environment as well as the evolution nature of health-related data, time-dependency is considered a key aspect for effectively analyzing timedependent data. Indeed, health parameters tend to vary over time and time trace-ability should be considered. A temporal relationship among the collected health parameters suggests the presence of a chronological evolution between their values over time. Therefore, such dependency through time should be considered in order to optimize the interrogation and to enhance also the storage capacities.
323
Protecting health data privacy and data ownership confidentiality : Considering ethical, social, and legal aspects of healthcare systems, the collected e-health data is highly sensitive and need to be properly managed to guarantee patient’s privacy [8, 11]. In addition, data sharing over cloud-based systems deals with the risks of private data exposure, data theft or leakage, and data loss. Therefore, a secure transmission, processing, and storage of e-health data have to be ensured. To guarantee data confidentiality, the access to the stored sensitive data must be firmly restricted to authorized users. In addition, data integrity check of the received data should be verified in order to prevent wrong diagnosis due to malicious or erroneous modifications. On the other hand, when dealing with inter-countries cooperation, up to now there is a lack of regulations mandating the protection of data privacy and the interchanges of medical data between countries. The data protection standards and laws differ across countries and the cooperation may be governed by complex policies that make it extremely hard to be successfully achieved [2]. Therefore, it is essential to discuss an approach that ensures the protection of data privacy and that operates on the same level between different organizations, systems, and countries in order to facilitate the provision of a cooperative international e-health monitoring solution.
4 WSAN as a Storage Service for e-health Applications: The Proposed Architecture and Processing In this section, we first describe the proposed e-health monitoring architecture that integrates networks of WBANs and cloud-based WSAN storage systems. Subsequently, we describe the processing performed for the different types of requests generated by the healthcare applications. 4.1 Architecture Description The basic components of the proposed architecture are illustrated in Fig. 1 and are defined as follows: WBANs: a set of physical wearable sensor nodes are attached to the patient’s body to form a WBAN that performs sensing and monitoring tasks. Among these sensors, we consider that multimedia
324
Y. Djemaiel et al.
Fig. 1 Cloud-based WSAN architecture
sensors are used to enhance the monitoring quality through the generation of instantaneous and real-time images and videos streams of the monitored patient. The WBANs are in charge of sensing and collecting health data regarding the monitored patients such as their vital signs (e.g., temperature and blood saturation level) and their motion parameters as well as environmental data (e.g., humidity and temperature level) to control and explore their surroundings. All these elementary sensors are connected using a main node, known as the Body Node Coordinator (BNC) [6]. The latter, compared to sensors, is a powerful device that is responsible for collecting sensors ’ readings, processing them in a real-time manner, storing them temporarily (if necessary), and transmit them wirelessly to the remote healthcare processing unit. Commonly, The BNC can be a Personal Digital Assistant (PDA), a smart mobile phone, or any connected device. As delay sensitive data may be generated, the WBANs have to ensure the wireless transmission of the
collected data to the remote destination in reasonable delays. In addition, a set of mobile WBANs can be organized into collaborative teams to perform a common monitoring/sensing task and to provide an ad-hoc communication infrastructure in infrastructure-less areas. All the collected health data forwarded to the correspondent healthcare department will be stored in a WSAN where they will be processed and analyzed. WSANs: a set of storage devices (SDs) are interconnected through a wireless link to the main component, denoted as the Storage Volume Controller (SVC). These components are forming a WSAN. A set of servers are also attached to the WSAN in order to provide the required services that need to access health data stored on the WSAN. As defined in [15], SAN systems are made of SDs, specialized networks for connecting data SDs to servers and the required management layer for setting up and maintaining connections between these servers and SDs. Based on this definition, a WSAN is built on
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs
the basis of the same components and ensures the communication between the aforementioned components using a wireless link and supports the particularities of this wireless link. This infrastructure is needed and especially the wireless connectivity is required to ensure the storage and the interaction with deployed WBANs that aims to collects a set of measures related to a patient. Cloud-based WSAN: The synergy between WSANs, sensor networks and cloud computing will enable the provision of a potential solution to manage ehealth big data. Indeed, by integrating the WBANs and the WSANs in a cloud-based architecture, a set of logical entities, are built on top of physical systems in order to provide sensing, monitoring, and storage services to a large number of users and applications. The collaborative teams of WBANs can enable the build and deployment of mobilecloudlets. The latter will be dynamically grouped as a response to a received request. A mobile-cloudlet is an architectural element of the cloud that enables to bring the cloud capabilities closer to patients living in areas with poor communication infrastructures such as rural or infrastructure-less areas by offering omnipresent e-health services, minimizing hardware maintenance costs, and facilitating the freedom deployment of new, personalized, and low-cost services. All the collected data forwarded to the correspondent healthcare department will be stored in a WSAN where they will be processed and analyzed. The storage cloud-based WSAN solution will be based on a shared pool of configurable resources that will be solicited according to an ondemand basis. There are basically three main component classes in a cloud-based WSAN system : the Storage Resource Broker (SRB), the servers, and SDs. The role of each component will be detailed in the sequel. SRB (Storage Resource Broker) Brokerage is a crucial process that should be considered in order to effectively enable cloud-based WSAN services delivery over heterogeneous and distributed infrastructures. The SRB extends the objective of the SVC of a classic WSAN. It is responsible of making service arbitrage, aggregation and selection among a set of available services [6]. To consider the issues related to the high variability and heterogeneity of the monitored environment, two types of SRBs are used; namely, the Master Storage
325
Resource Broker (MSRB) and the Local Storage Resource Broker (LSRB). –
–
The MSRB: represents the main component of the cloud-based WSAN system. It ensures the brokering between the different cloud WSANs and interacts with the set of local SRBs (LSRBs) attached to each WBAN. This component interrogates frequently the management SD in order to access or to update the set of available TCGs and GTCGs when a request is issued by a healthcare application. The LSRB: compared to the previous type of broker, the LSRB allows more flexibility in the composition of a cloud-based WSAN service since it is a mobile agent that adapts his behavior according to changes in their surrounding environment and available resources. The LSRB is a special type of WBAN that belongs to a mobile-cloudlet and is endowed with the capability of providing temporary storage and enhanced communication quality. Therefore, an LSRB plays an intermediate role between the WBANs and the MSRB in order to facilitate the dynamic composition of services. Such a role is crucial in the case when a real-time data retrieval request is received by the MSRB and additional knowledge have to be provided by patients to properly diagnose and investigate a typical symptom. At the reception of a new real-time data retrieval request, the MSRB distributes it, over a set of candidate LSRBs that will be in charge of running the request inside the mobile cloudlet via the LSRB, a selection request in order to acquire health data from a group of WBANs that match the attributes referenced in the received request. The returned data are then collected by the local broker for real-time delivery to the MSRB. Depending on the received data, the LSRB can take immediate actions. For example, in the case of emergency occurrence, the LSRB can configure the monitoring system to change the sampling period in the WBANs and the threshold limits.
A set of structures are needed at the MSRB level in order to ensure the processing of incoming requests provided by healthcare applications attached to the WSAN: 1) a mapping table that holds for each entry the request and the associated SDs; 2) a mapping between a GTCG and the attached TCGs; and, 3) a
326
Y. Djemaiel et al.
structure that ensures the mapping of each high level health data to its physical allocation at the SD (e.g. File allocation Table).
structure that ensures the mapping of each high level health data to its physical allocation at the SD (e.g. File allocation Table).
SDs (Storage Devices): are devices used for storage such as disk array, tape libraries. They are accessible to the servers, so that, they appear like locally attached devices to the operating system. Two types of SDs are considered for WSAN system: 1) SDs used for storing health data and 2) management SDs to hold TCGs used to represent health data and their dependencies. Servers: are connected to WSAN and holds healthcare applications that are used by deployed WBANs to collect and provide healthcare services based on provided health data.
4.2 Health Data Interrogation in Cloud Based WSAN
Since the MSRB is considered as the main component for the cloud based WSAN, the availability of this component should be ensured. When the current MSRB is unable to perform the management of incoming requests issued by the attached healthcare applications, the selection phase of a new MSRB among the available nodes is initiated. When the new MSRB is selected, a connection is established between the two nodes and the exchange of the set of structures used for the management of the health data are initiated. After the accomplishment of this process, a notification is performed by the current MSRB to announce the new MSRB for the cloud-based WSAN components. Upon the reception of the acknowledgment for the different notified nodes, the switching to the new MSRB is performed. The management SD is also among the critical components that is used by the MSRB. Therefore, a redundancy strategy should be implemented for such component in order to ensure the availability for the overall storage system. This strategy is based on the duplication of the content on several backup SDs defined at the SRB side for local health data and at the MSRB side for the different backup nodes attached to the available WSANs. A set of structures are needed at the MSRB level in order to ensure the processing of incoming requests provided by healthcare applications attached to the WSAN: 1) a mapping table that holds for each entry the request and the associated SDs; 2) a mapping between a GTCG and the attached TCGs, and; 3) a
The processing of health data is performed mainly by the SRB’s components according to the different kinds of requests issued by the healthcare applications. In this section, the processing performed by the MSRB is detailed for the different situations that may occur. The processing performed by the MSRB for an incoming request is performed differently according to the type of request (read request or storage request) which will be detailed in this section. MSRB Processing for a Health Data Read Request: The processing by the MSRB for an incoming read request is illustrated by the Algorithm 1: Algorithm 1 MSRB processing for incoming read requests Input: an incoming request for processing a health data hdi provided from a W BANk and generated by a sensor sj 1. For each request, do 2. dependent hd GT CG set{} = f etch dependent health data f rom GT CG(hdi ) 3. for each health data hd d GT CGj ⊂ dependent hd GT CG set{}, do 4. dependen hd SD TCG set{} = fetch dependent health data from TCG (hd d GT CGj ) 5. hd output set{} = hd output set{} ∪dependent hd SD TCG set{} 6. end 7. return hd output set{} 8. end According to the proposed scheme, the set of dependent health data that belongs to different WSAN SDs are first identified by checking the needed GTCG from the management SD (line 2). For each element of the identified result, the set of depended health data that belongs to the same SD are determined and then added to the result set in order to be processed by the MSRB (line 3-5).
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs
MSRB Processing for an e-health Data Storage Request: The MSRB ensures the storage of health data on an SD by following a strategy that is detailed in this Section. The proposed scheme aims to localize the SD that stores a set of dependent health data in order to enhance the access to such health data. The check is performed according to the following algorithm and based on the marks attached to the set of GTCGs: The identification of the available SDs that hold dependent health data associated to the issued storage request is illustrated through Algorithm 1. The processing is initiated by the selection of available SDs for storage and the set of dependent SDs (lines 3-4). An ordering is then established on the selected SDs considering a set of criteria such as the signal level and available free space for SDs (line 5-9). After identifying the ordered set of available SDs, the next step consists on the identification of the needed SDs according to the issued request (lines 10-17).
5 A Conceptual Graph Approach for Health Data Representation and Management We propose in this section to use conceptual graphs in order to optimize the representation and management of health data in a cloud-based WSANs environment. 5.1 Temporal Conceptual Graphs for WSAN The management of health data using traditional management systems introduces a processing overhead that may be not tolerated by healthcare applications. In this context, a graph-based management model is proposed to manage health data, that is based on TCGs that are proposed in [13]. Temporal information should be maintained in addition to the handled data since the collected measures from deployed sensors on a WBAN may be collected at different time instants and an update for such measures may be obtained according to the changes made on the patient status. The TCG, that is defined initially in [13] as follows: T CG = (t, D(t), M(t), C(t), Map1 (t), Map2 (t), KB(t), Map3 (t)), with t is the instant where the T CG is generated; D(t) is the set of data handled at an instant t that represent in the case of this work the handled health data; M(t) represents the set of
327
defined marks that are available (created or updated) at an instant t and representing the corresponding health data; C(t) is the set of concepts that are available (created or updated) at an instant t representing stored health data. A concept that is valid for an instant t may be attached to more than one mark; Map1 (t) is a mapping relation that defines the correspondence between the marks and the health data that are available at an instant t. This mapping allows the identification of the needed health data as a response to queries issued by healthcare applications at an instant t (created, modified, removed, etc.); Map2 (t) is a valid mapping relation for an instant t that defines the relationships between a mark Mit and its concepts Ci1t ..Cint . It is used for the search and the retrieving process since it holds the association between the mark and the different concepts that represent the same health data and that may be used as a criteria in the defined query by the user; KB(t) is the knowledge database that includes the different concepts, the existing relationships between them. The links between the concepts are defined using the set of available predicates; Map3 (t) is a mapping relation between a subset (application knowledge) of the KB(t) to M(t). It holds the relationships that may exist between the different concepts belonging to the same mark. The TCG may be also defined as a temporal graph that is built on a linking property that enables the optimization of the storage of huge volumes of the generated health data. In addition, the integration of the mark concept that is attached to the set of nodes of such a graph enables the management and the interrogation of either structured or unstructured health data by the use of a structured querying language. The marking is performed by the healthcare application that generates the data based on the KB(t) content associated to the handled health data. The set of concepts and predicates are determined by querying the KB(t). This process enables the attachment of a set of metadata to the generated health data. The temporal graph provides a snapshot of the whole graph at an instant t which reduces the size of data that should be handled at that instant. This graph integrates the set of concepts in addition to the defined predicates, the attached marks, and the related health data handled by healthcare applications. The temporal conceptual graph that is generated for an instant t and stored on the management SD, is
328
Y. Djemaiel et al.
Algorithm 2 MSRB processing for incoming storage requests Input: An incoming storage request for a health data hdi provided from a W BANk and generated by a sensor sj Goal: identify the available SD that holds dependent health data (generated from the same sensor that belongs to the same WBAN or by another sensor of the same WBAN) 1. set available dependent SDs{} = φ; 2. set dependent SDs{} = φ; 3. set available SDs{} = check available SDs(); 4. set dependent SDs{} = select available SDs(hdi , W BANk , sj ); 5. For each SD ∈ set dependent SDs{}, do 6. If (SD.signal level > min thres signal level) && (SD.min f ree space > min thres f ree level) ; 7. then set available dependent SDs{} = set available dependent SDs{}∪ SD; 8. end 9. ordered set available dependent SDs{} = order dependent SDs(set dependent SDs); 10. #required SDs = count needed SDs card(ordered set available dependent SDs{}, sti , hdi ); 11. If card(ordered set available dependent SDs{}) > #required SDs) 12. then return set selected SDs f or storage {} = get (ordered set available dependent SDs, #required SDs); 13. Else 14. set selected SDs f or storage{} = get (ordered set available dependent SDs, #required SDs); 15. For each SD ∈ / (set selected SDs f or storage{} ∩ set available SDs{} 16. if (card(set selected SDs f or storage{}) < #required SDs) 17. then set selected SDs f or storage{} = set selected SDs f or storage{} ∪ SD; 18. else return set selected SDs f or− storage{}; 19. end 20. end denoted as: T CGt . Similarly, the global temporal conceptual graph, generated at an instant t is denoted as : GT CGt . –
–
The set of concepts that represents the stored health data, is referred as: Ct = {c1t .. cmt } where m is the total number of concepts. The set of relations between concepts is called Rt = {r1t .. rkt }, where k is the total number of relations that are defined for concepts.
The GT CG is defined to ensure the representation of the structure of the overall storage system that is composed of a set of SDs. In addition, it helps to identify the dependencies between the available SDs and their attached WBANs. A TCG represents the health data stored on the same SD and the defined relationships between them. A set of TCGs is stored and associated to an SD according to the following relation: {T CG1i ...T CGni } → SDi The same patient may have collected health data that are stored on different SDs. The dependency
between data that are stored on different SDs is represented as an edge for a Global TCG (GT CG) stored and managed by the SRB. This graph holds the following information on a health data Di : (Di , SDj , W BANk ), however, at the SD side, only the associations between the W BAN and the health data Di are represented. Two temporal conceptual graphs T CGt1 and T CGt2 are linked if T CGt1 ∩ T CGt2 = ∅, where at least a mark mt1 ∈ T CGt1 is updated at t2 (t2 > t1 ). In order to illustrate the importance of the proposed graph, this structure may hold a set of dependent health data that may be stored at different SDs and that are attached semantically to the same patient that may have visited several medical services in different hospitals. These health data are described as a set of concepts that represents the collected heath data from available sensors in addition to a set of relations. As an example, a stored temperature data of a given patient may be described through a set of concepts that includes the following : temperature, patient, Celsius, etc. The location of the stored health data is included
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs
in the mark structure that is attached to the node representing such data in the graph. This structure is interrogated instead of the data which optimizes the response time for the issued requests targeting huge volumes of data. In addition, even unstructured data (e.g., medical images) taken for the patient using different kinds of medical devices and which may be not explored by a structured querying language can be easily explored using the mark structure. 5.1.1 Mark Structure A mark is a structure attached to the stored health data to facilitate its description using a set of metadata and to enhance the interrogation of TCGs. Two types of marks are considered for the defined TCGs and GTCGs according to the set of information available at each level. According to the specificity of health data, a mark is defined using the following structure that is attached to a health data: markT CG = mark id,, health data id , sensor id , sensor type WBAN id concepts, predicates. markGT CG = G mark id, WBAN id , sensor id , health data id , SD id , TCG id. The mark generation process is initiated by the healthcare application by generating the set of concepts that represents the processed health data in addition to the WBAN and the sensor identifications that are communicated to the MSRB within the requested operation on the health data. The generated request by a healthcare application is represented as follows:
Fig. 2 Illustration of the mark structure
329
request=(id request, application id, WBAN id, sensor id, operation, health data id, concepts). Figure 2 illustrates an example of a mark structure, markT CG , and the associated health data file. As described in this figure, the mark attached to a health data provides information on the location of the health data, its identifier, the identity of the sensor used to generate such a data, and the WBAN to which belong. Moreover, such structure describes the health data through a set of concepts and predicates that helps to better explore the generated TCGs, to localize the needed data for an incoming request, and to determine the dependent health data. 5.2 Representation of e-health Data Using TCG and GTCG The collected health data through a set of sensors belonging to the same or different WBANs may be dependent to each other according to different kinds of dependencies that are detailed in this section and illustrated by Fig. 3. WBAN Dependency An edge is defined for two data Dit and Djt collected respectively by sensors si and sj that belong to the same T CGt if the two sensors belong to the same W BAN: si ∈ W BANk and sj ∈ W BANk . The dependency between health data should be represented at two levels according to the different types of dependencies that may be defined between health data. The dependency should be represented at the SD
330
Y. Djemaiel et al.
Fig. 3 Health data dependencies
side when a set of dependent health data are stored at the same SD. In addition, the dependency should be represented at the SRB side when dependent health data are stored at different SDs that belong to the same WSAN. Parent/child Dependency Since the handled health data represents in most cases, a huge volumes of data, a fragmentation process is performed and a set of fragments are generated. These fragments may be stored on different storage devices and the dependencies between these fragments should be maintained through a TCG if the whole fragments or a part of fragments belong to the same SD. A T CGSDi is linked to a T CGSDj if it exists at least two health data D1 and D2 that are linked (e.g. fragmented health data) and belong to different SDs : SDi and SDj .
requirements and illustrated by the relation defined between D2 and D3 . As an example of conditional dependencies, we can consider the following scenario : D1 and D2 share the same type (e.g. temperature) and present a value that is larger than a defined threshold. In this case, the dependency between health data is defined according to the following condition: (T ype(Di ) eq T ype(Dj )) ∧ (Di > th) ∧ (Dj > th) Based on the aforementioned types of health data dependencies, advanced medical services are provided in order to enable the monitoring of the patient health status, the surveillance of some parameters regarding some kinds of diseases, and the tracking of the disease propagation through a population. 5.3 TCG/GTCG Generation Process
Conditional Dependency Another type of dependency between health data is defined based on healthcare applications requirements that enables a dynamic structure for T CGst based on e-health services
The generation of TCG and GTCG is performed by the MSRB in a dynamic manner by considering the incoming requests from the attached healthcare
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs
applications through the available LSRBs. According to the provided request, the generation process performed by the MSRB is ensured according to the following algorithm: The generation process of the GTCG and TCG is initiated for each incoming storage request. Based on the issued storage request and the set of selected SDs for storage as result of the processing detailed in Algorithm 2, the mark is generated (line 1). Next, the check of the existence of the associated TCG is initiated followed by the identification of the dependencies associated to the storage request (line 2 –4). The update of the TCG is then performed (line 5). The next step consists in generating the global mark and the generation or the update of the associated GTCG (line 7-11). In order to enhance the processing time for a query issued by healthcare applications to the MSRB, the set of generated marks are regrouped. The grouping is performed by checking the similarity rate that is calculated based on the different mark fields values in addition to the available dependencies. This operation is performed by checking for each mark the set of concepts that are related. In order to illustrate this optimization process, let define two marks mi and mj and their related set of concepts Cmi and Cmj where Cmi = {c1i ..cni } and Cmj = {c1j ..cmj } m and n are their respective cardinalities. The two marks are regrouped if Cmi ∩ Cmj = ∅. The result of grouping is a mark mij that has the following structure: id− group− mark, id− marks, concepts. After regrouping the marks, an updated TCG is then generated. In this TCG, the mark groups are considered as concepts and the dependencies between them are represented as relations if they exist. 5.4 E-health Data Interrogation In a similar way to big data, health data may be structured or unstructured. Therefore, the interrogation process requires a language that is able to handle these different types of data. In this context, a structured language may be used if the interrogation is performed on marks instead of the attached health data. A SQLlike language has been initially proposed in [12] to interrogate big data. The proposed structured query language for big data, called NSQL-DB is an SQL-like querying language, is used in this case to interrogate available marks through healthcare applications and
331
to explore available TCGs in order to determine the needed health data even if it is structured or not. According to this language, the query can be defined according to the following syntax: select concepts where (mark.field operator value|concept operator value|predicates)+ . The querying is performed based on a combination of several criteria including some mark fields values, concept values and predicates that are evaluated for concepts. + means that the query may include multiple criteria that are considered to fetch the associated health data. The operator keyword represents the basic operators such as =, , , etc, in addition to the proposed relations such as Boolean (AND, OR and NOT), temporal adjacency (ADJ) and interval operators (DURING, OVERLAP, etc.) that are used. As a response to a query, a set of concepts are identified in the processed TCGs that enables the identification of the related predicates and marks that are attached to the requested health data. The dependency between the different stored TCGs is evaluated when the interrogation is initiated for the set of available TCGs. The update of the TCG and GTCG contents may be performed using the NSQL-BD by issuing a query according to the following syntax: update mark|concept|predicate set concept =value|mark.field=value|predicate=value
According to the aforementioned query, the update may concern all components of the TCG including the marks, the concepts and predicates by affecting the defined values using the set keyword. If the specified criteria does not exist, the structure is then added and the content of the graph is updated. For example, if the query specifies the following arguments for the set keyword : mark.id = val, mark.id data = id, mark.concepts = body, hospital, temperature then the mark id is checked first, if it does not exist then a novel mark is created and the provided criteria are affected to the mark fields.
6 Graph checker for Temporal Conceptual Graphs The validation of the graph content is among the tasks that should be performed in order to prevent the occurrence of some threats related to the health data processed by healthcare applications. In this section, the need for performing graph validation is explained
332
Y. Djemaiel et al.
Algorithm 3 TCG/GTCG generation process Input: . An incoming storage request reqi for a health data hdi at instant ti according to the structure described in Section 5.1 and the set of selected SDs for storage set selected SDs f or storage{} 1. mi = generate T CG mark (reqi , set selected SDs f or storage{}); 2. T CG id = f etch TCG(ti ); 3. if (T CG id = 0) then 4. set dependency{} = check dependency(mi , T CG id); 5. update T CG (T CG id, mi , set dependency{}); 6. else T CG id = generate T CG(mi ); 7. mgi = generate Global mark(reqi , set selected SDs f or storage{}, T CG id); 8. GT CG id = f etch GTCG(ti ); 9. if (GT CG id = 0) then 10. update GT CG (GT CG id, mgi , set dependency{}); 11. else GT CG id = generate GT CG(mgi ); 12. return (GT CG id, T CG id) in addition to the set of rules and techniques used to evaluate the TCGs and GTCGs. 6.1 Need for Graph Validation The generated TCGs are used to represent health data in addition to the dependency between them. These graphs are also interrogated in order to localize the set of health data needed by a healthcare application. These structures may include a set of errors that may prevent the correct processing of the healthcare applications or even to introduce a denial of service (DoS) attack. For example, a missing edge between a mark and a health data node prevents the healthcare application to process such data and to access its meta data. Moreover, the privacy of such critical data may be not preserved when a user is able to gain access to patients health data while he should be only limited to access the patient id or to its location. The privacy may be also not preserved if a doctor has the ability to access the health data of a patient that is not under his supervision. The check of graph content helps also to verify the correctness of the dependency links between the different stored TCGs that may be a threat to the privacy of stored health data. The validation of the graph may also help to check the consistency of the stored data on the set of WSAN attached SDs. The identification of redundant fragments that are stored on the same SD may help to identify a storage process that has been executed twice or more on the same SD for the same health data where there is no redundancy strategy that is configured in
this case. The verification of the graph content may lead also to the identification of errors that are out of the WSAN and that may be at the WBAN side. For example, the lack of battery may be among the threats for attached WBAN devices. This problem may be observed in most cases through a corrupted data sent by the sensor or even by the absence of such data. Since the marking process is initiated at the WBAN side, this process generates as a consequence, a corrupted mark or a missed mark. The validation of the graph content is used in this case to detect such kind of failures that may occur for WBAN devices and may also prevent the realization of such a threat in some cases. 6.2 Graph Checker Principles This tool checks the TCG/GTCG against a set of rules in order to identify if the graph is correct or if there are semantic errors identified for both TCG and GTCG. Some examples of rules that are verified by the graph checker are listed in the following: – – – – –
An edge should not be defined between two successifs relations. An edge should be added between a mark and a health data. An edge is required between a mark and a concept. A mark that holds a set of concepts that should be identified in the TCG. A relation should be defined between two concepts.
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs
Some specific rules are defined according to the graph type. For example, a GTCG may be checked against the following set of rules: – –
A mark attached to a GTCG should include at least an SD− id. Two health data that belong to the same TCG should not be added to the GTCG.
A sample T CG that shows three errors (highlighted in red color) from the aforementioned list are illustrated by Fig. 4. The first error corresponds to the missing of the edge between the mark M1t1 and the health data D1t1 . The second error consists on the existence of an edge between two successifs relations R2t1 and R3t1 . The last error is related to the missing of a relation between the two concepts C3t1 and C5t1 . The violation of the set of available rules means that the checked graph is not valid and the set of errors are then localized.
333
A rule may be composed of a set of conditions that may be verified for the graph and that may be represented as a predicate. A graph is considered consistent if a data node is attached at least to a concept and a predicate. This rule is expressed using as a propositional logic formula: edge(D, C) ∧ edge(C, R)∧ edge(D, M). The graph in this case is considered correct if is evaluated for true for a given state related to a graph content as described in Section 6.3 according to the following expression : s |= iff L(s) |= . Given that is a propositional logic formula, then s satisfies the formula if the evaluation induced by L(s) makes true. The verification of the graph may be performed according to two ways: 1) The exploration of the content of the graph and the checking of the set of available rules for each part by considering the current node type and 2) The interrogation of the graph and the verification of the obtained results against the defined set of rules. In the case of the checking based on the interrogation of the TCG, a set of queries should be defined in addition to the rules in order to ensure the exploration of the graph and the test of the linking properties if they preserve or violate the privacy of handled health data. A set of properties may be also checked for TCGs and GTCGs in order to verify if these graphs are valid or not. The invariance property is among these properties and it may be attached to different features for both health data and healthcare applications. For example, the privacy should be preserved (in-variance property) even if a query is executed by a healthcare application or a linking property is applied for available graphs. The privacy means in this context the prevention of the access to the health data for unauthorized users and healthcare services in addition to the restriction of the TCG linking and dependencies checking. 6.3 Graph Management Specification as a Transition System
Fig. 4 Illustration of possible errors for generated TCGs
The management of TCGs and GTCGs within the graph checker support is illustrated by defining a transition system when the states in this case represent the graph content at an instant t. The initial state corresponds to the graph content at an instant t0 that may correspond to the instant before performing an interrogation by a healthcare application in most cases to
334
Y. Djemaiel et al.
the handled graph. A transition in the case of the proposed scheme corresponds to either: 1) A process that entails the generation, the update, and the checking of the graph; or 2) A process of dependencies generation based on linking properties verified according to a set of criteria such as the dependency between two health data generated by the same WBAN. Different kinds of dependencies have been detailed in Section 5.2. A state in the general case represents the content of the graph by activating a transition. As defined in [5], a transition system T S is a tuple (S, Act, →, I, AP , L): • • • • • •
S is a set of states that are associated to the graph content for an instant t, Act is a set of actions (querying or dependency verification) that enables the transition to another graph content, →⊆ S × Act × S is a transition relation, I ⊆ S is a set of initial states that corresponds in the case of this work to the generated TCGs and GTCGs at instant t, AP is a set of atomic propositions to any state s, and L : S → 2AP is a labeling function.
The transition system starts in some initial state s0 ∈ I that represents the content of the stored graphs and evolves according to the transition relation →. An initial state corresponds to the set of available TCG and GTCGs contents defined at an instant t. L(s) represents the atomic propositions a ∈ AP which are satisfied by state s. As illustrated by Fig. 5, the elements of the transition system are defined as follows: S = {s1 , s2 , s3 , s4 }, I = {s0 }, Act = {A1 , A2 , A3 }. In the case of this example, only two TCGs are considered: T CGSDit and T CGSDjt . The different states generated for the illustrated storage system are checked using the proposed graph checker based on the defined rules in Section 6.2 in order to check if such graphs or part of graphs are valid or not. As defined in [5], there is a set of rules that should be considered when dealing with a transition system that are the following: 1) If a state has more than one outgoing transition, the “next” transition is chosen in a non deterministic manner; 2) when the set of initial states consists of more than one state, the start state is selected terminologically. By performing checking on
Fig. 5 Illustration of a sample transition system for graph management and checking
the content of both graphs T CGSDit1 and T CGSDjt1 (activation of the transition A2 ) , a set of errors are identified for T CGSDjt1 and a new graph is then generated and associated to the state S4 . At this state, the generated graphs are ready for interrogation by e-health applications.
7 Graph-Based Approach Evaluation We evaluate in this section the proposed approach using an EVD monitoring application as a case study and a simulation scenario in order to illustrate the efficiency of the proposed graph-based data representation approach in comparison to the traditional storage approach. 7.1 Case Study In order to illustrate the efficiency of the graph-based health data management approach, we propose in this section a scenario of tracking the propagation of the
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs
Ebola Virus Disease (EVD) that is detailed initially in [7]. In this scenario, a cloud-sensor health monitoring system is defined to continuously detect first EVD’s symptoms (e.g., fever, muscle aches, weakness, severe headache, and shivers) and track the disease’s evolution. Therefore, to build such a system, we consider that several networks of WBANs are geographically distributed in order to collect real-time health parameters, which are transmitted to the correspondent healthcare department (e.g., a hospital, a medical laboratory, or a clinic). A set of assumptions are considered for the system illustrated by Fig. 6: • • •
The monitored area is subdivided into adjacent sub-areas, each is at least supervised by one cloud-based WSAN infrastructure. Two different types of WBANs are considered for each area: The Patient-WBAN system and the Medical-WBAN system. The Patient-WBAN system encompasses physiological sensors (e.g., temperature sensors, pressure
Fig. 6 Illustration of the cloud-sensor health monitoring system
•
•
335
sensors, and sweat sensors) and motion sensors (e.g., accelerometer). The Medical-WBAN system is embedded on medical staff including doctors and nurses. Compared to a Patient-WBAN system, the Medical-WBAN encompasses additional sensors with enhanced communication capabilities and enriched functionalities (e.g., video sensors to capture real-time videos of the monitored patient/area). The first EVD’s symptoms may appear after an incubation period of 2/21 days.
An illustration of a typical TCG is given in Fig. 7, where health data Di,tf that have been collected in instant tf are represented by T CGtf . The latter is associated to the state sf and is given as response to a request issued from an e-health application for example to get the temperature of a given patient. In this case, Ditf = {T emp1tf , T emp2tf, ..., T empntf } refers to the collected temperature values from patients equipped with WBANs. As we are dealing in
336
this case study with the monitoring and tracking of the EVD, we consider that in addition to the patient’s body temperature, several vital parameters referring to results of several diagnostic lab tests (e.g., serologic testing for IgG and IgM or liver function testing) are also made available for healthcare providers. The graph checker verifies the graph by applying the following rules: A mark Mitf is attached to each health data Ditf . A concept Citf is associated to each mark Mitf and a relation Ritf is introduced between them. The different concepts Citf are T emperature and Body. I s P − W BAN(x) and I s M − W BAN(y) are two relations referring to the fact that the collected data belongs either to a Patient-WBAN or a MedicalWBAN system. If a form of dependency between data
Y. Djemaiel et al.
represented in two different graphs exists, the correspondent health data will be represented by dashed lines. A set of rules have to be verified by the graph checker. The violation of these rules means that the checked graph is not valid and, therefore, the set of errors will be localized. Rule1: A health data Di should be associated to a mark Mi for an instant t. Therefore, an edge between them should exist. Such a rule can be expressed using the following propositional logic formula, ΦR1 : edge(Di , Mi ). To verify the validity of the T CGtf such a rule is checked for each health data Di,tf in the graph.
ΦR1 : edge (T emp1tf , M1tf ) ∧ edge(T emp2tf , M2tf ) ∧ edge(T emp3tf , M3tf )∧ edge(T emp4tf , M4tf ) ∧ edge (T emp5tf , M7tf ) ∧ edge(Dset5tf , M6tf )∧ edge(X − ray, M6tf ) ∧ edge(liver, M6tf ) ∧ edge(IgG, M6 tf ).
Fig. 7 Illustration of a sample TCG for health data
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs
The graph is considered incorrect since ΦR1 is evaluated for false for sf . Rule 2: A health data Di,t that have no dependency relationship should be associated to only one mark Mi . Therefore, we consider that the predicate Card(Di,t ) refers to the number of marks attached to a health data. A valid TCG has to verify that Card(Di,t ) = 1 if it is not linked to other TCGs. Another predicate used by this rule is Dep(Di,t ) that is set to true if the medical data Di,t has a dependency relationship. As no dependency for the health data T emp1tf is depicted in T CGtf , the following propositional logic formula, ΦR2 is checked: ΦR2 :edge (Di,t , Mi ) ∧ edge (Di,t , Mj ) ∧ Dep(Di,t ) ∧ (Card(Di,t ) > 1)
337
⎧ edge (T emp1tf , M1tf ) = T rue ∧ ⎪ ⎪ ⎨ edge (T emp1tf , M5tf ) = T rue ∧ =⇒ ΦR2 is F alse Dep(Di ) = F alse ∧ ⎪ ⎪ ⎩ Card(T emp1tf ) > 1
The graph is considered correct if R2 is evaluated for true for a state sf to the provided graph content to the following expression : sf |= R2 iff L(sf ) |= R2 . In our case, the graph is considered incorrect since R2 is evaluated for false for sf . Rule3: Two concepts Ci and Cj shouldn’t be directly linked. A relation Rk should exist and linked for both concepts. Therefore, such a rule is verified by checking the logical formula ΦR3 that is expressed as : ΦR3 : edge (Ci , Ri ) ∧ edge(Cj , Ri ) ∧ ¬edge (Ci , Cj ). In our case, the graph checker verify this rule for the two concepts T emperature and Body :
⎧ ⎨ edge (T emperature, H as− T emperaure()) = T rue ∧ edge(Body, H as− T emperature()) = T rue∧ =⇒ ΦR3 is valid ⎩ ¬edge(T emperature, Body) = T rue
Rule4: A mark Mi should only be attached to one and only one health data Di,t . The formula cardmark (Mi , 1) returns T rue if the number of data to which the mark is attached is equal to 1. ΦR4 : edge (Di,t , Mi ) ∧ edge (Dj,t , Mi ) ∧ cardmark (Mi , 1) where Di,t = Dj,t
In the following and for the sake of simplicity, we pick the sub-graph illustrated in Fig. 9 to identify some medical scenario that could be executed by a medical staff to retrieve health data stored in the WSAN-cloud system and that requires from the e-health monitoring system to trace and identify dependencies that could exist between the collected data. To trace such dependencies, a set of rules are also identified and validated by the model checker.
⎧ ⎨ edge (T emp1tf , M1tf ) = T rue ∧ edge (T emp2 tf , M1tf ) = T rue ∧ =⇒ ΦR4 is not valid ⎩ cardmark (M1tf , 1) = F alse
Scenario 1: The Monitoring of patient health status through WBANs: A form of WBAN dependency
The graph is considered incorrect by checking the rule R4 since R4 is evaluated for False for sf . According to the aforementioned given rules, the obtained graph T CGtf is not valid since the overall verified rule Θ : ΦR1 ∧ ΦR2 ∧ ΦR3 ∧ ΦR4 is not valid (i.e., ΦR1 = F alse, ΦR2 = F alse, ΦR3 = T rue, and ΦR4 = F alse). Accordingly, a set of errors are identified for the defined graph and illustrated by Fig. 8.
The medical department (doctor or other medical staff) needs to find all possible health data stored in the cloud-based WSAN and belonging to the same patient W BANid to track and monitor its health status. Therefore, it needs to access the health data that have been previously collected and now stored in the cloud-based WSAN. For example, doctors may request in addition to the temperature measurements several diagnostic lab tests such as serologic testing
The graph is considered correct by checking the rule R3 since R3 is evaluated for true for the state sf .
338
Y. Djemaiel et al.
Fig. 8 An example of rules violation
for IgG and IgM and liver function testing in order to check symptoms that could be linked to the EVD. This data might have been recorded either at the same location (i.e., belongs to the same SD) or at different locations (i.e., different hospitals of a given hospital chain, clinics, medical laboratories). More formally, Fig. 9 Dependencies representation in T CG(tf )
the process of dependency tracking tries to find all sensors readings and medical notes that are associated to the same WBAN owner (i.e., patient) and stored (e.g., in case of frequent patient’s displacement) in different storage systems. Such a dependency can be traced using the T CGtf as shown in Fig. 9. Here,
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs
T emp5tf and IgG5tf belong to the same patient (i.e., the same P WBAN(2Yb)). Rule 5: A Form of WBAN Dependency : if a data set Di,t is linked to another data set Dj,t and both health data belong to the same WBAN, therefore a form of WBAN dependency is identified between these two data sets. This rule can be checked by verifying the logical formula: ΦR5 : edge(Di,t , Dj,t ) ∧Same (I D(Di,t , Mi ) , I D(Dj,t , Mj )). The predicate I D(Di,t , Mi ) returns the identification number of the WBAN from which the health Di,t is generated. Such an information can be easily retrieved from the mark Mi attached to the data. The predicate Same(x, y) is set to true if the value of x is equal to y. In our scenario, the temperature value T emp5tf is dependent from the health data set Dset5tf since they belong to the same WBAN having the identification number 2Y b. Therefore, by checking the graph the rule ΦR5 : edge(T emp5tf , Dset5tf ) ∧ Same(I D(T emp5tf , M7tf ) , I D(Dset5tf , M6tf )) is set to valid and the dependency form is validated. Scenario 2: Epidemic Evolution: A form of Conditional Dependency The EVD spreads among human populations through direct contact with body fluids or secretions (e.g., blood) from a live or deceased infected individuals or through exposure to contaminated objects. Therefore,
339
the medical department needs to find all suspected patients that could be affected by the epidemic disease. The health data collected from these patients share some features in common. For example, the system checks the MSRB in order to find all possibly dependent data. Here the dependency links medical records that belong to different patients (e.g., different P WBANs) but they share the same condition (e.g., symptoms). In Fig. 9, a conditional dependency between two temperature measurements is represented by a double line link. These measurements are stored in two different SDs, belong to two different TCGs, and satisfy the following rule: Rule 6: A Form of Conditional Dependency : if a data set Di,t is linked to another data set Dj,t with a double line and Di,t and Dj,t do not belong to the same WBAN and a threshold T h is exceeded by these two data, therefore a form of conditional dependency is identified between these two data sets. This rule can be checked by verifying the logical formula: ΦR6 : edge(Di,t , Dj,t ) ∧ ¬Same(I D(Di,t , Mi ) , I D (Dj,t , Mj )) ∧ T hre(Di,t , th) ∧ T hre(Dj,t , th). The predicate T hre(Di,t , th) is set to true if the value of Di,t is higher than th. Here, we consider that the collected temperature values T emp5tf and T emp2tf are linked to each other. These measurements belong to two different WBANs and they exceed a predefined threshold th. Accordingly, ΦR6 can be expressed as:
⎧ ⎪ ⎪ ⎨
edge(T emp2tf , T emp5tf ) = T rue ∧ ¬Same(I D(T emp2tf , M2tf )) , I D(T emp5tf , M7tf )) = T rue ∧ =⇒ ΦR6 is valid T hre(T emp2tf , th) = T rue ∧ ⎪ ⎪ ⎩ T hre(T emp5tf , th) = T rue
Scenario 3: Pattern Matching: A form of Fragmented Data Dependency The doctor may need to access to the data that was currently collected and recorded shortly before the consultation. These data may be collected from any connected diagnostic device deployed in the hospital, in a medical laboratory or at home during a longterm monitoring or from sensors implanted or attached into/to a human body for therapy or diagnosis. As these devices may be located in the same or several
locations, fragmented data could be present and the dependency between them should be matched to properly define the disease source and to predict its propagation. For example, since the incubation period of the EVD is 2 to 21 days, patients affected by the virus (but, no real symptoms were not yet noticed) can visit different medical departments and even travel from one country to another along this period and therefore a set of fragments (generated from the same sensor node) may be stored on different storage devices. As illustrated in Fig. 9 and using the TCG, the par-
340
Y. Djemaiel et al.
ent/child dependencies between fragments associated to health data recorded in medical laboratories can be traced. By matching the dependency between the high temperature measurements of a given patient, IgG testing, the liver functions, and X-ray images, infected subjects are easily detected. Rule 7: A Form of Fragmented Data Dependency : Let Di,t be a given health data represented by the T CGt and Dj,t = {dk |k = 1...n} be a set of fragments of Di,t . Therefore, to verify that Dj,t is a fragmen-
tation of Di,t , the following logical formula ΦR7 is checked for every element of Dj,t . Therefore, ΦR7 can be expressed as: ΦR7 : edge(Di,t , dk ) ∧ Same(I D(Di,t , Mi ) , I D(dk , Mj )) ∧ I sP art (Di,t , dk ), where the predicate I sP art (Di,t , dk ) is set to true if dk is included in Di,t (dk ⊂ Di ). According to this rule the execution of the medical task described by the third scenario consists in checking the validity of ΦR7 . Here Di,t will be Dset5tf and Dj,t = {x − ray, liver, IgG}.
⎧ edge(Dset5tf , x − ray) = T rue ∧ ⎪ ⎪ ⎪ ⎪ edge(Dset5tf , liver) = T rue ∧ ⎪ ⎪ ⎪ ⎪ edge(Dset5tf , IgG) = T rue ∧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ Same(I D(Dset5tf , M6tf ) , I D(x − ray, M6j )) = T rue ∧ Same(I D(Dset5tf , M6tf ) , I D(liver, M6j )) = T rue ∧ =⇒ ΦR6 isvalid ⎪ ⎪ ⎪ Same(I D(Dset5tf , M6tf ) , I D(IgG, M6 ⎪ j )) = T rue ∧ ⎪ ⎪ ⎪ , x − ray) = T rue ∧ I sP art (Dset5 ⎪ tf ⎪ ⎪ ⎪ , liver) = T rue ∧ I sP art (Dset5 ⎪ tf ⎪ ⎩ I sP art (Dset5tf , IgG) = T rue 7.2 Performance Analysis In order to illustrate the efficiency of the proposed graph-based approach for the management of health data on cloud-based WSANs, a simulation is conducted for a cloud-based WSAN composed of an MSRB for different values associated for the SDs, the WBANs and the number of requests to be processed by the MSRB. The used values for the different simulation parameters are illustrated by Table 2. The aim of the conducted simulation is to evaluate the gain in processing time for the proposed graphbased management scheme for health data compared to the traditional approach that ensures the management based on a set of structures defined at the MSRB level. These structures hold the mapping between the SDs and the set of the stored health data. The first conducted simulation aims to evaluate the processing time for the approach-based on the TCG/GTCG for the management of the health data
Table 2 Simulation parameters
for cloud-based WSAN compared to the traditional behavior of the MSRB without using the graph structure. The obtained processed time for the graph-based approach is illustrated by Fig. 10. The simulated MSRB processing is detailed in Section 4.2. The maximum processing time for a request is about 15.10−6 which is related to a requested health data which has a great number of dependent health data that should be fetched by the MSRB. The average value of the processing time is around 5.10−6 . For the MSRB processing that is performed without the use of the graph, the obtained processing time is illustrated by Fig. 11. According to this approach, the MSRB takes more time for the processing of the issued requests by the healthcare application compared to the graph-based approach. The gain on the processing time between the two approaches is illustrated by Fig. 12. The average of this gain is around 102 means that the MSRB takes more times to process a request compared to the
#SD
#WBAN
#requests
#MSRB
#LSRB
100
40
200
1
40
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs
341
Fig. 10 MSRB processing time with graph-based approach
graph-based approach. This gain increases with the increase of the number of requests to be processed by the MSRB. The storage approach performed by the MSRB according to the scheme detailed in Section 4.2 is evaluated in order to illustrate the efficiency of the proposed technique compared to the traditional
Fig. 11 MSRB Processing time without graph
processing of an MSRB without using a graph. Figure 13 shows that the distribution of the depended health data is not performed in an efficient manner since these data are not stored on the same SD which introduces a degradation in the processing performance of the MSRB when a read request is issued by the healthcare application. The SDs that are mentioned
342
Y. Djemaiel et al.
Fig. 12 Processing time gain for the graph-based approach
in the figure below with a bar empty means that the stored health data on them are not dependent. As consequence, the dependent data is in this case dispatched on several SDs for the following considered parameters : #SDs= 100, #health data= 1000, #WBAN=100.
Fig. 13 Distribution of the storage of dependent health data on SDs without graph
However, the distribution of the health data on the set of SDs is enhanced according to the graph-based approach which enables the optimization of the processing time for requests accessing stored heath data since the requested depended health data will be fetched in most cases from the same SD, as shown by Fig. 14.
A Novel Graph-Based Approach for the Management of Health Data on Cloud-Based WSANs
343
Fig. 14 Distribution of the storage of dependent health data on SDs with graph approach
8 Conclusion In this paper, a novel cloud-based WSAN has been proposed for healthcare applications handling huge volumes of health data generated by deployed sensors belonging to a set of WBANs. This system provides data monitoring, storage, as well as querying as a service to different applications belonging to private or public clouds. In addition, the representation and the management of health data have been enhanced by adopting TCGs as a structure to represent such data. The proposed TCGs are considered as a tool to represent, trace and control the dependencies between the collected data. The evaluation of the proposed system has been conducted using a typical case study of an EVD monitoring system and a simulation. According to the considered evaluation, the proposed architecture enables the optimization of data processing time for attached healthcare applications and the improvement of additional tracing capabilities that enabled the monitoring of the propagation of diseases through a set of patients and populations. As a future work, a tracing through different WSANs for some kind of diseases in addition to the deployment of HDFS as a file system for WSAN. Moreover, the processing of healthcare transactions will be among the aspects that will be modeled for a cloud based WSAN instead of elementary requests.
References 1. Abdullah, W.A.N.W., Yaakob, N., Elobaid, M.E., Warip, M.N.M., Sitti, A.Y.: Energy-efficient remote healthcare monitoring using iot: a review of trends and challenges. In: Proceedings of the International Conference on Internet of Things and Cloud Computing, ICC ’16, pp. 29:1–29:8. ACM, New York (2016) 2. AbuKhousa, E., Mohamed, N., Jameela, A.-J.: E-health cloud: opportunities and challenges. Future Internet 12(4), 621–645 (2012) 3. Akintoye, S.B., Bagula, A.B., Djemaiel, Y., Boudriga, N.: Lightweight cloud computing for development: a graph based data model. In: Cunningham, P., Cunningham, M. (eds.) Proceedings of the 12th IST-Africa Conference, Namibia. IIMC International Information Management Corporation (2017) 4. Aslam, M.S., Rea, S., Pesch, D.: Provisioning within a wsan cloud concept. SIGBED Rev. 10(1), 48–53 (2013) 5. Baier, C., Katoen, J.-P.: Principles of Model Checking (Representation and Mind Series). MIT Press, Cambridge (2008) 6. Berrahal, S., Boudriga, N., Bagula, A.: Cooperative SensorClouds for Public Safety Services in Infrastructure-Less Areas. In: Proceedings of 22Nd Asia-Pacific Conference on Communications (APCC), Indonesia, August 25 - 27 (2016) 7. Berrahal, S., Boudriga, N., Bagula, A.: Healthcare systems in rural areas: a cloud-sensor based approach for epidemic diseases management. In: Belqasmi, F., Glitho, R., Zennaro, M., Agueh, M. (eds.) Proceedings of the 7th EAI International Conference on e-Infrastructure and eServices for Developing Countries (AFRICOMM), volume 171. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Springer (2015)
344 8. Butca, C.G., Suciu, G., Ochian, A., Fratu, O., Halunga, S.: Wearable Sensors and Cloud Platform for Monitoring Environmental Parameters in E-Health Applications. In: Proceedings of the 11Th Nternational Symposium on Electronics and Telecommunications (ISETC), Number 1 - 4, Romania (2014) 9. Celesti, A., Fazio, M., Romano, A., Villari, M.: Hospital Cloud-Based Archival Information System for the Efficient Management of Hl7 Big Data. In: Proceedings of the 39Th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2016), Croatia, May 30 - June 3 (2016) 10. Chen, M., Ma, Y., Song, J., Lai, C.-F., clothing, B.H.: Smart connecting human with clouds and big data for sustainable health monitoring. Mobile Netw. Appl. 21(5), 825–845 (2016) 11. Djemaiel, Y., Boudriga, N., Zouaidi, S.: An intrusion tolerant transaction management model for wireless storage area networks. J Netw. Technol. 4(3), 127–138 (2013) 12. Djemaiel, Y., Essaddi, N., graphs, N.B.: Optimizing Big Data Management Using Conceptual a Mark-Based Approach. In: Proceedings of the International Conference on Business Information Systems (BIS), pp. 1–12, Cyprus (2014) 13. Djemaiel, Y., Fessi, B.A., Boudriga, N.: A Mark BasedTemporal Conceptual Graphs for Enhancing Big Data Management and Attack Scenario Reconstruction. In: Proceedings of the International Conference on Business Information Systems (BIS), pp. 62–73, Poland (2015) 14. Doukas, C., Maglogiannis, I.: Managing Wearable Sensor Data through Cloud Computing. In: Proceedings of the IEEE Third International Conference on Cloud Computing Technology and Science (Cloudcom), Athens, Greece, 29 November - 1 (2011) 15. Fazio, M., Bramanti, A., Celesti, A., Bramanti, P., Villari, M.: A Hybrid Storage Service for the Management of Big E-Health Data: A Tele-Rehabilitation Case of Study. In: Proceedings of the 12th ACM Symposium on Qos and Security for Wireless and Mobile Networks (Q2SWinet ’16), pp. 1–8, Malta, November 13 - 17 (2016) 16. Goli-Malekabadi, Z., Sargolzaei-Javan, M., Akbari, M.K.: An effective model for store and retrieve big health data in cloud computing. Comput. Methods Programs Biomed. 132(Supplement C), 75–82 (2016) 17. Jones, M., Kepner, J., Arcand, W., Bestor, D., Bergeron, B., Gadepally, V., Houle, M., Hubbell, M., Michaleas, P., Prout, A., Reuther, A., Samsi, S., Monticiollo, P.: Performance measurements of supercomputing and cloud storage solutions. arXiv:1708.00544 (2017) 18. Junghanns, M., Petermann, A., Neumann, M., Rahm, E.: Management and Analysis of Big Graph Data: Current
Y. Djemaiel et al.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
Systems and Open Challenges, chapter Handbook of Big Data Technologies, pp. 457–505. Springer, Berlin (2017) Lee, E.: Partitioning a graph into small pieces with applications to path transversal. In: Proceedings of the TwentyEighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’17, pp. 1546–1558. Society for Industrial and Applied Mathematics, Philadelphia (2017) Lim, Y., Park, J.: Sensor resource sharing approaches in sensor-cloud infrastructure. Int. J. Distrib. Sens. Netw. 10(4), 476090 (2014) Fazio, A.P.M., Celesti, A., Villari, M.: Big data storage in the cloud for smart environment monitoring. In: Procedia Computer Science, editor Proceedings of the 6th International Conference on Ambient Systems, Networks and Technologies (ANT), vol. 52, pp. 500–506 (2015) Arsenio, A., Sales, N., Remedios, O.: Wireless sensor and actuator system for smart irrigation on the cloud. 2015 IEEE 2nd World Forum on Internet of Things (WF-IoT) 00, 693–698 (2015) Pandey, M.K., Subbiah, K.: A Novel Storage Architecture for Facilitating Efficient Analytics of Health Informatics Big Data in Cloud. In: Proceedings of the IEEE International Conference on Computer and Information Technology (CIT), Fiji (2016) Pantelopoulos, A., Bourbakis, N.G.: A survey on wearable sensor-based systems for health monitoring and prognosis. Trans. Sys., Man Cyber Part C 40(1), 1–12 (2010) Robinson, I., Webber, J., Eifrem, E.: Graph Databases: New Opportunities for Connected Data. O’Reilly Media, Inc., Sebastopol (2015) Strohbach, M., Daubert, J., Ravkin, H., Lischka, M.: New Horizons for a Data-Driven Economy, chapter Big Data Storage, pp. 119–141. Springer, Berlin (2016) Viangteeravat, T., Anyanwu, M.N., Nagisetty, V.R., Kuscu, E., Sakauye, M.E., Duojiao, W.: Clinical data integration of distributed data sources using health level seven (hl7) v3-rim mapping. J. Clin. Bioinf. 1(1), 32 (2011) Vilaplana, J., Solsona, F., Abella, F., Filgueira, R., Torrento, J.R.: The cloud paradigm applied to e-health. BMC Med. Inf. & Decision Making 13, 35 (2013) Wei-Qi, W., Denny, J.C.: Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome Med. 7(1), 1–14 (2015) Wullianallur, R., Raghupathi, V.: Big data analytics in healthcare: Promise and potential. Health Inform. Sci. Syst. 2(3), 1–10 (2014) Yoon, B.-H., Kim, S.-K., Kim, S.-Y.: Use of graph database for the integration of heterogeneous biological data. Genome Inform. 15(1), 19–27 (2017)